Monday, November 2, 2009

Analysis of a word game

There is a word game which is played like this: When it's your turn, you say a word that starts with the last letter of the previously said word as quickly as you can. The player to start the game can choose any word to say. This is a noncompetitive game, and there is no real winner or loser, you just play until you don't feel like playing anymore. I used to play this with my mom when I was young.

I played this game with a few friends earlier in the semester, and during the game we noticed that we kept saying words that end with 'E', and soon enough we began having difficulty thinking of words that start with 'E'.

I got to thinking about the patterns we encountered and decided to take a look at what Wikipedia knows about letter frequencies:
Wikipedia - Relative frequencies of letters in the English language

The article shows that 'E' is the most common letter in the English vocabulary, followed in second place by 'T'. However, this only describes the frequency of appearance of a letter, irrelevant of a letter's location within the word.

I was specifically interested in pinpointing the frequencies of letters appearing at the beginning and at the end of words, so I decided to do some quick scripting on a dictionary file. I got the following results on a ~32,000 word dictionary file (Source: http://wordlist.sourceforge.net/):

Top 5 letters at the END of a word:
E (5866)
N (2907)
T (2865)
Y (2772)
R (2026)

This seems to follow, somewhat, with the data in the Wikipedia article: E is the most common appearing letter, and T is very close to second place.

Top 5 letters at the BEGINNING of a word:
S (3590)
C (2921)
P (2481)
A (1886)
D (1855)

'E' is nowhere near being in this Top 5 list. I don't show it here, but it's in 13th place with a measly count of 1205. That means that there are around: 5866/1205 = ~5 times as many words that end with E than start with E.

This data gives some indication as to why we had a lot of words ending with E, and why we had more trouble thinking of words starting with E.

Of course, if we look at absolute numbers, 1205 words beginning with E is more than enough to choose from, so I guess we don't have too good of an excuse.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some more fun letters to compare:
K-end (621) vs. K-start (209). K words are pretty tough.

The situation is even worse for X...
X-end (120) vs X-start (9). Wow, only 9! I could spend some time learning words that end with X and then no one will ever want to play this game with me again.

Z is pretty cool. Unlike most of the other letters, more words start with Z (48) than end with Z (30).

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Source code (PHP) for the code script can be found here (includes the dictionary file - but please be aware of its original source):
http://www.romanstolper.com/files/lettercount.zip

No comments:

Post a Comment