Friday, November 6, 2009

Terrible idea for generating random numbers

The Operating Systems course I am taking has just finished going over threading and concurrency, and it introduced me to multiplexing and thread scheduling. For whatever reason, I was thinking about random number generation the other day, and I thought: what if random numbers could be produced by making use of the nondeterminism of thread concurrency?

My idea:

To produce a random integer from 1 to N, create one global variable and run N threads in parallel. Each thread has a unique ID number ranging from 1 to N, and at each iteration of the thread's while loop, it tries to write its ID number to the global variable. In this way, the global variable's value will keep changing nondeterministically. To get a random number, just read the value of the global variable.

Result:

It works! It's an utterly horrible way to generate random numbers, but hey, it was just an experiment.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UPDATE:

After running the program for around 10 hours, and generating random numbers from 0-9 (inclusive) at around 3-4 numbers per second (over 100,000 numbers), the average of all the numbers was around 4.38. This isn't bad considering the target average for this range of numbers is 4.5.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Source code (Java 1.6) can be found here:
http://www.romanstolper.com/files/RandomThreads.java

Monday, November 2, 2009

Analysis of a word game

There is a word game which is played like this: When it's your turn, you say a word that starts with the last letter of the previously said word as quickly as you can. The player to start the game can choose any word to say. This is a noncompetitive game, and there is no real winner or loser, you just play until you don't feel like playing anymore. I used to play this with my mom when I was young.

I played this game with a few friends earlier in the semester, and during the game we noticed that we kept saying words that end with 'E', and soon enough we began having difficulty thinking of words that start with 'E'.

I got to thinking about the patterns we encountered and decided to take a look at what Wikipedia knows about letter frequencies:
Wikipedia - Relative frequencies of letters in the English language

The article shows that 'E' is the most common letter in the English vocabulary, followed in second place by 'T'. However, this only describes the frequency of appearance of a letter, irrelevant of a letter's location within the word.

I was specifically interested in pinpointing the frequencies of letters appearing at the beginning and at the end of words, so I decided to do some quick scripting on a dictionary file. I got the following results on a ~32,000 word dictionary file (Source: http://wordlist.sourceforge.net/):

Top 5 letters at the END of a word:
E (5866)
N (2907)
T (2865)
Y (2772)
R (2026)

This seems to follow, somewhat, with the data in the Wikipedia article: E is the most common appearing letter, and T is very close to second place.

Top 5 letters at the BEGINNING of a word:
S (3590)
C (2921)
P (2481)
A (1886)
D (1855)

'E' is nowhere near being in this Top 5 list. I don't show it here, but it's in 13th place with a measly count of 1205. That means that there are around: 5866/1205 = ~5 times as many words that end with E than start with E.

This data gives some indication as to why we had a lot of words ending with E, and why we had more trouble thinking of words starting with E.

Of course, if we look at absolute numbers, 1205 words beginning with E is more than enough to choose from, so I guess we don't have too good of an excuse.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some more fun letters to compare:
K-end (621) vs. K-start (209). K words are pretty tough.

The situation is even worse for X...
X-end (120) vs X-start (9). Wow, only 9! I could spend some time learning words that end with X and then no one will ever want to play this game with me again.

Z is pretty cool. Unlike most of the other letters, more words start with Z (48) than end with Z (30).

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Source code (PHP) for the code script can be found here (includes the dictionary file - but please be aware of its original source):
http://www.romanstolper.com/files/lettercount.zip