Bad behavior from random()

Background

I’ve been warned over the years to be skeptical of random number generators (or more properly, pseudo-random number generators). Way back in the day when I was first starting out doing C coding, I used to use a function called rand() for my random numbers. It was known to be pretty bad — the low bit would oscillated between 0 and 1 (i.e. even and odd numbers), and in general the low several bits were known to be periodic. But the upper bits were supposedly not too terrible.

At some point, I switched to random(), which was apparently much better. It sure seemed fine for my purposes. I used random() on Sun Workstations, then in Linux for many years, and more recently on Mac OS-X from about 10.5 until 10.8. I used it mostly for stochastic spatial lattice models I work with, but also for patch models or community-structured epidemiological models. All of my research simulations were done using random().

Then, one of my students Isaac was learning C, and coded up one of my household models. We ran it using parameter values which would produce a known result. The results we got were correct to several decimal places, but by luck we happened to notice that among several runs we did, the results always came out just slightly above the expected values, and never below them. By symmetry, it seemed the simulations should be equally likely to be above and below the theoretical expected value.

Further exploration showed that in fact random() was exhibiting some odd behavior, but only when called in the particular patterns in which we used it. Namely, inside of our main iteration loop, we would call random() twice. Depending on the result of the second call, we would either call random() once more or three more times (to process a death event or birth event in the simulation). Changing the structure of the code in any way that altered the number of calls to random() in various parts of the loop made the problem go away.

We then tried the Mersenne Twister Prime random number generator (we got a copy from here), which made the problem go away. Perhaps there is some other basic simulation out there waiting to be written, which will demonstrate odd behavior with the twister prime. But for now, all of my simulations will use these random numbers instead of those from random().

Code to demonstrate the problem

In the randproblem.c code above, the main “if/else” statement should come up TRUE 1/4 of the time, and FALSE 3/4 of the time. That is, the ratio of frequencies of the two clauses should be 3 to 1. If you run the code, you’ll find that it is systematically slightly above 3. The randproblemMT.c version of the code uses the Mersenne Twister Prime RNG, and does not exhibit the problem.