A Million Random Digits with 100,000 Normal Deviates

Foreword to the Online Edition

This book was a product of RAND's computing power (and patience). The tables of random numbers in the book have become a standard reference in engineering and econometrics textbooks and have been widely used in gaming and simulations that employ Monte Carlo trials. Still the largest known source of random digits and normal deviates, the work is routinely used by statisticians, physicists, polltakers, market analysts, lottery administrators, and quality control engineers.

A humorous sidelight: The New York Public Library originally indexed this book under the heading "Psychology."

Acknowledgments

The following persons participated in the production, testing, and preparation for publication of the tables of random digits and random normal deviates: Paul Armer, E. C. Bower, Mrs. Bernice Brown, G. W. Brown, Walter Frantz, J. J. Goodpasture, W. F. Gunning, Cecil Hastings, Olaf Helmer, M. L. Juncosa, J. D. Madden, A. M. Mood, R. T. Nash, J. D. Williams. These tables were prepared in connection with analyses done for the United States Air Force.

Introduction

Early in the course of research at The RAND Corporation a demand arose for random numbers; these were needed to solve problems of various kinds by experimental probability procedures, which have come to be called Monte Carlo methods. Many of the applications required a large supply of random digits or normal deviates of high quality, and the tables presented here were produced to meet those requirements. The numbers have been used extensively by research workers at RAND, and by many others, in the solution of a wide range of problems during the past seven years.

One distinguishing feature of the digit table is its size. On numerous RAND problems the largest existing table of Kendall and Smith (Ref. 1) would have had to be used many times over, with the consequent dangers of introducing unwanted correlations. The feasibility of working with as large a table as the present one resulted from developments in computing machinery which made possible the solving of very complicated distribution problems in a reasonable time by Monte Carlo methods. The tables were constructed primarily for use with punched card machines. With the high-speed electronic computers recently developed, the storage of such tables is usually not practical and, in fact, much larger tables than the present one are often required; these machines have caused research workers to turn to pseudo-random numbers which are computed by simple arithmetic processes directly by the machine as needed. These developments are summarized in Refs. 2, 3, and 4, where other references may be found. Refs. 5, 6, 7, and 8 discuss the uses and applications of Monte Carlo methods and give references to other applications.

Production of the Random Digits

The random digits in this book were produced by rerandomization of a basic table generated by an electronic roulette wheel. Briefly, a random frequency pulse source, providing on the average about 100,000 pulses per second, was gated about once per second by a constant frequency pulse. Pulse standardization circuits passed the pulses through a 5-place binary counter. In principle the machine was a 32-place roulette wheel which made, on the average, about 3000 revolutions per trial and produced one number per second. A binary-to-decimal converter was used which converted 20 of the 32 numbers (the other twelve were discarded) and retained only the final digit of two-digit numbers; this final digit was fed into an IBM punch to produce finally a punched card table of random digits.

Production from the original machine showed statistically significant biases, and the engineers had to make several modifications and refinements of the circuits before production of apparently satisfactory numbers was achieved. The basic table of a million digits was then produced during May and June of 1947. This table was subjected to fairly exhaustive tests and it was found that it still contained small but statistically significant biases. For example, the following table[2] shows the results of three tests (described later) on two blocks of 125,000 digits:

  Block 1 Block 2
  Chi Probability Chi Probability
Frequency (9 d.f.*) 6.0.74 21.0.02
Odd-even (1 d.f) 3.0.09 7.0 <.0l
Serial (81 d.f.) 78.7 .55 105.6 .03
*The letters "d.f." (degrees of freedom) identify a parameter associated with the test. A discussion of the test may be found in any textbook on statistics.

Block 1 was produced immediately after a careful tune-up of the machine; Block 2 was produced after one month of continuous operation without adjustment. Apparently the machine had been running down despite the fact that periodic electronic checks indicated it had remained in good order.

The table was regarded as reasonably satisfactory because the deviations from expectations in the various tests were all very small--the largest being less than 2 per cent--and no further effort was made to generate better numbers with the machine. However, the table was transformed by adding pairs of digits modulo 10 in order to improve the distribution of the digits. There were 20,000 punched cards with 50 digits per card; each digit on a given card was added modulo 10 to the corresponding digit of the preceding card to yield a rerandomized digit. It is this transformed table which is published here and which is the subject of the tests described below.

The transformation was expected to, and did, improve the distribution in view of a limit theorem to the effect that sums of random variables modulo 1 have the uniform distribution over the unit interval as their limiting distribution. (See Ref. 9 for a version of this theorem for discrete variates.)

These tables were reproduced by photo-offset from pages printed by the IBM model 856 Cardatype. Because of the very nature of the tables, it did not seem necessary to proofread every page of the final manuscript in order to catch random errors of the Cardatype. All pages were scanned for systematic errors, every twentieth page was proofread (starting with page 10 for both the digits and deviates), and every fortieth page (starting with page 5 for both the digits and deviates) was summed and the totals checked against sums obtained from the cards.[3]

Tests on the Random Digits

Frequency Tests. The table was divided into 1000 blocks of 1000 digits each and the frequency of each digit was recorded for each block. Then for each block a goodness-of-fit Chi was computed with 9 d.f. These 1000 values of Chi provided an empirical fit to the Chi distribution (with 9 d.f.); to test the fit, a goodness-of-fit Chi was computer using 50 class intervals, each of which was expected to contain 2 per cent of the values. (The number of intervals was chosen in accordance with the result of Wald and Mann (Ref. 10).) The value of the test Chi was 54.6 which, for 49 d.f., corresponded to about the 0.45 probability level.

To examine further the frequencies, the digits were tallied in 20 blocks of 50,000 digits each. The results are shown in Table 1 together with the goodness-of-fit Chi for each block. On the total frequencies the Chi (13.316) for 9 d.f. has been partitioned into three components as follows:

Chi d.f. Probability
Odd versus even digits 1.37 1 0.25
Within groups of odd digits 7.90 4 0.10
Within groups of even digits 4.04 4 0.40

Table 1
Frequencies of One Million Digits

Block No. 0 1 2 3 4 5 6 7 8 9 Chi
149235013491649515109499350555080498649747.556
2487049565080509750665034490249745012500910.132
350655014503450574902506149424946496050196.078
4500950534966489150314895503750625170488615.004
5503349825180507448924992501150054959487213.846
649764993493250394965503449434932511650707.076
7501151524990504749745107486949255023490214.116
8500350925163493650205069491449434914494613.051
9486048995138495950895047503050395002493713.410
1049984957496451244909499550534946499550597.212
1149485048504150775051500450244886491750047.142
1249584993506449875041498449914987511348826.992
1349684961502950385022502350104988493650252.162
14511049235025497550955051503549624942488210.172
15509449624945489150145002503850235179485216.261
1649575035505150215036492750224988491050534.856
1750884989504249484999502850374893500449725.347
1849705034499650085049501649544989497050141.625
1949984981498451074874498050575020497850216.584
2049635013510150844956497250184971502149016.584
Total9980210005010064110031110009410021499942995591001079928013.316

Of the 200 frequencies recorded in Table 1, 59 (29.5 per cent) deviate from 5000 by more than Sigma (= 30Square root5 = 67.08), and 8 (4 per cent) deviate from 5000 by more than 2Sigma. Of the twenty Chi values in Table 1, eight exceed the 50 per cent value (8.34), two fall below the 10 per cent value (4.17), and two exceed the 90 per cent value (14.7).

Poker Tests.Sets of 5 digits in blocks of 5000 digits were taken to be poker hands and were classified as:

Class Symbol Expected Frequency Per Block
Bustsabcde302.4
Pairsaabcd504
Two pairsaabbc108
Threesaaabc72
Full houseaaabb9
Fours}aaaab{4.5}
Five}aaaaa{0.1}

There were 200 sets of 1000 poker hands in the table, and for each set a goodness-of-fit Chi was computed with 5 d.f. (the fours and fives were combined). The manner in which these 200 values fit the Chi distribution is shown in Table 2.

Table 2
Distribution of Chi-square Values

Probability Values of Chi Expected Frequency Observed Frequency
P > .900 - 1.602022
.90 >P > .801.61 – 2.352019
.80 > P > .702.36 – 3.002022
.70 > P > .603.01 – 3.702019
.60 > P > .503.71 – 4.352020
.50 > P > .404.36 – 5.202029
.40 > P > .305.21 – 6.102022
.30 > P > .206.11 – 7.302015
.20 > P > .107.31 – 9.202015
P < .109.21 or more2017
  200200

The goodness-of-fit test gives:

Chi = 7.7 for 9 d.f.,    P = 0.55.

The combined frequencies of poker hands in the whole table are shown in Table 3. The largest difference between expected and observed frequencies (for threes) is about 2.25 times its standard deviation, which is roughly at about the 9 or 10 per cent probability level (looking merely at the largest of five independent normal observations).

Table 3
Poker Test on The Million Digits (200,000 Poker Hands)

Classes Expected
Frequency
Observed
Frequency
Busts (abcde) 60,480 60,479
Pairs (aabcd) 100,800 100,570
Two pairs (aabbc) 21,600 21,572
Threes (aaabc) 14,400 14,659
Full house (aaabb) 1,800 1,788
Fours (aaaab) {900} {914}
Fives (aaaaa) {20} {18}
  200,000 200,000

The goodness-of-fit test gives:

Chi = 5.5 for 5 d.f.,    P = 0.35.

Also, the frequencies of poker hands were computed for each of ten blocks of 100,000 digits and the mean and standard deviation was computed from the ten values for each kind of hand. The results are shown in Table 4.

Table 4
Mean and Standard Deviation of Frequencies in Seven Classes of Poker Hands

Classes Theoretical Mean Actual Mean Theoretical Std. Dev. Actual Std. Dev.
Busts60486047.964.960.3
Pairs1008010057.070.778.4
Two pairs21602157.243.945.8
Threes14401465.936.926.6
Full house180178.813.48.9
Fours9091.49.511.5
Fives21.81.41.9

Serial and Run Tests. Some further tests were made on the first block of 50,000 digits to look particularly for any evidence of serial association among the digits. The serial test classified every successive pair of digits by each digit of the pair in a ten-by-ten table. The frequencies of the different pairs are given in Table 5, where the first digit of the pair is shown in the left column of the table and the second digit is shown at the top. Thus there were 5l0 cases in which a zero followed a one. The frequency Chi for the row (or column) totals is 7.56, which is about the 0.60 probability level for 9 d.f.

Table 5
Frequencies of Ordered Pairs of Digits

First Digit Second Digit/ 0 1 2 3 4 5 6 7 8 9 Total
05084565095075024894715044884894923
15105144745145044814964865075275013
24515234934845024665145064934844916
35004724764665134785405135304634951
45135614814855265134855105245115109
54754905275074934814895124655544993
64944864914835255045305395134905055
75085124544985505335165044855205080
84635034755145205445144915204424986
95014965364934745045005154614944974
Total492350134916495151094993505550804986497450000

Several essentially equivalent Chi values were computed from Table 5. First, assuming all pairs equally likely (expected value of 500 for each cell), a Chi of 107.8 was computed, which for 90 d.f. (because row totals equal column totals) is about the 0.10 probability level. Second, given the row frequencies and assuming digits equally likely to follow (expected value of 492.3 for cells of the first row, for example), a Chi of 98.9 was computed which is about the 0.25 level for 90 d.f. Third, the expected cell sizes were computed as 1 /10 the column totals to give a Chi value of 100.4, which is about the 0.20 level. Fourth, fitting all means to both row and column totals gave a Chi of 91.9 with 81 d.f, which is at about the 0.19 probability level.

Finally, in the same block of 50,000 digits all runs were counted with the obviously satisfactory results shown in Table 6.

Table 6
Run Test

Length of Run Expected Frequency Observed Frequency
r = 14050040410
r = 240504055
r = 3405421
r = 440.548
r = 54.55

Normal Deviates

Half of the random digit table was used to produce 100,000 standard normal deviates by solving for x in the equation

equation(1)

where D is a five-digit number from the table and

equation

is the cumulative standard normal distribution. The Bureau of Standards tables of F(x) were used (Ref. 11).

The deviates were determined by the five-digit numbers on the left-hand half of every page of the digit table. The deviates in the first column correspond page by page with the five-figure digits in the first column of the first 200 pages of the digit table; the deviates in the second column correspond page by page with the first column of the second 200 pages of the digit table. Similarly, the third and fourth columns of deviates were derived from the second column of five-figure digits, etc.

A Chi test of the fit of the entire table of deviates to the normal distribution was performed using 400 class intervals (Ref. 10) with roughly 250 expected in each. The Chi value was found to be 346.4, which for 399 d.f. indicates a very close fit; the probability of a larger value of Chi is about 0.97. The detailed data for this test are given in Table 7.

Table 7
Goodness-of-fit Test for Normal Deviates

A more refined test of the fit in the tails was made on the deviates exceeding 2.326 in absolute value. Eighty intervals (Ref. 10) were used, each with an expectation of approximately 25. The Chi value was 76.26, with 80 d.f.; the probability of a larger value is about 0.61. The details of this test are given in Table 8.

Table 8
Goodness-of-fit of Normal Deviates in 1 Per Cent Tails

The only tests made on the squares of the deviates consisted in computing sums of k squares and comparing the distribution of the sums with the Chi distribution with k d.f., employing again the standard goodness-of-fit test. This was done for k = 25, 50, 100, 300, with the following results:

k Number of Sums Number of Intervals (i) Chi with i – 1 Degrees of Freedom Probability of a Larger Chi
25400010092.920.66
50200010092.450.67
10010005057.750.19
3003333438.700.23

The fourth column gives the goodness-of-fit Chi value for the fit to the Chi distribution with k degrees of freedom. Intervals of approximately equal probability were used in all cases.

Use of the Tables

The lines of the digit table are numbered from 00000 to 19999. In any use of the table, one should first find a random starting position. A common procedure for doing this is to open the book to an unselected page of the digit table and blindly choose a five-digit number; this number with the first digit reduced modulo 2 determines the starting line; the two digits to the right of the initially selected five- digit number are reduced modulo 50 to determine the starting column in the starting line. To guard against the tendency of books to open repeatedly at the same page and the natural tendency of a person to choose a number toward the center of the page: every five-digit number used to determine a starting position should be marked and not used a second time for this purpose.

The digit table is also used to find a random starting position in the deviate table: Select a five-digit number as before; the first four digits give the starting line (the lines being numbered from 0000 to 9999) and the fifth digit gives the starting position in the line.

Ordinarily, the table is read in the same direction as a book is read; however, the size of the table may be effectively increased by varying the direction in which it is read. Thus, one may read columns instead of lines, may read the table backward, may read lines forward but pages from bottom to top, etc. Of course, care must be taken in using these devices to avoid introducing correlations when the table is used more than once on the same problem.

To obtain a random permutation of the integers 1, 2, . . . , n, select a random starting position; use the five-digit number containing the starting position and the following n - 1 five-digit numbers; put the integers in the same order as these n five-digit numbers. In case of ties among the five-digit numbers, use additional columns to the right to make six or more digit numbers. The same procedure is used to obtain a random permutation of n objects, some of which are indistinguishable, by merely numbering the objects arbitrarily from 1 to n.

To obtain random observations from any distribution G(x), use Eq. (1), substitute G(x) for F(x), and employ as many digits in D as required for the desired accuracy of the observations. Of course the negative exponent of 10 in Eq. (1) must be equal to the number of digits in D. If G(x) has a discontinuity at x0, define it to be continuous on the right and take the solution of Eq. (1) to be x0 when the left side of Eq. (1) falls between G(x0-) and G(x0). For example, if

G(x) = 1 – e-x,

and one is content with three-figure accuracy, then the three-digit number 082 determines an observation from a population distributed by G(x) as follows:

.0825 = 1 – e-x,

x = .086.

A technique suggested by von Neumann, called the "rejection method," enables one to substitute for the solution of Eq. (1) a stochastic process involving a much simpler computation; this technique will be discussed in a forthcoming book by Kahn (Ref. 8).

To obtain pairs of normal deviates with given correlation Rho, use pairs (x, y) of independent deviates from the table and transform them to

equation

Thus for Rho = -.6, for example, if (.732, -1.205) are two deviates from the table, then

(.732, -1.403)

is a pair of deviates from a normal population with the desired correlation.[4]

In general, to obtain a random observation from a bivariate population with distribution G(x,y), one uses a marginal distribution on one variate, say, G1(x), and the conditional distribution, say, G2(y/x), on the other. Two random numbers determine the observation: one determines x by employing G1(x) in Eq. (1), and the other determines y by employing G2(y/x) in Eq. (1). Thus, if a probability density is uniform (and equal to two) over the triangle bounded by x = 0, y = 0, x + y = 1 and is zero elsewhere, then

equation

and two four-digit random numbers, 5402 and 1770, determine the observation (.3220, .1200). The direct generalization of this procedure will determine observations from multivariate populations.



The tables of random digits and normal deviates comprise very large files, so only a sample page of each is included here.

Access to the complete tables in compressed format is provided on the document's main page.


References

1. Kendall, M. G., and B. B. Smith, Random Sampling Numbers, Cambridge University Press, 1939.

2. Juncosa, M. L., Random Number Generation on the BRL High-Speed Computing Machines, Ballistic Research Laboratories Report No. 855, Aberdeen Proving Ground, Maryland, 1953.

3. Meyer, H. A., L. S. Gephart, and N. L. Rasmussen, On the Generation and Testing of Random Digits, WADC Technical Report 54-55, Wright-Patterson Air Force Base, Ohio, 1954.

4. Moshman, Jack, "Generation of Pseudo-random Numbers on a Decimal Calculator," J. Assoc. Computing Machinery, Vol. 1, 1954, p. 88.

5. The Monte Carlo Method (Proceedings of a Symposium held in 1949), National Bureau of Standards Report AMS 12, Government Printing Office, Washington 25, D.C., 1951.

6. Curtiss, J. H., "Sampling Methods Applied to Differential and Difference Equations," Seminar on Scientific Computation, International Business Machines Corp., New York, 1949.

7. Kahn, H., and A. W. Marshall, "Methods of Reducing Sample Size in Monte Carlo Computations," J. Operations Res. Soc. of Amer., Vol. 1, 1953, pp. 263-278.

8. Kahn, H., Applications of Monte Carlo, The RAND Corporation (to be published).

9. Horton, H. B., and R. T. Smith, "A Direct Method for Producing Random Digits in Any Number System," Ann. Math. Statistics, Vol. 20, 1949, pp. 82-90.

10. Mann, H. B., and A. Wald, "On the Choice of the Number of Class Intervals in the Application of the Chi-Square Test," Ann. Math. Statistics, Vol. 13, 1942, pp. 306-317.

11. Tables of Probability Functions, 2d ed., U.S. Government Printing Office, Washington, D.C., 1948.


[1] References are listed on page xxiv.

[2] For readers who are not statisticians: The Chi (chi-square) test is a standard statistical criterion used to measure discrepancy from expectation. The probability value associated with the criterion ranges between zero and one. A small probability value, e.g., less than .05, indicates the possibility of a discrepancy or bias. For very large samples, such as is the case here, with 125,000 digits, the test is extremely sensitive and will result in a small probability value even though the bias may be quite trivial from a practical standpoint.

[3] Some issues concerning discrepancies in the numbers presented here are discussed in an addendum prepared in April 1997.

[4] RAND has a table of deviations for Gaussian-Markov chains with a discrete time parameter. There is one chain of 10,000 observations for each of the correlations: .600, .800, .900, .950, .970, .980, .990, .995.


Copyright © 1955 by The RAND Corporation

All rights reserved. Permission is given to duplicate this on-line document for personal use only, as long as it is unaltered and complete. Copies may not be duplicated for commercial purposes.

RAND is a nonprofit institution that helps improve public policy through research and analysis. RAND's publications do not necessarily reflect the opinions or policies of its research sponsors.