A Million Random Digits with 100,000 Normal Deviates
Foreword to the Online Edition
This book was a product of RAND's computing power (and patience). The tables of random numbers in the book have become a standard reference in engineering and econometrics textbooks and have been widely used in gaming and simulations that employ Monte Carlo trials. Still the largest known source of random digits and normal deviates, the work is routinely used by statisticians, physicists, polltakers, market analysts, lottery administrators, and quality control engineers.
A humorous sidelight: The New York Public Library originally indexed this book under the heading "Psychology."
Acknowledgments
The following persons participated in the production, testing, and preparation for publication of the tables of random digits and random normal deviates: Paul Armer, E. C. Bower, Mrs. Bernice Brown, G. W. Brown, Walter Frantz, J. J. Goodpasture, W. F. Gunning, Cecil Hastings, Olaf Helmer, M. L. Juncosa, J. D. Madden, A. M. Mood, R. T. Nash, J. D. Williams. These tables were prepared in connection with analyses done for the United States Air Force.
Introduction
Early in the course of research at The RAND Corporation a demand arose for random numbers; these were needed to solve problems of various kinds by experimental probability procedures, which have come to be called Monte Carlo methods. Many of the applications required a large supply of random digits or normal deviates of high quality, and the tables presented here were produced to meet those requirements. The numbers have been used extensively by research workers at RAND, and by many others, in the solution of a wide range of problems during the past seven years.
One distinguishing feature of the digit table is its size. On numerous RAND problems the largest existing table of Kendall and Smith (Ref. 1) would have had to be used many times over, with the consequent dangers of introducing unwanted correlations. The feasibility of working with as large a table as the present one resulted from developments in computing machinery which made possible the solving of very complicated distribution problems in a reasonable time by Monte Carlo methods. The tables were constructed primarily for use with punched card machines. With the high-speed electronic computers recently developed, the storage of such tables is usually not practical and, in fact, much larger tables than the present one are often required; these machines have caused research workers to turn to pseudo-random numbers which are computed by simple arithmetic processes directly by the machine as needed. These developments are summarized in Refs. 2, 3, and 4, where other references may be found. Refs. 5, 6, 7, and 8 discuss the uses and applications of Monte Carlo methods and give references to other applications.
Production of the Random Digits
The random digits in this book were produced by rerandomization of a basic table generated by an electronic roulette wheel. Briefly, a random frequency pulse source, providing on the average about 100,000 pulses per second, was gated about once per second by a constant frequency pulse. Pulse standardization circuits passed the pulses through a 5-place binary counter. In principle the machine was a 32-place roulette wheel which made, on the average, about 3000 revolutions per trial and produced one number per second. A binary-to-decimal converter was used which converted 20 of the 32 numbers (the other twelve were discarded) and retained only the final digit of two-digit numbers; this final digit was fed into an IBM punch to produce finally a punched card table of random digits.
Production from the original machine showed statistically significant biases, and the engineers had to make several modifications and refinements of the circuits before production of apparently satisfactory numbers was achieved. The basic table of a million digits was then produced during May and June of 1947. This table was subjected to fairly exhaustive tests and it was found that it still contained small but statistically significant biases. For example, the following table[2] shows the results of three tests (described later) on two blocks of 125,000 digits:
Block 1 | Block 2 | |||
Probability | Probability | |||
Frequency (9 d.f.*) | 6.0 | .74 | 21.0 | .02 |
Odd-even (1 d.f) | 3.0 | .09 | 7.0 | <.0l |
Serial (81 d.f.) | 78.7 | .55 | 105.6 | .03 |
*The letters "d.f." (degrees of freedom) identify a parameter associated with the test. A discussion of the test may be found in any textbook on statistics.
Block 1 was produced immediately after a careful tune-up of the machine; Block 2 was produced after one month of continuous operation without adjustment. Apparently the machine had been running down despite the fact that periodic electronic checks indicated it had remained in good order.
The table was regarded as reasonably satisfactory because the deviations from expectations in the various tests were all very small--the largest being less than 2 per cent--and no further effort was made to generate better numbers with the machine. However, the table was transformed by adding pairs of digits modulo 10 in order to improve the distribution of the digits. There were 20,000 punched cards with 50 digits per card; each digit on a given card was added modulo 10 to the corresponding digit of the preceding card to yield a rerandomized digit. It is this transformed table which is published here and which is the subject of the tests described below.
The transformation was expected to, and did, improve the distribution in view of a limit theorem to the effect that sums of random variables modulo 1 have the uniform distribution over the unit interval as their limiting distribution. (See Ref. 9 for a version of this theorem for discrete variates.)
These tables were reproduced by photo-offset from pages printed by the IBM model 856 Cardatype. Because of the very nature of the tables, it did not seem necessary to proofread every page of the final manuscript in order to catch random errors of the Cardatype. All pages were scanned for systematic errors, every twentieth page was proofread (starting with page 10 for both the digits and deviates), and every fortieth page (starting with page 5 for both the digits and deviates) was summed and the totals checked against sums obtained from the cards.[3]
Tests on the Random Digits
Frequency Tests. The table was divided into 1000 blocks of 1000 digits each and the frequency of each digit was recorded for each block. Then for each block a goodness-of-fit _{} was computed with 9 d.f. These 1000 values of _{} provided an empirical fit to the _{} distribution (with 9 d.f.); to test the fit, a goodness-of-fit _{} was computer using 50 class intervals, each of which was expected to contain 2 per cent of the values. (The number of intervals was chosen in accordance with the result of Wald and Mann (Ref. 10).) The value of the test _{} was 54.6 which, for 49 d.f., corresponded to about the 0.45 probability level.
To examine further the frequencies, the digits were tallied in 20 blocks of 50,000 digits each. The results are shown in Table 1 together with the goodness-of-fit _{} for each block. On the total frequencies the _{} (13.316) for 9 d.f. has been partitioned into three components as follows:
d.f. | Probability | ||
Odd versus even digits | 1.37 | 1 | 0.25 |
Within groups of odd digits | 7.90 | 4 | 0.10 |
Within groups of even digits | 4.04 | 4 | 0.40 |
Table 1
Frequencies of One Million Digits
Block No. | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
1 | 4923 | 5013 | 4916 | 4951 | 5109 | 4993 | 5055 | 5080 | 4986 | 4974 | 7.556 |
2 | 4870 | 4956 | 5080 | 5097 | 5066 | 5034 | 4902 | 4974 | 5012 | 5009 | 10.132 |
3 | 5065 | 5014 | 5034 | 5057 | 4902 | 5061 | 4942 | 4946 | 4960 | 5019 | 6.078 |
4 | 5009 | 5053 | 4966 | 4891 | 5031 | 4895 | 5037 | 5062 | 5170 | 4886 | 15.004 |
5 | 5033 | 4982 | 5180 | 5074 | 4892 | 4992 | 5011 | 5005 | 4959 | 4872 | 13.846 |
6 | 4976 | 4993 | 4932 | 5039 | 4965 | 5034 | 4943 | 4932 | 5116 | 5070 | 7.076 |
7 | 5011 | 5152 | 4990 | 5047 | 4974 | 5107 | 4869 | 4925 | 5023 | 4902 | 14.116 |
8 | 5003 | 5092 | 5163 | 4936 | 5020 | 5069 | 4914 | 4943 | 4914 | 4946 | 13.051 |
9 | 4860 | 4899 | 5138 | 4959 | 5089 | 5047 | 5030 | 5039 | 5002 | 4937 | 13.410 |
10 | 4998 | 4957 | 4964 | 5124 | 4909 | 4995 | 5053 | 4946 | 4995 | 5059 | 7.212 |
11 | 4948 | 5048 | 5041 | 5077 | 5051 | 5004 | 5024 | 4886 | 4917 | 5004 | 7.142 |
12 | 4958 | 4993 | 5064 | 4987 | 5041 | 4984 | 4991 | 4987 | 5113 | 4882 | 6.992 |
13 | 4968 | 4961 | 5029 | 5038 | 5022 | 5023 | 5010 | 4988 | 4936 | 5025 | 2.162 |
14 | 5110 | 4923 | 5025 | 4975 | 5095 | 5051 | 5035 | 4962 | 4942 | 4882 | 10.172 |
15 | 5094 | 4962 | 4945 | 4891 | 5014 | 5002 | 5038 | 5023 | 5179 | 4852 | 16.261 |
16 | 4957 | 5035 | 5051 | 5021 | 5036 | 4927 | 5022 | 4988 | 4910 | 5053 | 4.856 |
17 | 5088 | 4989 | 5042 | 4948 | 4999 | 5028 | 5037 | 4893 | 5004 | 4972 | 5.347 |
18 | 4970 | 5034 | 4996 | 5008 | 5049 | 5016 | 4954 | 4989 | 4970 | 5014 | 1.625 |
19 | 4998 | 4981 | 4984 | 5107 | 4874 | 4980 | 5057 | 5020 | 4978 | 5021 | 6.584 |
20 | 4963 | 5013 | 5101 | 5084 | 4956 | 4972 | 5018 | 4971 | 5021 | 4901 | 6.584 |
Total | 99802 | 100050 | 100641 | 100311 | 100094 | 100214 | 99942 | 99559 | 100107 | 99280 | 13.316 |
Of the 200 frequencies recorded in Table 1, 59 (29.5 per cent) deviate from 5000 by more than (= 30_{}5 = 67.08), and 8 (4 per cent) deviate from 5000 by more than 2. Of the twenty _{} values in Table 1, eight exceed the 50 per cent value (8.34), two fall below the 10 per cent value (4.17), and two exceed the 90 per cent value (14.7).
Poker Tests.Sets of 5 digits in blocks of 5000 digits were taken to be poker hands and were classified as:
Class | Symbol | Expected Frequency Per Block |
Busts | abcde | 302.4 |
Pairs | aabcd | 504 |
Two pairs | aabbc | 108 |
Threes | aaabc | 72 |
Full house | aaabb | 9 |
Fours} | aaaab | {4.5} |
Five} | aaaaa | {0.1} |
There were 200 sets of 1000 poker hands in the table, and for each set a goodness-of-fit _{} was computed with 5 d.f. (the fours and fives were combined). The manner in which these 200 values fit the _{} distribution is shown in Table 2.
Table 2
Distribution of Chi-square Values
Probability | Values of _{} | Expected Frequency | Observed Frequency |
P > .90 | 0 - 1.60 | 20 | 22 |
.90 >P > .80 | 1.61 – 2.35 | 20 | 19 |
.80 > P > .70 | 2.36 – 3.00 | 20 | 22 |
.70 > P > .60 | 3.01 – 3.70 | 20 | 19 |
.60 > P > .50 | 3.71 – 4.35 | 20 | 20 |
.50 > P > .40 | 4.36 – 5.20 | 20 | 29 |
.40 > P > .30 | 5.21 – 6.10 | 20 | 22 |
.30 > P > .20 | 6.11 – 7.30 | 20 | 15 |
.20 > P > .10 | 7.31 – 9.20 | 20 | 15 |
P < .10 | 9.21 or more | 20 | 17 |
200 | 200 |
The goodness-of-fit test gives:
= 7.7 for 9 d.f., P = 0.55.
The combined frequencies of poker hands in the whole table are shown in Table 3. The largest difference between expected and observed frequencies (for threes) is about 2.25 times its standard deviation, which is roughly at about the 9 or 10 per cent probability level (looking merely at the largest of five independent normal observations).
Table 3
Poker Test on The Million Digits (200,000 Poker Hands)
Classes | Expected Frequency |
Observed Frequency |
Busts (abcde) | 60,480 | 60,479 |
Pairs (aabcd) | 100,800 | 100,570 |
Two pairs (aabbc) | 21,600 | 21,572 |
Threes (aaabc) | 14,400 | 14,659 |
Full house (aaabb) | 1,800 | 1,788 |
Fours (aaaab) | {900} | {914} |
Fives (aaaaa) | {20} | {18} |
200,000 | 200,000 |
The goodness-of-fit test gives:
_{} = 5.5 for 5 d.f., P = 0.35.
Also, the frequencies of poker hands were computed for each of ten blocks of 100,000 digits and the mean and standard deviation was computed from the ten values for each kind of hand. The results are shown in Table 4.
Table 4
Mean and Standard Deviation of Frequencies in Seven Classes of Poker Hands
Classes | Theoretical Mean | Actual Mean | Theoretical Std. Dev. | Actual Std. Dev. |
Busts | 6048 | 6047.9 | 64.9 | 60.3 |
Pairs | 10080 | 10057.0 | 70.7 | 78.4 |
Two pairs | 2160 | 2157.2 | 43.9 | 45.8 |
Threes | 1440 | 1465.9 | 36.9 | 26.6 |
Full house | 180 | 178.8 | 13.4 | 8.9 |
Fours | 90 | 91.4 | 9.5 | 11.5 |
Fives | 2 | 1.8 | 1.4 | 1.9 |
Serial and Run Tests. Some further tests were made on the first block of 50,000 digits to look particularly for any evidence of serial association among the digits. The serial test classified every successive pair of digits by each digit of the pair in a ten-by-ten table. The frequencies of the different pairs are given in Table 5, where the first digit of the pair is shown in the left column of the table and the second digit is shown at the top. Thus there were 5l0 cases in which a zero followed a one. The frequency _{} for the row (or column) totals is 7.56, which is about the 0.60 probability level for 9 d.f.
Table 5
Frequencies of Ordered Pairs of Digits
First Digit | Second Digit/ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Total |
0 | 508 | 456 | 509 | 507 | 502 | 489 | 471 | 504 | 488 | 489 | 4923 | |
1 | 510 | 514 | 474 | 514 | 504 | 481 | 496 | 486 | 507 | 527 | 5013 | |
2 | 451 | 523 | 493 | 484 | 502 | 466 | 514 | 506 | 493 | 484 | 4916 | |
3 | 500 | 472 | 476 | 466 | 513 | 478 | 540 | 513 | 530 | 463 | 4951 | |
4 | 513 | 561 | 481 | 485 | 526 | 513 | 485 | 510 | 524 | 511 | 5109 | |
5 | 475 | 490 | 527 | 507 | 493 | 481 | 489 | 512 | 465 | 554 | 4993 | |
6 | 494 | 486 | 491 | 483 | 525 | 504 | 530 | 539 | 513 | 490 | 5055 | |
7 | 508 | 512 | 454 | 498 | 550 | 533 | 516 | 504 | 485 | 520 | 5080 | |
8 | 463 | 503 | 475 | 514 | 520 | 544 | 514 | 491 | 520 | 442 | 4986 | |
9 | 501 | 496 | 536 | 493 | 474 | 504 | 500 | 515 | 461 | 494 | 4974 | |
Total | 4923 | 5013 | 4916 | 4951 | 5109 | 4993 | 5055 | 5080 | 4986 | 4974 | 50000 |
Several essentially equivalent _{} values were computed from Table 5. First, assuming all pairs equally likely (expected value of 500 for each cell), a _{} of 107.8 was computed, which for 90 d.f. (because row totals equal column totals) is about the 0.10 probability level. Second, given the row frequencies and assuming digits equally likely to follow (expected value of 492.3 for cells of the first row, for example), a _{} of 98.9 was computed which is about the 0.25 level for 90 d.f. Third, the expected cell sizes were computed as 1 /10 the column totals to give a _{} value of 100.4, which is about the 0.20 level. Fourth, fitting all means to both row and column totals gave a _{} of 91.9 with 81 d.f, which is at about the 0.19 probability level.
Finally, in the same block of 50,000 digits all runs were counted with the obviously satisfactory results shown in Table 6.
Table 6
Run Test
Length of Run | Expected Frequency | Observed Frequency |
r = 1 | 40500 | 40410 |
r = 2 | 4050 | 4055 |
r = 3 | 405 | 421 |
r = 4 | 40.5 | 48 |
r = 5 | 4.5 | 5 |
Normal Deviates
Half of the random digit table was used to produce 100,000 standard normal deviates by solving for x in the equation
where D is a five-digit number from the table and
is the cumulative standard normal distribution. The Bureau of Standards tables of F(x) were used (Ref. 11).
The deviates were determined by the five-digit numbers on the left-hand half of every page of the digit table. The deviates in the first column correspond page by page with the five-figure digits in the first column of the first 200 pages of the digit table; the deviates in the second column correspond page by page with the first column of the second 200 pages of the digit table. Similarly, the third and fourth columns of deviates were derived from the second column of five-figure digits, etc.
A _{} test of the fit of the entire table of deviates to the normal distribution was performed using 400 class intervals (Ref. 10) with roughly 250 expected in each. The _{} value was found to be 346.4, which for 399 d.f. indicates a very close fit; the probability of a larger value of _{} is about 0.97. The detailed data for this test are given in Table 7.
Table 7
Goodness-of-fit Test for Normal Deviates
A more refined test of the fit in the tails was made on the deviates exceeding 2.326 in absolute value. Eighty intervals (Ref. 10) were used, each with an expectation of approximately 25. The _{} value was 76.26, with 80 d.f.; the probability of a larger value is about 0.61. The details of this test are given in Table 8.
Table 8
Goodness-of-fit of Normal Deviates in 1 Per Cent Tails
The only tests made on the squares of the deviates consisted in computing sums of k squares and comparing the distribution of the sums with the _{} distribution with k d.f., employing again the standard goodness-of-fit test. This was done for k = 25, 50, 100, 300, with the following results:
k | Number of Sums | Number of Intervals (i) | _{} with i – 1 Degrees of Freedom | Probability of a Larger _{} |
25 | 4000 | 100 | 92.92 | 0.66 |
50 | 2000 | 100 | 92.45 | 0.67 |
100 | 1000 | 50 | 57.75 | 0.19 |
300 | 333 | 34 | 38.70 | 0.23 |
The fourth column gives the goodness-of-fit _{} value for the fit to the _{} distribution with k degrees of freedom. Intervals of approximately equal probability were used in all cases.
Use of the Tables
The lines of the digit table are numbered from 00000 to 19999. In any use of the table, one should first find a random starting position. A common procedure for doing this is to open the book to an unselected page of the digit table and blindly choose a five-digit number; this number with the first digit reduced modulo 2 determines the starting line; the two digits to the right of the initially selected five- digit number are reduced modulo 50 to determine the starting column in the starting line. To guard against the tendency of books to open repeatedly at the same page and the natural tendency of a person to choose a number toward the center of the page: every five-digit number used to determine a starting position should be marked and not used a second time for this purpose.
The digit table is also used to find a random starting position in the deviate table: Select a five-digit number as before; the first four digits give the starting line (the lines being numbered from 0000 to 9999) and the fifth digit gives the starting position in the line.
Ordinarily, the table is read in the same direction as a book is read; however, the size of the table may be effectively increased by varying the direction in which it is read. Thus, one may read columns instead of lines, may read the table backward, may read lines forward but pages from bottom to top, etc. Of course, care must be taken in using these devices to avoid introducing correlations when the table is used more than once on the same problem.
To obtain a random permutation of the integers 1, 2, . . . , n, select a random starting position; use the five-digit number containing the starting position and the following n - 1 five-digit numbers; put the integers in the same order as these n five-digit numbers. In case of ties among the five-digit numbers, use additional columns to the right to make six or more digit numbers. The same procedure is used to obtain a random permutation of n objects, some of which are indistinguishable, by merely numbering the objects arbitrarily from 1 to n.
To obtain random observations from any distribution G(x), use Eq. (1), substitute G(x) for F(x), and employ as many digits in D as required for the desired accuracy of the observations. Of course the negative exponent of 10 in Eq. (1) must be equal to the number of digits in D. If G(x) has a discontinuity at x_{0}, define it to be continuous on the right and take the solution of Eq. (1) to be x_{0} when the left side of Eq. (1) falls between G(x_{0}-) and G(x_{0}). For example, if
G(x) = 1 – e^{-x},
and one is content with three-figure accuracy, then the three-digit number 082 determines an observation from a population distributed by G(x) as follows:
.0825 = 1 – e^{-x},
x = .086.
A technique suggested by von Neumann, called the "rejection method," enables one to substitute for the solution of Eq. (1) a stochastic process involving a much simpler computation; this technique will be discussed in a forthcoming book by Kahn (Ref. 8).
To obtain pairs of normal deviates with given correlation _{}, use pairs (x, y) of independent deviates from the table and transform them to
Thus for _{} = -.6, for example, if (.732, -1.205) are two deviates from the table, then
(.732, -1.403)
is a pair of deviates from a normal population with the desired correlation.[4]
In general, to obtain a random observation from a bivariate population with distribution G(x,y), one uses a marginal distribution on one variate, say, G_{1}(x), and the conditional distribution, say, G_{2}(y/x), on the other. Two random numbers determine the observation: one determines x by employing G_{1}(x) in Eq. (1), and the other determines y by employing G_{2}(y/x) in Eq. (1). Thus, if a probability density is uniform (and equal to two) over the triangle bounded by x = 0, y = 0, x + y = 1 and is zero elsewhere, then
and two four-digit random numbers, 5402 and 1770, determine the observation (.3220, .1200). The direct generalization of this procedure will determine observations from multivariate populations.
The tables of random digits and normal deviates comprise very large files, so only a sample page of each is included here.
Access to the complete tables in compressed format is provided on the document's main page.
References
1. Kendall, M. G., and B. B. Smith, Random Sampling Numbers, Cambridge University Press, 1939.
2. Juncosa, M. L., Random Number Generation on the BRL High-Speed Computing Machines, Ballistic Research Laboratories Report No. 855, Aberdeen Proving Ground, Maryland, 1953.
3. Meyer, H. A., L. S. Gephart, and N. L. Rasmussen, On the Generation and Testing of Random Digits, WADC Technical Report 54-55, Wright-Patterson Air Force Base, Ohio, 1954.
4. Moshman, Jack, "Generation of Pseudo-random Numbers on a Decimal Calculator," J. Assoc. Computing Machinery, Vol. 1, 1954, p. 88.
5. The Monte Carlo Method (Proceedings of a Symposium held in 1949), National Bureau of Standards Report AMS 12, Government Printing Office, Washington 25, D.C., 1951.
6. Curtiss, J. H., "Sampling Methods Applied to Differential and Difference Equations," Seminar on Scientific Computation, International Business Machines Corp., New York, 1949.
7. Kahn, H., and A. W. Marshall, "Methods of Reducing Sample Size in Monte Carlo Computations," J. Operations Res. Soc. of Amer., Vol. 1, 1953, pp. 263-278.
8. Kahn, H., Applications of Monte Carlo, The RAND Corporation (to be published).
9. Horton, H. B., and R. T. Smith, "A Direct Method for Producing Random Digits in Any Number System," Ann. Math. Statistics, Vol. 20, 1949, pp. 82-90.
10. Mann, H. B., and A. Wald, "On the Choice of the Number of Class Intervals in the Application of the Chi-Square Test," Ann. Math. Statistics, Vol. 13, 1942, pp. 306-317.
11. Tables of Probability Functions, 2d ed., U.S. Government Printing Office, Washington, D.C., 1948.
[1] References are listed on page xxiv.
[2] For readers who are not statisticians: The _{} (chi-square) test is a standard statistical criterion used to measure discrepancy from expectation. The probability value associated with the criterion ranges between zero and one. A small probability value, e.g., less than .05, indicates the possibility of a discrepancy or bias. For very large samples, such as is the case here, with 125,000 digits, the test is extremely sensitive and will result in a small probability value even though the bias may be quite trivial from a practical standpoint.
[3] Some issues concerning discrepancies in the numbers presented here are discussed in an addendum prepared in April 1997.
[4] RAND has a table of deviations for Gaussian-Markov chains with a discrete time parameter. There is one chain of 10,000 observations for each of the correlations: .600, .800, .900, .950, .970, .980, .990, .995.
Copyright © 1955 by The RAND Corporation
All rights reserved. Permission is given to duplicate this on-line document for personal use only, as long as it is unaltered and complete. Copies may not be duplicated for commercial purposes.
RAND is a nonprofit institution that helps improve public policy through research and analysis. RAND's publications do not necessarily reflect the opinions or policies of its research sponsors.