0% found this document useful (0 votes)
43 views8 pages

Marsaglia 1994

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views8 pages

Marsaglia 1994

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Statistics & Probability Letters 19 (1994) 259-266 15 March 1994

North-Holland

Rapid evaluation of the inverse


of the normal distribution function
George Marsaglia * and Arif Zaman
Supercomputer Computations Research Institute and Department of Statistics, The Florida State University, Tallahassee, FL, USA

John C.W. Marsaglia


Department of Computer Science, Western Oregon State College, Corvallis, OR, USA

Received August 1992


Revised May 1993

Abstract: Here is a method for very fast evaluation of the inverse of the normal distribution-in two versions. The first, given ~1,
rapidly produces the solution n to 2 1,” 4(t) dt = u, to within the accuracy available in single precision arithmetic. The second is
faster. Using one less term in an expansion, it provides accuracy to within 0.000002-suitable for generating a normal random
variable by direct inversion of its distribution function.

1. Introduction

The main result of this article is a method for generating a normal random variable by direct inversion of
its distribution function. Such a goal has become increasingly important for use in supercomputers or
massively parallel systems, where hundreds or thousands of processors are in lock-step. The next step in
a Monte Carlo simulation cannot begin until all of the processors have finished generating their
particular normal random variable in the current step.
For single CPU’s, the fastest methods achieve high average speed with a mixture of a very fast and a
quite slow part. (The fastest current method is probably the ziggurat method of Marsaglia and Tsang
(19851.1 Such methods are not suited for implementation in massively parallel systems, where speed is
determined not by the average, but by the slowest of the times to produce many normal variables.
In addition, producing a normal random variable x as a monotonic function of a uniform variable u
has important uses in variance reduction. Such x’s can only be produced by inverting the normal
distribution or its complement.
We have recently published a fast, constant-time method for generating normal variates (Marsaglia,
1991). It also inverts the normal distribution function and is comparable in speed to the one we develop
here. But it requires coefficients for the minimax fit of cubits to 256 ranges of the input uniform variable
-too many for listing in a journal; a program to generate them is too long as well. These coefficients are
readily distributed by e-mail, and we have responded to a number of requests. But many correspondents

Correspondence to: Dr. G. Marsaglia, Department of Statistics, Florida State University, Tallahassee FL 32306, USA.
* Research supported by the National Science Foundation, Grant DM9206972.

0167-7152/94/$07.00 0 1994 - Elsevier Science B.V. All rights reserved 259


SSDI 0167.7152(93)E0112-7
Volume 19, Number 4 STATISTICS & PROBABILITY LETTERS 15 March 1994

-for example in such places as Bulgaria, Cuba and China-have had to get the tables by ordinary mail
and enter them by hand.
So we suggest an alternative method here. It is marginally faster, producing the normal variable fx in
terms of a uniform variable u as the solution to the equation cPhi(x) = $, where cPhi is the
complementary normal integral cPhi(x) = /,” 4(t) dt. It has the additional advantage that its large table
of constants is more easily generated than that for the earlier method (Marsaglia, 1991).
To make the constants readily available, we provide a short Fortran program for very accurate
determination of the complementary normal distribution function cPhi.
There is no need to dwell on the importance of the normal distribution and its inverse in statistical
computing, but some remarks on methods-or lack of suitable methods-seem appropriate. Many
scientific subroutine libraries have the normal distribution available, but only indirectly, through the
error function erf. Thus @(x> = /?, +(t> dt = 0.5 + 0.5 erf(x/ a>. Values of erf(x) are usually obtained
by rational approximations when x is small and through an asymptotic or continued fraction expansion
when x is large.
These methods might not require large tables of constants, but they often are complicated programs
that result in quite large * .exe files-often larger than those from faster methods using many more
tabled constants but simpler arithmetic.
Concern for memory locations played an important role when methods for erf were developed. But
now memory is cheap, and a few thousand memory locations are of no concern, particularly when they
lead to much faster and simpler methods.
Our main concern here is the inverse of the normal distribution function. We develop a method so
fast that it is suitable for generating a normal random variable by inverting its distribution, using a
random uniform argument. But the method requires two large tables of 1024 constants. These must be
obtained from @ itself. So, in the last section, we give methods for evaluating @, or, more properly, the
complementary standard normal integral cPhi(x) = /,” 4(t) dt.
We provide a small Fortran program that will compute cPhi(x1 to the full 15 significant digits
available in double precision for the range of x of most interest, 0 G x < 6.4. For example, it returns
cPhi(1.96 D 0) = 0.0249978951482205 while the true value to 17 significant digits is 0.024997895148220434.
For the rarely required x’s out to x = 15, the number of correct significant digits might be only 12 or 13.
For example, the little Fortran program will return cPhi(12.4 D 0) as 0.130661798312446 D - 034, while
the first 17 significant digits of the true value are 13066179831246405980.

2. The inverse normal distribution

Our primary goal is to develop a very fast method for inverting the normal distribution, using a uniform
random variable u as input. Almost all such u’s come from floating a 31 or 32 bit integer from a random
number generator. This puts some restrictions on the available u’s: they must lie in the range
2-32 < u < 1. For generating a normal variable by inverting cPhi we will use a uniform variate u on
(- 1, 1). Such a u comes from multiplying a 32-bit (signed) integer by 2-32. We use the positive part for
the inversion and the sign to provide negative x’s.
So assume a uniform (0, 1) random variable u, expressed in normalized floating point form. Such a u
may be represented

u = 2-k x ($ + &j + m/224), (1)

with 0 < k < 32, 0 <j < 32 and 0 <m < 218. In most CPU’s with bits bob, . . . b,,, k is 126 minus the
integer represented by bits 1 to 8, j is the integer represented by bits 9 to 13 and m is the integer in bits
14 to 31.

260
Volume 19, Number 4 STATISTICS & PROBABILITY LETTERS 15 March 1994

It is easy, and fast, to recover k, j and m from a normalized floating point variable u by means of a
few machine language instructions or through instructions in C. Many Fortran compilers have instruc-
tions (right shifts, logical and’s) that will also provide k, j and m.
We now describe a method based on this idea: for each choice of k and j, (32 X 32 = 1024 choices in
all), let

~,=2-~ X(++ &j).

Then for each such uO, define x0 by the relation

21rn4( t) dr = u”,
XI1

where +(t> = (2,)-‘/2e-t2/2 is the standard normal density.


Our goal is to evaluate the function x(u) defined by

(We need only solve for positive x, and since Q(O) = 0.5, we get more accuracy by solving the displayed
equation. If we solved for u rather than for fu we would ‘waste’ one of u’s bits.)
We choose 1024 ‘base’ values u0 and will use the Taylor series for x(u):

x(u,+h) =x(uo) -tx’(u,)h + +X”(LL@+ +X”‘(u(#z3+ ... .

Our choice of the base values is motivated by the way floating point numbers are stored in most modern
CPU’s. Every value u with representation (1) may be written u = u0 + h, with

u0 = 2-k x (+ + &j) and h = 2-k x (m/224).

Now, to find the Taylor expansion of x(u), differentiate both sides of the relation 2/r+(f) dt = u with
respect to x to get du/dx = -24(x). The reciprocal then gives dx/du = - 1/[24(x)]. Successive
differentiation with respect to u then provides the next several derivatives:

d2x X d”x 1+2x2 d4x 7x + 6x3


-=_ -= _~ -=
du2 4$(x)” ’ du3 84(x)3 ’ dU4 164(x)” ’

These provide a series for x(u). If u = u0 + h, let x0 be the value associated with the base value u(,:
2/,“o+(t> dt = u”. Then, with t = h/c$(x,),

x(u,, + h) =x0 - it + $xot2 - A(1 + 2x,2)t3 + . . . . (2)


Since we have an alternating series with terms of diminishing magnitude, stopping after the t 3 term
will cause an error less than )&(7x + 6x3)t3 I. For the base values u0 we have chosen, t will be quite
small. The greatest error occurs when k = j = 0 and m = 218 - 1. In that case, u = u0 + h = 0.5 + 0.015625
and the true x associated with that u is 0.650104. Series (2) through the t3 term yields x = 0.650104, to
the limit of single precision accuracy. Thus all u values in 0 < u < 1, using (2) through the t3 term,
provide an x value to within the limit of single precision arithmetic. This may also be verified by
comparing with x’s that result from a high-precision routine for inverting cPhi0, such as that below. The
result is a saw-toothed error curve with peaks at points for which m = 2” - 1.
Note that-using series (2) only through the t2 term provides accuracy suitable for generating a normal
random variable in terms of a uniform random variable u. The greatest error is again at u = 0.5 +
0.015625, when k = j = 0 and m = 218 - 1. The returned x value is 0.650108 compared to the true value

261
Volume 19, Number4 STATISTICS& PROBABILITYLETTERS 15March1994

of 0.650104. Few users would be unhappy with this error, the worst possible, in a normal random
variable. (The tabled x0 values may be adjusted so that the worst error is halved, making it 0.000002.)

2.1. Implementation: General remarks

We have 1024 base values u0 corresponding to the u values u = 2-k X (i + &j> for 0 =Gk < 32 and
0 <j < 32. We must create a table, say A(k, j), of x values corresponding to those u values. Thus

2/A9k jj+( t) dt = 2-k x (i + &j).

The A’s are all we need to evaluate series (2), since +(A(k, j)) may be computed, albeit slowly. With
memory so cheap and plentiful, it is reasonable to store a second table of 1024 values, say B(k, j),
defined by

B( k, j) = exp( $A( k, j)’ + 0.918938533204672).

Then, to solve the equation 2/:4(t) dt = u for x to within the limits of single precision, given u as a
32-bit normalized floating point number:

Extract k and j from U.


Form h = m/224fk,
(By subtracting from u the result of ‘anding’ out the last 18 bits of u.)
Then, with t = hB(k, j>,
x =A(k, j) - it + $(k, j)t2 - $1 + 2A(k, jj2)t3.

For worst-case accuracy of about 0.000002, adequate for generating normal random variables, the last
statement may be replaced by the simpler

x =A( k, j) - +t + iA( k, j)t”.

Note that code for these steps can be speeded up in several ways, omitted above for clarity. For
example, if B(k, j) is divided by \/s, the using t = h * B(k, j) changes the second denominator, 8, in the
truncated series to 1. This saves a multiplication, and of course using Horner’s rule for the polynomial
saves another: With v = t/6,

and

A - it + $At2 =A - v(fi -Au).

2.2. Implementation: Some shortcuts

We need tables A(k, j) and B(k, j), but more efficient code comes from using a single parameter, n,
determined by the appropriate bits of U, bits 1 to 8 for k and 9 to 13 for j. The range 0 < k < 32 and the
offset-126 representation of floating point numbers require a little thought about mapping of the pair (k,
j) to n and its inverse.
It turns out that we table A(n) and B(n), where n is extracted directly from bits bib, * . . b,, of the
32-bit floating point input U: n = L - 3040. Here L is the integer resulting from shifting u 18 bits to the
right.

262
Volume 19, Number 4 STATISTICS & PROBABILITY LETTERS 15 March 1994

Conversely, in creating the tabled values we may loop k and j from 0 to 31 and put n = 992 - 32 * k + j.
For example, this Fortran code (with statements separated by semicolons to save space) will create the
tables:

do 2 k=0,3l;do 2 j=O,31; u=2.**(-k)*(.5+j/64.); n=992-32*k+j;


A(n)=cPhinv(.5*u); 2 B(n)=exp(.5*A(n)**2-0.12078224)

That Fortran segment requires a function c P h i nv ( p 1 which, for input p, returns the value x that
solves /r+(t) dt =p. This may be done by using Newton’s method on the error function erf(x) available
in most scientific libraries. A subroutine for computing the normal distribution Q(x) or its complement
cPhi(x) is described below, as well as suggested code for inverting cPhi to get cPhinv by Newton’s
Method.
Finally, for positive input u, this segment of Fortran code, again compacted, will compute the solution
x to- 2 1,” +(t> dt = u to within the limit of single-precision arithmetic:

equivalence(u.iu),(w,iwI; iw=iand(iu,2147221504); n=ishft(iw,-18)-3040;


v=(u-w)*B(n); x=A(n)-v*(1.414214-v*(A(n)- .4714045*(1.+2.*A(n)**2)*v))

Some Fortran compilers will do shifts and logical and’s on any computer word rather than only on
integers. Other, such as Lahey’s, require integer arguments, hence the equivalence statement. Use of the
i-prefix in ishft and iand emphasizes the integer input and output of those functions. A negative second
argument in ishft means a right shift.
The reader should easily adapt these statements to his particular language. Of course, any serious
implementation of these ideas should be done in machine language.
We conclude this section with a short Fortran function subprogram rnor(> for generating a normal
random variable. It calls a random number generator vn i ( > that returns a signed uniform variable
between - 1 and 1 (such as what arises from multiplying a signed 32-bit random integer by 2-32). It also
requires: (1) use of the statement: en t r y a b s e t ( 1, to initialize the A and B tables before subsequent
calls to the primary entry r no r ( ) (note that some compilers may require a s a v e a, b statement) and
(2) a subroutine c P h i n v ( p 1 that will provide the solution x to the equation /r+(t) dt = p. A function
that provides such a subroutine through Newtons’s method is described below.
The following compacted Fortran program for generating a normal variable is provided to illustrate
the method. It may be easily translated to other high level languages-again for illustration. Any serious
implementation should be in machine language.

function rnor0; real*8 cPhinv; real*4 a(0:1023),b(0:1023);


equivalence(u,iu),(w.iw); u=vniO; iw=iand(iu,2147221504)
L=ishft(iw,-18); n=L-3040; v=(abs(u)-w)*B(n)
rnor=sign(A(n)-v*(1.414214-v*ACn)I,u); return;
entry abset0; do 2 k=0,31; do 2 j=O,31; n=992-32*k+j;
u=2.**(-k)*(.5+j/64.); a(n)=cPhinv(.5dO*u);
2 b(n)=dexp(.5dO*a(n)**2-0.120782237635245); abset=l.; return; end

3. Evaluating the complementary normal distribution function cPhi(x) = 1 - a(x) = /;c$( t) dt

The above method for rapid inversion of the normal distribution function requires a table of 1024 ‘exact’
values, at points determined by the exponent and first five bits of the fraction of a floating point
argument.

263
Volume 19. Number 4 STATISTICS & PROBABILITY LETTERS 15 March 1994

How does one establish such a table? If a double-precision error function, erf, or better, the
complementary error function, erfc, is available, one may use it, with, say Newton’s method, to solve the
equation /T+(t) dt =p for x, given p. This would provide the function cPhinv(u) required above.
Those not having a double-precision error or complementary error function available may want to
write their own. Better yet, even those with an error function available may want to fashion a function,
say cPhi(x), that will provide /:4(t) dt to the full accuracy available in double precision on most systems
-those using paired 32-bit words for double-precision arithmetic. Given such a function cPhi, one may
solve the equation cPhi(x) =p for x as a function of p > 0 by this segment of Fortran code, starting with
an accurate initial guess related to the Polya approximation:

x=dsqrt(abs(-1.6*d1og(1.00004d0_0**2~~~
1 y=x+~cphi~x~-p~*dexp~.5dO*x**2+.9189385332O4672dO~
if(dabs(y-x).lt.'l.d-12) go to 2
x=y
go to 1
2 cphinv=y

This Newton iteration may be used with double precision erf or erfc functions if they are available, or
with one of the following two versions of a cPhi function. The last one is self-contained and easily able to
provide the accuracy needed for making the table A(O), . . . , A(10231 needed for rapid inversion of cPhi,
described above.
We now describe a method for evaluating cPhi(x) = /:4(t) dt to that precision. It avoids the nuisance
conversion of / e-” to the more commonly required / e t2/2 . In addition, it uses a single method for all
x, unlike implementations of erf or erfc, which use rational approximations over some ranges of x and
asymptotic approximations for others.
Define the ratio R(x) = cPhi(x)/4(x). The function R(x), x >, 0, is well behaved, starting at
1.2533 . . . then dropping steadily toward zero. The graph of y = R(x) looks much like that of
y = 2/(x + \/x2+). R(x) has an easily-developed Taylor expansion, and from R(x) one easily
obtains cPhi(x) by multiplication: cPhi(x) = R(x)+(x).
We describe two versions of a program for cPhi based on R(x). The first uses a table of 121 constants
and computes eight terms of the Taylor series for R(x). It will compute cPhi(x) for all x < 15 with
relative accuracy of 10-i5. For example, the true value of cPhi(14.123) is, to 16 digits,
0.1370354214957889 x 1O-44 while this program returns the value 0.137035421495790 X 10-44.
The second method uses only 15 constants, but more terms of the Taylor series to get 15 digit accuracy
for all but extreme values of cPhi, where accuracy might be only 12 digits. Note: that is relative accuracy,
much more difficult to attain than absolute accuracy for the function cPhi. For example, the true value of
cPhi(lO.5) is 0.43190063178092303465 X 1O-25 to 20 places, while our second method will return
cPhi(lO.5 D 0) = 0.431900631780923 D - 025. One easily gets 24 place absolute accuracy by merely
returning cPhi(lO.5) = 0.
For the first method, make a table of 121 constants:

V(O) = R(O) 7 V(1) =qg, V(2) =R($...,V(120) =R(15).


Then with x = ij + h and j from 0 to 120 and 1h 1 < 0.0625,

R(~j+h)=V(j)+h(R,+h(R2+h(R3+h(R4+h(RS+h(Rb+h(R,+hRs)))))))

will compute R(x) with a maximum relative error of less than lo-l5 for all x in 0 <X < 15. The
coefficients R,, R,, . . . , R, come from the derivatives of R(x), developed recursively: with z = ij, and
R, = R(z) = V(j), R, = (Rn_2 +zR,_,)/n.

264
Volume 19, Number 4 STATISTICS & PROBABILITY LETTERS 15 March 1994

Except for the tabled values I/(O), . . . , 1/(120), a Fortran program for this method is short enough to
list directly, using a semicolon to separate statements:

function cPhi(x); implicit real*g(a-h,o-z); real*8 v(O:120); data v\...\;


j=8dO*dabs(x)+.5dO;j=min(j,l20); z=j*.125dO; h=dabs(x)-z; r=v(j);
rl=r*z-IdO; r2=(r+z*rl)*.5dO; r3=(rl+z*r2)/3dO; r4=(r2+x*r3)*.25dO;
r5=(r3+z*r4)*.2dO; r6=(r4+z*r5)/6dO; r7=(r5+z*r6)/7dO;
r8=(r6+z*r7)*.125dO;
t=r+h*(rl+h*(r2+h*(r3th*(r5+h*(r6+h*(r7th*r8)))))));
cPhi=t*dexp(-. 5dO*x*x-.9189385332046727dO);
if(x.lt.OdO) cPhi=ldO-cPhi; return; end;

That is the entire program, except for the values u(O), . . . , ~(120) that will have to be listed in data
statements. They must be calculated to full 16 digit accuracy by some other method having access to an
erfc function or equivalent in order to get V(j) = /,“s$~(t) dt/$(ij).
This is the program we have in our library of statistical functions. If you have access to a function
library with a double precision erf or erfc function and are content with its black-box uncertainties and
the nuisance of manipulating arguments to convert to the Phi function, so important in Statistics, you
may ignore this version. But those who want a direct evaluation of Phi(x) and an understanding of the
way it is obtained may wish to implement the above.
We now list a second program in entirety, including the constants. It has lower accuracy: worst-case 12
significant digits rather than 15, but 1Zdigit accuracy is adequate for all but a few applications. This
version uses more terms in the Taylor series for R(x), but requires so few constants: V(O), . . . , V(14), that
it is feasible to list them.

function cPhi(x); implicit real*8(a-h,o-z); real*8 v(0:14);data v/


&1.253314137315500d0, .6556795424187985dO, .4213692292880545dO.
&.3045902987101033d0, .2366523829135607dO, .1928081047153158d0,
&.1623776608968675dO, .1401041834530502d0, .1231319632579329dO,
&.1097872825783083dO, .9902859647173193d-1, .9017567550106468d-1,
&.8276628650136917d-1, .7647576101624854-l, .7106958053885211d-I/
cPhi=.5dO-sign(.5dO,x);ifo.ge.l5) return; j=dabs(x)+.5dO;
j=min(j,l4); z=j; h=dabs(x)-z; a=v(j); b=z*a-IdO; pwr=ldO; sum=a+h*b;
do 2 i=2,24-j,2; a=(a+z*b>/i; b=(b+z*a)/(i+l); pwr=pwr*h**2
2sum=sum+pwr*(a+h*b); cPhi=sum*dexp(-.5dO*x*x-.918938533204672dO);
if(x.lt.OdO) cPhi=ldO-cPhi;return;end;

Take only a few minutes to type in the statements of this program and check it out. Then Presto: you
have a nice little Fortran (or equivalent) subroutine for the complementary normal distribution function
without the bother or uncertainty of error functions on the system library. Furthermore, you’ll have
about 15 digit accuracy for the all-important range, 0 GX G 6.23026, when cPhi(x) ranges from 2-l to
2-32.
That is the range of x’s likely to be required in practice. The subroutine only begins to lose a few
digits for extreme values of x, when cPhi(x) becomes vanishingly small. The worst case is around
x = 14.5, when the latter version returns cPhi(14.5 D 0) = 0.605749476441401 x 10e4’, while the earlier,
120-constant version returns 0.605749476441522 x lo- 47. The true value, to 18 significant digits, is
0.605749476441522078 x 10-47.
Who among us is likely to want more detail in probabilities on the order of 10p4’?

265
Volume 19. Number 4 STATISTICS & PROBABILITY LETTERS 15 March 1994

References

Marsaglia, G. and WaiWan Tsang (1985), A fast, easily imple- Marsaglia, G. (19911, Normal (Gaussian) random variables for
mented method for sampling from decreasing or symmet- supercomputers, .I. Supercomput. 5, 49-55.
ric unimodal densities, SLAM J. Sci. Statist. Comput. 5,
349-359.

266

You might also like