The reference are 2 postings the Math newsgroup. For short: compute R(x) = ( 1- cdfN(x) )/pdfN(x) through its Taylor series (which converges good and also gives error estimates) in x=a for any a = 1,...,10 with external pre-computed values R(a). Now multiply by pdfN(x) = exp( -x*x/2 - 0.91893... ) to get the cdfN. For higher accuracy take a finer grid, George Marsaglia explains why, see below. For larger x classical asymptotics would be better. The Math is easier than a rational approximation like Moro, the problem for a good precision here _is_ Excel and that's what i wanted to know. One can use data type decimal, but the guys from M$ murmurous suggest it for |values| below ~ 7 so i re-aranged the code. Multiplying to get cdfN additionally one has to set up a routine for exp (the build-in cuts after 16 digits). The sheet contains a function localExp for negative arguments using the same technique which for arguments between -1 and 0 should give exactness better than 10^-30 (in theory, compute the Lagrange error term). Practical it is not clear how the ugly data type behaves, so i did some checking: I tested the string output of myN(x) in the range -10 <= x <= 10 using steps of 1/1000 against Maple (there setting significant digits to 60 for computation and pre-defining all error values as 9.99... to be overwritten by the correct ones). Maple needs about 3 minutes to spit out the statitics: count(errorList); 20001 range(errorList); -.31669918176968306653738544000586364777397 * 10^(-19) .. .31669918176968306653738544000586364777397 * 10^(-19) mean(errorList); .8243012849357532123393830308484575760162 * 10^(-26) That says: all 20001 values have successfully been processed (otherwise 9.99... would be in the range of the error list) and even at the 'boundaries' a - 0.001 (a=1,2,...10) the errors are below 0.32 * 10^(-19). So a second test would look closer at this ranges. But i am too lazy for that, a check with Maple shows that this theoretical is true. ------------------------------------------------------------------------- Subject: Re: Approx of Normal Distribution and its invers Date: Thu, 10 Jan 2002 21:29:33 +0100 From: Hugo Pfoertner Organization: Talkline Internet Division http://www.tli.de Newsgroups: sci.math George Marsaglia wrote: > > Axel Vogt wrote in message news:3C20F835.DE905AE5@axelvogt.de... > > Hi. > > > > In the 'enclopetic dictionary' i find some polynomial approximations > > of the normal distribution and one for its inverse with a maximal > > error of 10^-4. > > > > I remember (?) it has something to do with diverging series. > > > > Are there any further results (available on the web) with smaller > > error (may be not using rational functions and exp, but for the > > 'center' and not only far outside)? > > Let Phi be the normal distribution function and f its density. > If you write 1-Phi(x) = R(x)*f(x) > then terms for the Taylor series of R are easily expressed recursively. > Thus with a table of 20 values of 1-Phi and 8 terms of the Taylor series > you can get Phi accurate to about 12 places, and with a table of > 120 you get Phi to 15 places. > > I have had a Fortran version for some 30 years, if anyone would like a copy, > with a C version of a few years ago. > > George Marsaglia > > On second thought, since people such as Harris can clutter up the Internet > with trash, why not include a copy of the 20-table Fortran version? > function Phi(z) > implicit real*8(a-h,o-z) > real*8 r(0:19) > data r/1.253314137005841, 1.175451907906121, 1.101176439464860, > & 1.030105984709400, .9618781946289486, .8961465252562088, > & .8325768132417638, .7708440227285019, .7106292052927357, > & .6516167774466720, .593492322760325, .535941292274201, > & .478649243294768, .421304670042080, .363606084920330, > & .305275792412258, .246083511077006, .185882720934051, > & .124658929655036, .0625777004443072/ > x=abs(z) > j=20.-31.91538/(x+sqrt(x*x+2.546479)) > v=(1-j/20d0)*r(0) > yj=1/v-v*.6366197723675813 > H=dABS(X)-yj > F=r(J) > F1=F*x-1d0 > F2=(F+x*F1)*.5d0 > F3=(F1+x*F2)*.333333333333333D0 > F4=(F2+x*F3)*.25D0 > F5=(F3+x*F4)*.2D0 > F6=(F4+x*F5)*.166666666666667D0 > F7=(F5+x*F6)*.14285714285714286D0 > F8=(F6+x*F7)*.125D0 > TAYLOR=F+H*(F1+H*(F2+H*(F3+H*(F4+H*(F5+H*(F6+H*(F7+H*F8))))))) > Phi=TAYLOR*dEXP(-.5d0*X*X-.918938533204672D0) > if(z.lt.0d0) Phi=1d0-Phi > return > end I have tested this program and it only works correctly, if the following change is made: Replace the part F1=F*x-1d0 F2=(F+x*F1)*.5d0 F3=(F1+x*F2)*.333333333333333D0 F4=(F2+x*F3)*.25D0 F5=(F3+x*F4)*.2D0 F6=(F4+x*F5)*.166666666666667D0 F7=(F5+x*F6)*.14285714285714286D0 F8=(F6+x*F7)*.125D0 by F1=F*yj-1d0 F2=(F+yj*F1)*.5d0 F3=(F1+yj*F2)*.333333333333333D0 F4=(F2+yj*F3)*.25D0 F5=(F3+yj*F4)*.2D0 F6=(F4+yj*F5)*.166666666666667D0 F7=(F5+yj*F6)*.14285714285714286D0 F8=(F6+yj*F7)*.125D0 The Taylor expansion is made at the left boundary of the intervals, which is yj. George Marsaglia als sent me the version with a table of 120 phi values at equally spaced abscissa. This 120-table version worked correctly. Hugo Pfoertner ------------------------------------------------------------------------ Subject: Re: Integrating a Bell Curve Date: Tue, 20 Aug 2002 11:29:38 GMT From: "George Marsaglia" Organization: Giganews.Com - Premium News Outsourcing Newsgroups: sci.math ... &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Attached are extracted portions of a LaTeX file that will provide the file cPhi.ps for the next edition of The Marsaglia Random Number CDROM with The Diehard Battery of Tests of Randomness It may be of interest to those considering methods for "integrating a bell curve" \newcommand{\cPhi}{\mbox{cPhi}} \newcommand{\RR}{\mbox{R}} \begin{title}{Evaluating the Normal Distribution Function} Let $\phi(x)$ be the standard normal density function, $\phi(x)=e^{-x^2/2}/\sqrt{2\pi}$ and let $\Phi(x)$ and $\cPhi(x)$ be the corresponding distribution and complementary distribution functions: $$\Phi(x)=\int^x_{-\infty} \phi(t)\,dt,\qquad \cPhi(x)=1-\Phi(x)=\int_x^\infty\phi(t)\,dt .$$ This note advocates a simple method for direct evaluation of $\cPhi(x)$ for $x>0$ to the accuracy of the floating point system involved---single, double or extended precision. Unlike most of the available methods, which evaluate the error function at $x/\sqrt{2}$ using often mysterious methods for different ranges of $x$---continued fractions, rational or Chebychev approximation, asymptotic expansions---the method here is a straightforward application of a Taylor series whose terms arise from a simple recursion. It is an unfortunate accident of history that the `error function', named for the way that sample values differ from the assumed true value and obviously based on the normal distribution, was set up for $e^{-t^2}$ rather than for $e^{-t^2/2}$, with proper exponent belonging to the standard normal density. So the overwhelming majority of users have to put up with transforming $\mbox{erf}(x)=(2/\sqrt{\pi})\int_0^xe^{-t^2}\,dt$ to the standard normal distribution function by means of $$\Phi(x)=(1/\sqrt{2\pi})\int_{-\infty}^xe^{-t^2/2}\,dt= \frac{1}{2}+\frac{1}{2}\mbox{erf}(x/\sqrt{2}).$$ Curses on that unknown scientist, (a physicist?, asrtonomer?), who inflicted that onerous transformation on us. (To the argument that the error function is really defined as a solution to the differential equation $y''(z)+2zy'(z)=0$, one can reply that $y''(z)+zy'(z)=0$ is simpler and has solution $\Phi(z)$.) Now define the function $\RR(x)$ by means of $$\RR(x)\phi(x)=\int_x^\infty \phi(t)\,dt, \mbox{ that is, } \RR(x)=\cPhi(x)/\phi(x).$$ The ratio $\RR(x)=\cPhi(x)/\phi(x), x\ge0$ is a well behaved, convex function. It starts at\\ $\RR(0)=\sqrt{\pi/2}=1.2533\ldots$ then drops steadily toward zero. The graph of $y=\RR(x)$ looks much like that of $y=2/(x+\sqrt{x^2+8/\pi})$, Both are plotted in Figure 1.\\[2mm] Because $\RR(x)$ has an easily-developed Taylor expansion, one can use a few exact values, say $\RR(1)$,$\RR(2)$, $\RR(3)$,$\RR(4),\ldots$, then use the Taylor expansion to get R at intermediate points. >From $\RR(x)$ one easily obtains $\cPhi(x)$ by multiplication: $\cPhi(x)=\RR(x)\phi(x)$. \mbox{To get the Taylor expansion of $\RR$, differentiate $\\R(x)\phi(x)=\int_x^{\infty} \phi(t)\,dt$} on both sides to get $$ \RR'(x)\phi(x)-x\RR(x)\phi(x)=-\phi(x),\mbox{ that is, } \RR'=x\RR-1.$$ Then $$\RR''=x\RR'+\RR;\qquad \RR'''=x\RR''+2\RR';\qquad \RR''''=x\RR'''+3\RR'';$$ and in general, if $\RR^{[k]}$ means the $k$'th derivative of $\RR$, then for $k\ge1$: $$\RR^{[k+1]}(x)=x\RR^{[k]}(x)+k\RR^{[k-1]}(x).$$ This leads to a simple way to evaluate $\RR(z+h)$, given the exact value $a=\RR(z)$, (using C expressions): \begin{verbatim} b=z*a-1; pwr=1; s=a+h*b; for(i=2;;i+=2) { a=(a+z*b)/i; b=(b+z*a)/(i+1); pwr=pwr*h*h; t=s; s=s+pwr*(a+h*b); if(s==t) break; } \end{verbatim} One expects the worst errors will be when $h$ is nearly -1, that is, when $x$ just exceeds one of the tabled values, say $x=.05,1.05,2.05,\ldots,14.05$. Here is a list of the values returned for those $x$'s, followed by the exact values, all to the nearest 15 digits: \begin{verbatim} cPhi( .05)=.480061194161628 cPhi( 8.05)=.413970181627315e-15 480061194161628 413970181627315 cPhi(1.05)=.146859056375896 cPhi( 9.05)=.714841701126973e-19 146859056375896 714841701126975 cPhi(2.05)=.201822154057044e-1 cPhi(10.05)=.459337105561296e-23 201822154057044 459337105561297 cPhi(3.05)=.114420683102270e-2 cPhi(11.05)=.109607441199641e-27 114420683102270 109407441199642 cPhi(4.05)=.256088164740415e-4 cPhi(12.05)=.969749658013101e-33 256088164740414 969749658013097 cPhi(5.05)=.220905032269544e-6 cPhi(13.05)=.317737014468943e-38 220905032269543 317737014468942 cPhi(6.05)=.724229170513765e-9 cPhi(14.05)=.385170178073225e-44 724229170513760 385170178073226 cPhi(7.05)=.894588955876993e-12 894588955876992 \end{verbatim} One rarely needs values of the normal CDF $\Phi(x)$ or its complement $\cPhi(x)$ accurate to as many as 15 digits, or for values $x$ beyond 6 or so, but the easily implemented Taylor series for $\\R(x)=\cPhi(x)/\phi(x)$ makes such accuracy available with only a few stored values of $\\R(x)$ and few lines of code. Here is a listing of a C function that evaluates cPhi to that 15 digit accuracy: \begin{verbatim} double cPhi(double x){ long double v[]={0.,.65567954241879847154L, .42136922928805447322L,.30459029871010329573L,.23665238291356067062L, .19280810471531576488L,.16237766089686746182L,.14010418345305024160L, .12313196325793229628L,.10978728257830829123L, .99028596471731921395e-1L,.90175675501064682280e-1L, .82766286501369177252e-1L,.76475761016248502993e-1L, .71069580538852107091e-1L,.66374235823250173591e-1L}; long double h,a,b,z,t,sum,pwr; int i,j; if(x>15.) return (0.); if(x<-15.) return (1.); j=fabs(x)+1.; z=j; h=fabs(x)-z; a=v[j]; b=z*a-1.; pwr=1.; sum=a+h*b; for(i=2;i<60;i+=2){ a=(a+z*b)/i; b=(b+z*a)/(i+1); pwr=pwr*h*h; t=sum; sum=sum+pwr*(a+h*b); if(sum==t) break; } sum=sum*exp(-.5*x*x-.91893853320467274178L); if(x<0.) sum=1.-sum; return ((double) sum); } \end{verbatim} A Fortran version is virtually the same, except a {\tt goto} may be needed to exit the loop when the new term is too small to affect the sum. George Marsaglia