How do I calculate the inverse of the cumulative distribution function (CDF) of the normal distribution in Python?
Which library should I use? Possibly scipy?
How do I calculate the inverse of the cumulative distribution function (CDF) of the normal distribution in Python?
Which library should I use? Possibly scipy?
NORMSINV (mentioned in a comment) is the inverse of the CDF of the standard normal distribution. Using scipy
, you can compute this with the ppf
method of the scipy.stats.norm
object. The acronym ppf
stands for percent point function, which is another name for the quantile function.
In [20]: from scipy.stats import norm
In [21]: norm.ppf(0.95)
Out[21]: 1.6448536269514722
Check that it is the inverse of the CDF:
In [34]: norm.cdf(norm.ppf(0.95))
Out[34]: 0.94999999999999996
By default, norm.ppf
uses mean=0 and stddev=1, which is the "standard" normal distribution. You can use a different mean and standard deviation by specifying the loc
and scale
arguments, respectively.
In [35]: norm.ppf(0.95, loc=10, scale=2)
Out[35]: 13.289707253902945
If you look at the source code for scipy.stats.norm
, you'll find that the ppf
method ultimately calls scipy.special.ndtri
. So to compute the inverse of the CDF of the standard normal distribution, you could use that function directly:
In [43]: from scipy.special import ndtri
In [44]: ndtri(0.95)
Out[44]: 1.6448536269514722
ndtri
is much faster than norm.ppf
:
In [46]: %timeit norm.ppf(0.95)
240 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [47]: %timeit ndtri(0.95)
1.47 µs ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
scipy.stats.norm
methods.
Jan 29, 2021 at 19:55
norm.cdf(norm.ppf(0.95, loc=10, scale=2))
and I thought it was weird norm.cdf
did not have loc=10
and scale=2
too, I guess it should.
Jan 30, 2021 at 5:33
Starting Python 3.8
, the standard library provides the NormalDist
object as part of the statistics
module.
It can be used to get the inverse cumulative distribution function (inv_cdf
- inverse of the cdf
), also known as the quantile function or the percent-point function for a given mean (mu
) and standard deviation (sigma
):
from statistics import NormalDist
NormalDist(mu=10, sigma=2).inv_cdf(0.95)
# 13.289707253902943
Which can be simplified for the standard normal distribution (mu = 0
and sigma = 1
):
NormalDist().inv_cdf(0.95)
# 1.6448536269514715
# given random variable X (house price) with population muy = 60, sigma = 40
import scipy as sc
import scipy.stats as sct
sc.version.full_version # 0.15.1
#a. Find P(X<50)
sct.norm.cdf(x=50,loc=60,scale=40) # 0.4012936743170763
#b. Find P(X>=50)
sct.norm.sf(x=50,loc=60,scale=40) # 0.5987063256829237
#c. Find P(60<=X<=80)
sct.norm.cdf(x=80,loc=60,scale=40) - sct.norm.cdf(x=60,loc=60,scale=40)
#d. how much top most 5% expensive house cost at least? or find x where P(X>=x) = 0.05
sct.norm.isf(q=0.05,loc=60,scale=40)
#e. how much top most 5% cheapest house cost at least? or find x where P(X<=x) = 0.05
sct.norm.ppf(q=0.05,loc=60,scale=40)