Poisson Distribution Confidence Intervals in Scipy
2012 June 7
As happens often when doing transient searches, lately I’ve found myself wanting to compute confidence intervals in rate measurements from a small number of events. Clearly I want to be using the properties of the Poisson distribution, but how so, exactly?
The “Parameter Estimation” section of the above Wikipedia page gives some
hints but involves a few notation switches and also special functions that
don’t seem to be implemented in the handy
scipy.special
or
scipy.stats
`
modules. Once you track down the notation, though, it turns out to be really
easy. We want the inverse incomplete regularized Γ function.
Let’s say you observe n events in a period and want to compute the k confidence interval on the true rate — that is, 0 < k ≤ 1, and k = 0.95 would be the equivalent of 2σ. Let a = 1 − k, i.e. 0.05. The lower bound of the confidence interval, expressed as a potential number of events, is
scipy.special.gammaincinv(n, 0.5 * a)
and the upper bound is
scipy.special.gammaincinv(n + 1, 1 - 0.5 * a)
Conversion to rates is then just a matter of dividing by the relevant
duration, area, whatever. These apparently give the “exact” interval which may
be minutely more strict than is necessary. The halving of a is just because
the 95% confidence interval is made up of two tails of 2.5% each, so the
gammaincinv
function is really, once you chop through the obscurity, exactly
what you want.
So in this example, n is 14, a is 0.05, and we divide by 400 to get the interval bounds as reported (0.019135 to 0.058724, in case that webpage disappears).
It took me a while to understand the numbers in the first table on this page, but they’re very helpful once you get over that: they give the 95% confidence limits for a given number of observed events, when that number is small. I.e., if I observe one event over some period, my 95% confidence interval is 0.0253 to 5.5616 events. These numbers don’t quite make sense on their own, since you can only get integral numbers of events in practice, but you can use them to compute rate limits that are perfectly meaningful.
(Once you think about it, the relationship between the expressions for the two ends of the interval is kind of cute. Because the Poisson distribution can only take on integral values, the 2.5% upper limit to the value n must be the 2.5% lower limit to the value n + 1, because there are no intervening values that can be chosen.)