Poisson Distribution Confidence Intervals in Scipy

2012 June 7

As happens often when doing transient searches, lately I’ve found myself wanting to compute confidence intervals in rate measurements from a small number of events. Clearly I want to be using the properties of the Poisson distribution, but how so, exactly?

The “Parameter Estimation” section of the above Wikipedia page gives some
hints but involves a few notation switches and also special functions that
don’t seem to be implemented in the handy
`scipy.special`

or
`scipy.stats`

`
modules. Once you track down the notation, though, it turns out to be really
easy. We want the inverse incomplete regularized Γ function.

Let’s say you observe *n* events in a period and want to compute the *k*
confidence interval on the true rate — that is, 0 < *k* ≤ 1, and *k* = 0.95
would be the equivalent of 2σ. Let *a* = 1 − *k*, *i.e.* 0.05. The lower bound
of the confidence interval, expressed as a potential number of events, is

```
scipy.special.gammaincinv(n, 0.5 * a)
```

and the upper bound is

```
scipy.special.gammaincinv(n + 1, 1 - 0.5 * a)
```

Conversion to rates is then just a matter of dividing by the relevant
duration, area, whatever. These apparently give the “exact” interval which may
be minutely more strict than is necessary. The halving of *a* is just because
the 95% confidence interval is made up of two tails of 2.5% each, so the
`gammaincinv`

function is really, once you chop through the obscurity, exactly
what you want.

So in
this example,
*n* is 14, *a* is 0.05, and we divide by 400 to get the interval bounds as
reported (0.019135 to 0.058724, in case that webpage disappears).

It took me a while to understand the numbers in the first table
on this page, but
they’re very helpful once you get over that: they give the 95% confidence
limits for a given number of observed events, when that number is small.
*I.e.*, if I observe one event over some period, my 95% confidence interval is
0.0253 to 5.5616 events. These numbers don’t quite make sense on their own,
since you can only get integral numbers of events in practice, but you can use
them to compute rate limits that *are* perfectly meaningful.

(Once you think about it, the relationship between the expressions for the two
ends of the interval is kind of cute. Because the Poisson distribution can
only take on integral values, the 2.5% upper limit to the value *n* must be
the 2.5% lower limit to the value *n* + 1, because there are no intervening
values that can be chosen.)