Match Up a Histogram and a Normal Distribution

Given some data that have been histogrammed, how do we overplot the normal distribution that represents the maximum likelihood parameters of the underlying unbinned data? The key is getting the normalization right, and the key to that is that we can compute the area under both the histogram and the distribution.

Given data d and nbins:

m = d.mean()
s = d.std()
xmin = d.min()  # these bounds can be tweaked
xmax = d.max()
x = np.linspace(xmin, xmax, 512)  # the 512 is also adjustable
norm = d.size * (xmax - xmin) / (nbins * np.sqrt(2 * np.pi) * s)
y = norm * np.exp(-0.5 * ((x - m) / s)**2)

Questions or comments? For better or worse this website isn’t interactive, so send me an email or, uh, Toot me.

To get notified of new posts, try subscribing to my lightweight newsletter or my RSS/Atom feed. No thirsty influencering — you get alerts about what I’m writing; I get warm fuzzies from knowing that someone’s reading!

See a list of all how-to guides.

On GitHub you can propose a revision to this guide or view its revision history.