Matplotlib cookbook
Let me show you how to plot histograms suitable for high energy physics using the Python package matplotlib. I use it to draw figures for publications and such. It's really very nifty, and you can make just about anything. I recently read (full disclosure: the publisher sent it to me, asking for a review) a cookbook for making plots with matplotlib, this guy here by Alexandre Devert. First question you might want to ask is the following: Given that matplotlib is free software, with a quite extensive base of very introductory tutorials, which I can get for free online, why would I pay money for a pdf consisting basically of tutorials? Well, I guess if you don't know anything about Pyhton or matplotlib to begin with -- say you are an interested high school student or just started university and want to make an extra effort -- then I guess this book could be suited for you.The book basically goes through a number of recipe for plotting various figures, very similar to the ones found in the matplotlib doc. I'm not gonna go on about it here, I would rather show you how I use matplotlib to plot nice histograms for high energy physics.
So, matplotlib is easy. Almost too easy. If you want to plot a histogram, documentation (and the cookbook) tells you to do something like this (in Python):
import numpy
import matplotlib.pyplot as plt
values = numpy.random.randn(1000)
plt.hist(values)
plt.savefig("hist.jpg")
to get something like this figure:
Very easy. Almost too easy. Problem is: this is not how I, nor anyone I know, works. I use histograms a lot. Basically any experiment you can do in high energy physics involves counting, and histograms are the way to visualize this. I cannot work with big, rectangular bins, I need errorbars, and I need to be able to put several histograms in one plot and actually be able to view the contents. I would also need access to bin contents along the way, to produce eg. the ratio of multiple histograms. I am fully aware that the philosophy of matplotlib is to make the easy very easy, but I might argue that the example above is a bit too easy to be useful. Luckily I can do what I want, and I will walk you through the process.
import numpy
import matplotlib.pyplot as plt
# Give me the same values as before
values = numpy.random.randn(1000)
# I use the numpy histogram as it is easier to get its contents
# In real life application I use a dedicated histogramming package
# Second argument is my choice of binning
_bins, _edges = numpy.histogram(values,numpy.arange(-5,5,0.1))
plt.plot(_edges[:len(_edges)-1],_bins)
plt.savefig("hist2.jpg")
which would give the figure below:This is not way more complicated, but very neccesary to provide the tools for what we're going to do now. We could provide errorbars. Only problem is that the builtin function errorbar() has the same problems as hist(). So I draw my own, simply by drawing a vertical line. I also want to emphasize that this is binned data without drawing very wide bins. I do that by marking bin centers with a small o. All in all it's just a matter of adding a function to calculate the error, and then drawing it:
import numpy
import matplotlib.pyplot as plt
def error(bins,edges):
# Just estimate the error as the sqrt of the count
err = [numpy.sqrt(x) for x in bins]
errmin = []
errmax = []
for x,err in zip(bins,err):
errmin.append(x-err/2)
errmax.append(x+err/2)
return errmin, errmax
# Give me the same values as before
values = numpy.random.randn(1000)
# I use the numpy histogram as it is easier to get its contents
# In real life application I use a dedicated histogramming package
# Second argument is my choice of binning
_bins, _edges = numpy.histogram(values,numpy.arange(-5,5,0.5))
bins = _bins
edges = _edges[:len(_edges)-1]
errmin, errmax = error(bins,edges)
# Plot as colored o's, connected
plt.plot(edges,bins,'o-')
# Plot the error bars
plt.vlines(edges,errmin,errmax)
plt.savefig("hist3.jpg")
which would give something like this:I know we are somewhat beyond introductory now, but let me proceed, as this is stuff I would have loved someone to tell me at some point. Now let me add another histogram, and plot both, along with a ratio of the two. This is also something I use all the time, as a ratio plot is excellent for showing the difference between eg. data and simulation. I have also added a legend and an axis label such that it is visible how to do something like that. Happy plotting.
import numpy
import matplotlib.pyplot as plt
def error(bins,edges):
# Just estimate the error as the sqrt of the count
err = [numpy.sqrt(x) for x in bins]
errmin = []
errmax = []
for x,err in zip(bins,err):
errmin.append(x-err/2)
errmax.append(x+err/2)
return errmin, errmax
def getRatio(bin1,bin2):
# Sanity check
if len(bin1) != len(bin2):
print "Cannot make ratio!"
bins = []
for b1,b2 in zip(bin1,bin2):
if b1==0 and b2==0:
bins.append(1.)
elif b2==0:
bins.append(0.)
else:
bins.append(float(b1)/float(b2))
# The ratio can of course be expanded with eg. error
return bins
# Give me the same values as before
values = numpy.random.randn(10000)
# Also values for another histogram
values2 = numpy.random.randn(10000)
# I use the numpy histogram as it is easier to get its contents
# In real life application I use a dedicated histogramming package
# Second argument is my choice of binning
_bins, _edges = numpy.histogram(values,numpy.arange(-5,5,0.5))
_bins2, _edges2 = numpy.histogram(values2,numpy.arange(-5,5,0.5))
bins = _bins
edges = _edges[:len(_edges)-1]
bins2 = _bins2
edges2 = _edges2[:len(_edges2)-1]
errmin, errmax = error(bins,edges)
errmin2, errmax2 = error(bins2,edges2)
fig = plt.figure()
ax = fig.add_subplot(2,1,1)
# Plot as colored o's, connected
ax.plot(edges,bins,'o-',color="blue",lw=2,label="Blue hist")
ax.plot(edges2,bins2,'o-',color="red",lw=2,label="Red hist")
leg = ax.legend()
# Plot the error bars
ax.vlines(edges,errmin,errmax)
ax.vlines(edges2,errmin2,errmax2)
# Plot the ratio
ax = fig.add_subplot(2,1,2)
rat = getRatio(bins,bins2)
ax.plot(edges,rat)
ax.set_ylim(0,2)
ax.set_xlabel("Variable name")
plt.savefig("hist4.jpg")