### Matplotlib cookbook

Let me show you how to plot histograms suitable for high energy physics using the Python package matplotlib. I use it to draw figures for publications and such. It's really very nifty, and you can make just about anything. I recently read (full disclosure: the publisher sent it to me, asking for a review) a cookbook for making plots with matplotlib, this guy here by Alexandre Devert. First question you might want to ask is the following: Given that matplotlib is free software, with a quite extensive base of very introductory tutorials, which I can get for free online, why would I pay money for a pdf consisting basically of tutorials? Well, I guess if you don't know anything about Pyhton or matplotlib to begin with -- say you are an interested high school student or just started university and want to make an extra effort -- then I guess this book could be suited for you.
The book basically goes through a number of recipe for plotting various figures, very similar to the ones found in the matplotlib doc. I'm not gonna go on about it here, I would rather show you how I use matplotlib to plot nice histograms for high energy physics.
So, matplotlib is easy. Almost too easy. If you want to plot a histogram, documentation (and the cookbook) tells you to do something like this (in Python):  import numpy import matplotlib.pyplot as plt values = numpy.random.randn(1000) plt.hist(values) plt.savefig("hist.jpg")  to get something like this figure: Very easy. Almost too easy. Problem is: this is not how I, nor anyone I know, works. I use histograms a lot. Basically any experiment you can do in high energy physics involves counting, and histograms are the way to visualize this. I cannot work with big, rectangular bins, I need errorbars, and I need to be able to put several histograms in one plot and actually be able to view the contents. I would also need access to bin contents along the way, to produce eg. the ratio of multiple histograms. I am fully aware that the philosophy of matplotlib is to make the easy very easy, but I might argue that the example above is a bit too easy to be useful. Luckily I can do what I want, and I will walk you through the process.
 import numpy import matplotlib.pyplot as plt # Give me the same values as before values = numpy.random.randn(1000) # I use the numpy histogram as it is easier to get its contents # In real life application I use a dedicated histogramming package # Second argument is my choice of binning _bins, _edges = numpy.histogram(values,numpy.arange(-5,5,0.1)) plt.plot(_edges[:len(_edges)-1],_bins) plt.savefig("hist2.jpg")  which would give the figure below: This is not way more complicated, but very neccesary to provide the tools for what we're going to do now. We could provide errorbars. Only problem is that the builtin function errorbar() has the same problems as hist(). So I draw my own, simply by drawing a vertical line. I also want to emphasize that this is binned data without drawing very wide bins. I do that by marking bin centers with a small o. All in all it's just a matter of adding a function to calculate the error, and then drawing it:
 import numpy import matplotlib.pyplot as plt def error(bins,edges): # Just estimate the error as the sqrt of the count err = [numpy.sqrt(x) for x in bins] errmin = [] errmax = [] for x,err in zip(bins,err): errmin.append(x-err/2) errmax.append(x+err/2) return errmin, errmax # Give me the same values as before values = numpy.random.randn(1000) # I use the numpy histogram as it is easier to get its contents # In real life application I use a dedicated histogramming package # Second argument is my choice of binning _bins, _edges = numpy.histogram(values,numpy.arange(-5,5,0.5)) bins = _bins edges = _edges[:len(_edges)-1] errmin, errmax = error(bins,edges) # Plot as colored o's, connected plt.plot(edges,bins,'o-') # Plot the error bars plt.vlines(edges,errmin,errmax) plt.savefig("hist3.jpg")  which would give something like this: I know we are somewhat beyond introductory now, but let me proceed, as this is stuff I would have loved someone to tell me at some point. Now let me add another histogram, and plot both, along with a ratio of the two. This is also something I use all the time, as a ratio plot is excellent for showing the difference between eg. data and simulation. I have also added a legend and an axis label such that it is visible how to do something like that. Happy plotting.
 import numpy import matplotlib.pyplot as plt def error(bins,edges): # Just estimate the error as the sqrt of the count err = [numpy.sqrt(x) for x in bins] errmin = [] errmax = [] for x,err in zip(bins,err): errmin.append(x-err/2) errmax.append(x+err/2) return errmin, errmax def getRatio(bin1,bin2): # Sanity check if len(bin1) != len(bin2): print "Cannot make ratio!" bins = [] for b1,b2 in zip(bin1,bin2): if b1==0 and b2==0: bins.append(1.) elif b2==0: bins.append(0.) else: bins.append(float(b1)/float(b2)) # The ratio can of course be expanded with eg. error return bins # Give me the same values as before values = numpy.random.randn(10000) # Also values for another histogram values2 = numpy.random.randn(10000) # I use the numpy histogram as it is easier to get its contents # In real life application I use a dedicated histogramming package # Second argument is my choice of binning _bins, _edges = numpy.histogram(values,numpy.arange(-5,5,0.5)) _bins2, _edges2 = numpy.histogram(values2,numpy.arange(-5,5,0.5)) bins = _bins edges = _edges[:len(_edges)-1] bins2 = _bins2 edges2 = _edges2[:len(_edges2)-1] errmin, errmax = error(bins,edges) errmin2, errmax2 = error(bins2,edges2) fig = plt.figure() ax = fig.add_subplot(2,1,1) # Plot as colored o's, connected ax.plot(edges,bins,'o-',color="blue",lw=2,label="Blue hist") ax.plot(edges2,bins2,'o-',color="red",lw=2,label="Red hist") leg = ax.legend() # Plot the error bars ax.vlines(edges,errmin,errmax) ax.vlines(edges2,errmin2,errmax2) # Plot the ratio ax = fig.add_subplot(2,1,2) rat = getRatio(bins,bins2) ax.plot(edges,rat) ax.set_ylim(0,2) ax.set_xlabel("Variable name") plt.savefig("hist4.jpg") 