Open election data

After the 2014 election for the European parliament, I did a fun, small project with open election data. Turns out that the Danish national office for statistics makes data about all polling stations freely available to the public though their website (largely in Danish). The data are basically all personal and party votes from all polling stations. Since it is possible to view historical data as well, it is possible to correlate with results from earlier elections, and see if voters have moved. I wrote two simple Python functions to a) parse a url for election information using an xml library and b) use that parsed information to build a tree of votes. The first one is pretty simple: import xml.etree.ElementTree as et import urllib def getroot(url): name = url.split('/') elecname = name[-3] filename = name[-1] if os.path.isdir(elecname): try: f = open(elecname + "/" + filename) f.close() except: urllib.urlretrieve(url,elecname + "/" + filename) else: os.mkdir(elecname) urllib.urlretrieve(url,elecname + "/" + filename) f = open(elecname + "/" + filename) tree = et.parse(f) f.close() return tree.getroot() As dst puts all information about elections in separate files for separate polling stations, all info written in an xml file in their webpage, it makes sense to just parse that information and build a tree from it. Try to call the above script with such an xlm file as the argument, like: ft09 = "" tree = getroot(url) Now the election data can be extracted on a district level using the following function. (I apologize for the fact that the code is partly written in Danish. This is of course due to the fact that everything from is in Danish.) def kredsniveau(plist, pshort, root): kredse = [] for child in root: if child.tag == "Opstillingskredse": for sub in child: k = Kreds() = sub.text.strip('1234567890.').strip() kredsroot = getroot(sub.get("filnavn")) for ch in kredsroot: if ch.tag == "Status": k.status = int(ch.get("Kode")) elif ch.tag == "Stemmeberettigede": k.voters = int(ch.text) elif ch.tag == "Stemmer": for party in ch: if party.get("Bogstav") is not None: let = party.get("Bogstav") if let == u'\xd8': let = "E" setattr(k, let + "votes",party.get("StemmerAntal")) elif party.tag == "IAltGyldigeStemmer": k.validvotes = int(party.text) elif party.tag == "IAltUgyldigeStemmer": k.invalidvotes = int(party.text) elif ch.tag == "Personer": for party in ch: for person in party: for p,q in zip(plist,pshort): if person.get("Navn") == p: setattr(k,q,int(person.get("PersonligeStemmer"))) kredse.append(k) return kredse And now the fun begins, as we can loop through the districts (kredse) and extract data. For example to compare votes on the Socialistic peoples party in the local election for parliament in 11 to the EP election in 14, and plot it using matplotlib: ep14 = "" root = getroot(ep14) ep14kredse = kredsniveau(plist,pshort,root) ft09 = "" elist = [] eshort = [] root = getroot(ft09) ft09kredse = kredsniveau(elist,eshort,root) x = [] y = [] label = [] xnorm = 0 ynorm = 0 for ep,ft in zip(ep14kredse,ft09kredse): if ep.status ==12: try: xval = float(ft.Fvotes)/float(ft.validvotes) yval = float(ep.Fvotes)/float(ep.validvotes) x.append(xval) y.append(yval) label.append( except: print "Problem at ", import numpy import matplotlib.pyplot as plt plt.figure() plt.subplot(111) ax = plt.scatter(x, y) plt.xlabel("SF (FT09) (%)") plt.ylabel("SF (EP14) (%)") Adding a bit of flavour to the above by writing out names of outliers produces a figure like this: SF correlation plot
So SF clearly did a better job at the EP election than at the last election for local parliament (FT).The district of Lolland is funny since SF did significantly worse for EP than for FT. My guess would be that the votes there were stolen by the EU sceptics from the Danish Nationalist Party. Looking at personal votes, the top candidates from the two EU-sceptical parties are fun to correlate. Morten Messerschmidt is from the Danish Nationalist Party and Rina Ronja Kari is from the center-left Peoples Movement against EU. Rina/Morten correlation plot
The correlation plot clearly shows that even though Kari and Messerschmidt are both EU sceptical, they get votes from different people. It clearly shows that Rina is comparably stronger in places that traditionally have strong support of left wing parties. You are more than welcome to download and play with the whole thing from here.