Machine Learning Contest

I have come across a contest that looks really exciting. The people over at Kaggle are hosting a programming contest about particle physics. Kaggle is a rather interesting platform, which hosts contests involving "big data". You know, the much celebrated buzzword. The idea behind the contests is, that all participants is given a large histotic dataset or a simulation, and based on that, using any technique they want, they are supposed to predict future behaviour. Say for example that an insurance company would like to know what kinds of extra insurance they could sell to existing customers. They would supply the Kaggle contest with historic sales data, and based on that, contestants could for example find that they should focus on selling life insurance to a certain subgroup of people, to maximize the value from their effort. The means to do such an analysis, is often various machine learning techniques. Luckily these can also be used to solve problems that are actually interesting... The problem that needs to be solved is rooted in the fact that even though the aim of the large particle physics experiments, like ATLAS , is to produce exotic particles such as the much celebrated Higgs boson, such particles are never seen directly in the detector, as they decay even before they leave the innermost part of the detector; the pipe in which the proton beams collide. All the physicists at the experiments can do is to try and calculate "back" from the stuff the Higgs bosons decay to. The efficiency of such calculations is of course an important matter, as it will determine how well the Higgs bosons can be observed. The contest at kaggle wants participants to employ machine learning techniques to do this calculation for a certain type of Higgs decays. As this is something I have done before in one of my courses, albeit for another decay channel, this is of course something I want to to, probably over the summer. My idea is to take this opportunity to learn another of the free, open source frameworks for image analysis techniques, perhaps OpenCV. My idea is to use traditional techniqes like boosted decision trees, but with the caveat that I dont want to just train them against a signal background sample, but instead train against the different types of background vs. the signal. I suspect that this would allow me to achieve better separation, as different processes would populate phase space differently. If you think this sounds interesting and would like to know more, you should head over to Kaggle. If you already know something, especially about more exotic machine learning techniques, and want to form a team, feel free to contact me!