Posted: July 12, 2010

Prediction markets to facilitate real-time science?

(Nanowerk News) Kaggle, a web platform for data prediction competitions, has just issued a press release that touts the usefulness of open prediction technology for data-heavy scientific problems. The platform allows researchers and organizations to post their problem and have it scrutinized by the world's best statisticians to predict the future (produce the best forecasts) or predict the past (find the best insights hiding in data).
In the age of ubiquitous communications technologies, it's inexplicable that the scientific literature evolves much as it did one hundred years ago. Competitions offer a promising way forward.
Kaggle is currently hosting a bioinformatics contest requiring participants to pick genetic markers that correlate with a change in the severity of the HIV infection. The best entry to this contest had outdone the best methods in the scientific literature within a week and a half. Whereas the scientific literature tends to evolve slowly (somebody writes a paper, somebody else tweaks that paper and so on), a competition inspires rapid innovation by introducing the problem to a wide audience.
So when this week the headlines announced the discovery of genetic markers that correlate with extreme longevity — what they missed was that the work took 15 years from beginning to publication. Had the study been run as a competition, with the raw data available to all, the results would have been generated in real time. Insights would have been available much sooner and with more precision.
Moreover, competitions might help to avoid situations in which a valuable technique is overlooked by the scientific establishment. This aspect of the case for competitions is best illustrated by Ruslan Salakhutdinov, now a postdoctoral fellow at the Massachusetts Institute of Technology, who had a new algorithm rejected by the NIPS conference. According to Ruslan, the reviewer 'basically said "it's junk and I am very confident it's junk"'. It later turned out that his algorithm was good enough to make him an early leader in the $1m Netflix Prize and 135th overall – a remarkable achievement when you consider that many of the top teams used a suite of models, making his one of the better performing single algorithms.
Source: Kaggle