Are you looking for real world data science problems to sharpen your skills? In this post, we introduce you to four platforms hosting data science competitions. Data science competitions can be a great way for gaining practical experience with real world data, and for boosting your motivation through the competitive environment they provide. Check them out, competitions are a lot of fun!
Kaggle is the best known platform for data science competitions. Data scientists and statisticians compete to create the best models for describing and predicting the data sets uploaded by companies or NGOs. From predicting house prices in the US to demographics of mobile phone users in China or the properties of soil in Africa, Kaggle offers many interesting challenges to solve real world problems. Check out their No Free Hunch Blog featuring the winners of each competition. The platform was recently acquired by Alphabet, Google’s parent company, and also offers a wide range of datasets to train your algorithms and other useful resources to improve your data science skill set.
Similar to other platforms, the dataset is available online and participants submit their best predictive models. The great thing about DrivenData competitions is that the competition question and datasets are related to the work of non-profits, which can be especially interesting to those who want to contribute to a good cause. Furthermore, the data problems are no less diverse and range from predicting dengue fever cases, to estimating the penguin population in the Antarctic and forecasting energy consumption levels. For some challenges, the best model wins a prize, for others you get the glory and the knowledge that you applied your skillset to make the world a better place. DrivenData offers great opportunities to tackle real-world problems with real-world impact.
Numerai is a data science competition platform focusing on finance applications. What makes their competitions particularly interesting is that the participants’ predictions are used in the underlying hedge fund. Data scientists entering Numerai’s tournaments currently receive an encrypted data set every week. The data set is an abstract representation of stock market information that preserves its structure without revealing details. The data scientists then create machine-learning algorithms to find patterns in the data, and they test their models by uploading their predictions to the website. Numerai, then creates a meta-model from all submissions to make its investments. The models get ranked, with the top 100 earning Numeraire coins, a cryptocurrency launched by Numerai. Numerai's mix of data science, cryptography, artificial intelligence, crowdsourcing and bitcoin has given the fledgling business an exciting flair.
Tianchi is a data competition platform by Alibaba Cloud, the cloud computing arm of Alibaba Group, and has strong similarities with Kaggle. The platform focuses on Chinese data scientist, but most pages are also available in English. Tianchi boasts a community of over 150,000 data scientists, 3,000 institutes and business groups from over 80 countries. Besides the competitions, the platform also offers datasets and a notebook to run Python 3 scripts.