Blog

Blog Categories

  For many machine learning problems with a large number of features or a low number of observations, a linear model tends to overfit and variable selection is tricky. Models that use shrinkage such as Lasso and Ridge can improve the prediction accuracy as they reduce the estimation variance while providing an interpretable final model. In this tutorial, we will examine Ridge and Lasso...
Companies have a growing demand to visualize their data with business intelligence tools. We compared the salaries from across 10 different European countries using   Glassdoor , which offers self-reported salary information by location and employer, giving us some key insights into the salaries of people with “Business Intelligence” in their job title. Switzerland with the highest salary for Business Intelligence There...
  The EU Open Data Portal gives access to open data published by EU institutions, agencies  and  other bodies. Around 70 EU institutions, bodies or departments use the platform to make over 12,500 datasets available. In this Jupyter Notebook we will retrieve data from open data portal " http://data.europa.eu/euodp/en/home ". The portal is based on the open source project CKAN. CKAN stands for...
  Google became the main starting point for our online activities. The search engine processes about 40,000 searches every second or 3.5 billion searches per day. It records what people are interested in, what they worry about or where they want to travel. In a unique manner, the search engine captures trends in interests and behavior. Hidden racisms, sexual orientation or ad returns - check out the work by Seth...
  opendata.swiss is the Swiss authorities’ portal for open data. Currently, 65 governmental organizations (many federal agencies, but also cantonal agencies, SBB and Post) provide access to 6,278 datasets. The portal offers an easily searchable catalogue of available datasets. Manually downloading the datasets can be cumbersome and the retrieval of data through the API can save time.   In this Jupyter...
Since the dawn of the digital age, the amount of data stored on servers has risen dramatically. With this increase, more and more firms are looking for talent that can handle their datasets and generate insights for business decisions. Google Trends shows that the global volume of the search term “Data Analyst” nearly tripled over the last 5 years. How does the increasing demand translate into earnings of data analysts in...
With the rise of the amount of data stored in servers, the demand has also risen for data engineers to help manage the vasts amount of data now available to us. Data Engineers are in high demand, and Google trends have shown that the global volume of the search term “Data Engineer” has tripled since 2014. More and more people are seeking skilled data engineers to help manage the vasts amount of data stored across the globe, and we...
Are you looking for real world data science problems to sharpen your skills? In this post, we introduce you to four platforms hosting data science competitions. Data science competitions can be a great way for gaining practical experience with real world data, and for boosting your motivation through the competitive environment they provide. Check them out, competitions are a lot of fun! Kaggle Kaggle is the best known platform...
Companies use machine learning to improve their business decisions. Algorithms select ads, predict consumers’ interest or optimize the use of storage. However, few stories of machine learning applications for public policy are out there, even though public employees often make comparable decisions. Similar to the business examples, decisions by public employees often try to optimize the use of limited resources. Algorithms may assist...
Curious about neural networks and deep learning? This post will inspire you to get started in deep learning. Why are we witnessing this kind of build up for neural networks? It is because of their amazing applications. Some of their applications include image classification, face recognition, pattern recognition, automatic machine translation, and so on. So, let’s get started now. Machine Learning is a field of computer science that...
The open-source project R is among the leading tools for data science and machine learning tasks. Given its open-source framework, there are continuous contributions and new package libraries with new features pop up frequently. Currently, the CRAN package repository features 12,525 available packages. This post takes a look at the most popular and useful packages that have set the standards for solving data manipulation, visualization, and...
GBM is a highly popular prediction model among data scientists or as top Kaggler Owen Zhang describes it: "My confession: I (over)use GBM. When in doubt, use GBM." GradientBoostingClassifier from sklearn is a popular and user-friendly application of Gradient Boosting in Python (another nice and even faster tool is xgboost). Apart from setting up the feature space and fitting the model, parameter tuning is a crucial task...