Blog

Blog Categories

Recently, a lot of attention has been attached to different powerful languages and tools that can deal with data science and machine learning tasks, and R is definitely in the forefront. It’s a language and an environment for statistical computing and graphics. Since R is an open-source project, there is a continuous contribution and support from R practicians. Therefore its package library is rich and significantly expands the basic...
GBM is a highly popular prediction model among data scientists or as top Kaggler Owen Zhang describes it: "My confession: I (over)use GBM. When in doubt, use GBM." GradientBoostingClassifier from sklearn is a popular and user friendly application of Gradient Boosting in Python (another nice and even faster tool is xgboost). Apart from setting up the feature space and fitting the model, parameter tuning is a crucial task...
AI Took My Job! Ken Jennings’ name is vaguely familiar to people, but why? Because his profound knowledge on all things trivial led to him being the unbeatable champion of a TV game show called Jeopardy! It also put him in the gunsights of IBM. They spent thousands of hours, invested millions of dollars, all just to build a machine named WATSON that could defeat him playing that TV-derived game. See how Ken deals with the...
Demand for professionals in data science and analytics is expected to rise significantly over the next years (cf. this study by IBM). In order to keep track of future job trends, we started the DataCareer Job Market Index (DJMI) in July 2017. We track job openings on the biggest online job board, Indeed , in the fields of data science and analytics, data engineering, business intelligence, artificial intelligence and statistics. Sign...
For individuals, businesses and research institutes working with emerging technologies, it is important to follow and shape societal debates revolving around their field. Sooner or later, societal debates are likely to translate into political action, which may greatly impact work on emerging technologies – for better or worse. Also, if research institutes and businesses aim for more than research results and profit, they’re...
Mobile phone data has a vast scope. Our phones track our location, record social activities by listing who we call or message, and know what we like or what we’re looking for by collecting data on our online behavior and use of apps. The recent Mobile User Demographics Challenge on Kaggle (by the Chinese platform TalkingData ) offers some insight into the volume and precision of the information available on mobile...
Big Data, AI and Machine Learning are today's buzzwords. Data nerds, business executives and politicians alike are talking about data-related opportunities and potential risks. But since when has this been the case and how have data-related interests developed over time? We've looked into this question using Google Trends data.  Google searches reveal people's interests Google search queries have become a powerful tool to...
Wie lassen sich Facebook-Ads strategisch für Ihr Unternehmen einsetzen? Wie funktioniert Viralität auf Twitter? Mit welchen Tools kann Online-Werbung exakt auf Zielgruppen zugeschnitten werden?  Wie können Sie in wenigen Minuten 500 Leute befragen und ermitteln, welche Bilder/Slogans/Framings am besten gefallen und die meisten Klicks generieren?    Für...
Image recognition has been a major challenge in machine learning, and working with large labelled datasets to train your algorithms can be time-consuming. One efficient approach for getting such data is to outsource the work to a large crowd of users. Google uses this approach with the game “Quick, Draw!” to create the world’s largest doodling dataset, which has recently been made publicly available . In this...
Much has been written on the most popular software and programming languages for Data Science (recall, for instance, the infamous “Python vs R battle”). We approached this question by scraping job ads from Indeed and counting the frequency at which each software is mentioned as a measure of current employer demand. In a recent blog post , we analyzed the Data Science software Swiss employers want job applicants to know (showing...
Social science researchers collect much of their data through online surveys. In many cases, they offer incentives to the participants. These incentives can take the form of lotteries for more valuable prices or individual gift card codes. We are doing the latter in our studies here at CEPA Labs at Stanford. Specifically, our survey participants receive a gift card code from Amazon.     However, sending these gift card...
Learning new programming languages is an investment in human capital. Figuring out the return on investment can thus be very informative. There are very specific requirements for each industry and specific job, and finding a generalizable answer to the question proves quite difficult. One approach is to analyze the required software skills in job postings, which reflect current demand and may therefore indicate general return on investment....