Your Career Platform for Big Data

Be part of the digital revolution in Switzerland

 

View all Jobs

DataCareer Blog

Use your extra time at home (and your data skills) for a good cause: Check out the  Kaggle  COVID-19 Open Research Dataset Challenge. In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). This dataset is a resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. Today we, along with the White House and global health organizations, are asking for your help to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions.
Today, data science specialists are among the most sought-after in the labor market. Being able to find significant insights in a huge amount of information, they help companies and organizations to optimize the structure's work.  The field of data science is rapidly developing and the demand for talent is changing. We use job offerings to analyze the current demand for talent. After performing a first analysis in 2017 , this article provides an update and more details on job openings, roles, required skills, locations and employers. We collect data over the last month from Indeed through their API. Indeed aggregates job openings from various websites and therefore provides a good starting point for an analysis.  Data science job offers in Switzerland: first sight We collect job openings for the search queries Data Analyst, Data Scientist, Machine Learning and Big Data. At the time of writing, there were 458 vacancies in the field of data science listed on Indeed for Switzerland. Most likely, not all jobs are captured by Indeed. Nevertheless, the dataset offers clear and structured information that allow us to analyze the market.  In order to get a better idea of different job types offered in the market, we will analyse the required level of experience, the required skills as well as the geographical distribution in the country. The bar plot shows the different job titles. Senior Data Scientist and Data Scientist are the most common job titles in this area. However, many jobs are listed with a job specific title rather than a general one.  Based on the job title,  we group jobs by the required experience level. The following plots points out that there is a high demand for experienced talents. More than 60% of job openings require middle-level specialists, while around quarter offers look for seniors. Interns and juniors make up less than 10% of the total number of vacancies.   Zurich is the Swiss data science center In a second step, we analyze the regional distribution of jobs. Not surprisingly, Zurich offers most positions, followed by Basel, Geneva (Genf), and Lausanne. The distribution by seniority seems equally distributed among cities - generally there is a high demand for people with some years of on-the-job experience. The majority of job offers for interns and juniors are found in Zürich. Both senior and lead data scientists are in high demand in Zürich and Basel, on the second stage follow Geneva and Lausanne. Strong demand for Python skills Let’s review the demand for specific skills, namely the number of occurrences of skills.Python is clearly leading among the programming languages with more than 400 mentions followed by R, Java, Scala, JavaScript, and C/C++. AWS, Spark, and Azure are the top 3 big data technologie, while Tableau and SAS head the visualization and analysis tools. Academia, Tech, Pharma and Finance hire data skills   There are about 260 companies in Switzerland offering jobs in the sphere of data science, that we captured in our dataset. Most of them belong to the IT, FMCG and pharmaceutical areas, as well as financial and digital spheres. We extracted the top companies with the most job offerings, and here they are. The size of the company name represents the number of job offerings.  Google, Siemens, and EPAM companies offer many data science jobs in IT and digital areas. Roche and Novartis are large pharmaceutical firms.  Credit Suisse is a famous financial holding. There are also hiring agencies like Myitjob or Oxygen Digital Recruitment. Johnson&Johnson is a giant FMCG company. EPFL, University of Basel, University of Bern and other educational establishments are also among the top employers of data talents. myitjobs was captured in the analysis - but it is just a job website and was misclassified by Indeed.  Summing up our analysis, we can see that there is substantial demand for individuals with experience, while fewer opportunities exist to acquire experience though entry positions or internships. Strong Python skills seem to be a must-have these days in the data science world. The domination of Python has amplified compared to 2 years ago when R or Matlab were mentioned more often.     
Accessing datasets in a structured form through an API can often simplify the life of a data analyst - especially if the same data series are used repeatedly. Unfortunately, many public data sources such as the Federal Statistical Office (BFS) do not provide data access through an API ( STAT-TAB makes life a bit easier, but is not fully automated). While opendata.swiss offers a great way to explore available public datasets. The high number of different topics, datasets and data contributors makes standardization highly difficult. Hence, when pulling the data, it comes in its original structure and often requires in depth processing (See our blog on fetching data from opendata.swiss in Python . For people interested in data capturing regional development, Novalytica offers an open data API covering several hundred data series in the area of local level data that contain a regional relation (e.g. municipality, ms-region, canton etc.). This offers the advantage that the data comes in a structured format and is always updated - thus, once an automated analysis or report is prepared, it automatically pulls the most recent data. The free subscription includes more than 200 series that cover the areas socio-demographics, economic development, real estate and politics. The Premium subscription offers more detailed local level datasets that are not from public sources, for example real estate prices and rents, job openings, Airbnb listings, investor returns, bankruptcies and company foundings as well as several machine learning based industry indicators. Let’s try it out: After signing up on the website, within 8 hours, I received my account to the free subscription as well as a full list of available data series and their respective API keys. On their website, they provide example scripts for R and Python , what makes it easy to get started. I am interested in plotting the development of the share of singles over the last five years for the 5 largest Swiss cities. Let's first load the required Python libraries: In [1]: import requests import pandas as pd from pandas.io.json import json_normalize import seaborn as sns ; sns . set () import matplotlib.pyplot as plt from matplotlib.ticker import PercentFormatter username = 'USER' password = 'PASSWORD'   I first send a token request for authentication. With the obtained token, I can then send all further requests. In [2]: url_tokens = 'http://nova-db.com/api/v1/tokens' url_series = 'http://nova-db.com/api/v1/series/' #TOKEN REQUEST r = requests . post ( url_tokens , auth = ( username , password )) r . status_code # 200 for successful response response = r . json () # Prepare token for bearer authentication token = response [ 'token' ] token_bearer = 'Bearer ' + token print ( token_bearer ) headers = { 'Authorization' : token_bearer , }   Bearer 1tA7S4XIq1HIe4OuFAeOR28AOSBhvH4h   Now, we can define the series of interest. In our case the share of singles with the key 'civilstatus_single'. We pull all municipalities and then exclude everything but the 5 cities. In [3]: # Series of interest series = 'civilstatus_single' # Optional search parameters. Set to "None" to get all. params = {} #DATA REQUEST s = requests . get ( url_series + series , params = params , headers = headers ) s . status_code # 200 for successful response result = s . json () #reformat to dataframe df = pd . DataFrame . from_dict ( json_normalize ( result ), orient = 'columns' ) df [ 'regionnr' ] = pd . to_numeric ( df [ 'regionnr' ], downcast = 'float' ) df [ 'value' ] = pd . to_numeric ( df [ 'value' ], downcast = 'float' ) df . head () Out[3]:     freq period regionname regionnr topic value variable 0 a 2010 Aeugst am Albis 1.0 sociodemo 0.419408 civilstatus_single 1 a 2011 Aeugst am Albis 1.0 sociodemo 0.425654 civilstatus_single 2 a 2012 Aeugst am Albis 1.0 sociodemo 0.429668 civilstatus_single 3 a 2013 Aeugst am Albis 1.0 sociodemo 0.424949 civilstatus_single 4 a 2014 Aeugst am Albis 1.0 sociodemo 0.424196 civilstatus_single   As we are only interested in the 5 main cities, we select a subset of the data frame. It is good to keep in mind that the dataset does not use any special characters often used in German (ä, ö, ü) or French (é, è, â etc.). In [4]: cities = [ 'Bern' , 'Basel' , 'Geneve' , 'Lausanne' , 'Zuerich' ] df = df . loc [ df [ 'regionname' ] . isin ( cities )] In [5]: df [ 'value' ] = df [ 'value' ] * 100 fig = plt . figure ( figsize = ( 8 , 5 )) p = sns . lineplot ( x = "period" , y = "value" , hue = "regionname" , data = df ) p . set ( xlabel = 'Year' , ylabel = 'Share singles' ) p . set_title ( "Share single in Swiss cities" ) p . yaxis . set_major_formatter ( PercentFormatter ()) plt . show ()     We observe that the share of singles is rising - what is not fully unexpected. Interesting are the differences in level by city and that the change is almost constant over time for all cities. Thus, the API let us pull, analyse and plot many data series on municipality level within minutes. As soon as new data is available, we can simply rerun the script and obtain the analysis within seconds. If you are often working in Excel, also check out the Nova Excel Add-In, that allows you to access the same database within Excel: novalytica.com/excel Get more information on the API at: novalytica.com/api li>a { padding-top: 15px; padding-bottom: 15px; } } .navbar .container { position: relative; max-width: 1130px !important; } @media (min-width: 768px) { .container { width: 750px; } } @media (min-width: 992px) { .container { width: 970px; } } @media (min-width: 1200px) { .container { width: 1170px; } } .container { padding-right: 15px; padding-left: 15px; margin-right: auto; margin-left: auto; } @media (min-width: 768px) { .navbar>.container .navbar-brand, .navbar>.container-fluid .navbar-brand { margin-left: -15px; } } .navbar-nav>li>a { padding-top: 10px; padding-bottom: 10px; line-height: 20px; } @media (min-width: 768px) { .navbar-nav>li>a { padding-top: 15px; padding-bottom: 15px; } } .dropdown-menu { position: absolute; top: 100%; left: 0; z-index: 1000; display: none; float: left; min-width: 160px; padding: 5px 0; margin: 2px 0 0; font-size: 14px; text-align: left; list-style: none; background-color: #fff; -webkit-background-clip: padding-box; background-clip: padding-box; border: 1px solid #ccc; border: 1px solid rgba(0, 0, 0, .15); border-radius: 4px; -webkit-box-shadow: 0 6px 12px rgba(0, 0, 0, .175); box-shadow: 0 6px 12px rgba(0, 0, 0, .175); } .dropdown-menu { background-color: #00a7de; } .body__inner .container .container { width: 100%; padding-left: 0px; padding-right: 0px; } .prompt { min-width: 11ex; margin-left: -13ex !important; padding: 6px 6px 6px 0px; line-height: 1 !important; display: none !important; } div.cell { margin: 0 !important; padding-left: 0px; } .border-box-sizing { outline: none; } .blog__full-article .static-pages__blog .blog__content div, .blog__full-article .static-pages__blog .blog__content p { margin: 0; } #notebook-container { padding: 0; min-height: 0; -webkit-box-shadow: none; box-shadow: none; } div#notebook { padding-top: 0px; } div.input_area>div.highlight { padding-left: 6px; margin: 1px !important; } .cell div.input { margin-bottom: 5px !important; } .text_cell_render h1 { text-align: left; } .anchor-link { display: none; } div.output_area .rendered_html table { margin-top: 10px; } .text_cell_render, .text_cell.rendered .rendered_html{ padding-left:0px; } .inner_cell {margin-left: 5px;} @media (min-width: 541px) { .navbar-collapse.collapse { display: none !important; } } @media(max-width: 800px) { div.output_subarea { overflow-x: auto; padding: 0.4em; -webkit-box-flex: 1; -moz-box-flex: 1; box-flex: 1; flex: 1; max-width: calc(100% - 2ex); } .prompt { margin-left: 0px !important; } } @media (max-width: 991px) { .navbar-collapse.collapse { background-color: #8c8585 !important; } } .navbar { min-height: 77px; font-size: 16px; border: none; background: none; z-index: 20; background: #fff; border-radius: 0; margin-bottom: 0; background: transparent; position: absolute; top: 0; left: 0; width: 100%; } @media all and (max-width: 992px) { .navbar { min-height: 72px; } } .navbar-collapse.collapsing { position: absolute; } @media (min-width: 994px) { .navbar-collapse.collapse { display: block !important; } } .blog__content h1, .blog__content h2, .blog__content h3, .blog__content h4 { padding: 30px 0 10px; } div.output_wrapper { margin-bottom: 15px !important; } -->
View all blog posts

Data Academy

Programmieren mit R für Einsteiger

R ist eine der führenden Lösungen für Data Science. Der Kurs führt in die Programmiersprache und in das Open Source Software Umfeld ein.

  • Dauer: 2 Tage
  • Unterlagen als PDF und Code
  • 2h Projekt-Beratung
  • Inkl. Mittagessen und Kaffee
  • Preis: CHF 990

Datenanalyse in Python für Einsteiger

Python erfreut sich einer grossen Beliebtheit. In diesem Kurs werden Sie mit den Grundlagen von Python vertraut gemacht.

  • Dauer: 2 Tage
  • Unterlagen als PDF und Code
  • Inkl. Mittagessen und Kaffee
  • Preis: CHF 990

Explorative Datenanalyse & Visualisierungen

Daten zu säubern, anzupassen und zu visualisieren sind die Grundelemente jeder Analyse. Dieser Kurs führt umfassend in die explorative Analyse mit R-Anwendungen ein.

  • Dauer: 1 Tag
  • Unterlagen als PDF und Code
  • 2h Projekt-Beratung
  • Inkl. Mittagessen und Kaffee
  • Preis: CHF 690

View all Courses