Blog > API

Nova-DB: Swiss Local Data API

10/10/2019

Accessing datasets in a structured form through an API can often simplify the life of a data analyst - especially if the same data series are used repeatedly. Unfortunately, many public data sources such as the Federal Statistical Office (BFS) do not provide data access through an API (STAT-TAB makes life a bit easier, but is not fully automated). While opendata.swiss offers a great way to explore available public datasets. The high number of different topics, datasets and data contributors makes standardization highly difficult. Hence, when pulling the data, it comes in its original structure and often requires in depth processing (See our blog on fetching data from opendata.swiss in Python.

For people interested in data capturing regional development, Novalytica offers an open data API covering several hundred data series in the area of local level data that contain a regional relation (e.g. municipality, ms-region, canton etc.). This offers the advantage that the data comes in a structured format and is always updated - thus, once an automated analysis or report is prepared, it automatically pulls the most recent data.

The free subscription includes more than 200 series that cover the areas socio-demographics, economic development, real estate and politics. The Premium subscription offers more detailed local level datasets that are not from public sources, for example real estate prices and rents, job openings, Airbnb listings, investor returns, bankruptcies and company foundings as well as several machine learning based industry indicators.

Let’s try it out: After signing up on the website, within 8 hours, I received my account to the free subscription as well as a full list of available data series and their respective API keys. On their website, they provide example scripts for R and Python, what makes it easy to get started.

I am interested in plotting the development of the share of singles over the last five years for the 5 largest Swiss cities. Let's first load the required Python libraries:

In [1]:

import requests
import pandas as pd
from pandas.io.json import json_normalize
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter


username = 'USER'
password = 'PASSWORD'

I first send a token request for authentication. With the obtained token, I can then send all further requests.

In [2]:

url_tokens = 'http://nova-db.com/api/v1/tokens'
url_series = 'http://nova-db.com/api/v1/series/'

#TOKEN REQUEST 
r = requests.post(url_tokens, auth=(username, password))
r.status_code # 200 for successful response

response = r.json()

# Prepare token for bearer authentication
token = response['token']
token_bearer = 'Bearer ' + token
print(token_bearer)

headers = {
'Authorization': token_bearer,
}

Bearer 1tA7S4XIq1HIe4OuFAeOR28AOSBhvH4h

Now, we can define the series of interest. In our case the share of singles with the key 'civilstatus_single'. We pull all municipalities and then exclude everything but the 5 cities.

In [3]:

# Series of interest
series = 'civilstatus_single'

# Optional search parameters. Set to "None" to get all.
params = {}         

#DATA REQUEST 
s = requests.get(url_series + series, params=params, headers=headers)

s.status_code # 200 for successful response

result = s.json()

#reformat to dataframe 
df = pd.DataFrame.from_dict(json_normalize(result), orient='columns')
df['regionnr']=pd.to_numeric(df['regionnr'], downcast='float')
df['value']=pd.to_numeric(df['value'], downcast='float')

df.head()

Out[3]:

	freq	period	regionname	regionnr	topic	value	variable
0	a	2010	Aeugst am Albis	1.0	sociodemo	0.419408	civilstatus_single
1	a	2011	Aeugst am Albis	1.0	sociodemo	0.425654	civilstatus_single
2	a	2012	Aeugst am Albis	1.0	sociodemo	0.429668	civilstatus_single
3	a	2013	Aeugst am Albis	1.0	sociodemo	0.424949	civilstatus_single
4	a	2014	Aeugst am Albis	1.0	sociodemo	0.424196	civilstatus_single

As we are only interested in the 5 main cities, we select a subset of the data frame. It is good to keep in mind that the dataset does not use any special characters often used in German (ä, ö, ü) or French (é, è, â etc.).

In [4]:

cities=['Bern', 'Basel', 'Geneve', 'Lausanne', 'Zuerich' ]
df=df.loc[df['regionname'].isin(cities)]

In [5]:

df['value']=df['value']*100
fig=plt.figure(figsize=(8, 5))
p=sns.lineplot(x="period", y="value",hue="regionname", data=df)
p.set(xlabel='Year', ylabel='Share singles')
p.set_title("Share single in Swiss cities")
p.yaxis.set_major_formatter(PercentFormatter())

plt.show()

We observe that the share of singles is rising - what is not fully unexpected. Interesting are the differences in level by city and that the change is almost constant over time for all cities. Thus, the API let us pull, analyse and plot many data series on municipality level within minutes.

As soon as new data is available, we can simply rerun the script and obtain the analysis within seconds.

If you are often working in Excel, also check out the Nova Excel Add-In, that allows you to access the same database within Excel: novalytica.com/excel

Get more information on the API at: novalytica.com/api