Accessing datasets in a structured form through an API can often simplify the life of a data analyst - especially if the same data series are used repeatedly. Unfortunately, many public data sources such as the Federal Statistical Office (BFS) do not provide data access through an API (STAT-TAB makes life a bit easier, but is not fully automated). While opendata.swiss offers a great way to explore available public datasets. The high number of different topics, datasets and data contributors makes standardization highly difficult. Hence, when pulling the data, it comes in its original structure and often requires in depth processing (See our blog on fetching data from opendata.swiss in Python.
For people interested in data capturing regional development, Novalytica offers an open data API covering several hundred data series in the area of local level data that contain a regional relation (e.g. municipality, ms-region, canton etc.). This offers the advantage that the data comes in a structured format and is always updated - thus, once an automated analysis or report is prepared, it automatically pulls the most recent data.
The free subscription includes more than 200 series that cover the areas socio-demographics, economic development, real estate and politics. The Premium subscription offers more detailed local level datasets that are not from public sources, for example real estate prices and rents, job openings, Airbnb listings, investor returns, bankruptcies and company foundings as well as several machine learning based industry indicators.
Let’s try it out: After signing up on the website, within 8 hours, I received my account to the free subscription as well as a full list of available data series and their respective API keys. On their website, they provide example scripts for R and Python, what makes it easy to get started.
I am interested in plotting the development of the share of singles over the last five years for the 5 largest Swiss cities. Let's first load the required Python libraries:
import requests
import pandas as pd
from pandas.io.json import json_normalize
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
username = 'USER'
password = 'PASSWORD'
I first send a token request for authentication. With the obtained token, I can then send all further requests.
url_tokens = 'http://nova-db.com/api/v1/tokens'
url_series = 'http://nova-db.com/api/v1/series/'
#TOKEN REQUEST
r = requests.post(url_tokens, auth=(username, password))
r.status_code # 200 for successful response
response = r.json()
# Prepare token for bearer authentication
token = response['token']
token_bearer = 'Bearer ' + token
print(token_bearer)
headers = {
'Authorization': token_bearer,
}
Bearer 1tA7S4XIq1HIe4OuFAeOR28AOSBhvH4h
Now, we can define the series of interest. In our case the share of singles with the key 'civilstatus_single'. We pull all municipalities and then exclude everything but the 5 cities.
# Series of interest
series = 'civilstatus_single'
# Optional search parameters. Set to "None" to get all.
params = {}
#DATA REQUEST
s = requests.get(url_series + series, params=params, headers=headers)
s.status_code # 200 for successful response
result = s.json()
#reformat to dataframe
df = pd.DataFrame.from_dict(json_normalize(result), orient='columns')
df['regionnr']=pd.to_numeric(df['regionnr'], downcast='float')
df['value']=pd.to_numeric(df['value'], downcast='float')
df.head()
freq | period | regionname | regionnr | topic | value | variable | |
---|---|---|---|---|---|---|---|
0 | a | 2010 | Aeugst am Albis | 1.0 | sociodemo | 0.419408 | civilstatus_single |
1 | a | 2011 | Aeugst am Albis | 1.0 | sociodemo | 0.425654 | civilstatus_single |
2 | a | 2012 | Aeugst am Albis | 1.0 | sociodemo | 0.429668 | civilstatus_single |
3 | a | 2013 | Aeugst am Albis | 1.0 | sociodemo | 0.424949 | civilstatus_single |
4 | a | 2014 | Aeugst am Albis | 1.0 | sociodemo | 0.424196 | civilstatus_single |
As we are only interested in the 5 main cities, we select a subset of the data frame. It is good to keep in mind that the dataset does not use any special characters often used in German (ä, ö, ü) or French (é, è, â etc.).
cities=['Bern', 'Basel', 'Geneve', 'Lausanne', 'Zuerich' ]
df=df.loc[df['regionname'].isin(cities)]
df['value']=df['value']*100
fig=plt.figure(figsize=(8, 5))
p=sns.lineplot(x="period", y="value",hue="regionname", data=df)
p.set(xlabel='Year', ylabel='Share singles')
p.set_title("Share single in Swiss cities")
p.yaxis.set_major_formatter(PercentFormatter())
plt.show()
We observe that the share of singles is rising - what is not fully unexpected. Interesting are the differences in level by city and that the change is almost constant over time for all cities. Thus, the API let us pull, analyse and plot many data series on municipality level within minutes.
As soon as new data is available, we can simply rerun the script and obtain the analysis within seconds.
If you are often working in Excel, also check out the Nova Excel Add-In, that allows you to access the same database within Excel: novalytica.com/excel
Get more information on the API at: novalytica.com/api