Blog > API

Fetching data from the opendata.swiss API: A short tutorial in Python

 

opendata.swiss is the Swiss authorities’ portal for open data. Currently, 65 governmental organizations (many federal agencies, but also cantonal agencies, SBB and Post) provide access to 6,278 datasets. The portal offers an easily searchable catalogue of available datasets. Manually downloading the datasets can be cumbersome and the retrieval of data through the API can save time.

 

In this Jupyter Notebook we will retrieve data from open data portal "opendata.swiss". The portal is based on the open source project CKAN. CKAN stands for Comprehensive Knowledge Archive Network. It provides an extensive API for the metadata of the open data catalogue. This means that the information about the datasets can be retrieved from CKAN, but the data itself will have to be downloaded from the servers of the contributors ("opendata.swiss" in this cases).

In this tutorial we will first take a look at GDP of Basel and then at the population of Switzerland using Python 3. Let's start with importing some packages we will use for this exercise.

In [1]:
import pprint
import requests     # 2.18.4
import json         # 2.0.9
import pandas as pd # 0.23.0
 

Like mentioned, the CKAN API functions as a catalog for datasets. We need to define the URL for "opendata.swiss".

In [2]:
# Package list of the Swiss open data portal
packages = 'https://opendata.swiss/api/3/action/package_list'
 

Let's get a list of all the datasets (called packages in CKAN) listed by "opendata.swiss".

In [3]:
# Make the HTTP request
response = requests.get(packages)

# Use the json module to load CKAN's response into a dictionary
response_dict = json.loads(response.content)

# Check the contents of the response
assert response_dict['success'] is True  # make sure if response is OK
result = response_dict['result']         # extract all the packages from the response
pprint.pprint(result)                    # pretty print the list to the screen
 
['__',
 '__1',
 '__10',
 '__100',
 '__101',
 '__102',
 '__103',
 '__104',
 '__105',
 '__106',
 '__107',
 '__108',
 '__109',
 '10-jahre-berufsmaturitat-1994-2004',
 '10-jahre-schweizerische-sozialhilfestatistik',
 '
Many more datasets - removed to keep the post readable
'zuzuge-pers', 'zvv-fahrplan-tram-und-bus', 'zwangsnutzungen', 'zweigstellen-der-musikschule-konservatorium-zurich-mkz', 'zweiradabstellplatze-in-der-stadt-zurich2', 'zweite-vornamen-neugeborener-madchen-und-knaben-mit-wohnsitz-in-der-stadt-zurich-seit-1993']
 

This is quite an extensive list. For this exercise we will take one from this list, called "bruttoinlandprodukt". Other examples could be: 'bevolkerung', 'elektroautos', or 'bevolkerungsdaten-im-zeitvergleich'.

In [40]:
# Specify the package you are interested in:
package = 'bruttoinlandprodukt'
 

Now let's download the package/dataset information. We need to take a few steps:

In [41]:
# Base url for package information. This is always the same.
base_url = 'https://opendata.swiss/api/3/action/package_show?id='

# Construct the url for the package of interest
package_information_url = base_url + package

# Make the HTTP request
package_information = requests.get(package_information_url)

# Use the json module to load CKAN's response into a dictionary
package_dict = json.loads(package_information.content)

# Check the contents of the response.
assert package_dict['success'] is True  # again make sure if response is OK
package_dict = package_dict['result']   # we only need the 'result' part from the dictionary
pprint.pprint(package_dict)             # pretty print the package information to screen
 
{'accrual_periodicity': 'http://purl.org/cld/freq/annual',
 'author': 'Statistisches Amt des Kantons Basel-Stadt',
 'author_email': None,
 'contact_points': [{'email': 'stata@bs.ch',
                     'name': 'Statistisches Amt des Kantons Basel-Stadt'},
                    {'email': 'Bjoern.Lietzke@bs.ch', 'name': 'Björn Lietzke'}],
 'creator_user_id': '0594d621-c92a-4a32-809c-98e281dc7944',
 'description': {'de': 'Das Bruttoinlandprodukt (BIP) ist ein Mass für die '
                       'wirtschaftliche Leistung der kantonalen Wirtschaft. Es '
                       'misst den Wert der im Kanton hergestellten Waren und '
                       'Dienstleistungen, soweit diese nicht als Vorleistungen '
                       'für die Produktion anderer Waren und Dienstleistungen '
                       'verwendet werden. Es wird das nominale BIP ausgewiesen '
                       '(zu laufenden Preisen). ',
                 'en': '',
                 'fr': '',
                 'it': ''},
 'display_name': {'de': 'Bruttoinlandprodukt', 'en': '', 'fr': '', 'it': ''},
 'extras': [{'key': 'guid',
             'value': '4323@statistisches-amt-kanton-basel-stadt'},
            {'key': 'uri',
             'value': 'http://statistisches-amt-kanton-basel-stadt/4323'}],
 'groups': [{'description': {'de': '', 'en': '', 'fr': '', 'it': ''},
             'display_name': {'de': 'Raum und Umwelt',
                              'en': 'Territory and environment',
                              'fr': 'Territoire et environnement',
                              'it': 'Territorio e ambiente'},
             'id': '27b314a5-57b6-4c4e-9c9f-6923365eaecc',
             'image_display_url': '',
             'name': 'territory',
             'title': {'de': 'Raum und Umwelt',
                       'en': 'Territory and environment',
                       'fr': 'Territoire et environnement',
                       'it': 'Territorio e ambiente'}}],
 'id': 'fd670617-d9e3-4793-8c4b-97f0641a7245',
 'identifier': '4323@statistisches-amt-kanton-basel-stadt',
 'isopen': False,
 'issued': '2018-05-03T02:00:00',
 'keywords': {'de': [], 'en': [], 'fr': [], 'it': []},
 'language': ['de'],
 'license_id': None,
 'license_title': None,
 'maintainer': 'Statistisches Amt des Kantons Basel-Stadt',
 'maintainer_email': 'stata@bs.ch',
 'metadata_created': '2018-05-07T14:37:07.105876',
 'metadata_modified': '2018-08-22T05:48:50.932423',
 'modified': '2018-05-03T02:00:00',
 'name': 'bruttoinlandprodukt',
 'num_resources': 1,
 'num_tags': 0,
 'organization': {'approval_status': 'approved',
                  'created': '2017-05-02T08:45:04.263609',
                  'description': {'de': 'Das Statistische Amt Basel-Stadt '
                                        'veröffentlicht Daten aus allen '
                                        'Bereichen des öffentlichen Lebens. Wo '
                                        'immer möglich werden diese auch '
                                        'kleinräumig (Gemeinde, Wohnviertel) '
                                        'ausgewertet und zielgerichtet für die '
                                        'Öffentlichkeit, Politik und '
                                        'Verwaltung aufbereitet. Wir arbeiten '
                                        'unabhängig und bieten eine hohe '
                                        'Datenqualität.',
                                  'en': 'The statistical office Basel-Stadt '
                                        'provides data on all core domains of '
                                        'public life. Where ever possible data '
                                        'is analysed on a small scale '
                                        '(municipal or ward level) and '
                                        'specifically processed for the '
                                        'general public, politics or '
                                        'authorities. We work independently '
                                        'and with a high quality standard.',
                                  'fr': 'L‘office de la statistique Bâle-Ville '
                                        'publie des données sur tous les '
                                        'domaines de la vie publique. Si '
                                        'possible les données sont exploitées '
                                        "à l'échelle communale ou locale et "
                                        'fonctionnalisées pour le grand '
                                        'public, la politique ou '
                                        'l’administration publique. Nous '
                                        'travaillons indépendamment assurant '
                                        'des données haut de gamme.',
                                  'it': 'L‘ufficio di statistica di Basilea '
                                        'Città fornisce dati su tutti i campi '
                                        'della vita pubblica. Nel limite del '
                                        'possibile i dati sono elaborati anche '
                                        'a livello locale (comuni e quartieri) '
                                        'e preparati in modo mirato per il '
                                        'pubblico, la politica e '
                                        'l’amministrazione. Lavoriamo '
                                        'indipendentemente e offriamo dati di '
                                        'alta qualità. '},
                  'display_name': {'de': 'Statistisches Amt Kanton Basel-Stadt',
                                   'en': 'Statistical Office Basel-Stadt',
                                   'fr': 'Office de la statistique Bâle-Ville',
                                   'it': 'Ufficio di statistica di Basilea '
                                         'Città'},
                  'groups': [{'capacity': 'public',
                              'name': 'kanton-basel-stadt'}],
                  'id': '9545d293-89b6-41e5-bbc0-0868a7b4e094',
                  'image_display_url': 'https://opendata.swiss/content/uploads/2017/05/Stata-BS.jpg',
                  'image_url': 'https://opendata.swiss/content/uploads/2017/05/Stata-BS.jpg',
                  'is_organization': True,
                  'name': 'statistisches-amt-kanton-basel-stadt',
                  'num_followers': 0,
                  'package_count': 174,
                  'political_level': 'canton',
                  'revision_id': '4b876fc0-4c25-4f8e-ba64-5df1a836c7aa',
                  'state': 'active',
                  'tags': [],
                  'title': {'de': 'Statistisches Amt Kanton Basel-Stadt',
                            'en': 'Statistical Office Basel-Stadt',
                            'fr': 'Office de la statistique Bâle-Ville',
                            'it': 'Ufficio di statistica di Basilea Città'},
                  'type': 'organization',
                  'url': 'http://www.statistik.bs.ch/ogd'},
 'owner_org': '9545d293-89b6-41e5-bbc0-0868a7b4e094',
 'private': False,
 'publishers': [{'label': 'Statistisches Amt des Kantons Basel-Stadt'}],
 'relations': {},
 'relationships_as_object': [],
 'relationships_as_subject': [],
 'resources': [{'byte_size': 0,
                'cache_last_updated': None,
                'cache_url': None,
                'created': '2018-05-07T14:37:07.110218',
                'datastore_active': False,
                'description': {'de': 'Umweltbericht beider Basel',
                                'en': '',
                                'fr': '',
                                'it': ''},
                'display_name': {'de': '', 'en': '', 'fr': '', 'it': ''},
                'download_url': 'https://raw.githubusercontent.com/StataBS/indikatoren/master/data/4323.tsv',
                'format': 'tab-separated-values',
                'hash': '',
                'id': '624534c0-38f1-4da8-aba5-287841035185',
                'identifier': 4323,
                'issued': '2018-05-03T02:00:00',
                'language': ['de'],
                'last_modified': None,
                'media_type': 'text/tab-separated-values',
                'mimetype': 'text/tab-separated-values',
                'mimetype_inner': None,
                'modified': '2018-05-03T02:00:00',
                'name': {'de': '', 'en': '', 'fr': '', 'it': ''},
                'package_id': 'fd670617-d9e3-4793-8c4b-97f0641a7245',
                'position': 0,
                'resource_type': None,
                'revision_id': 'c86b26d0-4c4a-4c78-918d-2084d0323df0',
                'rights': 'NonCommercialAllowed-CommercialAllowed-ReferenceRequired',
                'size': None,
                'state': 'active',
                'title': {'de': '', 'en': '', 'fr': '', 'it': ''},
                'uri': 'http://www.statistik.bs.ch/de/4323',
                'url': 'https://github.com/StataBS/indikatoren/tree/master/data/4323.tsv',
                'url_type': None}],
 'revision_id': 'ca50c2c9-c454-49bd-9c60-cebd8246a972',
 'see_alsos': {},
 'state': 'active',
 'tags': [],
 'temporals': [{'end_date': '2014-12-31T01:00:00',
                'start_date': '1980-01-01T01:00:00'}],
 'title': {'de': 'Bruttoinlandprodukt', 'en': '', 'fr': '', 'it': ''},
 'type': 'dataset',
 'url': 'https://statabs.github.io/indikatoren/chart-details.html?id=4323',
 'version': None}
 

Did you walk through the information above? Is this indeed the dataset you are interested in? In the description we can see that it is the GDP (or Bruttoinlandprodukt in German) of the canton of Basel-Stadt. This is important as it might be mixed up with the GDP of the country or an other canton or region. If yes, then you need to download the dataset. It is also important to know the format of the dataset, for next steps. This information is also listed in the package information above.

In [39]:
# Get the url for the data from the dictionary
data_url = package_dict['resources'][0]['url']
print('Data url:     ' + data_url)

# Print the data format
data_format = package_dict['resources'][0]['format']
print('Data format:  ' + data_format)
 
Data url:     https://www.bfs.admin.ch/bfsstatic/dam/assets/3442548/master
Data format:  XLS
 

Notice that this particular dataset is hosted at GitHub. When downloading from GitHub, it is better to request the raw data. We need to rewrite the URL a little bit to get there.

In [31]:
# If data is hosted at GitHub, always download the raw data
if data_url.startswith('https://github.com/'):
    data_url = data_url.replace('https://github.com/', 'https://raw.githubusercontent.com/')
    data_url = data_url.replace('tree/', '')
print('Data url:     ' + data_url)
 
Data url:     https://raw.githubusercontent.com/StataBS/indikatoren/master/data/4323.tsv
 

Feel free to take a look at the URL's above. It is good to take a sneak peak so you know what the data will look like. The dataset can come in different formats, so let's specify which ones we are willing to accept and load them into a Pandas DataFrame.

In [32]:
# List of formats we work with in this excercise
csv = ['comma-separated-values', 'CSV', 'csv']
tsv = ['tab-separated-values', 'TSV', 'tsv']
xls = ['XLS']

# Download the data to a Pandas DataFrame. Use seperate function calls, depending on the format of the dataset.
if any(s in data_format for s in csv):     # pd.read_csv()
    df = pd.read_csv(data_url)
elif any(s in data_format for s in tsv):   # pd.read_csv() and specify the delimiter
    df = pd.read_csv(data_url, sep='\t')
elif any(s in data_format for s in xls):   # pd.read_excel()
    df = pd.read_excel(data_url)
else:
    print('Sorry, the data format is not supported for this exercise')

# Print the first rows to the screen to inspect the dataset    
df.head(5)
Out[32]:
 
  Jahr Bruttoinlandprodukt DateTime
0 1980 8746.8  
1 1981 9304.6  
2 1982 10033.6  
3 1983 10468.1  
4 1984 11241.2  
 

As you can see, we need to make a few adjustments before we can continue. It is best to clean up the dataset before you start doing your analysis.

In [33]:
# Remove the column 'DateTime', because it is empty
df.drop('DateTime', axis=1, inplace=True)

# Make 'Jahr' the index
df.set_index('Jahr', inplace=True)
 

That's it! Now let's visualize the data with Pandas built-in plot functionality, which is based on 'matplotlib'. Voilà, here we see the trend of the GDP of Basel since 1980.

In [34]:
# Use IPython's "magic" in Jupyter Notebook to directly show the plot on the screen.
%matplotlib inline
df.plot()
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x11af99a90>
 
 

For Advanced Users

For advanced users only: download the dataset called "bevolkerungsdaten-im-zeitvergleich" and make a chart of the Swiss population over time. Please be aware this is an Excel file and you need to do a lot of cleaning to get to the final result. You probably need to print the DataFrame to the screen after every step to inspect if the operation was successful and what needs to be done next.

In [46]:
# Check the package information, as done above
package_information = requests.get('https://opendata.swiss/api/3/action/package_show?id=bevolkerungsdaten-im-zeitvergleich')
package_dict = json.loads(package_information.content)
package_dict = package_dict['result']   
data_url = package_dict['resources'][0]['url']

# Download the data into a Pandas DataFrame
df = pd.read_excel(data_url)

# Transpose the DataFrame, to get the years in a column
df = df.transpose()

# Drop the first (useless) line from the DataFrame. Use 'inplace=True' so it amends the existing DataFrame (not a copy)
df.drop(df.index[0], inplace=True)

# Keep only the columns of interest: year & residents
df = df.iloc[:, [2,6]]

# Rename the columns
df.columns = ['year', 'residents']

# Reset the index. Use 'drop=True' so no duplicate is made of the old index
df = df.reset_index(drop=True)

# Fix the year "2010" at row index 20
df.year[20] = 2010

# Convert all the years to integers so they can be transformed to DateTime later
df.year = df.year.apply(int)

# Convert the years to DateTime format
df.year = pd.to_datetime(df.year, format='%Y')

# Make the years the index of the DataFrame
df.set_index('year', inplace=True)

# Finally print the DataFrame to screen
print(df)
 
           residents
year                
1950-01-01      4717
1960-01-01      5360
1970-01-01      6193
1980-01-01      6335
1990-01-01      6751
1995-01-01      7062
1996-01-01      7081
1997-01-01      7096
1998-01-01   7123.54
1999-01-01      7164
2000-01-01      7204
2001-01-01      7256
2002-01-01      7314
2003-01-01      7364
2004-01-01    7415.1
2005-01-01   7459.13
2006-01-01   7508.74
2007-01-01   7593.49
2008-01-01   7701.86
2009-01-01   7785.81
2010-01-01      7870
2011-01-01   7954.66
2012-01-01   8039.06
2013-01-01   8139.63
2014-01-01      8238
2015-01-01      8327
2016-01-01   8419.55
In [47]:
# Last but not least: make a chart!
df.plot()
Out[47]:
<matplotlib.axes._subplots.AxesSubplot at 0x11b195da0>