Blog

Blog Categories

Among the variety of open source relational databases, PostgreSQL is probably one of the most popular due to its functional capacities. That is why it is frequently used among all the areas of work where databases are involved. In this article, we will go through connection and usage of PostgreSQL in R. R is an open source language for statistical and graphics data analysis providing scientists, statisticians, and academics powerful tools...
What is Exploratory Data Analysis Exploratory data analysis (EDA) is a powerful tool for a comprehensive study of the available information providing answers to basic data analysis questions. What distinguishes it from traditional analysis based on testing a priori hypothesis is that EDA makes it possible to detect — by using various methods — all potential systematic correlations in the...
Introduction Exploratory data analysis (EDA) is an approach to data analysis to summarize the main characteristics of data. It can be performed using various methods, among which data visualization takes a great place. The idea of EDA is to recognize what information can data give us beyond the formal modeling or hypothesis testing task. In other words, if initially we don’t have at all or there are not enough priori ideas about...
In the modern world, the information flow which befalls on a person is daunting. This led to a rather abrupt change in the basic principles of data perception. Therefore visualization is becoming the main tool for presenting information. With the help of visualization, information is presented to the audience in a more accessible, clear, visual form. Properly chosen method of visualization can make it possible to structure large data arrays,...
The more carefully you process the data and go into details, the more valuable information you can get for your benefit. Data visualization is an efficient and handy tool for gaining insights from data. Moreover, you can make the data far more understandable, colorful and pleasant with the help of visualization tools. As data is changing every second, it is an urgent task to investigate it carefully and get the insights as fast as...
   Random Forest is a powerful ensemble learning method that can be applied to various prediction tasks, in particular classification and regression. The method uses an ensemble of decision trees as a basis and therefore has all advantages of decision trees, such as high accuracy, easy usage, and no necessity of scaling data. Moreover, it also has a very important additional benefit, namely perseverance to overfitting (unlike...
In this Jupyter Notebook we will retrieve data from the European Central Bank (ECB). The ECB publishes through the European Open Data Portal, which we discussed in the previous tutorial . Before diving into the code, please take a quick look at the following websites, to get a feel for what we will be dealing with. EU portal: https://data.europa.eu/euodp/en/data/publisher/ecb ECB SDMX 2.1 RESTful web...
   A common and very challenging problem in machine learning is overfitting, and it comes in many different appearances. It is one of the major aspects of training the model. Overfitting occurs when the model is capturing too much noise in the training data set which leads to bad predication accuracy when applying the model to new data. One of the ways to avoid overfitting is regularization technique. In this tutorial, we...
Companies have a growing demand to visualize their data with business intelligence tools. We compared the salaries from across 10 different European countries using   Glassdoor , which offers self-reported salary information by location and employer, giving us some key insights into the salaries of people with “Business Intelligence” in their job title. Switzerland with the highest salary for Business Intelligence There...
The random forest algorithm is the combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. It can be applied to different machine learning tasks, in particular, classification and regression. Random Forest uses an ensemble of decision trees as a basis and therefore has all advantages of decision trees, such as high accuracy,...
  For many machine learning problems with a large number of features or a low number of observations, a linear model tends to overfit and variable selection is tricky. Models that use shrinkage such as Lasso and Ridge can improve the prediction accuracy as they reduce the estimation variance while providing an interpretable final model. In this tutorial, we will examine Ridge and Lasso...
  The EU Open Data Portal gives access to open data published by EU institutions, agencies  and  other bodies. Around 70 EU institutions, bodies or departments use the platform to make over 12,500 datasets available. In this Jupyter Notebook we will retrieve data from open data portal " http://data.europa.eu/euodp/en/home ". The portal is based on the open source project CKAN. CKAN stands for...