
Bear in mind that a correlation is not the same thing as cause-effect. Failing to allow for confounding variables can result in assuming there is a cause-effect relationship between two variables when there is in fact another variable behind the phenomenon. This is likely to result in more drownings than on a cold day.Ī confounding variable is therefore a variable that is outside the scope of the existing analytical model but that does influence both the explanatory variable (in this case, ice cream sales) and the dependent variable (the number of drownings). If the weather is hotter, people will eat more ice cream and more people will go swimming. In this case, the confounding variable will be the temperature. If the research results show that when more ice creams are sold more people drown, ask whether they have checked for what are known as confounding variables. If the answer is no, it is highly likely that the outcomes of the analysis will not be applicable for all customers. Always ask the data analyst whether they have done a training or test sample. If the analyst looks at you with a rather glazed expression, there is a good chance that the outcomes of the analysis have not been validated and therefore might not apply to the whole database of customers. Always ask the data analyst what he or she has done to validate the model.

Overfitting risks causing a certain assumption to be treated as the truth whereas in practice it is actually not the case. Underfitting means when a model gives an oversimplistic picture of reality. For example, by basing the conclusions on the median – the middle value. If someone presents you with average values, you should check whether they have been corrected for outliers. Just think: a customer with extreme spending habits can have a huge effect on the average profit per customer. Outliers can make it a dangerous business to base a decision on the “average”. Values that are much higher, or much lower, than the region of almost all the other values. You can spot outliers by inspecting the data closely, and particularly at the distribution of values. Or a consumer with €10 million in their savings account. Avoid false extrapolation and make sure the results are applicable for the entire population.Īn outlier is an extreme data value. So you should always ask what sort of sample has been used for research. Just look at opinion polls in elections: Can it really be true that so many voters completely change their mind on the last day, or is it more likely that the sample on which the poll is based is not a good reflection of all the voters? This too can be done deliberately or unwittingly. Frequently, there is also selection bias in customer panels: The customers that you (easily) find willing to participate in a customer panel are far from being “average customers”. As a result, the sample used is not a good reflection of the population. This occurs when data is selected subjectively. It is therefore advisable to not doggedly set out to prove a predefined conclusion, but rather to test presumed hypotheses in a targeted way. This often occurs when data analysts are briefed in advance to support a particular conclusion. by intentionally excluding particular variables from the analysis. They then keep looking in the data until this assumption can be proven. Occurs when the person performing the data analysis wants to prove a predetermined assumption.

We have set out the 5 most common types of bias: When people who analyse data are biased, this means they want the outcomes of their analysis to go in a certain direction in advance. For example, drawing conclusions for the entire population of the Netherlands based on research into 10 students (the sample). When data is biased, we mean that the sample is not representative of the entire population. Data can be biased but so can the people who analyse the data. Bias is taken to mean interference in the outcomes of research by predetermined ideas, prejudice or influence in a certain direction. The chief cause of making the wrong decisions is what we call ‘bias’. Decisions made on the basis of such truth can subsequently turn out to have been incorrect.

This can lead to the outcome of analysis being mistakenly treated as truth. Results from data and analysis can be deliberately or unwittingly misinterpreted. It is important to remember that decisions are not necessarily guaranteed to be successful just because they are based on data. This is a positive development, but at the same time it brings with it a number of pitfalls. As such, decisions are considered to be based on insights, supported by adequate data, analytics and research methods & techniques. According to EMC (‘Digital Universe’ report from 2014), the quantity of data is going to see a tenfold increase between 20. There has never been so much data available for making decisions.
