Data awareness: Honey produced in the US and people who drowned falling from fishing boats
Why are bees responsible for so many fishing boat related deaths? Answer: they’re not. Just because two data sets are correlated it does not mean they are linked or impact each other.
Politicians, activists and journalists often wittingly or unwittingly make claims that things are linked which are false or far from proven.
Data should always be viewed critically and with caution.
Data can, intentionally or not, be presented misleadingly. Honey produced by bee colonies in the US is highly correlated with the number of people who drowned falling from fishing boats. Such a high degree of correlation between two datasets can indicate a causal link underlying the data. In this example, however, the two datasets are clearly unconnected.
Spurious correlations occur either purely by chance or by the presence of a third lurking factor linking the data. While spurious correlations, such as the example above, can be amusing, there is a more serious side. Politicians, activists and journalists often present data in this misleading fashion. Much of the time, this is merely careless analysis. Still, occasionally spurious correlations can be deliberately presented as causally related to mislead people intentionally.
It is essential to critically examine data presented and consider the underlying theory and logical connections between datasets. The mantra that ‘correlation does not imply causation’ helps us resist our natural inclination to look for patterns and ascribe relationships and meaning to them.