Causal Inference (CI) is a data analysis approach that seeks to identify causal relationships between phenomena. In simple terms, CI helps answer the question:
“Did one event affect another?” For example, does a job retraining program increase a person’s chances of finding employment? Or does the introduction of a new drug reduce the likelihood of hospitalization?
To answer such questions, analyzing correlation alone is not sufficient. Although two variables may be related, this does not necessarily mean that one causes the other—a point commonly summarized by the phrase
“correlation does not imply causation.” Why is this the case?
One reason is that when we construct a correlation between two variables and try to establish causality (that one variable causes the other), we often miss the
influence of a third variable (an omitted variable)—that is, something that we did not take into account in the calculation but which, in reality, has an effect.
A common demonstration of the presence of a third variable is the example of ice cream and sunburn. If we construct a correlation between the amount of ice cream consumed and the number of sunburn cases, the coefficient would indicate a fairly strong relationship. However, we cannot logically say that ice cream affects sunburns or vice versa. In this case, we are missing a third variable: the season of the year. It is summertime, with hot weather, that encourages people to buy ice cream and, at the same time, increases the likelihood of sunburn.
Causal inference is widely used in economics, medicine, the social sciences, and business analytics to evaluate the effectiveness of policies, drugs, educational programs, and marketing strategies.
One example is a study by
Eric Chyn, in which the author examined the long-term effects of the forced relocation of children from high-crime neighborhoods to less disadvantaged ones following the demolition of public housing in Chicago. Using data on employment, income, and education, the author compared two similarly characterized groups:
– Those who were forced to move (experimental group);
– Those who remained living in the same neighborhood (control group).
This quasi-experimental design made it possible to assess causal effects, since the relocation was not initiated by the residents themselves but was driven by the emergency condition of the buildings. This reduces the likelihood of systematic differences between the groups and allows differences in living standards to be interpreted as the result of relocation.
The figure shows the results of the comparison between the two groups, where the group of those who moved is further divided by age at the time of relocation (7–12 years old and 13–18 years old). The graph consists of two panels: the left panel represents employment, and the right panel represents income. According to the study, forced relocation has a positive effect, especially among the younger age group.
- Figure from Eric Chin (2018). Effects on employment and earnings as a function of measurement age, where the X-axis is age and the Y-axis is the treatment effect