Association Does Not Imply Causation, Except When it Does – A Causal Inference Perspective

Picture of Justin Belair

Justin Belair

Biostatistician in Science & Tech | Consultant | Author of Causal Inference in Statistics

Table of Contents

association_to_causation_blog.knit

The Challenge of Causal Inference

Ever wondered why researchers are so cautious when saying “X causes Y” instead of just “X is associated with Y”? The difference isn’t just semantic—it’s at the heart of scientific rigor and the foundation of evidence-based decision-making.

In observational studies, we often find ourselves with data that shows an association between variables, but determining whether this association represents a true causal relationship requires careful consideration of both theory and methodology.

This content is based on Chapter 4 of my upcoming book, Causal Inference in Statistics, with Exercises, Practice Projects, and R Code Notebooks. Follow the link to download the first Chapter free.

If you want to begin your journey of learning causal inference and don’t know where to start, visit our Causal Inference Guide: Books, Courses, and More.

Understanding Counterfactuals

At the core of modern causal inference is the concept of counterfactuals. For each individual in our study, we can imagine two parallel worlds:

  • In one world, they receive the treatment/exposure
  • In another world, they don’t receive it

The difference between these potential outcomes represents the true causal effect. But we face a fundamental problem: for each individual, we can only observe one of these worlds—this is the “fundamental problem of causal inference.”

Formally, we denote the potential outcomes as \((Y_i(0),Y_i(1))\), where

  • \(Y_i(0)\) is the outcome for individual \(i\) if they don’t receive the treatment
  • \(Y_i(1)\) is the outcome for individual \(i\) if they do receive the treatment
  • The individual causal effect would be \(Y_i(1)−Y_i(0)\), but we can never observe both for the same unit.

Associational vs. Causal Effect Measures

When analyzing observational data, we typically work with associational measures that can be directly computed from the observed data. Consider a simple 2×2 contingency table based on real data for tobacco smoking in pregnant mothers and their infant mortality:

mortality_tobacco <- matrix(c(
  723, 16255,  # exposed, outcome present; exposed, outcome absent
  3049, 84155  # unexposed, outcome present; unexposed, outcome absent
), nrow = 2, byrow = TRUE)
Contingency Table of Exposure and Outcome
Outcome+ (Death) Outcome- (Survival)
Exposed+ (Smoking Mother) 723 16255
Exposed- (Nonsmoking Mother) 3049 84155

From this table, we can calculate the risks in the exposed and the unexposed, as follows:

risk_exposed <- mortality_tobacco[1,1] / sum(mortality_tobacco[1,])
risk_unexposed <- mortality_tobacco[2,1] / sum(mortality_tobacco[2,])
## Risk for the Exposed+ : 0.043
## Risk for the (un)Exposed- : 0.035

Using these risks, we can compute the following measures of association.

1.Risk Difference (RD): The absolute difference in outcome probability between exposed and unexposed groups. Formally, \[\text{Associational RD} = P(Y^{\text{obs}}=1|W=1) - P(Y^{\text{obs}}=1|W=0).\]

We can compute this using our mortality 2x2 data as follows:

  RD <- risk_exposed - risk_unexposed
  cat("Risk Difference:", round(RD, 3))
## Risk Difference: 0.008

2.Risk Ratio (RR) : The ratio of outcome probability in the exposed group to that in the unexposed group. Formally, \[\text{Associational RR} = \frac{P(Y^{\text{obs}}=1|W=1)}{P(Y^{\text{obs}}=1|W=0)}.\].

We can compute this using our mortality 2x2 data as follows:

  RR <- risk_exposed / risk_unexposed
  cat("Risk Ratio:", round(RR, 3))
## Risk Ratio: 1.218

3.Odds Ratio (OR) - The ratio of the odds of the outcome in the exposed group to the odds in the unexposed group. Formally,

\[\begin{align*} \text{Associational OR} &= \frac{\text{Odds}(Y^{\text{obs}}=1|W=1)}{\text{Odds}(Y^{\text{obs}}=1|W=0)} \\ &= \frac{P(Y^{\text{obs}}=1|W=1)/P(Y^{\text{obs}}=0|W=1)}{P(Y^{\text{obs}}=1|W=0)/P(Y^{\text{obs}}=0|W=0)}. \end{align*}\]

We can compute this using our mortality 2x2 data as follows:

OR <- (mortality_tobacco[1,1] * mortality_tobacco[2,2]) / 
      (mortality_tobacco[1,2] * mortality_tobacco[2,1])
cat("Odds Ratio:", round(OR, 3))
## Odds Ratio: 1.228

These are all associational effect measures. They describe the statistical relationship between exposure and outcome in our observed data, but they don’t necessarily represent causal effects.

The Bridge: From Association to Causation

The causal versions of these measures are defined in terms of counterfactual risks. For the associational measures, we omit the obs superscript for clarity:

Comparison of Associational and Causal Effect Measures
Measure Associational Causal
Risk Difference P(Y=1|W=1) - P(Y=1|W=0) P(Y(1)=1) - P(Y(0)=1)
Risk Ratio P(Y=1|W=1) / P(Y=1|W=0) P(Y(1)=1) / P(Y(0)=1)
Odds Ratio Odds(Y=1|W=1) / Odds(Y=1|W=0) Odds(Y(1)=1) / Odds(Y(0)=1)

To bridge the gap between associational and causal measures, we need three key assumptions:

  1. Consistency: The observed outcome under a given treatment value equals the potential outcome under that treatment. Formally, if \(W_i= t\), then \[Y_i^\text{obs}= Y_i(t),\] for all \(t\).
  2. Exchangeability (or unconfoundedness): The potential outcomes are independent of the treatment assignment, possibly conditional on covariates \(X\). Formally, conditional exchangeability is defined as \[Y(t) \perp W \, | \, X,\] for all treatment values \(t\).
  3. Positivity: Every unit has a non-zero probability of receiving each treatment level. Formally, \[0<P(W_i=t∣X=x)<1,\] for all \(i\), all \(t\), and all \(x\) with \(P(X=x)>0\).

Indeed, when the counterfactuals are conditionally exchangeable, i.e. condition 2, we can compute a causal effect measure in every level defined by \(X\), provided consistency and positivity hold1. This can be done using the conditional risks, such as

\[P(Y^{\text{obs}} = 1 | W = t, X = x) = P(Y(t) = 1 | X = x).\]

Then, to obtain the unconditional risk, we use the law of total probability and sum the conditional risks by weighting by the probability of each level of \(X\):

\[\begin{align*} P(Y(t)) = \sum_{X=x} P(Y(t) = 1 | X = x)\cdot P(X=x) \end{align*}\]

Thus, when these assumptions hold, we can use the observed risks to stand in for the true counterfactual risks, which are otherwise unobservable. Then, the associational measures presented in the table above equal their causal counterparts23. We say that the causal effect is identifiable.

Of course, these assumptions are unverifiable. Consistency is abstract and untestable, but it is plausible if we have a well-enough defined treatment \(W\), there is no hidden-treatment variation, and other important conditions related to study design. Exchangeability is a strong assumption, since even if we measure many covariates, there may always be hidden confounders that make our counterfactuals unexchangeable. Positivity is violated in practice, but it can be partially tested for in the data.

Despite the untestability of these assumptions, they provide a framework for thinking about causality and a guide for the design of studies that can help us move from association to causation. It is in fact one of the major breakthroughs of modern causal inference to pinpoint precisely the unverifiable assumptions one needs in order to speak rigorously about causal inference. Researchers can then turn their attention to making these assumptions plausible, namely by ruling out competing explanations for the observed associations that could vitiate the causal assumptions.

A Real-World Example: The Low Birth-Weight Paradox

Consider a study on the effect of smoking on low birth weight (LBW) in newborns. When looking at data from LBW newborns, the ones that have mothers who smoke tend to suffer less mortality than the ones whose mothers do not smoke. This paradoxical feature puzzled researchers for a while.

In Chapter 4 of my upcoming book, Causal Inference in Statistics, with Exercises, Practice Projects, and R Code Notebooks, I perform a complete case-study of the Low Birth-Weight Paradox using real-world data, with R and Python code which is available when you purchase the book. In this data, if we filter out only the low-birth weight babies, the associational measures are biased due to selection bias. This happens often when studying certain samples or certain populations: we select them for study based on certain characteristics, inadvertently biasing our causal estimates and making them useless.

The presence of such distorting biases can be determined by postulating the causal structure linking the different variables at play using Directed Acyclic Graphs (DAGs). These modern tools help researchers make the assumptions needed for causal inference more transparent, allowing them to focus on the plausibility of these assumptions rather than debating poorly-defined concepts. This is beyond the scope of this blog post.

Conclusion

In this blog post, we’ve explored the difference between associational and causal effect measures, and how we can bridge the gap between them using the potential outcomes framework. We’ve also discussed the key assumptions needed to move from association to causation, and how these assumptions can guide the design of studies and the interpretation of results.

If you want to receive monthly insights about Causal Inference in Statistics, please consider subscribing to my newsletter. You will receive updates about my upcoming book, blog posts, and other exclusive resources to help you learn more about causal inference in statistics.

Join the stats nerds🤓!


  1. The reason we need consistency is to replace the observed \(Y^\text{obs}\) by the potential outcome \(Y(t)\), and positivity is needed to ensure that \(P(Y^{\text{obs}} = 1 | W = t, X = x)\) is well defined. Indeed, if there existed values of \(t\) and \(x\) such that \(P(W=t∣X=x)=0\), then \[P(Y^{\text{obs}} = 1 | W = t, X = x) = \frac{P(Y^{\text{obs}} = 1, W = t | X = x)}{P(W = t| X = x)},\] which would imply division by \(0\).↩︎

  2. The proof of this result is based on the law of total probability and the definition of conditional independence. For a detailed explanation, see the Appendix of my book Causal Inference in Statistics, with Exercises, Practice Projects, and R Code Notebooks.↩︎

  3. We could also use Inverse Probability Weighting (IPW) to estimate the counterfactual risk. This method involves weighting the observed data by the inverse of the probability of receiving the treatment level that was actually received. See Chapters 4 and 5 of my book Causal Inference in Statistics, with Exercises, Practice Projects, and R Code Notebooks for more details.↩︎

Scroll to Top

Get Our GitHub Code Library For Free