Introduction
Directed Acyclic Graphs (DAGs) are powerful tools for visualizing and understanding causal relationships. In this blog post, we’ll explore common DAG structures that frequently appear in causal inference problems, simulate data according to these structures, and demonstrate how different analytical approaches can lead to correct or incorrect causal estimates. If you want to begin your journey of learning causal inference and don’t know where to start, visit our Causal Inference Guide: Books, Courses, and More.
If you’re interested in obtaining the R code for this blog post, consider purchasing my upcoming book, Causal Inference in Statistics, with Exercises, Practice Projects, and R Code Notebooks. Each Chapter contains a complete case-study with an extensive code notebook that you can use to grasp the principles using code. There are also exercises and practice projects to help you solidify your understanding of the material.
Let’s jump in!
Confounding
One of the most basic causal structures is confounding, where a third variable affects both the treatment and the outcome. Here, \(W\) is the treament, \(Y\) is the outcome, and \(Z\) is a confounder that affects both \(W\) and \(Y\).
Let’s simulate 200 data points that follow this structure and see what happens when we analyze it. The true treatment effect will be set at a value of 5. Since we simulate data that we know has a true treatment effect of 5, we will be able to assess the bias in our methods, i.e. the difference between our estimates and the ground-truth value of 5.
Here is a snapshot of what the dataset looks like.
## W Z Y
## 1 1 4.782948 102.18330
## 2 1 6.872517 145.53130
## 3 0 5.750960 113.37947
## 4 1 6.264373 132.30732
## 5 0 3.044272 64.73679
## 6 1 6.677193 143.98164
Now, let’s fit two different models to this data and compare the results.
- Model 1 : We fit a simple linear regression model of outcome on treament, without adjusting for the confounder: \[Y \sim W\]
- Model 2 : We fit a linear regression model of outcome on treatment, adjusting for the confounder by adding it as a covariate: \[Y \sim W + Z\]
## Intercept W Z
## Y ~ W 90.023053 29.332428 NA
## Y ~ W + Z 2.943876 4.945199 19.74877
We see that when we correctly specify the model, the \(W\) coefficient is close to the true treatment effect of 5. It is not exactly 5 due to sampling variability. However, when we fail to adjust for the confounder, we get a biased estimate.
It is not possible to determine beforehand the size and magnitude of bias based solely on the DAG. However, the DAG structure can help us identify the presence of bias and guide us in the right direction. Further structural knowledge about the relationship between the confounder and the treament/outcome variables can help us assess the magnitude and direction of the bias if we were to omit adjusting for a confounder, e.g. if we did not measure it.
Collider Bias
Another important structure is the collider, where a variable is influenced by both the treatment and the outcome. Formally, a collider has a definition that can be bit tricky1. Informally, a collider is a variable that has two arrows pointing into it (see illustration below, where \(Z\) is now a collider).
Different selection bias mechanisms, such as differential
loss-to-followup, convenience sampling, etc. can all be represented as
bias induced by conditioning on a collider (or one of its descendants)
in a DAG2. One such example that is very common and
not always easy to identify arises when a sample is selected based on
some its characteristics. For example, when assessing the correlation of
athletic ability and intellectual ability, selecting a sample of
students from highly selective universities could induce a spurious
correlation, leading to the false belief that intellectual ability leads
student to achieve higher athletic ability, or vice-versa. See my previous
blog post on selection bias for a detailed illustration of this
example.
Let’s simulate data and see what happens when we condition on a collider. The data looks like this.
## W Z Y
## 1 0 81.94203 3.847860
## 2 0 145.72867 7.258959
## 3 1 68.89566 2.530837
## 4 1 128.56243 5.526076
## 5 1 176.21011 7.905835
## 6 0 -43.43318 -2.080502
We then fit 2 models:
- Model 1: \(Y \sim W\), correctly ignoring the collider
- Model 2: \(Y \sim W + Z\), erreneously adjusting for the collider
## Intercept W Z
## Y ~ W 2.00674769 4.5691843 NA
## Y ~ W + Z 0.01273163 -0.9853293 0.04979243
Looking at these results, we see that the effect estimate for the model that does not adjust for \(Z\) is close to 5, as expected, whereas the model that adjusts for \(Z\) gives a biased estimate. This is because conditioning on a collider can introduce bias in our treatment effect estimate. This can be counterintuitive–controlling for more variables doesn’t always improve your analysis!
Mediators
A mediator is a variable that lies on the causal pathway between exposure and outcome, such as \(M\) in the DAG below.
When working with mediators, we can decompose the total effect into
direct and indirect effects. When the model is linear (as we have
assumed in this example) these effects work additively along distincty
paths. That is, \[\text{Total effect} =
\text{Direct Effect} + \text{Indirect Effect}.\]
In this example, the treatment effect of \(W\) on \(Y\) is 5, and the effect of \(W\) on \(M\) is 2. The effect of \(M\) on \(Y\) is 3. The indirect effect is works multiplicatively along the path \(W \rightarrow M \rightarrow Y\)3. Thus, the total effect is given by \[\begin{align*} \text{Total Effect} &= \text{Direct Effect} + \text{Indirect Effect} \\ &= 5 + 2 \times 3 \\ &= 11. \end{align*}\]
The data looks like this.
## W M Y
## 1 1 0.08795477 4.523313
## 2 1 2.24083179 13.661702
## 3 1 5.33561097 20.571514
## 4 0 1.56165450 4.252607
## 5 0 0.27217363 3.801606
## 6 1 -1.28309216 8.492330
We then fit 3 models:
- Model 1: \(Y \sim W\), ignoring the mediation component
- Model 2: \(Y \sim W + M\), incorporating an adjustment for the mediator
- Model 3: \(M \sim W\), the mediation model, where we model the relationship between the mediator and the treatment indicator
## Intercept W M
## Y ~ W 2.14594107 10.493410 NA
## Y ~ W + M 2.18439123 5.342696 2.830629
## M ~ W -0.01358361 1.819635 NA
We see that when we regress \(Y\) on \(W\), we get an estimate close to 11, as expected. The direct effect of \(W\) on \(Y\) can be obtained via the regression adjusted for \(M\), which blocks the effect that passes through the mediator. We obtain an estimate close to 5, as expected. The effect of \(W\) on \(M\) is close to 2, as given by the coefficient of \(W\) in the \(M \sim W\) regression. The effect of \(M\) on \(Y\) is close to 3, as given by the \(M\) coefficient in the \(Y \sim W +M\) regression. Multiplying the latter two effects we obtain the indirect effect close to 6, as expected.
Conclusion
Understanding these common DAG structures is crucial for accurate causal inference:
- Confounding: Requires adjustment for common causes of treatment and outcome
- Collider bias: Avoid adjusting for variables affected by both treatment and outcome
- Mediation: Be clear about whether you’re estimating direct, indirect, or total effects. In cases with linear models, path analysis rules can be used to quickly decompose the total effect into direct and indirect effects
DAGs provide a powerful visual language for communicating causal assumptions and guiding proper statistical analysis. By understanding these common structures, researchers can better design studies, analyze data, and interpret results.
If you want to receive monthly insights about Causal Inference in Statistics, please consider subscribing to my newsletter. You will receive updates about my upcoming book, blog posts, and other exclusive resources to help you learn more about causal inference in statistics.
Join the stats nerds🤓!
See Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press for a formalization of a collider. The previous link is an affiliate link and we may earn a small commission on a purchase. I also discuss this idea in detail with examples, exercises, data, and code in my upcoming book Causal Inference in Statistics, with Exercises, Practice Projects, and R Code Notebooks↩︎
See Hernán, Hernández-DÃaz, Robins (2004). A structural approach to selection bias. Epidemiology, 15(5), 615-625.↩︎
This technique is known as Path Analysis. I discuss it in detail in Part II of my upcoming book Causal Inference in Statistics, with Exercises, Practice Projects, and R Code Notebooks.↩︎