When do you code your variables as levels in a factor, and when do you enter them as binary variables in a model? As it turns out, it isn’t necessarily a straightforward question to answer, and it as cliche as it sounds, it depends on your research question.
Using examples from a research project studying the associations between an ML-estimated metric of aging and bipolar disorder, we’ll discuss:
- Factors consider when deciding whether to recode your variables into multiple levels of a factor
- Implications of making such a decision
Background
For some context, bipolar disorder (BD) is known to be associated with both clinical and biological markers of premature aging, just as the case with other brain disorders such as schizophrenia and traumatic brain injury. Studies have shown that individuals with BD exhibit signs of advanced or accelerated aging compared to healthy individuals. The following are a subset of reasons for studying aging in bipolar disorder:
- Age-related Comorbidities: Patients with BD frequently present with age-related health issues earlier than expected, such as cardiovascular diseases, diabetes, and neurocognitive decline. These conditions contribute to the overall burden of the illness and impact the quality of life and longevity of affected individuals (Wrigglesworth et al., 2021).
- Cognitive Decline: Cognitive deficits are common in BD and often worsen with age. Studies have reported that people with BD show more significant cognitive decline over time compared to healthy individuals, indicating that BD may contribute to premature brain aging (Lewandowski et al., 2014).
- Neuroimaging Findings: Neuroimaging studies have shown that individuals with BD exhibit structural brain changes, such as reduced grey matter volume, white matter hyperintensities, and hippocampal atrophy, which are often observed in aging populations. These findings support the hypothesis that BD is linked to accelerated brain aging (Kaufman et al. 2019).
Brain age models are simply machine learning models developed from training models on neuroimaging data like MRIs or PET scans to predict an individual’s chronological age. Such models have been applied to various neuropsychiatric disorders (Baecker et al., 2021), and the difference between the predicted age and the chronological age is called Brain-Predicted Age Difference (Brain-PAD). Evidence from previous studies also suggest that brain-PAD represents traits that are genetically influenced, and that genetic variants associated with brain-PAD in HCs overlap partially with those associated observed in Alzheimer’s disease (AD), autism spectrum disorder (ASD), attention-deficit/hyperactivity disorder (ADHD), MDD, SZ, and BD (Kaufman et al. 2019).
- Multivariate Measure of Aging: The predicted age and the resulting brain-PAD is considered a multivariate measure because it is derived from multiple neuroimaging features, such as regional subcortical and lateral ventricle volumes, cortical thickness, and surface area. This allows researchers to measure the impact of BD in a more holistic manner on cognitive functioning.
- Investigating Clinical Relationships: Predicted age or brain-PAD can also be correlated with various clinical and demographic factors, such as illness duration, severity, subtype of BD, and lifestyle factors. This helps in understanding the impact of BD on brain health and aging.
In this study, we’re primarily interested in two questions, and the focus of this article is on the second question:
- How does brain aging, as estimated by a machine learning model with this metric called Brain Predicted Age Difference (Brain-PAD), differ in people with bipolar disorder (BD) compared to healthy individuals?
- Among people with BD, how does the use of different medications (antiepileptics, second-generation antipsychotics, lithium) relate to Brain-PAD? Specifically, is there evidence that antiepileptics and second-generation antipsychotics are associated with more advanced brain aging, and is lithium use associated with less advanced brain aging?
Impact of medications on Brain-Predicted Age Difference
Among people with BD, how does the use of different medications (antiepileptics, second-generation antipsychotics, lithium) relate to Brain-PAD? Specifically, is there evidence that antiepileptics and second-generation antipsychotics are associated with more advanced brain aging, and is lithium use associated with less advanced brain aging?
Prior work has suggested lithium is associated with better brain integrity measures from neuroimaging data (Abé et al., 2022). Second generation anti psychotics (SGAs) are often combined with mood stabilizers to manage BD mania and depression. However, previous studies focused on changes in cortical thickness, volume, and area have shown mixed results. On the other hand, there is quite a bit of evidence about the negative impact of anti-epileptics (AEDs) on the brain. While the negative side effects of these medications are known, their use is justified by their effectiveness in managing psychiatric symptoms and improving patients’ immediate quality of life.
In the original dataset, the variables were coded as binary variables, where SGA, AED, FGA, lithium and anti-depressants were all binary variables. However, we decided to recode the medication variables to include 8 levels representing each combination of SGA, AED and lithium usage. FGA and antidepressants were excluded from the new medication variable because only 5% of the BD sample size (N = ~ 1500) were on FGA, while the other medications had at least a proportion ~30% of the sample size, and anti-depressant use was found to be stable in recent prescription practices. Antipsychotics, anti-epileptics and lithium are common involved in polypharmacy for BD.
This results in such a model, where we control the covariates chronological age and sex. We include age to account for a well-known regression to the mean effect where younger individuals are predicted to have older ages and vice versa. Sex is included due to known developmental differences.
\[ \text{BrainPAD} \sim \text{Sex} + \text{Age} + \text{Mood Stabilizer (8 levels)} + S_j + \varepsilon_{ij} \]
By doing so we:
- Captures Interactions Implicitly: A single factor with multiple levels directly captures the interactions between different medications for each of the possible seven combinations among the three,
- Simplify the model: By using one factor, you avoid the need to specify and test multiple interaction terms. Otherwise, interpreting interaction terms would require us to consider how the effect of one medication changes with the presence or absence of another medication. This can be less intuitive and harder to communicate than comparing specific combinations of medications directly.
- Avoid multicollinearity: Given that AED and SGA are commonly prescribed together, including them as separate variables in the model can lead to multicollinearity, which might inflate standard errors of the estimates and make it difficult to determine the individual effects of each medication.
However, the downside of this is we lose statistical power and are unable to include all observations where a given medication is present. For instance, we could not include FGA in our analyses since it already has a very sample size by itself, splitting them into even smaller groups in combinations with other medications will make its polypharmacy groups even smaller.
Additionally, we are unable to estimate main effect of each of the medications independently of others. When we include a medication as a binary variable, it provides an average effect size of a given medication, say lithium, across the entire sample, regardless of other medication combinations. Our decision to model the medications as a single factor enables us to compare each combination as a separate category and provides an effect size for each combination.
\[ \text{BrainPAD} \sim \text{Sex} + \text{Age} + \text{Lithium} + \text{AED}+ \text{SGA}+ S_j + \varepsilon_{ij} \]
Table 1. Summary of model with mood stabilizers coded as binary variables
Â
Table 2. Summary of model with different combinations of mood stabilizers coded as levels of a factor
Â
Examining the two models above, we find that:
- The significant predictors in both models include age, BDI, and site.
- Lithium in Model 1 and in Model 2 are significant with negative estimates.
- AED in Model 1 and Model 2 are significant with positive estimates.
- There is no significant interaction between AED and SGA in Model 2, but when there are combined, there is a significant effect on Brain-PAD.
Summary
In conclusion, coding the medication variable with different combinations as levels (Model 2) provides a more intuitive understanding of specific group effects compared to using interaction terms (Model 1). The interaction terms primarily assess how the effect of one variable changes in the presence of another, which might not always capture significant combined effects. For example, the combination of AED and SGA is significant when treated as a distinct level (Model 2), indicating an average effect of this combination on brain-PAD.
However, when the medications are coded as binary variables (Model 1), the interaction term between AED and SGA is not significant, suggesting that there is no meaningful change in the effect of SGA when AED is also present. This highlights the importance of choosing the appropriate model structure to capture and interpret the effects of medication combinations on brain-PAD.
The key takeaway here is that having a clear research question we wish to answer helps us with arriving at the appropriate model structure for interpreting how medication can affect brain aging as represented by the brain-PAD. Clear and accurate interpretation of how medication combinations affect brain-PAD is essential for making informed clinical decisions. Understanding whether a specific combination has a significant effect can guide further study using clinical trials to guide future clinical practices, potentially leading to better patient outcomes. While we did not explore the effects of medication on subgroups of individuals with BD, understanding how combined effects of medications on particular groups of individuals (e.g., BD I vs. others) can be also help clinicians move towards a more personalized and precise treatment plan especially when managing multiple medications.
References
Baecker, L., Dafflon, J., da Costa, P. F., Garcia-Dias, R., Vieira, S., Scarpazza, C., Calhoun, V. D., Sato, J. R., Mechelli, A., & Pinaya, W. H. L. (2021). Brain age prediction: A comparison between machine learning models using region- and voxel-based morphometric data. Human Brain Mapping, 42(8), 2332–2346.
Kaufmann, T., van der Meer, D., Doan, N. T., Schwarz, E., Lund, M. J., Agartz, I., Alnæs, D., Barch, D. M., Baur-Streubel, R., Bertolino, A., Bettella, F., Beyer, M. K., Bøen, E., Borgwardt, S., Brandt, C. L., Buitelaar, J., Celius, E. G., Cervenka, S., Conzelmann, A., … Westlye, L. T. (2019). Common brain disorders are associated with heritable patterns of apparent aging of the brain. Nature Neuroscience, 22(10), Article 10.
Lewandowski, K. E., Sperry, S. H., Malloy, M. C., & Forester, B. P. (2014). Age as a Predictor of Cognitive Decline in Bipolar Disorder. The American Journal of Geriatric Psychiatry : Official Journal of the American Association for Geriatric Psychiatry, 22(12), 1462–1468.
Wrigglesworth, J., Ward, P., Harding, I. H., Nilaweera, D., Wu, Z., Woods, R. L., & Ryan, J. (2021). Factors associated with brain ageing—A systematic review. BMC Neurology, 21(1), 312.