The Biostatistics Roadmap or How to Become a Great Biostatistician

Picture of Justin Belair

Justin Belair

Biostatistician in Science & Tech | Consultant | Causal Inference Specialist | Founder & Editor @ biostatistics.ca | Click on author's name to sign up for his FREE monthly newsletter

Table of Contents

I’m often asked by budding biostatisticians curious and eager to learn more to advise them on a developmental roadmap.

Here is what I think you need to become a great applied biostatistician, which in itself is more about the journey than the destination.

Here is a list of topics including books, loosely ordered by conceptual difficulty.

All book links are Amazon affiliate links and help support biostatistics.ca. Thank you!

Foundations of Statistics

It is important to learn the foundational concepts of statistics and probability theory. I’ve also listed some mathematics, since a lot of advanced concepts rely on a solid understanding of linear algebra and calculus.

Mathematical Foundations

These will allow to develop the necessary mathematical theory to understand more advanced statistics

  • Intro to linear algebra
  • Intro to calculus.

Advanced Statistics

Once your learned the foundational tools, here are some advanced topics to master to become proficient as a statistician.

  • Longitudinal data
  • Experimental Design
  • Randomized Controlled Trials
  • Resampling methods (permutation tests, bootstrap, etc.)
  • Missing Data and Data Imputation
  • Sampling Theory
  • Advanced Experimental Design (complex randomization schemes)
  • Survival Analysis
  • Bayesian Statistics
  • Mathematical Statistics and Statistical Inference

This book covers hundreds of statistical tests in depth. It assumes some statistical maturity and goes into detail about each test covered. A great resource.

Modelling

  • Linear Regression
  • Categorical Data (Logistic Regression, Multinomial Regression, Ordinal Regression)
    • A masterpiece on logistic regression. Today, there are many misconceptions around logistic regression as it is often used to build classification procedures. It is less well-known that logistic regression can be used to do statistical inference with binary data, the same way any other Generalized Linear Model (GLM) can be used. Applied Logistic Regression by by David W. Hosmer, Stanley Lemeshow
  • Generalized Linear Models (GLM, Linear, Binomial, Poisson, Negative Binomial)
  • Linear Mixed Models (LMM)
  • Generalized Additive Models (GAM), Smoothing and non-parametric regression
  • Advanced modelling (Nonlinear mixed models, Generalized Estimating Equations. etc.)
  • Multivariate Data Methods (PCA, LDA, Clustering, etc.)

This book is hands down the best one to learn how to apply regression models in scientific settings, where the statistical properties of the estimators and the inferences matter. Most books on regression modelling focus on building predictive systems, and can sometimes lead researchers astray. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis by Frank E. Harrell , Jr.

Causal Inference

Causal inference is rapidly growing to be an indispensable part of the statisticians toolkit. It is a very large and complex field, and multiple specialized subfields are emerging. Here is an overview of important topics to master

  • Potential Outcomes Model
  • DAGs
  • Propensity Scores
  • Instrumental Variables
  • Mediation and Interaction
  • SEMs and Path Analysis
  • Differences-in-differences
  • Regression Discontinuity Design
  • Time-Varying Confounding
  • Emulating a Target Trial
  • Targeted Learning and Causal ML

Here is a list of textbooks that cover a wide range of causal inference concepts.

Related Disciplines (Epidemiology, Bioinformatics, Psychology, Coding, Data Science)

BONUS : For those who wish to become independent consultants

  • Intro to statistical consulting (real-world consulting projects, either as consultant or even an intern, preferably with R)
  • Basic business principles (contracts, marketing and branding, accounting, digital presence, structuring and closing deals)

Conclusion

How much time and focus to devote to each subject depends on your personal idiosyncrasies, experiences, and other factors.

This path is highly nonlinear by design and should keep you busy for a good 5 years of learning, if not 10.

Don’t hesitate to reach out to the author if you have any questions.

You can also subscribe to his FREE monthly newsletter Causal Inference in (Bio)statistics.

Scroll to Top