Simulating The Distribution of P-Values (With Downloadable R Code Notebook)

Justin Belair

Biostatistician in Science & Tech | Consultant | Causal Inference Specialist

p.value_distribution_simulations.knit

.owl-carousel .owl-video-play-icon{--wpr-bg-80088a3d-09ec-42b5-b52c-99f6346b985e: url('https://www.biostatistics.ca/wp-content/plugins/themesflat-addons-for-elementor/assets/css/owl.video.play.png');}.rll-youtube-player .play{--wpr-bg-0a94f17a-5ff0-490d-9bdb-95981e7e8c53: url('https://www.biostatistics.ca/wp-content/plugins/wp-rocket/assets/img/youtube.png');}

Introduction

In this simulation, we will investigate the distribution of p-values : both when the null hypothesis is true. The idea is simply to simulate a sample size of \(n\) from normal distributions of standard deviation 1 that get progressively shifted as we change the mean (feel free to modify the simulation parameters and rerun the simulations yourself by downloading the free R Code Notebook here! Then, we test this mean against \(H_0 : \mu_0 = 0\) using a t-test.

Simulating P-values Distribution

#number of simulations
n_experiment <- 25000

#simulation parameters
n <- 20
sd <- 1
means <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 1, 1.25, 1.5)

We then run the simulation and plot the results. The code below will generate both histograms and empirical cumulative distribution function (ecdf) plots to visualize the distribution of the p-values.

Computer Simulation Results

Notice how when the null hypothesis is true, the p-value is uniformly distributed between 0 and 1. This might be surprising at first thought, but it makes a lot of sense. If the null hypothesis is true, what is the probability that \(p<0.05\)? Well, by definition it’s 0.05, since there is absolutely no effect under the null hypothesis : we would declare significance erroneously around 5 times out of 100, or 0.05. This is true for any value, the probability that \(p<p_0\) for any \(p_0\) is exactly \(p_0\) : this is the definition of a uniform distribution.

Cool, huh?

Then, when the mean moves, the p-value distribution shifts to the left, and the probability of getting a significant result increases. This is because the t-test is more likely to reject the null hypothesis when the true mean is different from 0.

See the results below. To see the simulation code, Download the Free R Code Notebook here!

Conclusion

Under the null hypothesis, by definition the p-value has a uniform distribution. As we move away from the null hypothesis, the p-value skews towards smaller and smaller values. This is obviously desirable : we want the p-value to help us detect effects when they are present, in spite of sampling uncertainty.

If you want to learn more about p-values and statistical inference, how to run simulations like this, and how to do statistics the right way, consider joining my Introduction to Biostatistics : Learn Statistics the Right Way! Course where I teach you everything you need to know about statistics and data analysis.

Simulating The Distribution of P-Values (With Downloadable R Code Notebook)

Justin Belair

Table of Contents

Introduction

Simulating P-values Distribution

Computer Simulation Results

Conclusion

Recent Posts

The Best Epidemiology Books | The Full List

Common DAG Structures–Confounding, Collider Bias, and Mediation

Public Health Books

Tags

EXPLORE

Simulating The Distribution of P-Values (With Downloadable R Code Notebook)

Justin Belair

Table of Contents

Introduction

Simulating P-values Distribution

Computer Simulation Results

Conclusion

Recent Posts

The Best Epidemiology Books | The Full List

Common DAG Structures–Confounding, Collider Bias, and Mediation

Best Mental Health (Electronic Medical Record) Emr’s: The Optimal Unbiased List

Public Health Books

Tags

EXPLORE