Introduction
In this simulation, we will investigate the distribution of p-values : both when the null hypothesis is true. The idea is simply to simulate a sample size of \(n\) from normal distributions of standard deviation 1 that get progressively shifted as we change the mean (feel free to modify the simulation parameters and rerun the simulations yourself by downloading the free R Code Notebook here! Then, we test this mean against \(H_0 : \mu_0 = 0\) using a t-test.
Simulating P-values Distribution
#number of simulations
n_experiment <- 25000
#simulation parameters
n <- 20
sd <- 1
means <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 1, 1.25, 1.5)
We then run the simulation and plot the results. The code below will generate both histograms and empirical cumulative distribution function (ecdf) plots to visualize the distribution of the p-values.
Computer Simulation Results
Notice how when the null hypothesis is true, the p-value is uniformly distributed between 0 and 1. This might be surprising at first thought, but it makes a lot of sense. If the null hypothesis is true, what is the probability that \(p<0.05\)? Well, by definition it’s 0.05, since there is absolutely no effect under the null hypothesis : we would declare significance erroneously around 5 times out of 100, or 0.05. This is true for any value, the probability that \(p<p_0\) for any \(p_0\) is exactly \(p_0\) : this is the definition of a uniform distribution.
Cool, huh?
Then, when the mean moves, the p-value distribution shifts to the left, and the probability of getting a significant result increases. This is because the t-test is more likely to reject the null hypothesis when the true mean is different from 0.
See the results below. To see the simulation code, Download the Free R Code Notebook here!
Conclusion
Under the null hypothesis, by definition the p-value has a uniform distribution. As we move away from the null hypothesis, the p-value skews towards smaller and smaller values. This is obviously desirable : we want the p-value to help us detect effects when they are present, in spite of sampling uncertainty.
If you want to learn more about p-values and statistical inference, how to run simulations like this, and how to do statistics the right way, consider joining my Introduction to Biostatistics : Learn Statistics the Right Way! Course where I teach you everything you need to know about statistics and data analysis.