The \(p\) value is often used in applications to determine if some improvement is significant or if it was just random chance. Loosely speaking, it defines how likely it is (under the assumption of an hypothesis \(H_0\)) to get more extreme results.
This is interesting for studies in medicine and similar scenarios. You have a drug and a placebo. You want to figure out if the drug is better than the placebo.
Hypothesis testing
In statistics, hypothesis testing works as follows:
- Define a statistical model: This means you have a sample \(\mathfrak{X}\) and an assumption about the distribution of the data. For example, \(\mathfrak{X} = \mathbb{R}^n\) where \(n \in \mathbb{N}\) is your sample size and \(X_1, \dots, X_n \stackrel{iid}{\sim} \mathcal{N}(\mu, \sigma^2)\)
- Define a hypothesis and an alternative: For example \(H_0: \mu = 100\) and \(H_1: \mu > 100\) might be hypotheses if you want to check if a drug increases the IQ.
- Controll errors: You can make two errors. Either \(H_0\) is true and you reject it or \(H_1\) is true, but you don't reject \(H_0\). In statistics, the first error is usually controlled. So the test is made in such a way that the first error is less than some \(\alpha \in (0, 1)\). Usually, \(\alpha = 0.05\) or \(\alpha = 0.01\) or even lower.
- Test statistic: You have a value which indicates something about the parameters in the hypotheses. For example \(T(X_1, \dots, X_n) = \frac{1}{n}\sum_{i=1} X_i\).
- Test statistic distribution: \(T\) itself is a random variable and you can calculate its distribution (e.g. \(T \stackrel{H_0}{\sim} \mathcal{N}(\mu, \frac{\sigma^2}{n})\)).
- Calculate test decision: Hence you can calculate a \(c \in \mathbb{R}\) such that \(P_{H_0}(H_0 \text{ is rejected}) = P_{H_0}(T \leq c) \leq \alpha\). It does not have to be \(T < c\), but often it is.
The p value
Now note that you could also make it the other way round. You could calculate
If \(p^* \leq \alpha\), then \(H_0\) can be rejected on Niveau \(\alpha\).
Interesting statements
I just came across the following statments which I think are interesting enough to share them. All of them are wrong.
The follwing was takine from a German statistics exam by Dr. Klar (KIT, WS 2013/2014).
Assume for the following, that an experiment resulted in a \(p\) value of \(0.01\).
- \(H_0\) is certainly false.
- \(H_0\) is with probability \(0.01\) false.
- \(H_1\) is certainly correct.
- You can calculate the probability that \(H_1\) is correct with the \(p\) value.
- If one decides to reject \(H_0\), then the \(p\) value is the probability of making the wrong decision.
- The experimental result is reliable, meaning that if the experiment is repeated often one would get a significant result in 99% of the cases.
I'm not too sure if (5) is really wrong.