8  Statistical testing

Last modified on 11. December 2025 at 20:23:05

“A quote.” — Dan Meyer

8.1 General background

“Statistik ist: Wenn der Jäger am Hasen einmal links und einmal rechts vorbeischießt, dann ist der Hase im Durchschnitt tot.” — Mike Krüger, German comedian

Statistics is: When the hunter misses the rabbit once to the left and once to the right, on average the rabbit is dead.

Statistics means: If the hunter misses the rabbit once to the left and once to the right, then on average, the rabbit’s a goner.

Figure 8.1: foo
Figure 8.2: foo
Figure 8.3: foo
Figure 8.4: foo
Figure 8.5: foo
Figure 8.6: foo

8.2 Theoretical background

8.2.1 Fisher’s approach: The ‘significance test’

Fisher saw statistics as a tool for inductive reasoning (learning from data for science).

  • Only ONE hypothesis: There is only the null hypothesis (\(H_0\) e.g., ‘no effect’). An alternative does not formally exist.
  • The measure (p-value): The p-value is a continuous measure of the strength of evidence against the null hypothesis \(H_0\).
    • \(p = 0.01\) Strong evidence against the null hypothesis.
    • \(p = 0.20\) No evidence against the null hypothesis.
  • The result: One rejects the null hypothesis \(H_0\) or one does not make a judgement. One never ‘accepts’ the null hypothesis (one simply has not found enough evidence to reject it).
  • Objective: Gain knowledge through individual experiments.

8.2.2 Neyman-Pearson’s approach: The ‘hypothesis test’

Neyman and Pearson sharply criticised Fisher. They said, ‘You can’t reject anything if you don’t know what to accept instead.’ They saw statistics as a decision-making process (behaviourism).

  • TWO hypotheses: There is the null hypothesis (\(H_0\)) AND a specific alternative hypothesis (\(H_A\)).
  • Type 1 and 2 errors: Before the experiment begins, the following is determined:
    • \(\alpha\) (alpha): How often am I allowed to incorrectly find an effect? (e.g. 5%)
    • \(\beta\) (Beta): How often am I allowed to mistakenly overlook a real effect? (Power/test strength).
  • The result: A tough decision. ‘Accept \(H_0\)’ or ‘Reject \(H_0\)’ (or Accept \(H_A\)).
  • Goal: Minimisation of losses over many repeated experiments (as in industrial production).

Neymans philosophy: We are not looking for the ‘truth’ in individual cases, but rather we behave in such a way that we are wrong as rarely as possible in 1000 decisions.

8.2.3 Today’s ‘hybrid chaos’

Modern textbooks and software (such as SPSS or R) often use a hybrid that historically makes no sense:

  • We define \(\alpha = 5\%\) (Neyman-Pearson).
  • We calculate an exact p-value (Fisher).
  • We report the p-value as evidence (Fisher), but use it for a hard yes/no decision (Neyman-Pearson).
  • We talk about ‘power’ (Neyman-Pearson), but often only test against a non-specific alternative.

This mishmash often leads to misunderstandings, such as that a \(p = 0.001\) indicates a ‘stronger effect’ than \(p = 0.049\) (Fisher thinking), even though in Neyman-Pearson logic at \(\alpha = 5\%\) , one would have to make exactly the same decision in both cases (‘Reject \(H_0\)’).

8.3 R packages used

8.4 Data

8.5 Alternatives

Further tutorials and R packages on XXX

8.6 Glossary

term

what does it mean.

8.7 The meaning of “Models of Reality” in this chapter.

  • itemize with max. 5-6 words

8.8 Summary

References