Letzte Änderung am 05. June 2025 um 09:39:26

Transformation

Transformations

Quadratwurzel moderat schiefe Messwerte:

  • sqrt(y) für positiv, schiefe Messwerte
  • sqrt(max(y+1) - y) für negative, schiefe Messwerte

Logarithmus für starke schiefe Messwerte:

  • log10(y) für positiv, schiefe Messwerte
  • log10(max(y+1) - y) für negative, schiefe Messwerte

Die Inverse für extrem, schiefe Messwerte:

  • 1/y für positiv, schiefe Messwerte
  • 1/(max(y+1) - y) für negative, schiefe Messwerte

Die Transformation mit Rängen rank()

Automatisierte Transformationen

R Code [zeigen / verbergen]
sqrt_tbl <- fac2_tbl |> 
  mutate(sqrt_hatch_time = sqrt(hatch_time))
R Code [zeigen / verbergen]
log_tbl <- fac2_tbl |> 
  mutate(log_hatch_time = log(hatch_time))
R Code [zeigen / verbergen]
inverse_tbl <- fac2_tbl |> 
  mutate(inverse_hatch_time = 1/hatch_time)

Ranked t-Test

Warum rechnen wir nicht einfach einen nicht-parametrischen Test?

“Es gibt Gründe die Nichtparametrik als den Ausweg für die Nichtnormalverteilung zu sehen. In gewissen Anwendungsfeldern mag das auch so stimmen. Wenn du wirklich nicht an einem Effektmaß interessiert bist und nur einen p-Wert brauchst, dann ist die Nichtparametrik eine Lösung. Heutzutage wollen wir aber auch immer wissen, ist der Effekt relevant? Und die Frage ohne ein Effektmaß zu beantworten sehe ich als unmöglich an.” — Jochen Kruppa-Scheetz, meiner bescheidener Meinung nach.

Das ganze kommt dann in das Kapitel statistisches Testen in R plus den entsprechenden Kapiteln

Beware the Friedman test!

Equivalent to Welch’s t-test in GLS framework

Common statistical tests are linear models

Are parametric tests on rank transformed data equivalent to non-parametric test on raw data?

Conover & Iman (1981) mit Rank Transformations as a Bridge Between Parametric and Nonparametric Statistics

R Code [zeigen / verbergen]
set.seed(20250345)
ranked_tbl <- tibble(grp = gl(3, 7, labels = c("cat", "dog", "fox")),
                     rsp_lognormal = c(round(rlnorm(7, 4, 1), 2),
                                       round(rlnorm(7, 4, 1), 2),
                                       round(rlnorm(7, 4, 1), 2)),
                     ranked_lognormal = rank(rsp_lognormal),
                     rsp_normal = c(round(rnorm(7, 4, 1), 2),
                                    round(rnorm(7, 5, 1), 2),
                                    round(rnorm(7, 7, 1), 2)),
                     ranked_normal = rank(rsp_normal)) 

rank_tbl <- tibble(id = 1:14, 
                   trt = gl(2, 7, label = c("A", "B")),
                   unranked = c(c(1.2, 2.1, 3.5, 4.1, 6.2, 6.5, 7.1), 
                               c(4.7, 6.3, 6.8, 7.3, 8.2, 9.1, 10.3)),
                   ranked = rank(unranked))
R Code [zeigen / verbergen]
t.test(ranked ~ trt, data = rank_tbl)

    Welch Two Sample t-test

data:  ranked by trt
t = -3.0077, df = 12, p-value = 0.01091
alternative hypothesis: true difference in means between group A and group B is not equal to 0
95 percent confidence interval:
 -9.114748 -1.456681
sample estimates:
mean in group A mean in group B 
       4.857143       10.142857 
R Code [zeigen / verbergen]
wilcox.test(unranked ~ trt, data = rank_tbl)

    Wilcoxon rank sum exact test

data:  unranked by trt
W = 6, p-value = 0.01748
alternative hypothesis: true location shift is not equal to 0
R Code [zeigen / verbergen]
ranked_tbl |> 
  filter(grp != "fox") |> 
  group_by(grp) |> 
  summarise(mean(rsp_normal), sd(rsp_normal), mean(ranked_normal), sd(ranked_normal)) |> 
  mutate_if(is.numeric, round, 2) |> 
  set_names(c("Gruppe", "$\\bar{y}_{normal}$", "$s_{normal}$", "$\\bar{y}_{ranked}$", "$s_{ranked}$")) |> 
  tt(width = 1, align = "c", theme = "striped")
tinytable_hxv62ccqw33rxtaiuv1c
Gruppe $\bar{y}_{normal}$ $s_{normal}$ $\bar{y}_{ranked}$ $s_{ranked}$
cat 3.44 1.02 4.86 3.44
dog 5.26 1.31 10.86 4.45
R Code [zeigen / verbergen]
t.test(ranked_normal ~ grp, data = filter(ranked_tbl, grp != "fox")) |> 
  tidy() |> 
  select(p.value)
# A tibble: 1 × 1
  p.value
    <dbl>
1  0.0162
R Code [zeigen / verbergen]
wilcox.test(rsp_normal ~ grp, data = filter(ranked_tbl, grp != "fox")) |> 
  tidy() |> 
  select(p.value)
# A tibble: 1 × 1
  p.value
    <dbl>
1  0.0175
R Code [zeigen / verbergen]
aov(ranked_normal ~ grp, data = ranked_tbl) |> 
  tidy() 
# A tibble: 2 × 6
  term         df sumsq meansq statistic    p.value
  <chr>     <dbl> <dbl>  <dbl>     <dbl>      <dbl>
1 grp           2  541.  270.       21.3  0.0000181
2 Residuals    18  229.   12.7      NA   NA        
R Code [zeigen / verbergen]
kruskal.test(ranked_normal ~ grp, data = ranked_tbl) |> 
  tidy() 
# A tibble: 1 × 4
  statistic  p.value parameter method                      
      <dbl>    <dbl>     <int> <chr>                       
1      14.1 0.000886         2 Kruskal-Wallis rank sum test
R Code [zeigen / verbergen]
signed_rank <- function(x) sign(x) * rank(abs(x))
R Code [zeigen / verbergen]
rank(c(3.6, 3.4, -5.0, 8.2))
[1] 3 2 1 4
R Code [zeigen / verbergen]
signed_rank(c(3.6, 3.4, -5.0, 8.2))
[1]  2  1 -3  4

Welch t-Test und GLS

R Code [zeigen / verbergen]
# the t-statistic not assuming equal variances
t.test(rsp_normal ~ grp, data = filter(ranked_tbl, grp != "fox"), var.equal = FALSE)

    Welch Two Sample t-test

data:  rsp_normal by grp
t = -2.9197, df = 11.318, p-value = 0.01357
alternative hypothesis: true difference in means between group cat and group dog is not equal to 0
95 percent confidence interval:
 -3.1998081 -0.4544776
sample estimates:
mean in group cat mean in group dog 
         3.435714          5.262857 
R Code [zeigen / verbergen]
library(nlme)
summary(gls(rsp_normal ~ grp, data = filter(ranked_tbl, grp != "fox"), 
            weights = varIdent(form = ~ 1 | grp)))
Generalized least squares fit by REML
  Model: rsp_normal ~ grp 
  Data: filter(ranked_tbl, grp != "fox") 
       AIC      BIC    logLik
  49.35749 51.29711 -20.67874

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | grp 
 Parameter estimates:
     cat      dog 
1.000000 1.284679 

Coefficients:
               Value Std.Error  t-value p-value
(Intercept) 3.435714 0.3843972 8.937927  0.0000
grpdog      1.827143 0.6258007 2.919688  0.0128

 Correlation: 
       (Intr)
grpdog -0.614

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-1.3645596 -0.6355371 -0.1011953  0.4646938  1.7581826 

Residual standard error: 1.017019 
Degrees of freedom: 14 total; 12 residual

Chi Quadrat Test und GLM

R Code [zeigen / verbergen]
conflicts_prefer(stats::chisq.test)
[conflicted] Will prefer stats::chisq.test over any other package.
R Code [zeigen / verbergen]
D <- data.frame(mood = c('happy', 'sad', 'meh'),
               counts = c(60, 90, 70))
chisq.test(D$counts)

    Chi-squared test for given probabilities

data:  D$counts
X-squared = 6.3636, df = 2, p-value = 0.04151

Score test oder Rao Statistik

R Code [zeigen / verbergen]
glm(counts ~ mood, data = D, family = poisson()) |> 
  anova(test = 'Rao')
Analysis of Deviance Table

Model: poisson, link: log

Response: counts

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev    Rao Pr(>Chi)  
NULL                     2     6.2697                  
mood  2   6.2697         0     0.0000 6.3636  0.04151 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Jetzt brauchen wir eine 2x2 Tabelle also zwei Spalten…

R Code [zeigen / verbergen]
D = data.frame(
  mood = c('happy', 'happy', 'meh', 'meh', 'sad', 'sad'),
  sex = c('male', 'female', 'male', 'female', 'male', 'female'),
  Freq = c(100, 70, 30, 32, 110, 120)
)

MASS::loglm(Freq ~ mood + sex, D)
Call:
MASS::loglm(formula = Freq ~ mood + sex, data = D)

Statistics:
                      X^2 df   P(> X^2)
Likelihood Ratio 5.119915  2 0.07730804
Pearson          5.099859  2 0.07808717
Conover, W. J., & Iman, R. L. (1981). Rank transformations as a bridge between parametric and nonparametric statistics. The American Statistician, 35(3), 124–129.