Letzte Änderung am 22. August 2025 um 08:47:22

Stand des Kapitels: Konstruktion (seit 07.2025)

Dieses Kapitel wird in den nächsten Wochen geschrieben. Ich plane zum Beginn des WiSe 2025/26 eine neue Version des Kapitels erstellt zu haben. Während das Kapitel entsteht, funktioniert so manches dann nicht so wie es soll.

Abbildung 84.1— foo. [Zum Vergrößern anklicken]
Abbildung 84.2— foo. [Zum Vergrößern anklicken]

“Aktuell wird gerade an den Marginal effects models gebaut… das wird jedenfalls so seine Zeit noch brauchen, bis ich hier einen Strang und Grund drin habe.” — Jochen Kruppa-Scheetz, meiner bescheidener Meinung nach.

Abbildung 84.3— “foo” Quelle: https://easystats.github.io/modelbased
  1. Draw what you want to visualize
  2. Make models for it
  3. Select the best model
  4. Visualize the best model
  5. Investigate its parameters

84.1 Genutzte R Pakete

Wir wollen folgende R Pakete in diesem Kapitel nutzen.

R Code [zeigen / verbergen]
pacman::p_load(tidyverse, gtsummary, marginaleffects, emmeans, scales,
               ggpmisc, readxl, conflicted)
conflicts_prefer(dplyr::mutate)
conflicts_prefer(dplyr::summarize)
conflicts_prefer(dplyr::filter)
conflicts_prefer(ggplot2::annotate)
cb_pal <- c("#000000", "#E69F00", "#56B4E9", 
            "#009E73", "#F0E442", "#F5C710", 
            "#0072B2", "#D55E00", "#CC79A7")
## 
nice_number <- label_number(style_negative = "minus", accuracy = 0.01)
nice_p <- label_pvalue(prefix = c("p < ", "p = ", "p > "))
find_intercept <- function(x1, y1, slope) {
  intercept <- slope * (-x1) + y1
  return(intercept)
}

An der Seite des Kapitels findest du den Link Quellcode anzeigen, über den du Zugang zum gesamten R-Code dieses Kapitels erhältst.

84.2 Daten

R Code [zeigen / verbergen]
flea_model_tbl <- read_excel("data/fleas_model_data.xlsx") |> 
  mutate(feeding = as_factor(feeding),
         stage = as_factor(stage),
         bonitur = as.numeric(bonitur),
         infected = factor(infected, labels = c("healthy", "infected"))) |> 
  select(feeding, stage, jump_length, weight, hatched, count_leg,  bonitur, infected)

84.3 Hypothesen

\[ Z=\frac{h(\hat{\theta})-H_0}{\sqrt{\hat{V}[h(\hat{\theta})]}} \]

R Code [zeigen / verbergen]
feeding_fit = lm(jump_length ~ 0 + feeding, data = flea_model_tbl)
coef(feeding_fit)
feedingsugar_water       feedingblood     feedingketchup 
          70.00562          107.87063          101.69437 
R Code [zeigen / verbergen]
summary(feeding_fit) |> coef()
                    Estimate Std. Error   t value     Pr(>|t|)
feedingsugar_water  70.00562   7.516627  9.313436 4.598588e-12
feedingblood       107.87063   7.516627 14.350934 2.091381e-18
feedingketchup     101.69437   7.516627 13.529256 1.821420e-17

\[ Z = \cfrac{\hat{\beta_1}-H_0}{\sqrt{\widehat{V}[\beta_1]}} = \cfrac{75.43938 - 0}{4.791469} = 15.74 \]

R Code [zeigen / verbergen]
2*pt(15.74452, df = 45, lower.tail = FALSE)
[1] 6.366696e-20
R Code [zeigen / verbergen]
feeding_fit <-  lm(jump_length ~ feeding * stage, data = flea_model_tbl)
coef(feeding_fit)
                 (Intercept)                 feedingblood 
                    78.97750                     20.73500 
              feedingketchup                stagejuvenile 
                    13.17125                    -17.94375 
  feedingblood:stagejuvenile feedingketchup:stagejuvenile 
                    34.26000                     37.03500 

84.4 Prädiktion

R Code [zeigen / verbergen]
simple_tbl <- flea_model_tbl |> 
  filter(stage == "adult")
simple_fit <- lm(jump_length ~ feeding, simple_tbl)
coef(simple_fit)
   (Intercept)   feedingblood feedingketchup 
      78.97750       20.73500       13.17125 
R Code [zeigen / verbergen]
predictions(simple_fit)

 Estimate Std. Error     z Pr(>|z|)     S 2.5 % 97.5 %
     79.0       8.45  9.35   <0.001  66.6  62.4   95.5
     79.0       8.45  9.35   <0.001  66.6  62.4   95.5
     79.0       8.45  9.35   <0.001  66.6  62.4   95.5
     79.0       8.45  9.35   <0.001  66.6  62.4   95.5
     79.0       8.45  9.35   <0.001  66.6  62.4   95.5
     79.0       8.45  9.35   <0.001  66.6  62.4   95.5
     79.0       8.45  9.35   <0.001  66.6  62.4   95.5
     79.0       8.45  9.35   <0.001  66.6  62.4   95.5
     99.7       8.45 11.81   <0.001 104.4  83.2  116.3
     99.7       8.45 11.81   <0.001 104.4  83.2  116.3
     99.7       8.45 11.81   <0.001 104.4  83.2  116.3
     99.7       8.45 11.81   <0.001 104.4  83.2  116.3
     99.7       8.45 11.81   <0.001 104.4  83.2  116.3
     99.7       8.45 11.81   <0.001 104.4  83.2  116.3
     99.7       8.45 11.81   <0.001 104.4  83.2  116.3
     99.7       8.45 11.81   <0.001 104.4  83.2  116.3
     92.1       8.45 10.91   <0.001  89.7  75.6  108.7
     92.1       8.45 10.91   <0.001  89.7  75.6  108.7
     92.1       8.45 10.91   <0.001  89.7  75.6  108.7
     92.1       8.45 10.91   <0.001  89.7  75.6  108.7
     92.1       8.45 10.91   <0.001  89.7  75.6  108.7
     92.1       8.45 10.91   <0.001  89.7  75.6  108.7
     92.1       8.45 10.91   <0.001  89.7  75.6  108.7
     92.1       8.45 10.91   <0.001  89.7  75.6  108.7

Type: response
R Code [zeigen / verbergen]
avg_predictions(simple_fit)

 Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
     90.3       4.88 18.5   <0.001 251.8  80.7   99.8

Type: response
R Code [zeigen / verbergen]
plot_predictions(simple_fit, by = "feeding")

R Code [zeigen / verbergen]
simple_2_tbl <- flea_model_tbl |> 
  filter(stage == "adult")
simple_2_fit <- lm(jump_length ~ weight*feeding, simple_2_tbl)
coef(simple_2_fit)
          (Intercept)                weight          feedingblood 
          43.21968178            2.11178610           29.67377681 
       feedingketchup   weight:feedingblood weight:feedingketchup 
          43.24721935           -0.03097599           -1.65129790 
R Code [zeigen / verbergen]
simple_2_fit |> broom::augment()
predictions(simple_2_fit)
avg_predictions(simple_2_fit, by = "feeding")

tibble(weight = simple_2_tbl$weight,
       jump_length = simple_2_tbl$jump_length, 
       feeding = simple_2_tbl$feeding,
       estimate = predictions(simple_2_fit)$estimate) |> 
  ggplot(aes(weight, estimate, shape = feeding)) +
  geom_point(aes(weight, jump_length), color = "blue") +
  geom_point(color = "red") +
  geom_line(aes(y = predict(simple_2_fit))) +
  geom_hline(yintercept = avg_predictions(simple_2_fit, by = "feeding")$estimate)
R Code [zeigen / verbergen]
plot_predictions(simple_2_fit, by = c("weight", "feeding"))

84.5 Counterfactual

R Code [zeigen / verbergen]
avg_comparisons(feeding_fit,
    by = "stage",
    variables = list("feeding" = "pairwise"),
    vcov = "HC3")

              Contrast    stage Estimate Std. Error      z Pr(>|z|)    S  2.5 %
 blood - sugar_water   adult       20.73       11.6  1.794   0.0728  3.8  -1.92
 ketchup - blood       adult       -7.56       15.0 -0.503   0.6148  0.7 -37.02
 ketchup - sugar_water adult       13.17       11.4  1.157   0.2472  2.0  -9.14
 blood - sugar_water   juvenile    54.99       17.7  3.106   0.0019  9.0  20.29
 ketchup - blood       juvenile    -4.79       22.3 -0.215   0.8301  0.3 -48.52
 ketchup - sugar_water juvenile    50.21       14.3  3.512   <0.001 11.1  22.19
 97.5 %
   43.4
   21.9
   35.5
   89.7
   38.9
   78.2

Term: feeding
Type: response

84.6 Weitere R Pakete

Das R Paket {modelbased}

84.7 Marginal effects models

84.7.1 Kontrafaktische Vergleiche

Kontrafaktische Vergleiche (eng. counterfactual)

Abbildung 84.4— foo. Modifiziert nach Heiss (2022)
R Code [zeigen / verbergen]
model_grp_sq <- lm(jump_length ~ weight*I(weight^2)*feeding + count_leg,
                   data = flea_model_tbl)
tidy(model_grp_sq)
# A tibble: 13 × 5
   term                                 estimate std.error statistic  p.value
   <chr>                                   <dbl>     <dbl>     <dbl>    <dbl>
 1 (Intercept)                        49.8         36.5      1.37    1.80e- 1
 2 weight                              1.11        11.7      0.0946  9.25e- 1
 3 I(weight^2)                         0.0615       0.950    0.0647  9.49e- 1
 4 feedingblood                      -13.1         92.9     -0.141   8.89e- 1
 5 feedingketchup                     -7.07        68.1     -0.104   9.18e- 1
 6 count_leg                           0.0973       0.0113   8.59    3.84e-10
 7 weight:I(weight^2)                 -0.00305      0.0223  -0.137   8.92e- 1
 8 weight:feedingblood                 7.02        27.8      0.253   8.02e- 1
 9 weight:feedingketchup               0.517       20.7      0.0250  9.80e- 1
10 I(weight^2):feedingblood           -0.562        2.30    -0.244   8.09e- 1
11 I(weight^2):feedingketchup          0.0565       1.80     0.0313  9.75e- 1
12 weight:I(weight^2):feedingblood     0.0142       0.0536   0.266   7.92e- 1
13 weight:I(weight^2):feedingketchup   0.0000836    0.0468   0.00179 9.99e- 1
R Code [zeigen / verbergen]
cfct_data <- datagrid(model = model_grp_sq,
                      weight = c(5, 15),
                      grid_type = "counterfactual")
R Code [zeigen / verbergen]
nrow(flea_model_tbl)
[1] 48
R Code [zeigen / verbergen]
nrow(cfct_data)
[1] 96
R Code [zeigen / verbergen]
mfx_cfct <- model_grp_sq |> 
  slopes(datagrid(weight = c(5, 15),
                  grid_type = "counterfactual"),
         variables = "weight")

rbind(head(mfx_cfct), tail(mfx_cfct))

 Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
     1.49       4.01 0.372    0.710 0.5 -6.37   9.36
     1.49       4.01 0.372    0.710 0.5 -6.37   9.36
     1.49       4.02 0.372    0.710 0.5 -6.38   9.36
     1.49       4.02 0.372    0.710 0.5 -6.38   9.36
     1.49       4.02 0.372    0.710 0.5 -6.38   9.36
     1.49       4.01 0.372    0.710 0.5 -6.37   9.36
     3.16       2.20 1.433    0.152 2.7 -1.16   7.48
     3.16       2.21 1.426    0.154 2.7 -1.18   7.50
     3.16       2.21 1.426    0.154 2.7 -1.18   7.50
     3.16       2.21 1.426    0.154 2.7 -1.18   7.50
     3.16       2.20 1.433    0.152 2.7 -1.16   7.48
     3.16       2.20 1.433    0.152 2.7 -1.16   7.48

Term: weight
Type: response
Comparison: dY/dX
R Code [zeigen / verbergen]
mfx_cfct |> 
  group_by(weight) |> 
  summarize(avg_slope = mean(estimate))
# A tibble: 2 × 2
  weight avg_slope
   <dbl>     <dbl>
1      5      2.68
2     15      1.57
R Code [zeigen / verbergen]
model_grp_sq |> 
  avg_slopes(newdata = datagrid(weight = c(5, 20),
                                grid_type = "counterfactual"),
             variables = "weight",
             by = c("weight", "feeding"))

 weight     feeding Estimate Std. Error       z Pr(>|z|)   S  2.5 % 97.5 %
      5 sugar_water   1.4930       4.01  0.3721    0.710 0.5  -6.37   9.36
      5 blood         3.9624       8.00  0.4955    0.620 0.7 -11.71  19.64
      5 ketchup       2.5814       5.16  0.4999    0.617 0.7  -7.54  12.70
     20 sugar_water  -0.0983       2.13 -0.0462    0.963 0.1  -4.27   4.07
     20 blood         1.5321       1.68  0.9140    0.361 1.5  -1.75   4.82
     20 ketchup       2.7790       5.72  0.4856    0.627 0.7  -8.44  14.00

Term: weight
Type: response
Comparison: dY/dX

84.8 Analyse von Zeitreihen

Hier nochmal {mgcv} und {gamm4}

Introduction to Generalized Additive Mixed Models

s() und Interaktion mit s(x_1, by = f_1)

Referenzen

Heiss, A. (2022, Mai 20). Marginalia: A Guide to Figuring Out What the Heck Marginal Effects, Marginal Slopes, Average Marginal Effects, Marginal Effects at the Mean, and All These Other Marginal Things Are. https://doi.org/10.59350/40xaj-4e562