Letzte Änderung am 09. July 2025 um 11:10:44

Stand des Kapitels: Konstruktion (seit 07.2025)

Dieses Kapitel wird in den nächsten Wochen geschrieben. Ich plane zum Beginn des WiSe 2025/26 eine neue Version des Kapitels erstellt zu haben. Während das Kapitel entsteht, funktioniert so manches dann nicht so wie es soll.

“Aktuell wird gerade an den Marginal effects models gebaut… das wird jedenfalls so seine Zeit noch brauchen, bis ich hier einen Strang und Grund drin habe.” — Jochen Kruppa-Scheetz, meiner bescheidener Meinung nach.

Abbildung 82.1— “foo” Quelle: https://easystats.github.io/modelbased
  1. Draw what you want to visualize
  2. Make models for it
  3. Select the best model
  4. Visualize the best model
  5. Investigate its parameters

82.1 Genutzte R Pakete

Wir wollen folgende R Pakete in diesem Kapitel nutzen.

R Code [zeigen / verbergen]
pacman::p_load(tidyverse, gtsummary, marginaleffects, emmeans, scales,
               ggpmisc, readxl, conflicted)
conflicts_prefer(dplyr::mutate)
conflicts_prefer(dplyr::summarize)
conflicts_prefer(dplyr::filter)
conflicts_prefer(ggplot2::annotate)
cb_pal <- c("#000000", "#E69F00", "#56B4E9", 
            "#009E73", "#F0E442", "#F5C710", 
            "#0072B2", "#D55E00", "#CC79A7")
## 
nice_number <- label_number(style_negative = "minus", accuracy = 0.01)
nice_p <- label_pvalue(prefix = c("p < ", "p = ", "p > "))
find_intercept <- function(x1, y1, slope) {
  intercept <- slope * (-x1) + y1
  return(intercept)
}

An der Seite des Kapitels findest du den Link Quellcode anzeigen, über den du Zugang zum gesamten R-Code dieses Kapitels erhältst.

82.2 Daten

R Code [zeigen / verbergen]
flea_model_tbl <- read_excel("data/fleas_model_data.xlsx") |> 
  mutate(feeding = as_factor(feeding),
         stage = as_factor(stage),
         bonitur = as.numeric(bonitur),
         infected = factor(infected, labels = c("healthy", "infected"))) |> 
  select(feeding, stage, jump_length, weight, hatched, count_leg,  bonitur, infected)

82.3 Hypothesen

\[ Z=\frac{h(\hat{\theta})-H_0}{\sqrt{\hat{V}[h(\hat{\theta})]}} \]

R Code [zeigen / verbergen]
feeding_fit = lm(jump_length ~ 0 + feeding, data = flea_model_tbl)
coef(feeding_fit)
feedingsugar_water       feedingblood     feedingketchup 
          63.08062           75.43938           65.71313 
R Code [zeigen / verbergen]
summary(feeding_fit) |> coef()
                   Estimate Std. Error  t value     Pr(>|t|)
feedingsugar_water 63.08062   4.791469 13.16519 4.878067e-17
feedingblood       75.43938   4.791469 15.74452 6.366735e-20
feedingketchup     65.71313   4.791469 13.71461 1.109930e-17

\[ Z = \cfrac{\hat{\beta_1}-H_0}{\sqrt{\widehat{V}[\beta_1]}} = \cfrac{75.43938 - 0}{4.791469} = 15.74 \]

R Code [zeigen / verbergen]
2*pt(15.74452, df = 45, lower.tail = FALSE)
[1] 6.366696e-20
R Code [zeigen / verbergen]
feeding_fit <-  lm(jump_length ~ feeding * stage, data = flea_model_tbl)
coef(feeding_fit)
                 (Intercept)                 feedingblood 
                    68.89000                     13.18500 
              feedingketchup                stagejuvenile 
                    12.24625                    -11.61875 
  feedingblood:stagejuvenile feedingketchup:stagejuvenile 
                    -1.65250                    -19.22750 

82.4 Prädiktion

R Code [zeigen / verbergen]
simple_tbl <- flea_model_tbl |> 
  filter(stage == "adult")
simple_fit <- lm(jump_length ~ feeding, simple_tbl)
coef(simple_fit)
   (Intercept)   feedingblood feedingketchup 
      68.89000       13.18500       12.24625 
R Code [zeigen / verbergen]
predictions(simple_fit)

 Estimate Std. Error     z Pr(>|z|)     S 2.5 % 97.5 %
     68.9       6.99  9.86   <0.001  73.7  55.2   82.6
     68.9       6.99  9.86   <0.001  73.7  55.2   82.6
     68.9       6.99  9.86   <0.001  73.7  55.2   82.6
     68.9       6.99  9.86   <0.001  73.7  55.2   82.6
     68.9       6.99  9.86   <0.001  73.7  55.2   82.6
     68.9       6.99  9.86   <0.001  73.7  55.2   82.6
     68.9       6.99  9.86   <0.001  73.7  55.2   82.6
     68.9       6.99  9.86   <0.001  73.7  55.2   82.6
     82.1       6.99 11.74   <0.001 103.4  68.4   95.8
     82.1       6.99 11.74   <0.001 103.4  68.4   95.8
     82.1       6.99 11.74   <0.001 103.4  68.4   95.8
     82.1       6.99 11.74   <0.001 103.4  68.4   95.8
     82.1       6.99 11.74   <0.001 103.4  68.4   95.8
     82.1       6.99 11.74   <0.001 103.4  68.4   95.8
     82.1       6.99 11.74   <0.001 103.4  68.4   95.8
     82.1       6.99 11.74   <0.001 103.4  68.4   95.8
     81.1       6.99 11.61   <0.001 101.1  67.4   94.8
     81.1       6.99 11.61   <0.001 101.1  67.4   94.8
     81.1       6.99 11.61   <0.001 101.1  67.4   94.8
     81.1       6.99 11.61   <0.001 101.1  67.4   94.8
     81.1       6.99 11.61   <0.001 101.1  67.4   94.8
     81.1       6.99 11.61   <0.001 101.1  67.4   94.8
     81.1       6.99 11.61   <0.001 101.1  67.4   94.8
     81.1       6.99 11.61   <0.001 101.1  67.4   94.8

Type: response
R Code [zeigen / verbergen]
avg_predictions(simple_fit)

 Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
     77.4       4.04 19.2   <0.001 269.8  69.5   85.3

Type: response
R Code [zeigen / verbergen]
plot_predictions(simple_fit, by = "feeding")

R Code [zeigen / verbergen]
simple_2_tbl <- flea_model_tbl |> 
  filter(stage == "adult")
simple_2_fit <- lm(jump_length ~ weight*feeding, simple_2_tbl)
coef(simple_2_fit)
          (Intercept)                weight          feedingblood 
           53.9050671             0.8849805             3.4948660 
       feedingketchup   weight:feedingblood weight:feedingketchup 
           10.7610352             1.0294849             0.4498506 
R Code [zeigen / verbergen]
simple_2_fit |> broom::augment()
predictions(simple_2_fit)
avg_predictions(simple_2_fit, by = "feeding")

tibble(weight = simple_2_tbl$weight,
       jump_length = simple_2_tbl$jump_length, 
       feeding = simple_2_tbl$feeding,
       estimate = predictions(simple_2_fit)$estimate) |> 
  ggplot(aes(weight, estimate, shape = feeding)) +
  geom_point(aes(weight, jump_length), color = "blue") +
  geom_point(color = "red") +
  geom_line(aes(y = predict(simple_2_fit))) +
  geom_hline(yintercept = avg_predictions(simple_2_fit, by = "feeding")$estimate)
R Code [zeigen / verbergen]
plot_predictions(simple_2_fit, by = c("weight", "feeding"))

82.5 Counterfactual

R Code [zeigen / verbergen]
avg_comparisons(feeding_fit,
    by = "stage",
    variables = list("feeding" = "pairwise"),
    vcov = "HC3")

              Contrast    stage Estimate Std. Error       z Pr(>|z|)   S  2.5 %
 blood - sugar_water   adult      13.185       7.79  1.6934   0.0904 3.5  -2.08
 ketchup - blood       adult      -0.939      12.29 -0.0764   0.9391 0.1 -25.03
 ketchup - sugar_water adult      12.246      11.10  1.1029   0.2701 1.9  -9.52
 blood - sugar_water   juvenile   11.532       7.86  1.4679   0.1421 2.8  -3.87
 ketchup - blood       juvenile  -18.514       7.53 -2.4579   0.0140 6.2 -33.28
 ketchup - sugar_water juvenile   -6.981       3.89 -1.7946   0.0727 3.8 -14.61
 97.5 %
 28.445
 23.149
 34.010
 26.931
 -3.751
  0.643

Term: feeding
Type: response

82.6 Weitere R Pakete

Das R Paket {modelbased}

82.7 Marginal effects models

82.7.1 Kontrafaktische Vergleiche

Kontrafaktische Vergleiche (eng. counterfactual)

Abbildung 82.2— foo. Modifiziert nach Heiss (2022)
R Code [zeigen / verbergen]
model_grp_sq <- lm(jump_length ~ weight*I(weight^2)*feeding + count_leg,
                   data = flea_model_tbl)
tidy(model_grp_sq)
# A tibble: 13 × 5
   term                                estimate std.error statistic p.value
   <chr>                                  <dbl>     <dbl>     <dbl>   <dbl>
 1 (Intercept)                        49.9        36.5      1.37      0.180
 2 weight                              1.10       11.7      0.0939    0.926
 3 I(weight^2)                         0.0620      0.950    0.0653    0.948
 4 feedingblood                      -13.2        92.9     -0.142     0.888
 5 feedingketchup                     -7.04       68.1     -0.103     0.918
 6 count_leg                          -0.00280     0.0113  -0.248     0.806
 7 weight:I(weight^2)                 -0.00306     0.0223  -0.137     0.891
 8 weight:feedingblood                 7.05       27.8      0.254     0.801
 9 weight:feedingketchup               0.527      20.7      0.0255    0.980
10 I(weight^2):feedingblood           -0.565       2.30    -0.245     0.808
11 I(weight^2):feedingketchup          0.0551      1.80     0.0305    0.976
12 weight:I(weight^2):feedingblood     0.0143      0.0536   0.267     0.791
13 weight:I(weight^2):feedingketchup   0.000125    0.0468   0.00268   0.998
R Code [zeigen / verbergen]
cfct_data <- datagrid(model = model_grp_sq,
                      weight = c(5, 15),
                      grid_type = "counterfactual")
R Code [zeigen / verbergen]
nrow(flea_model_tbl)
[1] 48
R Code [zeigen / verbergen]
nrow(cfct_data)
[1] 96
R Code [zeigen / verbergen]
mfx_cfct <- model_grp_sq |> 
  slopes(datagrid(weight = c(5, 15),
                  grid_type = "counterfactual"),
         variables = "weight")

rbind(head(mfx_cfct), tail(mfx_cfct))

 Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
     1.49       4.01 0.372    0.710 0.5 -6.37   9.35
     1.49       4.01 0.372    0.710 0.5 -6.37   9.35
     1.49       4.01 0.372    0.710 0.5 -6.37   9.35
     1.49       4.01 0.372    0.710 0.5 -6.37   9.35
     1.49       4.01 0.372    0.710 0.5 -6.37   9.35
     1.49       4.01 0.372    0.710 0.5 -6.37   9.35
     3.16       2.20 1.436    0.151 2.7 -1.15   7.46
     3.16       2.20 1.436    0.151 2.7 -1.15   7.46
     3.16       2.20 1.436    0.151 2.7 -1.15   7.46
     3.16       2.20 1.436    0.151 2.7 -1.15   7.46
     3.16       2.20 1.436    0.151 2.7 -1.15   7.46
     3.16       2.20 1.436    0.151 2.7 -1.15   7.46

Term: weight
Type: response
Comparison: dY/dX
R Code [zeigen / verbergen]
mfx_cfct |> 
  group_by(weight) |> 
  summarize(avg_slope = mean(estimate))
# A tibble: 2 × 2
  weight avg_slope
   <dbl>     <dbl>
1      5      2.68
2     15      1.57
R Code [zeigen / verbergen]
model_grp_sq |> 
  avg_slopes(newdata = datagrid(weight = c(5, 20),
                                grid_type = "counterfactual"),
             variables = "weight",
             by = c("weight", "feeding"))

 weight     feeding Estimate Std. Error       z Pr(>|z|)   S  2.5 % 97.5 %
      5 sugar_water    1.490       4.01  0.3716    0.710 0.5  -6.37   9.35
      5 blood          3.965       7.99  0.4959    0.620 0.7 -11.70  19.63
      5 ketchup        2.578       5.16  0.4991    0.618 0.7  -7.54  12.70
     20 sugar_water   -0.097       2.13 -0.0456    0.964 0.1  -4.26   4.07
     20 blood          1.532       1.68  0.9136    0.361 1.5  -1.75   4.82
     20 ketchup        2.784       5.72  0.4870    0.626 0.7  -8.42  13.99

Term: weight
Type: response
Comparison: dY/dX

82.8 Analyse von Zeitreihen

Hier nochmal {mgcv} und {gamm4}

Introduction to Generalized Additive Mixed Models

s() und Interaktion mit s(x_1, by = f_1)

Referenzen

Heiss, A. (2022, Mai 20). Marginalia: A Guide to Figuring Out What the Heck Marginal Effects, Marginal Slopes, Average Marginal Effects, Marginal Effects at the Mean, and All These Other Marginal Things Are. https://doi.org/10.59350/40xaj-4e562