Multilevel analyses with brmsTutorial using TIMSS2019 data (FL)Sven De Maeyer2021-02-081 / 42

Aim

Get you familiar with a basic workflow for a Bayesian analysis making use of brms and other great packages...

Not: a technical introduction in Bayesian analysis

Side-effect: get you in touch with Bayesian Inferences

2 / 42

What's Bayesian about ...3 / 42

The fundament: Bayesian Theorem

$P (θ | d a t a) = \frac{P (d a t a | θ) P (θ)}{P (d a t a)}$

with

$P (d a t a | θ)$ : the likelihood of the data given our model $θ$
$P (θ)$ : our prior belief about model parameters

$\to$ we will get back to these during the presentation

4 / 42

Advantages & Disadvantages of Bayesian analyses

Advantages:

Natural approach to express uncertainty
Ability to incorporate prior knowledge
Increased model flexibility
Full posterior distribution of the parameters
Natural propagation of uncertainty

Disadvantage:

Slow speed of model estimation

[*] Slide from Paul Bürkner's presentation availble on YouTube [https://www.youtube.com/watch?v=FRs1iribZME]

5 / 42

Software6 / 42

Software

different dedicated software/packages are available: JAGS / BUGS / Stan
most powerful is Stan! Specifically the Hamiltonian Monte Carlo algorithm makes it the best choice at the moment
actually Stan is a probabilistic programming language that uses C++

7 / 42

Example of Stan code## // generated with brms 2.14.4
## functions {
## }
## data {
##   int<lower=1> N;  // total number of observations
##   vector[N] Y;  // response variable
##   int prior_only;  // should the likelihood be ignored?
## }
## transformed data {
## }
## parameters {
##   real Intercept;  // temporary intercept for centered predictors
##   real<lower=0> sigma;  // residual SD
## }
## transformed parameters {
## }
## model {
##   // likelihood including all constants
##   if (!prior_only) {
##     // initialize linear predictor term
##     vector[N] mu = Intercept + rep_vector(0.0, N);
##     target += normal_lpdf(Y | mu, sigma);
##   }
##   // priors including all constants
##   target += normal_lpdf(Intercept | 500,50);
##   target += student_t_lpdf(sigma | 3, 0, 68.8)
##     - 1 * student_t_lccdf(0 | 3, 0, 68.8);
## }
## generated quantities {
##   // actual population-level intercept
##   real b_Intercept = Intercept;
## }
8 / 42

alt text

9 / 42

TIMSS 2019 - Belgium (Fl)10 / 42

To illustrate brms and other packages we will do some analyses on TIMSS data...

library(haven)
## Student achievement data (we select some general variables
## and plausible values)
Student_FL19 <- read_sav("asgbflm7.sav") %>%
  select(IDSCHOOL, IDCLASS, IDSTUD, ITSEX, 
         ASDAGE, TOTWGT, HOUWGT, SENWGT,
         ASMMAT01, ASSSCI01, 
         ASMNUM01, ASMGEO01, ASMDAT01, 
         ASSLIF01, ASSPHY01, ASSEAR01,
         ASMKNO01, ASMAPP01, ASMREA01,
         ASSKNO01, ASSAPP01, ASSREA01) %>%
  zap_labels()
## School context data (we select schools' socio-economic
## composition)
School_contextFL19 <- read_sav("acgbflm7.sav") %>%
  select(IDSCHOOL, ACDGSBC)

11 / 42

A quick exploration of the data...

12 / 42

Model 1: pretty naive...13 / 42

To get started we will introduce an overly simple model14 / 42

How to fit in brms

# If necessary define (some of the) priors yourself
priorMath <- c(set_prior("normal(500,50)", class = "Intercept"))
Model_math_naive <- brm(
  math ~ 1 ,             # Typical R formula style
  data = Student_FL19,  
  family = "gaussian",   # Set the type of dependent variable
  prior = priorMath      # Define some customized prior definition
)

15 / 42

summary(Model_math_naive)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: math ~ 1 
##    Data: Student_FL19 (Number of observations: 4655) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept   532.14      0.99   530.23   534.10 1.00     3556     2471
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    67.34      0.68    66.01    68.72 1.00     3571     2185
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
16 / 42

4000 samples of parameters

head(
  posterior_samples(Model_math_naive), 10
  )

##    b_Intercept    sigma      lp__
## 1     531.7451 67.96813 -26208.14
## 2     530.8633 67.74419 -26208.66
## 3     533.0500 66.81828 -26208.38
## 4     531.3245 68.11200 -26208.59
## 5     531.4971 67.79558 -26208.08
## 6     531.0264 67.99846 -26208.72
## 7     531.2763 67.44605 -26208.06
## 8     532.9161 67.38107 -26207.97
## 9     532.0318 67.47086 -26207.69
## 10    532.5863 67.43820 -26207.78

17 / 42

Time for plots...

Let's set a theme:

theme_set(theme_linedraw() +
            theme(text = element_text(family = "Times"),
                  panel.grid = element_blank()))

18 / 42

Package tidybayes is a wrapper including functions from ggdist that open a lot of plotting options

posterior_samples(Model_math_naive) %>%  # Get draws of the posterior
  select(b_Intercept) %>%                # Only select our Intercept
  pivot_longer(everything()) %>%         # Rearrange the result to plot
  ggplot(aes(x = value, y = name)) +     # Let's plot...
    stat_pointinterval(.width = .89) +   # 89% HDI
    xlab("marginal posterior")           # Set our title

19 / 42

Let's plot the variance estimate making use of intervals...

posterior_samples(Model_math_naive) %>%     # Get draws of the posterior
  select(sigma) %>%                         # Select our 'residual' variance
  median_qi( .width = c(.95, .89, .5)) %>%  # Choose the intervals we want to plot
  ggplot(aes(x = sigma)) +                  # Let's plot...
   geom_interval(aes(xmin = .lower, xmax = .upper)) +
   xlab("marginal posterior")     +         # Set our title 
   scale_color_brewer()                     # Choose a color scale

20 / 42

Or let's combine some things...

names <- c("math average","sd pupillevel")
posterior_samples(Model_math_naive) %>%
  select(b_Intercept,sigma) %>%
  set_names(names) %>%
  pivot_longer(everything()) %>%
  mutate(name = factor(name, 
                       levels = c("math average", 
                                  "sd pupillevel"))) %>%
  ggplot(aes(x = value, y = 0)) +
   stat_halfeye(.width = .89, normalize = "panels") +
   scale_y_continuous(NULL, breaks = NULL) +
   xlab("marginal posterior") +
   facet_wrap(~name, scales = "free")

21 / 42

Model 2: let's introduce schools22 / 42

A more complex model

Let's assume that part of the variance can be ascribed to schools
a typical null model in multilevel analysis

$\begin{aligned} y_{i j} \sim & N (μ, σ_{e_{i j}}) \\ μ & = β_{0} + u_{j} \\ u_{j} \sim & N (0, σ_{u_{j}}) \end{aligned}$

23 / 42

Let's fit in brms

# If necessary define (some of the) priors yourself
priorMath <- c(set_prior("normal(500,50)", class = "Intercept"))
Model_math <- brm(
  math ~ 1 + (1|IDSCHOOL), # Recognise the lme4 style notation
  data = Student_FL19,
  family = "gaussian",
  prior = priorMath,       # Set our prior
  cores = 4,               # Use 4 cores of pc for estimation
  seed = 2021,             # Make our analyses Reproducible 
  sample_prior = T         # Also save samples of the prior
)

24 / 42

summary(Model_math)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: math ~ 1 + (1 | IDSCHOOL) 
##    Data: Student_FL19 (Number of observations: 4655) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~IDSCHOOL (Number of levels: 147) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)    27.75      1.95    24.15    31.78 1.00     1299     2376
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept   531.71      2.50   526.82   536.50 1.00      900     1691
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    62.13      0.65    60.86    63.43 1.00     9331     3066
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
25 / 42

Let's see the samples and calculate the ICC each time:

head(
  posterior_samples(Model_math) %>%
    mutate(ICC = (sd_IDSCHOOL__Intercept/(sd_IDSCHOOL__Intercept + sigma))) %>%
    select(b_Intercept, sigma, sd_IDSCHOOL__Intercept, ICC), 
  10
  )

##    b_Intercept    sigma sd_IDSCHOOL__Intercept       ICC
## 1     530.2516 63.61148               27.42303 0.3012378
## 2     533.3364 60.70196               28.55842 0.3199451
## 3     528.1693 63.56052               28.95115 0.3129459
## 4     528.9997 60.86778               29.10308 0.3234723
## 5     528.6217 61.42548               31.29036 0.3374866
## 6     532.3404 62.69834               27.04696 0.3013747
## 7     530.8171 60.81000               29.01396 0.3230091
## 8     529.1805 62.00777               26.48913 0.2993227
## 9     531.2647 62.03283               27.28977 0.3055192
## 10    528.6474 62.19574               31.86373 0.3387615

26 / 42

Let's plot our parameter estimates including ICC & make use of stat_dotsinterval()

names <- c("math average","sd pupillevel","sd schoollevel", "ICC")
posterior_samples(Model_math) %>%
  mutate(ICC = (sd_IDSCHOOL__Intercept/(sd_IDSCHOOL__Intercept + sigma))) %>%
  select(b_Intercept,sigma,sd_IDSCHOOL__Intercept, ICC) %>%
  set_names(names) %>%
  pivot_longer(everything()) %>%
  mutate(name = factor(name, 
                       levels = c("math average",
                                  "sd schoollevel", 
                                  "sd pupillevel", 
                                  "ICC"))) %>%
  ggplot(aes(x = value, y = 0)) +
   stat_dotsinterval(.width = .89, normalize = "panels", quantiles = 100) +
   scale_y_continuous(NULL, breaks = NULL) +
   xlab("marginal posterior") +
   facet_wrap(~name, scales = "free")

27 / 42

28 / 42

posterior_samples(Model_math) %>% 
  pivot_longer(starts_with("r_IDSCHOOL")) %>% 
  mutate(sigma_i = value) %>%
  ggplot(aes(x = sigma_i, 
             y = reorder(name, sigma_i))) +
    stat_pointinterval(point_interval = mean_qi, 
                       .width = .89, 
                       size = 1/6) +
    scale_y_discrete(expression(italic(j)), 
                     breaks = NULL) +
    labs(subtitle = "Schoollevel residuals and 89% CI",
       x = expression(italic(u)[italic(j)]))

29 / 42

library(sjPlot)
tab_model(Model_math)

	1ST PLAUSIBLE VALUE MATHEMATICS
Predictors	Estimates	CI (95%)
Intercept	531.72	526.82 – 536.50
Random Effects
σ²	3860.03
τ₀₀ _IDSCHOOL	770.05
ICC	0.17
N _IDSCHOOL	147
Observations	4655
Marginal R² / Conditional R²	0.000 / 0.149

30 / 42

Model 3: time to predict31 / 42

Let's introduce 2 explanatory variables:

Girl
Schools' Socioeconomic Composition (SSC)

$\begin{aligned} y_{i j} \sim N (μ, σ_{e_{i j}}) \\ μ = β_{0} + β_{1} {Girl}_{i j} + β_{2} {SSC}_{j} + u_{j} \\ u_{j} \sim N (0, σ_{u_{j}}) \end{aligned}$

32 / 42

Our prior knowledge

Girl: we expect girls to score lower than boys
- in 2015: girls scored 6 points lower
- in 2011: girls scored 8 points lower

SSC: we expect that
- schools with "More Affluent" composition score higher than
- schools with "Neither more Affluent nor more Disadvantaged" composition, who will score higher than
- schools with "More Disadvantaged" composotion

33 / 42

Let's fit in brms

priorMathModel2 <- 
  c(set_prior("normal(500,50)", class = "Intercept"),
    set_prior("normal(-7,5)", class = "b", coef = "Girl"),
    set_prior("normal(-10,10)", class = "b", coef = "SSC2"),
    set_prior("normal(-20,10)", class = "b", coef = "SSC3")
  )
Model2_math <- brm(math ~ 1 + Girl + SSC + (1|IDSCHOOL),
                   data = Student_FL19,
                   family = "gaussian",
                   prior = priorMathModel2,
                   cores = 4,
                   seed = 2021,
)

34 / 42

tab_model(Model_math, Model2_math,
          dv.labels = c("Model 1 (null model)", "Model 2"))



 
Model 1 (null model)
Model 2

Predictors
Estimates
CI (95%)
Estimates
CI (95%)

Intercept
531.72
526.82 – 536.50
548.82
543.79 – 554.06

Girl


-12.71
-16.31 – -9.04

SSC: SSC2


-16.53
-25.70 – -7.25

SSC: SSC3


-40.74
-49.73 – -30.86

Random Effects

σ2
3860.03
3865.48


τ00
770.05 IDSCHOOL
426.84 IDSCHOOL


ICC
0.17
0.10


N
147 IDSCHOOL
133 IDSCHOOL

Observations
4655
4257

Marginal R2 / Conditional R2
0.000 / 0.149
0.062 / 0.154


35 / 42

	Model 1 (null model)	Model 2
Predictors	Estimates	CI (95%)	Estimates	CI (95%)
Intercept	531.72	526.82 – 536.50	548.82	543.79 – 554.06
Girl			-12.71	-16.31 – -9.04
SSC: SSC2			-16.53	-25.70 – -7.25
SSC: SSC3			-40.74	-49.73 – -30.86
Random Effects
σ²	3860.03	3865.48
τ₀₀	770.05 _IDSCHOOL	426.84 _IDSCHOOL
ICC	0.17	0.10
N	147 _IDSCHOOL	133 _IDSCHOOL
Observations	4655	4257
Marginal R² / Conditional R²	0.000 / 0.149	0.062 / 0.154

names <- c("More Affluent",
           "Neither more Affluent nor more Disadvantaged",
           "More Disadvantaged")
posterior_samples(Model2_math) %>%
  mutate(Affluent = b_Intercept, 
         Neutral = (b_Intercept + b_SSC2) , 
         Disadvantaged = (b_Intercept + b_SSC3)) %>%
  select(Affluent, Neutral, Disadvantaged) %>%
  set_names(names) %>%
  pivot_longer(everything()) %>%
  mutate(name = factor(name, 
                       levels = c("More Disadvantaged",
                                  "Neither more Affluent nor more Disadvantaged", 
                                  "More Affluent"))) %>%
  ggplot(aes(x = value, y = 0)) +
  stat_dotsinterval(.width = .89, 
                    normalize = "panels", 
                    quantiles = 100) +
  scale_y_continuous(NULL, breaks = NULL) +
  xlab("marginal posterior") +
  facet_wrap(~name, scales = "free") + 
  ggtitle(label = "Average math scores according to SSC",
          subtitle = "Posterior distribution and 89% CI")

36 / 42

Let's compare models37 / 42

Visual posterior predictive check

p1 <- pp_check(Model_math_naive, 
               type = "dens_overlay", 
               nsamples = 20) + 
  ggtitle("Model_math_naive")
p2 <- pp_check(Model_math, 
               type = "dens_overlay", 
               nsamples = 20) +
  ggtitle("Model_math")
p3 <- pp_check(Model2_math, 
               type = "dens_overlay", 
               nsamples = 20) + 
  ggtitle("Model2_math")
library(patchwork)
p1 + p2 + p3 + 
  plot_layout(nrow = 3, byrow = F) +
  plot_annotation(title = 'Posterior Predictive Checks')

38 / 42

Leave One Out - Cross Validation (loo)

loo_Model_math_naive <- loo(Model_math_naive)
loo_Model_math <- loo(Model_math)
loo_Model2_math <- loo(Model2_math)
loo_Model_math_naive

## 
## Computed from 4000 by 4655 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo -26203.7 46.5
## p_loo         1.9  0.1
## looic     52407.4 93.0
## ------
## Monte Carlo SE of elpd_loo is 0.0.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

loo_Model_math

## 
## Computed from 4000 by 4655 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo -25891.6 46.3
## p_loo       125.7  2.8
## looic     51783.1 92.7
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

loo_Model2_math

## 
## Computed from 4000 by 4257 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo -23674.0 44.5
## p_loo       103.6  2.4
## looic     47348.0 89.1
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

39 / 42

To conclude

Bayesian modelling is made simple by brms
Great add-ons are tidybayes and bayesplot

What's next

brms is more versatile than lme4 allowing also to estimate
- multivariate models
- finite mixture models
- generalized models
- splines
- ...
we all should consider to do Bayesian modelling in the (near) future

40 / 42

My ambition

I'll try to blog the coming months on Bayesian Analyses [https://svendemaeyer.netlify.app/]

Some great sources

if you have time: "A student's guide to Bayesian Statistics", by Ben Lambert
inspiring blogs:
- Solomon Kurz: [https://solomonkurz.netlify.app/post/]
- Research on visualization & prior generation: [https://mucollective.northwestern.edu/]

41 / 42

Questions?

42 / 42

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help