+ - 0:00:00
Notes for current slide
Notes for next slide

Multilevel analyses with brms

Tutorial using TIMSS2019 data (FL)

Sven De Maeyer

2021-02-08

1 / 42

Aim



Get you familiar with a basic workflow for a Bayesian analysis making use of brms and other great packages...

Not: a technical introduction in Bayesian analysis


Side-effect: get you in touch with Bayesian Inferences

2 / 42

What's Bayesian about ...

3 / 42

The fundament: Bayesian Theorem

P(θ|data)=P(data|θ)P(θ)P(data)

with

  • P(data|θ) : the likelihood of the data given our model θ
  • P(θ) : our prior belief about model parameters


we will get back to these during the presentation

4 / 42

Advantages & Disadvantages of Bayesian analyses

Advantages:

  • Natural approach to express uncertainty
  • Ability to incorporate prior knowledge
  • Increased model flexibility
  • Full posterior distribution of the parameters
  • Natural propagation of uncertainty

Disadvantage:

  • Slow speed of model estimation

[*] Slide from Paul Bürkner's presentation availble on YouTube [https://www.youtube.com/watch?v=FRs1iribZME]

5 / 42

Software

6 / 42

Software

  • different dedicated software/packages are available: JAGS / BUGS / Stan

  • most powerful is Stan! Specifically the Hamiltonian Monte Carlo algorithm makes it the best choice at the moment

  • actually Stan is a probabilistic programming language that uses C++

7 / 42

Example of Stan code

## // generated with brms 2.14.4
## functions {
## }
## data {
## int<lower=1> N; // total number of observations
## vector[N] Y; // response variable
## int prior_only; // should the likelihood be ignored?
## }
## transformed data {
## }
## parameters {
## real Intercept; // temporary intercept for centered predictors
## real<lower=0> sigma; // residual SD
## }
## transformed parameters {
## }
## model {
## // likelihood including all constants
## if (!prior_only) {
## // initialize linear predictor term
## vector[N] mu = Intercept + rep_vector(0.0, N);
## target += normal_lpdf(Y | mu, sigma);
## }
## // priors including all constants
## target += normal_lpdf(Intercept | 500,50);
## target += student_t_lpdf(sigma | 3, 0, 68.8)
## - 1 * student_t_lccdf(0 | 3, 0, 68.8);
## }
## generated quantities {
## // actual population-level intercept
## real b_Intercept = Intercept;
## }
8 / 42

alt text

9 / 42

TIMSS 2019 - Belgium (Fl)

10 / 42

To illustrate brms and other packages we will do some analyses on TIMSS data...

library(haven)
## Student achievement data (we select some general variables
## and plausible values)
Student_FL19 <- read_sav("asgbflm7.sav") %>%
select(IDSCHOOL, IDCLASS, IDSTUD, ITSEX,
ASDAGE, TOTWGT, HOUWGT, SENWGT,
ASMMAT01, ASSSCI01,
ASMNUM01, ASMGEO01, ASMDAT01,
ASSLIF01, ASSPHY01, ASSEAR01,
ASMKNO01, ASMAPP01, ASMREA01,
ASSKNO01, ASSAPP01, ASSREA01) %>%
zap_labels()
## School context data (we select schools' socio-economic
## composition)
School_contextFL19 <- read_sav("acgbflm7.sav") %>%
select(IDSCHOOL, ACDGSBC)
11 / 42

A quick exploration of the data...


12 / 42

Model 1: pretty naive...

13 / 42

To get started we will introduce an overly simple model

14 / 42

How to fit in brms

# If necessary define (some of the) priors yourself
priorMath <- c(set_prior("normal(500,50)", class = "Intercept"))
Model_math_naive <- brm(
math ~ 1 , # Typical R formula style
data = Student_FL19,
family = "gaussian", # Set the type of dependent variable
prior = priorMath # Define some customized prior definition
)
15 / 42
summary(Model_math_naive)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: math ~ 1
## Data: Student_FL19 (Number of observations: 4655)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 532.14 0.99 530.23 534.10 1.00 3556 2471
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 67.34 0.68 66.01 68.72 1.00 3571 2185
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
16 / 42

4000 samples of parameters

head(
posterior_samples(Model_math_naive), 10
)
## b_Intercept sigma lp__
## 1 531.7451 67.96813 -26208.14
## 2 530.8633 67.74419 -26208.66
## 3 533.0500 66.81828 -26208.38
## 4 531.3245 68.11200 -26208.59
## 5 531.4971 67.79558 -26208.08
## 6 531.0264 67.99846 -26208.72
## 7 531.2763 67.44605 -26208.06
## 8 532.9161 67.38107 -26207.97
## 9 532.0318 67.47086 -26207.69
## 10 532.5863 67.43820 -26207.78
17 / 42

Time for plots...

Let's set a theme:

theme_set(theme_linedraw() +
theme(text = element_text(family = "Times"),
panel.grid = element_blank()))
18 / 42

Package tidybayes is a wrapper including functions from ggdist that open a lot of plotting options

posterior_samples(Model_math_naive) %>% # Get draws of the posterior
select(b_Intercept) %>% # Only select our Intercept
pivot_longer(everything()) %>% # Rearrange the result to plot
ggplot(aes(x = value, y = name)) + # Let's plot...
stat_pointinterval(.width = .89) + # 89% HDI
xlab("marginal posterior") # Set our title

19 / 42

Let's plot the variance estimate making use of intervals...

posterior_samples(Model_math_naive) %>% # Get draws of the posterior
select(sigma) %>% # Select our 'residual' variance
median_qi( .width = c(.95, .89, .5)) %>% # Choose the intervals we want to plot
ggplot(aes(x = sigma)) + # Let's plot...
geom_interval(aes(xmin = .lower, xmax = .upper)) +
xlab("marginal posterior") + # Set our title
scale_color_brewer() # Choose a color scale

20 / 42

Or let's combine some things...

names <- c("math average","sd pupillevel")
posterior_samples(Model_math_naive) %>%
select(b_Intercept,sigma) %>%
set_names(names) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name,
levels = c("math average",
"sd pupillevel"))) %>%
ggplot(aes(x = value, y = 0)) +
stat_halfeye(.width = .89, normalize = "panels") +
scale_y_continuous(NULL, breaks = NULL) +
xlab("marginal posterior") +
facet_wrap(~name, scales = "free")

21 / 42

Model 2: let's introduce schools

22 / 42

A more complex model

Let's assume that part of the variance can be ascribed to schools
a typical null model in multilevel analysis

yijN(μ,σeij)μ=β0+ujujN(0,σuj)

23 / 42

Let's fit in brms

# If necessary define (some of the) priors yourself
priorMath <- c(set_prior("normal(500,50)", class = "Intercept"))
Model_math <- brm(
math ~ 1 + (1|IDSCHOOL), # Recognise the lme4 style notation
data = Student_FL19,
family = "gaussian",
prior = priorMath, # Set our prior
cores = 4, # Use 4 cores of pc for estimation
seed = 2021, # Make our analyses Reproducible
sample_prior = T # Also save samples of the prior
)
24 / 42
summary(Model_math)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: math ~ 1 + (1 | IDSCHOOL)
## Data: Student_FL19 (Number of observations: 4655)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~IDSCHOOL (Number of levels: 147)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 27.75 1.95 24.15 31.78 1.00 1299 2376
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 531.71 2.50 526.82 536.50 1.00 900 1691
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 62.13 0.65 60.86 63.43 1.00 9331 3066
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
25 / 42

Let's see the samples and calculate the ICC each time:

head(
posterior_samples(Model_math) %>%
mutate(ICC = (sd_IDSCHOOL__Intercept/(sd_IDSCHOOL__Intercept + sigma))) %>%
select(b_Intercept, sigma, sd_IDSCHOOL__Intercept, ICC),
10
)
## b_Intercept sigma sd_IDSCHOOL__Intercept ICC
## 1 530.2516 63.61148 27.42303 0.3012378
## 2 533.3364 60.70196 28.55842 0.3199451
## 3 528.1693 63.56052 28.95115 0.3129459
## 4 528.9997 60.86778 29.10308 0.3234723
## 5 528.6217 61.42548 31.29036 0.3374866
## 6 532.3404 62.69834 27.04696 0.3013747
## 7 530.8171 60.81000 29.01396 0.3230091
## 8 529.1805 62.00777 26.48913 0.2993227
## 9 531.2647 62.03283 27.28977 0.3055192
## 10 528.6474 62.19574 31.86373 0.3387615
26 / 42

Let's plot our parameter estimates including ICC & make use of stat_dotsinterval()

names <- c("math average","sd pupillevel","sd schoollevel", "ICC")
posterior_samples(Model_math) %>%
mutate(ICC = (sd_IDSCHOOL__Intercept/(sd_IDSCHOOL__Intercept + sigma))) %>%
select(b_Intercept,sigma,sd_IDSCHOOL__Intercept, ICC) %>%
set_names(names) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name,
levels = c("math average",
"sd schoollevel",
"sd pupillevel",
"ICC"))) %>%
ggplot(aes(x = value, y = 0)) +
stat_dotsinterval(.width = .89, normalize = "panels", quantiles = 100) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("marginal posterior") +
facet_wrap(~name, scales = "free")
27 / 42

28 / 42
posterior_samples(Model_math) %>%
pivot_longer(starts_with("r_IDSCHOOL")) %>%
mutate(sigma_i = value) %>%
ggplot(aes(x = sigma_i,
y = reorder(name, sigma_i))) +
stat_pointinterval(point_interval = mean_qi,
.width = .89,
size = 1/6) +
scale_y_discrete(expression(italic(j)),
breaks = NULL) +
labs(subtitle = "Schoollevel residuals and 89% CI",
x = expression(italic(u)[italic(j)]))

29 / 42
library(sjPlot)
tab_model(Model_math)
  1ST PLAUSIBLE VALUE
MATHEMATICS
Predictors Estimates CI (95%)
Intercept 531.72 526.82 – 536.50
Random Effects
σ2 3860.03
τ00 IDSCHOOL 770.05
ICC 0.17
N IDSCHOOL 147
Observations 4655
Marginal R2 / Conditional R2 0.000 / 0.149
30 / 42

Model 3: time to predict

31 / 42

Let's introduce 2 explanatory variables:

  • Girl
  • Schools' Socioeconomic Composition (SSC)

yijN(μ,σeij)μ=β0+β1Girlij+β2SSCj+ujujN(0,σuj)

32 / 42

Our prior knowledge

  • Girl: we expect girls to score lower than boys
    • in 2015: girls scored 6 points lower
    • in 2011: girls scored 8 points lower


  • SSC: we expect that
    • schools with "More Affluent" composition score higher than
    • schools with "Neither more Affluent nor more Disadvantaged" composition, who will score higher than
    • schools with "More Disadvantaged" composotion
33 / 42

Let's fit in brms

priorMathModel2 <-
c(set_prior("normal(500,50)", class = "Intercept"),
set_prior("normal(-7,5)", class = "b", coef = "Girl"),
set_prior("normal(-10,10)", class = "b", coef = "SSC2"),
set_prior("normal(-20,10)", class = "b", coef = "SSC3")
)
Model2_math <- brm(math ~ 1 + Girl + SSC + (1|IDSCHOOL),
data = Student_FL19,
family = "gaussian",
prior = priorMathModel2,
cores = 4,
seed = 2021,
)
34 / 42
tab_model(Model_math, Model2_math,
dv.labels = c("Model 1 (null model)", "Model 2"))
  Model 1 (null model) Model 2
Predictors Estimates CI (95%) Estimates CI (95%)
Intercept 531.72 526.82 – 536.50 548.82 543.79 – 554.06
Girl -12.71 -16.31 – -9.04
SSC: SSC2 -16.53 -25.70 – -7.25
SSC: SSC3 -40.74 -49.73 – -30.86
Random Effects
σ2 3860.03 3865.48
τ00 770.05 IDSCHOOL 426.84 IDSCHOOL
ICC 0.17 0.10
N 147 IDSCHOOL 133 IDSCHOOL
Observations 4655 4257
Marginal R2 / Conditional R2 0.000 / 0.149 0.062 / 0.154
35 / 42
names <- c("More Affluent",
"Neither more Affluent nor more Disadvantaged",
"More Disadvantaged")
posterior_samples(Model2_math) %>%
mutate(Affluent = b_Intercept,
Neutral = (b_Intercept + b_SSC2) ,
Disadvantaged = (b_Intercept + b_SSC3)) %>%
select(Affluent, Neutral, Disadvantaged) %>%
set_names(names) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name,
levels = c("More Disadvantaged",
"Neither more Affluent nor more Disadvantaged",
"More Affluent"))) %>%
ggplot(aes(x = value, y = 0)) +
stat_dotsinterval(.width = .89,
normalize = "panels",
quantiles = 100) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("marginal posterior") +
facet_wrap(~name, scales = "free") +
ggtitle(label = "Average math scores according to SSC",
subtitle = "Posterior distribution and 89% CI")

36 / 42

Let's compare models

37 / 42

Visual posterior predictive check

p1 <- pp_check(Model_math_naive,
type = "dens_overlay",
nsamples = 20) +
ggtitle("Model_math_naive")
p2 <- pp_check(Model_math,
type = "dens_overlay",
nsamples = 20) +
ggtitle("Model_math")
p3 <- pp_check(Model2_math,
type = "dens_overlay",
nsamples = 20) +
ggtitle("Model2_math")
library(patchwork)
p1 + p2 + p3 +
plot_layout(nrow = 3, byrow = F) +
plot_annotation(title = 'Posterior Predictive Checks')

38 / 42

Leave One Out - Cross Validation (loo)

loo_Model_math_naive <- loo(Model_math_naive)
loo_Model_math <- loo(Model_math)
loo_Model2_math <- loo(Model2_math)
loo_Model_math_naive
##
## Computed from 4000 by 4655 log-likelihood matrix
##
## Estimate SE
## elpd_loo -26203.7 46.5
## p_loo 1.9 0.1
## looic 52407.4 93.0
## ------
## Monte Carlo SE of elpd_loo is 0.0.
##
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
loo_Model_math
##
## Computed from 4000 by 4655 log-likelihood matrix
##
## Estimate SE
## elpd_loo -25891.6 46.3
## p_loo 125.7 2.8
## looic 51783.1 92.7
## ------
## Monte Carlo SE of elpd_loo is 0.1.
##
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
loo_Model2_math
##
## Computed from 4000 by 4257 log-likelihood matrix
##
## Estimate SE
## elpd_loo -23674.0 44.5
## p_loo 103.6 2.4
## looic 47348.0 89.1
## ------
## Monte Carlo SE of elpd_loo is 0.1.
##
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
39 / 42

To conclude

  • Bayesian modelling is made simple by brms
  • Great add-ons are tidybayes and bayesplot

What's next

  • brms is more versatile than lme4 allowing also to estimate

    • multivariate models
    • finite mixture models
    • generalized models
    • splines
    • ...
  • we all should consider to do Bayesian modelling in the (near) future

40 / 42

My ambition

I'll try to blog the coming months on Bayesian Analyses [https://svendemaeyer.netlify.app/]

Some great sources

41 / 42

Questions?

42 / 42

Aim



Get you familiar with a basic workflow for a Bayesian analysis making use of brms and other great packages...

Not: a technical introduction in Bayesian analysis


Side-effect: get you in touch with Bayesian Inferences

2 / 42
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow