Sven De Maeyer
[University of Antwerp / Faculty of Social Sciences]
13/09/2021
University of Antwerp; Faculty of Social Sciences; dept. Training and Education Sciences
phd on methodological issues in research on School Effectiveness
research on a broad range of topics (e.g. psychometrics, writing research, learning strategies, education for sustainable development, linguistics, speech development of impaired children, ...)
my main focus shifted recently to Comparative Judgement and Learning from comparisons
since a year or 2 discovering Bayesian Statistics & brms
Blog: https://svendemaeyer.netlify.app/
Twitter: @svawa
github: https://github.com/Sdemaeyer2
Who has some experience with Bayesian analyses?
... with Stan
?
... with brms
?
Aims of the workshop & some practical stuff (let's go there)
What's Bayesian? (let's go there)
Why brms
? (let's go there)
brms
basic example (let's go there)
Let's get practical: the example data (let's go there)
Let's slightly increase complexity (let's go there)
brms
and other great packagesR code
that may inspireOn Github there is a dedicated repository where you'll find:
the Rmarkdown file for the slides
saved models from the slides (some take too long to estimate 'live')
data used for the exercises
The Github repository can be accessed using the following link:
[https://github.com/Sdemaeyer2/Turku_brms_2021]
The slides are on my Talks site [https://slides-sdemaeyer.netlify.app/]
Recordings
Please ask questions and interfere! Just give a shout!
Stuck with some exercises?
→ Share your screen and I will try to help
Day 2:
CmdStan
& cmdstanr
up and running (see https://mc-stan.org/cmdstanr/articles/cmdstanr.html)brms
installedtidyverse
installedI will often use the functional programming style making use of the pipe operator (works if tidyverse
is loaded):
A %>% B
This reads as take A and then do B
An example:
c(1,2,3,4,5) %>% mean()
reads as create a vector of the values 1, 2, 3, 4 and 5 and then calculate the mean
Want to learn dplyr by nice GIFS: hop to this great tweet feed
What is statistical inference about?
In your definition, what is a statistical model?
What are parameters?
prof. dr. Von Helsing (a famous Vampire Hunter)
How many Vampires are out there?
Time for his famous Vampire Test
Sample of 200 citizens
Result: 10.5% = Vampire
Ok, but what about the population?
Make a Confidence Interval for the 10.5%:
Estimate the sampling error:
Calculate upper and lower limit of the 95% CI:
Inference:
In 95% of the samples (of sample size 200) we would get a percentage of vampires varying somewhere between 6.25% and 14.75%
Before Von Helsing gathers data he had no knowledge at all!
If we visualize this 'prior knowledge' it could look like this.
Von Helsing starts testing ...
First tested person is NOT a Vampire!
So, 100% of Vampires is already less probable.
Time to adjust our knowledge
After testing the whole sample of 200 citizens, Von Helsing's knowledge is updated:
We get a probability for each possible percentage of vampires.
plot it in a probability density plot
summarize: e.g. 90% most probable values are situated between 7.5% and 14.8%
Advantages:
Disadvantage:
[*] Slight adaptation of a slide from Paul Bürkner's presentation available on YouTube
[https://www.youtube.com/watch?v=FRs1iribZME]
P(θ|data)=P(data|θ)P(θ)P(data)
with
meme from https://twitter.com/ChelseaParlett/status/1421291716229746689?s=20
P(data|θ)
sometimes also written as
L(θ|data)
Actually this is our model part
with θ being the model and all parameters in the model
Example:
What if we want to model how fast people can run a 1OK?
Data = 10 observations (Running times for 10K in minutes)
## [1] 52 54 58 48 41 49 72 53 64 62
The model (normal distribution):
yi∽N(μ,σ)
The parameter values that maximize the likelihood of our data:
Mean_RT <- mean(RT, na.rm = T)Mean_RT
## [1] 55.3
Sd_RT <- sd(RT, na.rm = T)Sd_RT
## [1] 8.957306
Expression of our prior knowledge (belief) about probable parameter values as a probability density function
"For Bayesians, the data are treated as fixed and the parameters vary. [...] Bayes' rule tells us that to calculate the posterior probability distribution we must combine a likelihood with a prior probability distribution over parameter values."
(Lambert, 2018, p.88)
Example:
What if we want to model how fast people can run a 1OK?
yi∽N(μ,σ)
Express our prior beliefs about
μ
and
σ
🤔 Please share your thoughts? What are possible values of the population average and standard deviation for our example?
When objectivity is crucial and you want let the data speak for itself...
When including significant information is crucial
Example:
What if we want to model how fast people can run a 1OK?
Vague priors for μ
par(mfrow=c(2,2))curve( dnorm( x , 50 , 20 ) , from=1 , to=200 ,xlab="mu", main="Prior for mu")curve( dunif( x , 1 , 40 ) , from=-10 , to=50 ,xlab="sigma", main="Prior for sigma")
# sample some values for mu and sigmamu.list <- seq(from = 30, to = 90, length.out=200)sigma.list <- seq(from = 2, to = 30, length.out = 200)post <- expand.grid(mu = mu.list, sigma = sigma.list)# Calculate the loglikelihood of the data# for each parameter valuepost$LL <- sapply(1:nrow(post), function(i) sum( dnorm( RT , mean=post$mu[i] , sd=post$sigma[i] , log=TRUE ) ) )
# Calculate posterior as product # of LL and Prior# but you see a '+ sign' # because we put everything on the # log scale # (to avoid getting zero's # due to rounding in R)post$prod <- post$LL + dnorm(post$mu, 50 , 10 , TRUE) + dunif(post$sigma , 1 , 40 , TRUE)# Re-scale the posterior # to the probability scalepost$prob <- exp(post$prod - max(post$prod) )
Make a contour plot
# Sampling going on# So for reproducibilityset.seed(1975)post %>% # sample 10000 rows # with replacement # higher prob higher prob to # be sampled sample_n(size = 10000, replace = TRUE, weight = prob) %>% # create the plot ggplot(aes(x = mu, y = sigma)) + geom_density_2d_filled() + theme_minimal()
Or sample from posterior and plot simple density plots
sample.rows <- sample(1:nrow(post), size=10000, replace=TRUE, prob=post$prob)sample.mu <- post$mu[sample.rows]sample.sigma <- post$sigma[sample.rows]
plot(density(sample.mu),main="mu")
plot(density(sample.sigma),main="sigma")
A 'simple' linear model
RTi∼N(μ,σei)μ=β0+β1∗Weigthi+β2∗Heighti+β3∗Agei+β4∗WeeklyTrainingHoursi+β5∗Genderi
So you can get a REALLY LARGE number of parameters!
Complex models → Large number of parameters → exponentional number of combinations!
Posterior gets unsolvable by grid approximation
We will approximate the 'joint posterior' by 'smart' sampling
Samples of combinations of parametervalues are drawn
BUT: samples will not be random!
Following link brings you to an interactive tool that let's you get the basic idea behind MCMC sampling:
https://chi-feng.github.io/mcmc-demo/app.html#HamiltonianMC,standard
Some prior probability distributions can be 'more easely' combined with a certain likelihood!
These are called conjugate priors
PRO:
You can actually analytically calculate the posterior. So: no need for all the sampling and approximation of the posterior.
A list of conjugate likelihood-prior pairs: https://en.wikipedia.org/wiki/Conjugate_prior
BUT:
"Simplicity comes with a cost! In most real-life examples of inference, the contraint of choosing likelihood-prior conjugate pairs is too restrictive and can lead us to use models that inadequately capture the variability in the data. (Lambert, 2018, p. 209)"
## // generated with brms 2.14.4## functions {## }## data {## int<lower=1> N; // total number of observations## vector[N] Y; // response variable## int prior_only; // should the likelihood be ignored?## }## transformed data {## }## parameters {## real Intercept; // temporary intercept for centered predictors## real<lower=0> sigma; // residual SD## }## transformed parameters {## }## model {## // likelihood including all constants## if (!prior_only) {## // initialize linear predictor term## vector[N] mu = Intercept + rep_vector(0.0, N);## target += normal_lpdf(Y | mu, sigma);## }## // priors including all constants## target += normal_lpdf(Intercept | 500,50);## target += student_t_lpdf(sigma | 3, 0, 68.8)## - 1 * student_t_lccdf(0 | 3, 0, 68.8);## }## generated quantities {## // actual population-level intercept## real b_Intercept = Intercept;## }
brms
syntaxVery very similar to lme4
and in line with typical R-style writing up of a model ...
lme4
Model <- lmer( y ~ x1 + x2 + (1|Group), data = Data, ...)
brms
Model <- brm( y ~ x1 + x2 + (1|Group), data = Data, family = "gaussian", backend = "cmdstanr" ...)
Notice:
family = "gaussian"
indicates the "likelihood" function we will usebackend = "cmdstanr"
indicates the way we want to interact with Stan and C++Example:
What if we want to model how fast people can run a 1OK?
The simplest model looked like:
RTi∼N(μ,σe)
In brms
this model is:
RT <- c( 52, 54, 58, 48, 41, 49, 72, 53, 64, 62 )# First make a dataset from our RT vectorDataRT <- data_frame(RT)Mod_RT1 <- brm( RT ~ 1, # We only model an intercept data = DataRT, backend = "cmdstanr", seed = 1975)
🏃 Try it yourself and run the model ... (don't forget to load the necessary packages: brms
& tidyverse
)
For the workshop all models are in the models folder
You can normally read model information with readRDS
Here's an example:
Mod_RT1 <- readRDS(here("models", "Mod_RT1.RDS"))
This model information was saved with the saveRDS
function:
saveRDS(Mod_RT1, here("models", "Mod_RT1.RDS"))
Notice that I use the package here
allowing me to access files anywhere in the RStudio Projects folder using the here( )
commando (no fuzz with paths on your computer)
MCMC results will differ each time run
Of course, not a lot if your model is good!
So, what if you want to be reproducible?
Mod_RT1 <- brm( RT ~ 1, # We only model an intercept data = DataRT, backend = "cmdstanr", seed = 1975 # make sure we get the same results each time :-))
summary( )
functionsummary(Mod_RT1)
## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: RT ~ 1 ## Data: DataRT (Number of observations: 10) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;## total post-warmup samples = 4000## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## Intercept 55.10 2.82 49.68 60.90 1.00 2000 2180## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## sigma 9.47 2.32 6.17 15.08 1.00 2445 2318## ## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS## and Tail_ESS are effective sample size measures, and Rhat is the potential## scale reduction factor on split chains (at convergence, Rhat = 1).
## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: RT ~ 1 ## Data: DataRT (Number of observations: 10) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;## total post-warmup samples = 4000## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## Intercept 55.10 2.82 49.68 60.90 1.00 2000 2180## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## sigma 9.47 2.32 6.17 15.08 1.00 2445 2318## ## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS## and Tail_ESS are effective sample size measures, and Rhat is the potential## scale reduction factor on split chains (at convergence, Rhat = 1).
By default brms
sets 4 chains of 2000 iterations of which 1000 iterations/chain are warm-up
plot( )
functionplot(Mod_RT1)
Left panel
the posterior distributions (later we will see other more informative ways to plot the information in the posterior)
Right panel
the convergence of the each parameter
→ should look like a caterpillar
ˆR < 1.015 for each parameter estimate
at least 4 chains are recommended
Effective Sample Size (ESS) > 400 to rely on ˆR
## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: RT ~ 1 ## Data: DataRT (Number of observations: 10) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;## total post-warmup samples = 4000## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## Intercept 55.10 2.82 49.68 60.90 1.00 2000 2180## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## sigma 9.47 2.32 6.17 15.08 1.00 2445 2318## ## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS## and Tail_ESS are effective sample size measures, and Rhat is the potential## scale reduction factor on split chains (at convergence, Rhat = 1).
26 high school teachers (Dutch) voluntary participated
each did 10 comparisons of 2 argumentative texts from 10th graders
3 batches of comparisons with random allocation of judges to one of the batches
all batches similar composition of comparisons regarding the characteristics of the pairs; the pairs, however not the same
Tobii TX300 dark pupil eye-tracker with a 23-inch TFT monitor (max. resolution of 1920 x 1080 pixels)
data sampled binocularly at 300 Hz
Partic | Compar | AOI_Type | Dur_Seconds | Dur_Log | |
---|---|---|---|---|---|
9 | 20 | DLL 112-103 | AOI_RIGHT_TEXT | 0.13 | 4.868 |
10 | 20 | DMH 803-102 | AOI_LEFT_TEXT | 0.32 | 5.768 |
11 | 20 | DMH 803-102 | AOI_LEFT_TEXT | 0.723 | 6.583 |
12 | 20 | DMH 803-102 | AOI_LEFT_TEXT | 0.09 | 4.500 |
13 | 20 | DMH 803-102 | AOI_LEFT_TEXT | 0.06 | 4.094 |
14 | 20 | DMH 803-102 | AOI_LEFT_TEXT | 0.083 | 4.419 |
15 | 20 | DMH 803-102 | AOI_LEFT_TEXT | 0.557 | 6.323 |
16 | 20 | DMH 803-102 | AOI_LEFT_TEXT | 8.586 | 9.058 |
How much time do people spend in a text (in total) when comparing two texts?
A simple model could look like:
log(Timei)∼N(μ,σe)
with Timei being the total time for each combination of participant, comparison and AOI_type
Durations_CJ %>% group_by(Partic, Compar, AOI_Type) %>% summarize( Dur_Seconds = mean(Dur_Seconds), Dur_Log = mean(Dur_Log) ) %>% head(5)
## # A tibble: 5 x 5## # Groups: Partic, Compar [3]## Partic Compar AOI_Type Dur_Seconds Dur_Log## <dbl> <chr> <chr> <dbl> <dbl>## 1 1 DHH 214-425 AOI_LEFT_TEXT 7.12 6.61## 2 1 DHH 214-425 AOI_RIGHT_TEXT 15.3 8.55## 3 1 DLL 508- 51 AOI_LEFT_TEXT 4.61 7.31## 4 1 DLL 508- 51 AOI_RIGHT_TEXT 3.28 6.63## 5 1 DMH 414-609 AOI_LEFT_TEXT 1.93 6.82
How much time do people spend in a text (in total) when comparing two texts?
A simple model could look like:
log(Timei)∼N(μ,σe)
🏃
First we make a dataset
Data_CJ1 <- Durations_CJ %>% group_by(Partic, Compar, AOI_Type) %>% summarize( Dur_Seconds = mean(Dur_Seconds), Dur_Log = mean(Dur_Log) )
Then we can run the model
Model1_CJ <- brm( Dur_Log ~ 1, data = Data_CJ1, family = "gaussian", backend = "cmdstanr", seed = 1975)
Model1_CJ <- readRDS(here("models", "Model1_CJ.RDS"))summary(Model1_CJ)
## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: Dur_Log ~ 1 ## Data: Data_CJ1 (Number of observations: 499) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;## total post-warmup samples = 4000## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## Intercept 7.65 0.06 7.54 7.76 1.00 3411 2340## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## sigma 1.23 0.04 1.15 1.31 1.00 3446 2663## ## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS## and Tail_ESS are effective sample size measures, and Rhat is the potential## scale reduction factor on split chains (at convergence, Rhat = 1).
Convergence is clearly ok!
plot(Model1_CJ)
Pull the 4000 samples of our parameter values from the posterior with
posterior_samples()
posterior_samples(Model1_CJ) %>% head(10)
## b_Intercept sigma Intercept lp__## 1 7.64887 1.15629 7.64887 -812.378## 2 7.63948 1.21292 7.63948 -810.850## 3 7.67498 1.22842 7.67498 -810.910## 4 7.59150 1.23610 7.59150 -811.445## 5 7.68766 1.24512 7.68766 -811.191## 6 7.59683 1.22045 7.59683 -811.293## 7 7.69482 1.24441 7.69482 -811.274## 8 7.61122 1.19426 7.61122 -811.343## 9 7.62011 1.16751 7.62011 -812.042## 10 7.61525 1.17405 7.61525 -811.850
*with the head()
function we ask to only show the first 10 rows
ggplot2
First we set a custom theme that makes our plots look 'better'
theme_set(theme_linedraw() + theme(text = element_text(family = "Times"), panel.grid = element_blank()))
ggplot2
Package tidybayes
is a wrapper including functions from ggdist
that open a lot of plotting options
posterior_samples(Model1_CJ) %>% # Get draws of the posterior select(b_Intercept) %>% # Only select our Intercept pivot_longer(everything()) %>% # Rearrange the result to plot ggplot(aes(x = value, y = name)) + # Let's plot... stat_pointinterval(.width = .89) + # 89% CI xlab("marginal posterior") # Set our x-axis label
🏃
posterior_samples(Mode1l_CJ) %>% # Get draws of the posterior select(sigma) %>% # Only select sigma pivot_longer(everything()) %>% # Rearrange the result to plot ggplot(aes(x = value, y = name)) + # Let's plot... stat_pointinterval(.width = .80) + # 80% CI xlab("marginal posterior") # Set our x-axis label
Let's use stat_interval()
from the ggdist
package
posterior_samples(Model1_CJ) %>% select(b_Intercept) %>% ggplot(aes(x = b_Intercept, y = 0)) + stat_interval( .width = c(.95,.90,.50) # The CI's to plot ) + xlab("marginal posterior") + scale_y_continuous( # No y-axis information NULL, breaks = NULL ) + scale_color_brewer()
🏃
posterior_samples(Model1_CJ) %>% select(sigma) %>% ggplot(aes(x = sigma, y = 0)) + stat_interval( .width = c(.95, .89,.80,.50) # The CI's to plot ) + xlab("marginal posterior") + scale_y_continuous( NULL, breaks = NULL ) + scale_color_brewer()
And we introduce the stat_halfeye()
function of ggdist
names <- c("mu","sigma")posterior_samples(Model1_CJ) %>% select(b_Intercept,sigma) %>% set_names(names) %>% pivot_longer(everything()) %>% ggplot(aes(x = value, y = 0)) + stat_halfeye(.width = .89, normalize = "panels") + scale_y_continuous(NULL, breaks = NULL) + xlab("marginal posterior") + facet_wrap(~name, scales = "free")
🏃
names <- c("mu","sigma")posterior_samples(Model1_CJ) %>% select(b_Intercept,sigma) %>% set_names(names) %>% pivot_longer(everything()) %>% ggplot(aes(x = value, y = 0)) + stat_halfeye( .width = c(.50, .90, .95), normalize = "panels") + scale_y_continuous(NULL, breaks = NULL) + xlab("marginal posterior") + facet_wrap(~name, scales = "free")
We can simply do calculations with our posterior samples! So, we can just use exp()
:
names <- c("mu_log_scale","mu_sec", "sigma")posterior_samples(Model1_CJ) %>% select(b_Intercept,sigma) %>% mutate( mu_sec = exp(b_Intercept) ) %>% # we do the select() again to reorder the columns # this is not super necessary, but then look out # with setting the names of the columns! select(b_Intercept, mu_sec, sigma) %>% set_names(names) %>% pivot_longer(everything()) %>% ggplot(aes(x = value, y = 0)) + stat_halfeye(.width = c(.5, .9, .95), normalize = "panels") + scale_y_continuous(NULL, breaks = NULL) + xlab("marginal posterior") + facet_wrap(~name, scales = "free")
Does the time that people spend reading a text differs for both texts if we control for the length of the text?
A model could look like:
log(Timei)∼N(μ,σei)μ=β0+β1∗Right_Texti+β2∗Text_Lengthi
load(here("data", "Data_CJ2.Rdata"))head(Data_CJ2, 6)
## # A tibble: 6 x 7## # Groups: Partic, Compar [3]## Partic Compar AOI_Type Dur_Seconds Dur_Log Right_Text Text_Length## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>## 1 1 DHH 214-425 AOI_LEFT_TEXT 7.12 6.61 0 2.72## 2 1 DHH 214-425 AOI_RIGHT_TEXT 15.3 8.55 1 2.48## 3 1 DLL 508- 51 AOI_LEFT_TEXT 4.61 7.31 0 3.11## 4 1 DLL 508- 51 AOI_RIGHT_TEXT 3.28 6.63 1 3.13## 5 1 DMH 414-609 AOI_LEFT_TEXT 1.93 6.82 0 1.78## 6 1 DMH 414-609 AOI_RIGHT_TEXT 1.53 6.33 1 3.89
🏃
brms
Model2_CJ <- brm( Dur_Log ~ 1 + Text_Length + Right_Text, data = Data_CJ2, family = "gaussian", backend = "cmdstanr", seed = 1975)
Visual checking of convergence
plot(Model2_CJ)
summary(Model2_CJ)
## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: Dur_Log ~ 1 + Text_Length + Right_Text ## Data: Data_CJ2 (Number of observations: 499) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;## total post-warmup samples = 4000## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## Intercept 6.34 0.16 6.02 6.66 1.00 4980 3109## Text_Length 0.50 0.05 0.40 0.59 1.00 4968 3327## Right_Text -0.33 0.10 -0.52 -0.14 1.00 4424 2890## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## sigma 1.11 0.04 1.04 1.18 1.00 4759 3210## ## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS## and Tail_ESS are effective sample size measures, and Rhat is the potential## scale reduction factor on split chains (at convergence, Rhat = 1).
posterior_samples(Model2_CJ) %>% select( b_Intercept, b_Text_Length, b_Right_Text ) %>% pivot_longer(everything()) %>% ggplot(aes(x = value, y = 0)) + stat_halfeye( .width = c(.5, .9, .95), normalize = "panels") + scale_y_continuous(NULL, breaks = NULL) + xlab("marginal posterior") + facet_wrap(~name, scales = "free")
mcmc_areas()
from the bayesplot
mcmc_areas( Model2_CJ, pars = c("b_Text_Length", "b_Right_Text" ), prob = 0.89)
mcmc_areas_ridges
from bayesplot
mcmc_areas_ridges( Model2_CJ, pars = c("b_Text_Length", "b_Right_Text" ), prob = 0.89)
mcmc_intervals
from bayesplot
& let's play around with the color_scheme_set()
that works with all bayesplot
functions...
color_scheme_set("red")mcmc_intervals( Model2_CJ, pars = c("b_Text_Length", "b_Right_Text"), prob = c(0.5, 0.89, 0.95))
ROPE : Region Of Practical Equivalence
Set a region of parameter values that can be considered equivalent to having no effect
in standard effect sizes the advised default is a range of -0.1 to 0.1
this equals half of a small effect size (Cohen, 1988)
all parameter values in that range are set equal to no effect
ROPE + HDI rule
95% of HDI out of ROPE → H0 rejected
95% of HDI all in ROPE → H0 not rejected
95% of HDI partially out of ROPE → undecided
brms
resultsWe can use the equivalence_test()
function of the bayestestR
package
equivalence_test(Model2_CJ)
## # Test for Practical Equivalence## ## ROPE: [-0.12 0.12]## ## Parameter | H0 | inside ROPE | 95% HDI## ----------------------------------------------------## Intercept | Rejected | 0.00 % | [ 5.99 6.63]## Text_Length | Rejected | 0.00 % | [ 0.41 0.59]## Right_Text | Rejected | 0.00 % | [-0.52 -0.14]
parameters
packageThis package has the nice flexible model_parameters()
function!
posterior_samples(Model2_CJ) %>% model_parameters(rope_range=c(-0.12,0.12))
## # Fixed Effects## ## Parameter | Median | 89% CI | pd | % in ROPE## -----------------------------------------------------------------## b_Intercept | 6.34 | [ 6.07, 6.59] | 100% | 0%## b_Text_Length | 0.50 | [ 0.42, 0.57] | 100% | 0%## b_Right_Text | -0.34 | [ -0.49, -0.17] | 99.95% | 1.43%## sigma | 1.11 | [ 1.05, 1.16] | 100% | 0%## Intercept | 7.65 | [ 7.57, 7.73] | 100% | 0%## lp__ | -761.62 | [-763.69, -760.08] | 100% | 0%
pd
stands for probability of direction
We can combine model_parameters()
with plot()
because the see
package deals with the output!
posterior_samples(Model2_CJ) %>% select( b_Right_Text, b_Text_Length ) %>% model_parameters( ) %>% plot(show_labels = TRUE, size_text = 3)
And another one ...
We can combine equivalence_test
with plot()
as well thx to the see
package!
equivalence_test(Model2_CJ) %>% plot( )
"Hey! You promised mixed effects model"
Does the time that people spend reading a text differs for both texts if we control for the length of the text?
In the previous analyses we neglected that we have multiple observations per person!
A mixed effects model could look like:
log(Timeij)∼N(μ,σeij)μ=β0+β1∗Right_Texti+β2∗Text_Lengthi+μjμj∼N(0,σuj)
Model3_CJ <- brm( Dur_Log ~ Text_Length + Right_Text + (1|Partic), data = Data_CJ2, family = "gaussian", backend = "cmdstanr", seed = 1975)
🏃
Model3_CJ <- readRDS(here("models", "Model3_CJ.RDS"))summary(Model3_CJ)
## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: Dur_Log ~ 1 + Text_Length + Right_Text + (1 | Partic) ## Data: Data_CJ2 (Number of observations: 499) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;## total post-warmup samples = 4000## ## Group-Level Effects: ## ~Partic (Number of levels: 26) ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## sd(Intercept) 0.62 0.11 0.45 0.87 1.00 1014 1218## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## Intercept 6.63 0.19 6.25 7.01 1.00 1543 2156## Text_Length 0.39 0.04 0.31 0.48 1.00 6940 2934## Right_Text -0.35 0.08 -0.52 -0.19 1.00 6448 2957## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## sigma 0.95 0.03 0.89 1.01 1.00 5550 2644## ## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS## and Tail_ESS are effective sample size measures, and Rhat is the potential## scale reduction factor on split chains (at convergence, Rhat = 1).
If we want to plot information on the variances (or sd) we have to know the names of elements
posterior_samples(Model3_CJ) %>% str()
## 'data.frame': 4000 obs. of 59 variables:## $ b_Intercept : num 6.63 6.79 6.66 6.74 6.57 ...## $ b_Text_Length : num 0.419 0.396 0.404 0.347 0.373 ...## $ b_Right_Text : num -0.309 -0.403 -0.288 -0.258 -0.286 ...## $ sd_Partic__Intercept : num 0.505 0.531 0.537 0.637 0.589 ...## $ sigma : num 0.928 0.937 0.939 0.936 0.874 ...## $ Intercept : num 7.73 7.78 7.72 7.65 7.54 ...## $ r_Partic[1,Intercept] : num -0.141 -0.673 -0.112 0.07 -0.57 ...## $ r_Partic[2,Intercept] : num 0.2899 -0.0303 0.5392 0.2669 0.5086 ...## $ r_Partic[3,Intercept] : num 0.619 0.767 0.996 1.152 0.998 ...## $ r_Partic[4,Intercept] : num -0.3145 0.0216 -0.3654 0.1749 0.0874 ...## $ r_Partic[5,Intercept] : num -0.3268 -0.0846 0.1697 -0.0618 -0.1935 ...## $ r_Partic[6,Intercept] : num 0.492 0.142 0.256 0.332 0.393 ...## $ r_Partic[7,Intercept] : num -0.617 -0.777 -0.576 -0.444 -0.248 ...## $ r_Partic[8,Intercept] : num 0.1582 -0.1739 -0.0464 0.0666 0.1937 ...## $ r_Partic[9,Intercept] : num 0.1173 -0.4262 -0.2635 -0.4233 0.0362 ...## $ r_Partic[10,Intercept]: num -0.0433 0.2303 -0.0641 0.3779 0.061 ...## $ r_Partic[11,Intercept]: num 0.0683 0.4425 0.1351 0.0419 0.4656 ...## $ r_Partic[12,Intercept]: num -0.0778 0.5016 0.1428 0.1423 0.1359 ...## $ r_Partic[13,Intercept]: num 0.533 0.211 0.403 0.8 0.465 ...## $ r_Partic[14,Intercept]: num -0.0189 0.7486 0.2406 0.8047 0.8817 ...## $ r_Partic[15,Intercept]: num 0.552 0.526 0.492 0.843 0.495 ...## $ r_Partic[16,Intercept]: num -1.06 -1.61 -1.61 -1.53 -1.62 ...## $ r_Partic[17,Intercept]: num 0.293 -0.11 -0.366 0.151 0.181 ...## $ r_Partic[18,Intercept]: num 0.358 0.562 0.7 0.415 0.357 ...## $ r_Partic[19,Intercept]: num -0.898 -0.246 -0.3 -0.456 -0.226 ...## $ r_Partic[20,Intercept]: num -0.892 -0.2 -0.697 -0.315 -0.5 ...## $ r_Partic[21,Intercept]: num 0.3688 -0.1558 0.099 0.6076 0.0982 ...## $ r_Partic[22,Intercept]: num 0.2282 -0.4344 -0.1846 -0.0366 -0.0649 ...## $ r_Partic[23,Intercept]: num 0.749 0.76 0.743 0.904 0.707 ...## $ r_Partic[24,Intercept]: num -0.792 -0.671 -0.647 -0.726 -0.701 ...## $ r_Partic[25,Intercept]: num -0.302 -0.384 -0.14 -0.312 -0.306 ...## $ r_Partic[26,Intercept]: num -0.599 -1.12 -0.663 -1.047 -1.045 ...## $ lp__ : num -726 -730 -726 -722 -725 ...## $ z_1[1,1] : num -0.28 -1.267 -0.208 0.11 -0.968 ...## $ z_1[1,2] : num 0.5736 -0.0571 1.0042 0.4188 0.8632 ...## $ z_1[1,3] : num 1.22 1.44 1.86 1.81 1.69 ...## $ z_1[1,4] : num -0.6223 0.0406 -0.6806 0.2744 0.1483 ...## $ z_1[1,5] : num -0.647 -0.159 0.316 -0.097 -0.329 ...## $ z_1[1,6] : num 0.974 0.267 0.476 0.52 0.667 ...## $ z_1[1,7] : num -1.221 -1.464 -1.072 -0.697 -0.421 ...## $ z_1[1,8] : num 0.3131 -0.3275 -0.0864 0.1046 0.3287 ...## $ z_1[1,9] : num 0.2322 -0.8027 -0.4908 -0.6641 0.0614 ...## $ z_1[1,10] : num -0.0858 0.4337 -0.1193 0.5929 0.1036 ...## $ z_1[1,11] : num 0.1351 0.8334 0.2516 0.0658 0.7903 ...## $ z_1[1,12] : num -0.154 0.945 0.266 0.223 0.231 ...## $ z_1[1,13] : num 1.055 0.398 0.751 1.255 0.788 ...## $ z_1[1,14] : num -0.0374 1.4099 0.4482 1.2623 1.4965 ...## $ z_1[1,15] : num 1.091 0.99 0.916 1.323 0.84 ...## $ z_1[1,16] : num -2.1 -3.02 -3 -2.4 -2.75 ...## $ z_1[1,17] : num 0.579 -0.208 -0.682 0.237 0.307 ...## $ z_1[1,18] : num 0.708 1.059 1.304 0.651 0.606 ...## $ z_1[1,19] : num -1.777 -0.463 -0.559 -0.715 -0.384 ...## $ z_1[1,20] : num -1.765 -0.376 -1.298 -0.494 -0.848 ...## $ z_1[1,21] : num 0.73 -0.293 0.184 0.953 0.167 ...## $ z_1[1,22] : num 0.4516 -0.818 -0.3438 -0.0574 -0.1101 ...## $ z_1[1,23] : num 1.48 1.43 1.38 1.42 1.2 ...## $ z_1[1,24] : num -1.57 -1.26 -1.21 -1.14 -1.19 ...## $ z_1[1,25] : num -0.598 -0.723 -0.261 -0.49 -0.52 ...## $ z_1[1,26] : num -1.19 -2.11 -1.23 -1.64 -1.77 ...
🏃
posterior_samples(Model3_CJ) %>% select( sigma, sd_Partic__Intercept ) %>% model_parameters( ) %>% plot(show_labels = TRUE, size_text = 3)
We can also make plots for all mu_j's (aka: the specific residuals for each participant)
posterior_samples(Model3_CJ) %>% select( starts_with("r_Partic") ) %>% pivot_longer( starts_with("r_Partic") ) %>% mutate( mu_j = value ) %>% ggplot( aes(x = mu_j, y = reorder(name, mu_j)) ) + stat_pointinterval( point_interval = mean_qi, .width = .89, size = 1/6 ) + scale_y_discrete("")
Remember! We can simply calculate with the posterior! So, we can plot the differences between our participants by using mu_j and the Intercept estimate (6.63). Sum them together and take the exp()
posterior_samples(Model3_CJ) %>% select( starts_with("r_Partic") ) %>% pivot_longer( starts_with("r_Partic") ) %>% mutate( mu_j = exp(6.63 + value) ) %>% ggplot( aes(x = mu_j, y = reorder(name, mu_j)) ) + stat_pointinterval( point_interval = mean_qi, .width = .89, size = 1/6 ) + scale_y_discrete("") + scale_x_continuous("Reading time (in seconds)")
sjPlots
library(sjPlot)tab_model(Model3_CJ)
Dur Log | ||
---|---|---|
Predictors | Estimates | CI (95%) |
Intercept | 6.63 | 6.25 – 7.01 |
Text Length | 0.39 | 0.31 – 0.48 |
Right Text | -0.34 | -0.52 – -0.19 |
Random Effects | ||
σ2 | 0.90 | |
τ00 Partic | 0.39 | |
ICC | 0.30 | |
N Partic | 26 | |
Observations | 499 | |
Marginal R2 / Conditional R2 | 0.133 / 0.405 |
tab_model(Model2_CJ, Model3_CJ)
Dur Log | Dur Log | |||
---|---|---|---|---|
Predictors | Estimates | CI (95%) | Estimates | CI (95%) |
Intercept | 6.34 | 6.02 – 6.66 | 6.63 | 6.25 – 7.01 |
Text Length | 0.50 | 0.40 – 0.59 | 0.39 | 0.31 – 0.48 |
Right Text | -0.34 | -0.52 – -0.14 | -0.34 | -0.52 – -0.19 |
Random Effects | ||||
σ2 | 0.90 | |||
τ00 | 0.39 Partic | |||
ICC | 0.30 | |||
N | 26 Partic | |||
Observations | 499 | 499 | ||
R2 Bayes | 0.186 | 0.133 / 0.405 |
Set the number of chains; the number of iterations; the number of warm-up
Model3_CJ <- brm( Dur_Log ~ Text_Length + Right_Text + (1|Partic), data = Data_CJ2, family = "gaussian", backend = "cmdstanr", iter = 3000, chains = 8, warmup = 1500, seed = 1975)
Tip: it's better to run less chains and more iterations per chain, so you get more precise parameter estimation
Increase efficiency
Model3_CJ <- brm( Dur_Log ~ Text_Length + Right_Text + (1|Partic), data = Data_CJ2, family = "gaussian", backend = "cmdstanr", iter = 3000, chains = 8, warmup = 1500, cores = 4, seed = 1975)
Modern computers have multiple cores → let them work simultaneously
Hey! It's still the same question!
Does the time that people spend reading a text differs for both texts if we control for the length of the text?
In the previous analyses we neglected that we have multiple observations per pair of texts!
🏃
Model4_CJ <- brm( Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar), data = Data_CJ2, family = "gaussian", backend = "cmdstanr", cores = 4, seed = 1975)
Model4_CJ <- readRDS(here("models", "Model4_CJ.RDS"))summary(Model4_CJ)
## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: Dur_Log ~ 1 + Text_Length + Right_Text + (1 | Partic) + (1 | Compar) ## Data: Data_CJ2 (Number of observations: 499) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;## total post-warmup samples = 4000## ## Group-Level Effects: ## ~Compar (Number of levels: 30) ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## sd(Intercept) 0.14 0.07 0.01 0.28 1.00 1179 1770## ## ~Partic (Number of levels: 26) ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## sd(Intercept) 0.63 0.11 0.45 0.88 1.00 967 1194## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## Intercept 6.63 0.19 6.23 6.99 1.00 1304 2184## Text_Length 0.39 0.04 0.31 0.48 1.00 4341 3143## Right_Text -0.35 0.09 -0.52 -0.18 1.00 4095 2984## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS## sigma 0.94 0.03 0.88 1.00 1.00 3609 2829## ## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS## and Tail_ESS are effective sample size measures, and Rhat is the potential## scale reduction factor on split chains (at convergence, Rhat = 1).
Two complementary procedures:
posterior-predictive check
compare models with leave one out cross-validation
A visual check that can be performed with pp_check()
from brms
pp_check(Model4_CJ)
∼ AIC or BIC in Frequentist statistics
ˆelpd: "expected log predictive density" (higher ˆelpd implies better model fit without being sensitive for overfitting!)
loo_Model2 <- loo(Model2_CJ)loo_Model3 <- loo(Model3_CJ)loo_Model4 <- loo(Model4_CJ)Comparison<- loo_compare(loo_Model2, loo_Model3, loo_Model4)print(Comparison, simplify = F)
## elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo looic ## Model4_CJ 0.0 0.0 -693.8 17.7 33.5 2.4 1387.7## Model3_CJ -0.4 1.5 -694.3 17.8 25.7 1.9 1388.5## Model2_CJ -67.0 11.2 -760.8 15.7 4.1 0.3 1521.7## se_looic## Model4_CJ 35.3 ## Model3_CJ 35.6 ## Model2_CJ 31.3
brms
has some default priors programmed (Uninformative priors)!
But: WAMBS (when to Worry and how to Avoid Misuse of Bayesian Statistics) (Rens van de Schoot et al. (2021))
"Ensure the prior distributions and the model or likelihood are well understood and described in detail in the text. Prior-predictive checking can help identify any prior-data conflict"
Let's first see how to get to know which priors are set by brms
in the model we defined.
get_prior()
function
get_prior( Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar), data = Data_CJ2, family = "gaussian")
## prior class coef group resp dpar nlpar bound## (flat) b ## (flat) b Right_Text ## (flat) b Text_Length ## student_t(3, 7.5, 2.5) Intercept ## student_t(3, 0, 2.5) sd ## student_t(3, 0, 2.5) sd Compar ## student_t(3, 0, 2.5) sd Intercept Compar ## student_t(3, 0, 2.5) sd Partic ## student_t(3, 0, 2.5) sd Intercept Partic ## student_t(3, 0, 2.5) sigma ## source## default## (vectorized)## (vectorized)## default## default## (vectorized)## (vectorized)## (vectorized)## (vectorized)## default
Imagine you want to visualise the prior for the Intercept
& also visualise an alternative prior: N(6,2)
library(metRology) # to use dt.scaled()# Create a vector with possible Dur_log values# and call this vector xx <- seq(1,20,by=.1) # probability density for our own prior# saved in the vector customcustom <- dnorm(x,6,2)# probability density for brms prior# saved in the vector brms_defaultbrms_default <- dt.scaled(x,3,7.5,2.5)# Plot bothdata_frame(x,custom,brms_default) %>% pivot_longer( c(custom,brms_default), names_to = "Prior") %>% ggplot( aes(x = x, y = value, lty = Prior)) + geom_line() + scale_x_continuous( "Dur_log",limits = c(1,20)) + scale_y_continuous( "probability density")
brms
We use set_prior()
to define priors and prior = ...
in the brm()
part
# Create an object with the definition of the priors you want to define manuallypriors_M4 <- c( set_prior("normal(6,2)", class = "Intercept"))Model4b_CJ <- brm( Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar), data = Data_CJ2, family = "gaussian", backend = "cmdstanr", prior = priors_M4, cores = 4, seed = 1975)
if we want to add other manually defined priors as well, we can by doing something like:
priors_Manually <- c( set_prior("normal(6,2)", class = "Intercept"), set_prior("normal(0,3)", class = "b", coef = "Right_Text"), ...)
Bürkner:
The default prior for population-level effects (including monotonic and category specific effects) is an improper flat prior over the reals. https://rdrr.io/cran/brms/man/set_prior.html
If you know that sd for Dur_Log
equals 1.22, what could be a good prior (not to informative) for the effect of:
Text_Length
Right_Text
Option 1: N(0,2) or Option 2: N(0,4)
x <- seq(-8,8,by=.1) # probability densities for bothoption1 <- dnorm(x,0,2)option2 <- dnorm(x,0,4)# Plot bothdata_frame(x, option1, option2) %>% pivot_longer( c(option1, option2), names_to = "Prior") %>% ggplot( aes(x = x, y = value, lty = Prior)) + geom_line() + scale_x_continuous( "Dur_log",limits = c(-8,8)) + scale_y_continuous( "probability density")
🏃
tab_model()
to summarize both models: this model and the model with the brms
default priorsRemember:
# Set the priorspriors_M4b <- c( set_prior("normal(0,2)", class = "b", coef = "Right_Text"), set_prior("normal(0,2)", class = "b", coef = "Text_Length"))# Estimate the modelModel4b_CJ <- brm( Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar), data = Data_CJ2, family = "gaussian", prior = priors_M4b, backend = "cmdstanr", cores = 4, seed = 1975)# Summarize both models side by sidetab_model(Model4_CJ, Model4b_CJ)
Dur Log | Dur Log | |||
---|---|---|---|---|
Predictors | Estimates | CI (95%) | Estimates | CI (95%) |
Intercept | 6.63 | 6.23 – 6.99 | 6.64 | 6.25 – 7.01 |
Text Length | 0.39 | 0.31 – 0.48 | 0.39 | 0.30 – 0.48 |
Right Text | -0.35 | -0.52 – -0.18 | -0.35 | -0.51 – -0.18 |
Random Effects | ||||
σ2 | 0.88 | 0.88 | ||
τ00 | 0.02 Compar | 0.02 Compar | ||
0.39 Partic | 0.39 Partic | |||
ICC | 0.32 | 0.32 | ||
N | 26 Partic | 26 Partic | ||
30 Compar | 30 Compar | |||
Observations | 499 | 499 | ||
Marginal R2 / Conditional R2 | 0.133 / 0.418 | 0.132 / 0.417 |
Now that you're ready to set your steps in Bayesian Mixed Effects Models there is a high probability that this will happen sometime:
This is a great source to learn more about Stan
; brms
& Hamiltonian MCMC
:
https://discourse.mc-stan.org/t/divergent-transitions-a-primer/17099
brms
?brms
is more versatile than lme4
allowing also to estimate
multivariate models
finite mixture models
generalized mixed effects models
splines
...
brms
as stepping-stone to learn Stan
https://www.bayesrulesbook.com/
https://vasishth.github.io/bayescogsci/book/
In Nature Reviews
If you like running - like I do - this could be a great companion on your run!
There are many blogs and websites that you can consult if you want to find out more about making graphs.
One that I often fall back to is:
Do not hesitate to contact me!
sven.demaeyer@uantwerpen.be
University of Antwerp; Faculty of Social Sciences; dept. Training and Education Sciences
phd on methodological issues in research on School Effectiveness
research on a broad range of topics (e.g. psychometrics, writing research, learning strategies, education for sustainable development, linguistics, speech development of impaired children, ...)
my main focus shifted recently to Comparative Judgement and Learning from comparisons
since a year or 2 discovering Bayesian Statistics & brms
Blog: https://svendemaeyer.netlify.app/
Twitter: @svawa
github: https://github.com/Sdemaeyer2
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |