+ - 0:00:00
Notes for current slide
Notes for next slide

Workshop: Bayesian analyses in R

Sven De Maeyer

[University of Antwerp / Faculty of Social Sciences]

13/09/2021

1 / 133

About me

  • University of Antwerp; Faculty of Social Sciences; dept. Training and Education Sciences

  • phd on methodological issues in research on School Effectiveness

  • research on a broad range of topics (e.g. psychometrics, writing research, learning strategies, education for sustainable development, linguistics, speech development of impaired children, ...)

  • my main focus shifted recently to Comparative Judgement and Learning from comparisons

  • since a year or 2 discovering Bayesian Statistics & brms




Blog: https://svendemaeyer.netlify.app/

Twitter: @svawa

github: https://github.com/Sdemaeyer2

2 / 133

What about you?

Who has some experience with Bayesian analyses?

... with Stan?

... with brms?




3 / 133

Topics

  1. Aims of the workshop & some practical stuff (let's go there)

  2. What's Bayesian? (let's go there)

  3. Why brms? (let's go there)

  4. brms basic example (let's go there)

  5. Let's get practical: the example data (let's go there)

  6. Let's slightly increase complexity (let's go there)

4 / 133

1. Aims & practical stuff

get me back to the topic slide

5 / 133

My aims are ...



  • to give some background on Bayesian statistics without getting too technical (I hope)
  • to get you familiar with a basic workflow for a Bayesian analysis making use of brms and other great packages
  • to let you apply what's learned on your data (so it's hands-on)
  • to supply you with R code that may inspire
  • to share some great sources
6 / 133

All material is shared...


On Github there is a dedicated repository where you'll find:

  • the Rmarkdown file for the slides

  • saved models from the slides (some take too long to estimate 'live')

  • data used for the exercises


The Github repository can be accessed using the following link:

[https://github.com/Sdemaeyer2/Turku_brms_2021]


The slides are on my Talks site [https://slides-sdemaeyer.netlify.app/]

7 / 133

Zoom


Recordings


Please ask questions and interfere! Just give a shout!


Stuck with some exercises?

Share your screen and I will try to help

8 / 133

Your own data

Day 2:

  • some dedicated time to apply stuff to your own data
  • be prepared:
    • think about some (simple) models (and possible priors...)
    • prepare your data to avoid a lot of data issues on day 2
    • for the brave: try some modelling at home before day 2
9 / 133

Prerequisites

I will often use the functional programming style making use of the pipe operator (works if tidyverse is loaded):

A %>% B

This reads as take A and then do B

An example:

c(1,2,3,4,5) %>% mean()

reads as create a vector of the values 1, 2, 3, 4 and 5 and then calculate the mean

Want to learn dplyr by nice GIFS: hop to this great tweet feed

10 / 133

What's Bayesian inference about?

get me back to the topic slide

11 / 133

First things first...



What is statistical inference about?



In your definition, what is a statistical model?



What are parameters?

12 / 133

Frequentist vs. Bayesian inference

13 / 133

To get us started...

  • prof. dr. Von Helsing (a famous Vampire Hunter)

  • How many Vampires are out there?

  • Time for his famous Vampire Test

  • Sample of 200 citizens

  • Result: 10.5% = Vampire

  • Ok, but what about the population?

14 / 133

What a Frequentist approach can tell us

Make a Confidence Interval for the 10.5%:

  • Estimate the sampling error:

    • sampling error: 0.0217
  • Calculate upper and lower limit of the 95% CI:

    • lower: 0.0625
    • upper: 0.1475
  • Inference:

In 95% of the samples (of sample size 200) we would get a percentage of vampires varying somewhere between 6.25% and 14.75%

15 / 133

The Bayesian approach

Before Von Helsing gathers data he had no knowledge at all!

If we visualize this 'prior knowledge' it could look like this.

16 / 133

The Bayesian approach

Von Helsing starts testing ...

First tested person is NOT a Vampire!

So, 100% of Vampires is already less probable.

Time to adjust our knowledge

17 / 133

The Bayesian approach

After testing the whole sample of 200 citizens, Von Helsing's knowledge is updated:

18 / 133

The Bayesian approach

We get a probability for each possible percentage of vampires.

  • plot it in a probability density plot

  • summarize: e.g. 90% most probable values are situated between 7.5% and 14.8%

19 / 133

Advantages & Disadvantages of Bayesian analyses


Advantages:

  • Natural approach to express uncertainty
  • Ability to incorporate prior knowledge
  • Increased model flexibility
  • Full posterior distribution of the parameters
  • Natural propagation of uncertainty


Disadvantage:

  • Slow speed of model estimation
  • Some reviewers don't understand you ("give me the p-value")

[*] Slight adaptation of a slide from Paul Bürkner's presentation available on YouTube
[https://www.youtube.com/watch?v=FRs1iribZME]

20 / 133

Bayesian Theorem

P(θ|data)=P(data|θ)P(θ)P(data)

with

  • P(data|θ) : the likelihood of the data given our model θ
  • P(θ) : our prior belief about model parameters
  • P(θ|data): the posterior probability about model parameters

meme from https://twitter.com/ChelseaParlett/status/1421291716229746689?s=20

21 / 133

Likelihood

P(data|θ)

sometimes also written as

L(θ|data)


Actually this is our model part


with θ being the model and all parameters in the model

22 / 133

Likelihood

Example:

What if we want to model how fast people can run a 1OK?

Data = 10 observations (Running times for 10K in minutes)

## [1] 52 54 58 48 41 49 72 53 64 62

The model (normal distribution):

yiN(μ,σ)

The parameter values that maximize the likelihood of our data:

Mean_RT <- mean(RT, na.rm = T)
Mean_RT
## [1] 55.3
Sd_RT <- sd(RT, na.rm = T)
Sd_RT
## [1] 8.957306
23 / 133

Prior

Expression of our prior knowledge (belief) about probable parameter values as a probability density function



"For Bayesians, the data are treated as fixed and the parameters vary. [...] Bayes' rule tells us that to calculate the posterior probability distribution we must combine a likelihood with a prior probability distribution over parameter values."
(Lambert, 2018, p.88)

24 / 133

Prior

Example:

What if we want to model how fast people can run a 1OK?

yiN(μ,σ)

Express our prior beliefs about

μ

and

σ

🤔 Please share your thoughts? What are possible values of the population average and standard deviation for our example?

25 / 133

Prior

Uninformative / Vague

When objectivity is crucial and you want let the data speak for itself...

Informative

When including significant information is crucial

  • previously collected data
  • results from former research/analyses
  • data of another source
  • theoretical considerations
26 / 133

Prior

Example:

What if we want to model how fast people can run a 1OK?

Vague priors for μ

27 / 133

Say we have following priors

par(mfrow=c(2,2))
curve( dnorm( x , 50 , 20 ) , from=1 , to=200 ,xlab="mu", main="Prior for mu")
curve( dunif( x , 1 , 40 ) , from=-10 , to=50 ,xlab="sigma", main="Prior for sigma")

28 / 133

Calculate the Posterior by hand (aka grid approximation)

# sample some values for mu and sigma
mu.list <- seq(from = 30,
to = 90,
length.out=200)
sigma.list <- seq(from = 2,
to = 30,
length.out = 200)
post <- expand.grid(mu = mu.list,
sigma = sigma.list)
# Calculate the loglikelihood of the data
# for each parameter value
post$LL <-
sapply(1:nrow(post),
function(i) sum(
dnorm(
RT ,
mean=post$mu[i] ,
sd=post$sigma[i] ,
log=TRUE ) ) )
# Calculate posterior as product
# of LL and Prior
# but you see a '+ sign'
# because we put everything on the
# log scale
# (to avoid getting zero's
# due to rounding in R)
post$prod <- post$LL +
dnorm(post$mu, 50 , 10 , TRUE) +
dunif(post$sigma , 1 , 40 , TRUE)
# Re-scale the posterior
# to the probability scale
post$prob <- exp(post$prod -
max(post$prod)
)
29 / 133

Visualise the posterior distribution

Make a contour plot

# Sampling going on
# So for reproducibility
set.seed(1975)
post %>%
# sample 10000 rows
# with replacement
# higher prob higher prob to
# be sampled
sample_n(size = 10000,
replace = TRUE,
weight = prob) %>%
# create the plot
ggplot(aes(x = mu, y = sigma)) +
geom_density_2d_filled() +
theme_minimal()

30 / 133

Visualise the posterior distributions

Or sample from posterior and plot simple density plots

sample.rows <- sample(1:nrow(post), size=10000, replace=TRUE, prob=post$prob)
sample.mu <- post$mu[sample.rows]
sample.sigma <- post$sigma[sample.rows]
plot(density(sample.mu),main="mu")

plot(density(sample.sigma),main="sigma")

31 / 133

Imagine

A 'simple' linear model


RTiN(μ,σei)μ=β0+β1Weigthi+β2Heighti+β3Agei+β4WeeklyTrainingHoursi+β5Genderi


So you can get a REALLY LARGE number of parameters!

33 / 133

Markov Chain Monte Carlo - Why?

Complex models Large number of parameters exponentional number of combinations!


Posterior gets unsolvable by grid approximation


We will approximate the 'joint posterior' by 'smart' sampling


Samples of combinations of parametervalues are drawn


BUT: samples will not be random!

34 / 133

MCMC - demonstrated

Following link brings you to an interactive tool that let's you get the basic idea behind MCMC sampling:

https://chi-feng.github.io/mcmc-demo/app.html#HamiltonianMC,standard

35 / 133

Side-step: conjugate priors

Some prior probability distributions can be 'more easely' combined with a certain likelihood!

These are called conjugate priors

PRO:

You can actually analytically calculate the posterior. So: no need for all the sampling and approximation of the posterior.

A list of conjugate likelihood-prior pairs: https://en.wikipedia.org/wiki/Conjugate_prior

BUT:

"Simplicity comes with a cost! In most real-life examples of inference, the contraint of choosing likelihood-prior conjugate pairs is too restrictive and can lead us to use models that inadequately capture the variability in the data. (Lambert, 2018, p. 209)"

36 / 133

Software


  • different dedicated software/packages are available: JAGS / BUGS / Stan


  • most powerful is Stan! Specifically the Hamiltonian Monte Carlo algorithm makes it the best choice at the moment


  • Stan is a probabilistic programming language that uses C++
37 / 133

Example of Stan code

## // generated with brms 2.14.4
## functions {
## }
## data {
## int<lower=1> N; // total number of observations
## vector[N] Y; // response variable
## int prior_only; // should the likelihood be ignored?
## }
## transformed data {
## }
## parameters {
## real Intercept; // temporary intercept for centered predictors
## real<lower=0> sigma; // residual SD
## }
## transformed parameters {
## }
## model {
## // likelihood including all constants
## if (!prior_only) {
## // initialize linear predictor term
## vector[N] mu = Intercept + rep_vector(0.0, N);
## target += normal_lpdf(Y | mu, sigma);
## }
## // priors including all constants
## target += normal_lpdf(Intercept | 500,50);
## target += student_t_lpdf(sigma | 3, 0, 68.8)
## - 1 * student_t_lccdf(0 | 3, 0, 68.8);
## }
## generated quantities {
## // actual population-level intercept
## real b_Intercept = Intercept;
## }
38 / 133

alt text

39 / 133

4. brms basic example

get me back to the topic slide

40 / 133

brms syntax

Very very similar to lme4 and in line with typical R-style writing up of a model ...

lme4

Model <- lmer(
y ~ x1 + x2 + (1|Group),
data = Data,
...
)

brms

Model <- brm(
y ~ x1 + x2 + (1|Group),
data = Data,
family = "gaussian",
backend = "cmdstanr"
...
)

Notice:

  • family = "gaussian" indicates the "likelihood" function we will use
  • backend = "cmdstanr" indicates the way we want to interact with Stan and C++
41 / 133

Let's retake the example on running

Example:

What if we want to model how fast people can run a 1OK?

The simplest model looked like:

RTiN(μ,σe)

In brms this model is:

RT <- c(
52, 54, 58, 48, 41, 49, 72, 53, 64, 62
)
# First make a dataset from our RT vector
DataRT <- data_frame(RT)
Mod_RT1 <- brm(
RT ~ 1, # We only model an intercept
data = DataRT,
backend = "cmdstanr",
seed = 1975
)

🏃 Try it yourself and run the model ... (don't forget to load the necessary packages: brms & tidyverse)

42 / 133

43 / 133

But ...

44 / 133

So

For the workshop all models are in the models folder

You can normally read model information with readRDS

Here's an example:

Mod_RT1 <- readRDS(here("models", "Mod_RT1.RDS"))

This model information was saved with the saveRDS function:

saveRDS(Mod_RT1, here("models", "Mod_RT1.RDS"))

Notice that I use the package here allowing me to access files anywhere in the RStudio Projects folder using the here( ) commando (no fuzz with paths on your computer)

45 / 133

MCMC = sampling

MCMC results will differ each time run


Of course, not a lot if your model is good!


So, what if you want to be reproducible?

Mod_RT1 <- brm(
RT ~ 1, # We only model an intercept
data = DataRT,
backend = "cmdstanr",
seed = 1975 # make sure we get the same results each time :-)
)
46 / 133

Good old summary( ) function

summary(Mod_RT1)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: RT ~ 1
## Data: DataRT (Number of observations: 10)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 55.10 2.82 49.68 60.90 1.00 2000 2180
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 9.47 2.32 6.17 15.08 1.00 2445 2318
##
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
47 / 133

Samples & chains?

## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: RT ~ 1
## Data: DataRT (Number of observations: 10)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 55.10 2.82 49.68 60.90 1.00 2000 2180
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 9.47 2.32 6.17 15.08 1.00 2445 2318
##
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

By default brms sets 4 chains of 2000 iterations of which 1000 iterations/chain are warm-up

48 / 133

Samples & chains: what are 'burn-in' iterations

49 / 133

Samples & chains: what are 'burn-in' iterations

50 / 133

Samples & chains: what are 'burn-in' iterations

51 / 133

Good ald plot( ) function

plot(Mod_RT1)

Left panel

the posterior distributions (later we will see other more informative ways to plot the information in the posterior)

Right panel

the convergence of the each parameter

should look like a caterpillar

52 / 133

Model Convergence

  • R^ < 1.015 for each parameter estimate

  • at least 4 chains are recommended

  • Effective Sample Size (ESS) > 400 to rely on R^

53 / 133

Let's inspect the output again

## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: RT ~ 1
## Data: DataRT (Number of observations: 10)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 55.10 2.82 49.68 60.90 1.00 2000 2180
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 9.47 2.32 6.17 15.08 1.00 2445 2318
##
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
54 / 133

5. Let's get practical: the example dataset

get me back to the topic slide

55 / 133

Judging 'argumentative texts'

alt text

56 / 133

Procedure

  • 26 high school teachers (Dutch) voluntary participated

  • each did 10 comparisons of 2 argumentative texts from 10th graders

  • 3 batches of comparisons with random allocation of judges to one of the batches

  • all batches similar composition of comparisons regarding the characteristics of the pairs; the pairs, however not the same

  • Tobii TX300 dark pupil eye-tracker with a 23-inch TFT monitor (max. resolution of 1920 x 1080 pixels)

  • data sampled binocularly at 300 Hz

57 / 133

AOI's

alt text

58 / 133

The data

59 / 133

6. Let's slightly increase complexity

get me back to the topic slide

60 / 133

Question 1

How much time do people spend in a text (in total) when comparing two texts?

A simple model could look like:

log(Timei)N(μ,σe)

with Timei being the total time for each combination of participant, comparison and AOI_type

Durations_CJ %>%
group_by(Partic, Compar, AOI_Type) %>%
summarize(
Dur_Seconds = mean(Dur_Seconds),
Dur_Log = mean(Dur_Log)
) %>% head(5)
## # A tibble: 5 x 5
## # Groups: Partic, Compar [3]
## Partic Compar AOI_Type Dur_Seconds Dur_Log
## <dbl> <chr> <chr> <dbl> <dbl>
## 1 1 DHH 214-425 AOI_LEFT_TEXT 7.12 6.61
## 2 1 DHH 214-425 AOI_RIGHT_TEXT 15.3 8.55
## 3 1 DLL 508- 51 AOI_LEFT_TEXT 4.61 7.31
## 4 1 DLL 508- 51 AOI_RIGHT_TEXT 3.28 6.63
## 5 1 DMH 414-609 AOI_LEFT_TEXT 1.93 6.82
61 / 133

Question 1

How much time do people spend in a text (in total) when comparing two texts?

A simple model could look like:

log(Timei)N(μ,σe)


🏃

  • Try it yourself and run the model
  • Inspect convergence
  • Interpret the results
62 / 133

Question 1

First we make a dataset

Data_CJ1 <- Durations_CJ %>%
group_by(Partic, Compar, AOI_Type) %>%
summarize(
Dur_Seconds = mean(Dur_Seconds),
Dur_Log = mean(Dur_Log)
)

Then we can run the model

Model1_CJ <- brm(
Dur_Log ~ 1,
data = Data_CJ1,
family = "gaussian",
backend = "cmdstanr",
seed = 1975
)
63 / 133

Question 1

Model1_CJ <- readRDS(here("models", "Model1_CJ.RDS"))
summary(Model1_CJ)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Dur_Log ~ 1
## Data: Data_CJ1 (Number of observations: 499)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 7.65 0.06 7.54 7.76 1.00 3411 2340
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 1.23 0.04 1.15 1.31 1.00 3446 2663
##
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
64 / 133

Convergence is clearly ok!

Question 1

plot(Model1_CJ)

65 / 133

Further exploration of the posterior

Pull the 4000 samples of our parameter values from the posterior with

posterior_samples()

posterior_samples(Model1_CJ) %>%
head(10)
## b_Intercept sigma Intercept lp__
## 1 7.64887 1.15629 7.64887 -812.378
## 2 7.63948 1.21292 7.63948 -810.850
## 3 7.67498 1.22842 7.67498 -810.910
## 4 7.59150 1.23610 7.59150 -811.445
## 5 7.68766 1.24512 7.68766 -811.191
## 6 7.59683 1.22045 7.59683 -811.293
## 7 7.69482 1.24441 7.69482 -811.274
## 8 7.61122 1.19426 7.61122 -811.343
## 9 7.62011 1.16751 7.62011 -812.042
## 10 7.61525 1.17405 7.61525 -811.850

*with the head() function we ask to only show the first 10 rows

66 / 133

Time to unpack the plotting power of ggplot2

First we set a custom theme that makes our plots look 'better'

theme_set(theme_linedraw() +
theme(text = element_text(family = "Times"),
panel.grid = element_blank()))
67 / 133

Time to unpack the plotting power of ggplot2

Package tidybayes is a wrapper including functions from ggdist that open a lot of plotting options

posterior_samples(Model1_CJ) %>% # Get draws of the posterior
select(b_Intercept) %>% # Only select our Intercept
pivot_longer(everything()) %>% # Rearrange the result to plot
ggplot(aes(x = value, y = name)) + # Let's plot...
stat_pointinterval(.width = .89) + # 89% CI
xlab("marginal posterior") # Set our x-axis label

68 / 133

Your turn

🏃

  • Make a similar plot for the estimate of the standard deviation
  • With a CI of 80%
  • Interpret the results
69 / 133

Solution...

posterior_samples(Mode1l_CJ) %>% # Get draws of the posterior
select(sigma) %>% # Only select sigma
pivot_longer(everything()) %>% # Rearrange the result to plot
ggplot(aes(x = value, y = name)) + # Let's plot...
stat_pointinterval(.width = .80) + # 80% CI
xlab("marginal posterior") # Set our x-axis label

70 / 133

Another type of plot with some more information in it

Let's use stat_interval() from the ggdist package

posterior_samples(Model1_CJ) %>%
select(b_Intercept) %>%
ggplot(aes(x = b_Intercept, y = 0)) +
stat_interval(
.width = c(.95,.90,.50) # The CI's to plot
) +
xlab("marginal posterior") +
scale_y_continuous( # No y-axis information
NULL,
breaks = NULL
) +
scale_color_brewer()

71 / 133

Your turn

🏃

  • Make a similar plot for the estimate of the standard deviation
  • With a CI of 50%, 80%, 89%, 95%
72 / 133

Solution

posterior_samples(Model1_CJ) %>%
select(sigma) %>%
ggplot(aes(x = sigma, y = 0)) +
stat_interval(
.width = c(.95, .89,.80,.50) # The CI's to plot
) +
xlab("marginal posterior") +
scale_y_continuous(
NULL,
breaks = NULL
) +
scale_color_brewer()

73 / 133

We can also integrate plots for mu and sigma

And we introduce the stat_halfeye() function of ggdist

names <- c("mu","sigma")
posterior_samples(Model1_CJ) %>%
select(b_Intercept,sigma) %>%
set_names(names) %>%
pivot_longer(everything()) %>%
ggplot(aes(x = value, y = 0)) +
stat_halfeye(.width = .89, normalize = "panels") +
scale_y_continuous(NULL, breaks = NULL) +
xlab("marginal posterior") +
facet_wrap(~name, scales = "free")

74 / 133

Your turn

🏃

  • Try to replicate the graph
  • But with other CI's (e.g. for 50%, 90% and 95%)
75 / 133

Solution

names <- c("mu","sigma")
posterior_samples(Model1_CJ) %>%
select(b_Intercept,sigma) %>%
set_names(names) %>%
pivot_longer(everything()) %>%
ggplot(aes(x = value, y = 0)) +
stat_halfeye(
.width = c(.50, .90, .95), normalize = "panels") +
scale_y_continuous(NULL, breaks = NULL) +
xlab("marginal posterior") +
facet_wrap(~name, scales = "free")

76 / 133

How to get rid of the log?

We can simply do calculations with our posterior samples! So, we can just use exp():

names <- c("mu_log_scale","mu_sec", "sigma")
posterior_samples(Model1_CJ) %>%
select(b_Intercept,sigma) %>%
mutate(
mu_sec = exp(b_Intercept)
) %>%
# we do the select() again to reorder the columns
# this is not super necessary, but then look out
# with setting the names of the columns!
select(b_Intercept, mu_sec, sigma) %>%
set_names(names) %>%
pivot_longer(everything()) %>%
ggplot(aes(x = value, y = 0)) +
stat_halfeye(.width = c(.5, .9, .95), normalize = "panels") +
scale_y_continuous(NULL, breaks = NULL) +
xlab("marginal posterior") +
facet_wrap(~name, scales = "free")

77 / 133

Question 2

Does the time that people spend reading a text differs for both texts if we control for the length of the text?

A model could look like:

log(Timei)N(μ,σei)μ=β0+β1Right_Texti+β2Text_Lengthi

load(here("data", "Data_CJ2.Rdata"))
head(Data_CJ2, 6)
## # A tibble: 6 x 7
## # Groups: Partic, Compar [3]
## Partic Compar AOI_Type Dur_Seconds Dur_Log Right_Text Text_Length
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 DHH 214-425 AOI_LEFT_TEXT 7.12 6.61 0 2.72
## 2 1 DHH 214-425 AOI_RIGHT_TEXT 15.3 8.55 1 2.48
## 3 1 DLL 508- 51 AOI_LEFT_TEXT 4.61 7.31 0 3.11
## 4 1 DLL 508- 51 AOI_RIGHT_TEXT 3.28 6.63 1 3.13
## 5 1 DMH 414-609 AOI_LEFT_TEXT 1.93 6.82 0 1.78
## 6 1 DMH 414-609 AOI_RIGHT_TEXT 1.53 6.33 1 3.89
78 / 133

Your turn

🏃

  • Try to run the model with brms
  • Check model convergence
  • Make one plot to visualise the posteriors for the regression parameters
79 / 133

Solution

Model2_CJ <- brm(
Dur_Log ~ 1 + Text_Length + Right_Text,
data = Data_CJ2,
family = "gaussian",
backend = "cmdstanr",
seed = 1975
)
80 / 133

Solution

Visual checking of convergence

plot(Model2_CJ)

81 / 133

Solution: check output

summary(Model2_CJ)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Dur_Log ~ 1 + Text_Length + Right_Text
## Data: Data_CJ2 (Number of observations: 499)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 6.34 0.16 6.02 6.66 1.00 4980 3109
## Text_Length 0.50 0.05 0.40 0.59 1.00 4968 3327
## Right_Text -0.33 0.10 -0.52 -0.14 1.00 4424 2890
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 1.11 0.04 1.04 1.18 1.00 4759 3210
##
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
82 / 133

Solution: create plot on posteriors for beta's

posterior_samples(Model2_CJ) %>%
select(
b_Intercept,
b_Text_Length,
b_Right_Text
) %>%
pivot_longer(everything()) %>%
ggplot(aes(x = value, y = 0)) +
stat_halfeye(
.width = c(.5, .9, .95),
normalize = "panels") +
scale_y_continuous(NULL, breaks = NULL) +
xlab("marginal posterior") +
facet_wrap(~name, scales = "free")

83 / 133

Let's introduce mcmc_areas() from the bayesplot

mcmc_areas(
Model2_CJ,
pars = c("b_Text_Length",
"b_Right_Text"
),
prob = 0.89
)

84 / 133

or mcmc_areas_ridges from bayesplot

mcmc_areas_ridges(
Model2_CJ,
pars = c("b_Text_Length",
"b_Right_Text"
),
prob = 0.89
)

85 / 133

or mcmc_intervals from bayesplot

& let's play around with the color_scheme_set() that works with all bayesplot functions...

color_scheme_set("red")
mcmc_intervals(
Model2_CJ,
pars = c("b_Text_Length",
"b_Right_Text"),
prob = c(0.5, 0.89, 0.95)
)

86 / 133

CI's (ETI's) vs. HDI's

87 / 133

Alternative for p-values?

88 / 133

Alternative for p-values?

ROPE : Region Of Practical Equivalence

Set a region of parameter values that can be considered equivalent to having no effect

  • in standard effect sizes the advised default is a range of -0.1 to 0.1

  • this equals half of a small effect size (Cohen, 1988)

  • all parameter values in that range are set equal to no effect

89 / 133

Alternative for p-values?

ROPE + HDI rule

  • 95% of HDI out of ROPE H0 rejected

  • 95% of HDI all in ROPE H0 not rejected

  • 95% of HDI partially out of ROPE undecided

90 / 133

HDI + ROPE with brms results

We can use the equivalence_test() function of the bayestestR package

equivalence_test(Model2_CJ)
## # Test for Practical Equivalence
##
## ROPE: [-0.12 0.12]
##
## Parameter | H0 | inside ROPE | 95% HDI
## ----------------------------------------------------
## Intercept | Rejected | 0.00 % | [ 5.99 6.63]
## Text_Length | Rejected | 0.00 % | [ 0.41 0.59]
## Right_Text | Rejected | 0.00 % | [-0.52 -0.14]
91 / 133

Or the parameters package

This package has the nice flexible model_parameters() function!

posterior_samples(Model2_CJ) %>%
model_parameters(rope_range=c(-0.12,0.12))
## # Fixed Effects
##
## Parameter | Median | 89% CI | pd | % in ROPE
## -----------------------------------------------------------------
## b_Intercept | 6.34 | [ 6.07, 6.59] | 100% | 0%
## b_Text_Length | 0.50 | [ 0.42, 0.57] | 100% | 0%
## b_Right_Text | -0.34 | [ -0.49, -0.17] | 99.95% | 1.43%
## sigma | 1.11 | [ 1.05, 1.16] | 100% | 0%
## Intercept | 7.65 | [ 7.57, 7.73] | 100% | 0%
## lp__ | -761.62 | [-763.69, -760.08] | 100% | 0%

pd stands for probability of direction

92 / 133

Now it's getting more fun

We can combine model_parameters() with plot() because the see package deals with the output!

posterior_samples(Model2_CJ) %>%
select(
b_Right_Text,
b_Text_Length
) %>%
model_parameters(
) %>%
plot(show_labels = TRUE,
size_text = 3)

93 / 133

Now it's getting more fun

And another one ...

We can combine equivalence_test with plot() as well thx to the see package!

equivalence_test(Model2_CJ) %>%
plot( )

94 / 133

"Hey! You promised mixed effects model"

Question 3

Does the time that people spend reading a text differs for both texts if we control for the length of the text?

In the previous analyses we neglected that we have multiple observations per person!

A mixed effects model could look like:

log(Timeij)N(μ,σeij)μ=β0+β1Right_Texti+β2Text_Lengthi+μjμjN(0,σuj)

95 / 133

Adding random term to the model

Model3_CJ <- brm(
Dur_Log ~ Text_Length + Right_Text + (1|Partic),
data = Data_CJ2,
family = "gaussian",
backend = "cmdstanr",
seed = 1975
)

🏃

  • Estimate the model
  • Check the output
  • Interpret the results
96 / 133

Solution

Model3_CJ <- readRDS(here("models", "Model3_CJ.RDS"))
summary(Model3_CJ)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Dur_Log ~ 1 + Text_Length + Right_Text + (1 | Partic)
## Data: Data_CJ2 (Number of observations: 499)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~Partic (Number of levels: 26)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.62 0.11 0.45 0.87 1.00 1014 1218
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 6.63 0.19 6.25 7.01 1.00 1543 2156
## Text_Length 0.39 0.04 0.31 0.48 1.00 6940 2934
## Right_Text -0.35 0.08 -0.52 -0.19 1.00 6448 2957
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.95 0.03 0.89 1.01 1.00 5550 2644
##
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
97 / 133

What do we get?

If we want to plot information on the variances (or sd) we have to know the names of elements

posterior_samples(Model3_CJ) %>%
str()
## 'data.frame': 4000 obs. of 59 variables:
## $ b_Intercept : num 6.63 6.79 6.66 6.74 6.57 ...
## $ b_Text_Length : num 0.419 0.396 0.404 0.347 0.373 ...
## $ b_Right_Text : num -0.309 -0.403 -0.288 -0.258 -0.286 ...
## $ sd_Partic__Intercept : num 0.505 0.531 0.537 0.637 0.589 ...
## $ sigma : num 0.928 0.937 0.939 0.936 0.874 ...
## $ Intercept : num 7.73 7.78 7.72 7.65 7.54 ...
## $ r_Partic[1,Intercept] : num -0.141 -0.673 -0.112 0.07 -0.57 ...
## $ r_Partic[2,Intercept] : num 0.2899 -0.0303 0.5392 0.2669 0.5086 ...
## $ r_Partic[3,Intercept] : num 0.619 0.767 0.996 1.152 0.998 ...
## $ r_Partic[4,Intercept] : num -0.3145 0.0216 -0.3654 0.1749 0.0874 ...
## $ r_Partic[5,Intercept] : num -0.3268 -0.0846 0.1697 -0.0618 -0.1935 ...
## $ r_Partic[6,Intercept] : num 0.492 0.142 0.256 0.332 0.393 ...
## $ r_Partic[7,Intercept] : num -0.617 -0.777 -0.576 -0.444 -0.248 ...
## $ r_Partic[8,Intercept] : num 0.1582 -0.1739 -0.0464 0.0666 0.1937 ...
## $ r_Partic[9,Intercept] : num 0.1173 -0.4262 -0.2635 -0.4233 0.0362 ...
## $ r_Partic[10,Intercept]: num -0.0433 0.2303 -0.0641 0.3779 0.061 ...
## $ r_Partic[11,Intercept]: num 0.0683 0.4425 0.1351 0.0419 0.4656 ...
## $ r_Partic[12,Intercept]: num -0.0778 0.5016 0.1428 0.1423 0.1359 ...
## $ r_Partic[13,Intercept]: num 0.533 0.211 0.403 0.8 0.465 ...
## $ r_Partic[14,Intercept]: num -0.0189 0.7486 0.2406 0.8047 0.8817 ...
## $ r_Partic[15,Intercept]: num 0.552 0.526 0.492 0.843 0.495 ...
## $ r_Partic[16,Intercept]: num -1.06 -1.61 -1.61 -1.53 -1.62 ...
## $ r_Partic[17,Intercept]: num 0.293 -0.11 -0.366 0.151 0.181 ...
## $ r_Partic[18,Intercept]: num 0.358 0.562 0.7 0.415 0.357 ...
## $ r_Partic[19,Intercept]: num -0.898 -0.246 -0.3 -0.456 -0.226 ...
## $ r_Partic[20,Intercept]: num -0.892 -0.2 -0.697 -0.315 -0.5 ...
## $ r_Partic[21,Intercept]: num 0.3688 -0.1558 0.099 0.6076 0.0982 ...
## $ r_Partic[22,Intercept]: num 0.2282 -0.4344 -0.1846 -0.0366 -0.0649 ...
## $ r_Partic[23,Intercept]: num 0.749 0.76 0.743 0.904 0.707 ...
## $ r_Partic[24,Intercept]: num -0.792 -0.671 -0.647 -0.726 -0.701 ...
## $ r_Partic[25,Intercept]: num -0.302 -0.384 -0.14 -0.312 -0.306 ...
## $ r_Partic[26,Intercept]: num -0.599 -1.12 -0.663 -1.047 -1.045 ...
## $ lp__ : num -726 -730 -726 -722 -725 ...
## $ z_1[1,1] : num -0.28 -1.267 -0.208 0.11 -0.968 ...
## $ z_1[1,2] : num 0.5736 -0.0571 1.0042 0.4188 0.8632 ...
## $ z_1[1,3] : num 1.22 1.44 1.86 1.81 1.69 ...
## $ z_1[1,4] : num -0.6223 0.0406 -0.6806 0.2744 0.1483 ...
## $ z_1[1,5] : num -0.647 -0.159 0.316 -0.097 -0.329 ...
## $ z_1[1,6] : num 0.974 0.267 0.476 0.52 0.667 ...
## $ z_1[1,7] : num -1.221 -1.464 -1.072 -0.697 -0.421 ...
## $ z_1[1,8] : num 0.3131 -0.3275 -0.0864 0.1046 0.3287 ...
## $ z_1[1,9] : num 0.2322 -0.8027 -0.4908 -0.6641 0.0614 ...
## $ z_1[1,10] : num -0.0858 0.4337 -0.1193 0.5929 0.1036 ...
## $ z_1[1,11] : num 0.1351 0.8334 0.2516 0.0658 0.7903 ...
## $ z_1[1,12] : num -0.154 0.945 0.266 0.223 0.231 ...
## $ z_1[1,13] : num 1.055 0.398 0.751 1.255 0.788 ...
## $ z_1[1,14] : num -0.0374 1.4099 0.4482 1.2623 1.4965 ...
## $ z_1[1,15] : num 1.091 0.99 0.916 1.323 0.84 ...
## $ z_1[1,16] : num -2.1 -3.02 -3 -2.4 -2.75 ...
## $ z_1[1,17] : num 0.579 -0.208 -0.682 0.237 0.307 ...
## $ z_1[1,18] : num 0.708 1.059 1.304 0.651 0.606 ...
## $ z_1[1,19] : num -1.777 -0.463 -0.559 -0.715 -0.384 ...
## $ z_1[1,20] : num -1.765 -0.376 -1.298 -0.494 -0.848 ...
## $ z_1[1,21] : num 0.73 -0.293 0.184 0.953 0.167 ...
## $ z_1[1,22] : num 0.4516 -0.818 -0.3438 -0.0574 -0.1101 ...
## $ z_1[1,23] : num 1.48 1.43 1.38 1.42 1.2 ...
## $ z_1[1,24] : num -1.57 -1.26 -1.21 -1.14 -1.19 ...
## $ z_1[1,25] : num -0.598 -0.723 -0.261 -0.49 -0.52 ...
## $ z_1[1,26] : num -1.19 -2.11 -1.23 -1.64 -1.77 ...
98 / 133

Your turn!

🏃

  • Make a plot to summarise the posterior for sigma & the sd for participants
  • If your quick! Play around with different options we saw
99 / 133

A possible solution:

posterior_samples(Model3_CJ) %>%
select(
sigma,
sd_Partic__Intercept
) %>%
model_parameters(
) %>%
plot(show_labels = TRUE,
size_text = 3)

100 / 133

Make plot for random effects

We can also make plots for all mu_j's (aka: the specific residuals for each participant)

posterior_samples(Model3_CJ) %>%
select(
starts_with("r_Partic")
) %>%
pivot_longer(
starts_with("r_Partic")
) %>%
mutate(
mu_j = value
) %>%
ggplot(
aes(x = mu_j, y = reorder(name, mu_j))
) +
stat_pointinterval(
point_interval = mean_qi,
.width = .89,
size = 1/6
) +
scale_y_discrete("")

101 / 133

Make plot for random effects

Remember! We can simply calculate with the posterior! So, we can plot the differences between our participants by using mu_j and the Intercept estimate (6.63). Sum them together and take the exp()

posterior_samples(Model3_CJ) %>%
select(
starts_with("r_Partic")
) %>%
pivot_longer(
starts_with("r_Partic")
) %>%
mutate(
mu_j = exp(6.63 + value)
) %>%
ggplot(
aes(x = mu_j, y = reorder(name, mu_j))
) +
stat_pointinterval(
point_interval = mean_qi,
.width = .89,
size = 1/6
) +
scale_y_discrete("") +
scale_x_continuous("Reading time (in seconds)")

102 / 133

Another great package sjPlots

library(sjPlot)
tab_model(Model3_CJ)
  Dur Log
Predictors Estimates CI (95%)
Intercept 6.63 6.25 – 7.01
Text Length 0.39 0.31 – 0.48
Right Text -0.34 -0.52 – -0.19
Random Effects
σ2 0.90
τ00 Partic 0.39
ICC 0.30
N Partic 26
Observations 499
Marginal R2 / Conditional R2 0.133 / 0.405
103 / 133

Combine models output in a table

tab_model(Model2_CJ, Model3_CJ)
  Dur Log Dur Log
Predictors Estimates CI (95%) Estimates CI (95%)
Intercept 6.34 6.02 – 6.66 6.63 6.25 – 7.01
Text Length 0.50 0.40 – 0.59 0.39 0.31 – 0.48
Right Text -0.34 -0.52 – -0.14 -0.34 -0.52 – -0.19
Random Effects
σ2   0.90
τ00   0.39 Partic
ICC   0.30
N   26 Partic
Observations 499 499
R2 Bayes 0.186 0.133 / 0.405
104 / 133

How to tweak the estimation?

Set the number of chains; the number of iterations; the number of warm-up

Model3_CJ <- brm(
Dur_Log ~ Text_Length + Right_Text + (1|Partic),
data = Data_CJ2,
family = "gaussian",
backend = "cmdstanr",
iter = 3000,
chains = 8,
warmup = 1500,
seed = 1975
)

Tip: it's better to run less chains and more iterations per chain, so you get more precise parameter estimation

105 / 133

How to tweak the estimation?

Increase efficiency

Model3_CJ <- brm(
Dur_Log ~ Text_Length + Right_Text + (1|Partic),
data = Data_CJ2,
family = "gaussian",
backend = "cmdstanr",
iter = 3000,
chains = 8,
warmup = 1500,
cores = 4,
seed = 1975
)

Modern computers have multiple cores let them work simultaneously

106 / 133

Question 4

Hey! It's still the same question!

Does the time that people spend reading a text differs for both texts if we control for the length of the text?

In the previous analyses we neglected that we have multiple observations per pair of texts!

Your turn! You have the power!

🏃

  • Do the necessary analyses
  • Make a plot to summarise the posterior for sigma & the sd's
  • If your quick! Play around with different options we saw...
107 / 133

Solution:

Model4_CJ <- brm(
Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar),
data = Data_CJ2,
family = "gaussian",
backend = "cmdstanr",
cores = 4,
seed = 1975
)
108 / 133
Model4_CJ <- readRDS(here("models", "Model4_CJ.RDS"))
summary(Model4_CJ)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Dur_Log ~ 1 + Text_Length + Right_Text + (1 | Partic) + (1 | Compar)
## Data: Data_CJ2 (Number of observations: 499)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~Compar (Number of levels: 30)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.14 0.07 0.01 0.28 1.00 1179 1770
##
## ~Partic (Number of levels: 26)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.63 0.11 0.45 0.88 1.00 967 1194
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 6.63 0.19 6.23 6.99 1.00 1304 2184
## Text_Length 0.39 0.04 0.31 0.48 1.00 4341 3143
## Right_Text -0.35 0.09 -0.52 -0.18 1.00 4095 2984
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.94 0.03 0.88 1.00 1.00 3609 2829
##
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
109 / 133

But is it a good model?

Two complementary procedures:

  • posterior-predictive check

  • compare models with leave one out cross-validation

110 / 133

Posterior-predictive check

A visual check that can be performed with pp_check() from brms

pp_check(Model4_CJ)

111 / 133

Model comparison with loo cross-validation

AIC or BIC in Frequentist statistics

elpd^: "expected log predictive density" (higher elpd^ implies better model fit without being sensitive for overfitting!)

loo_Model2 <- loo(Model2_CJ)
loo_Model3 <- loo(Model3_CJ)
loo_Model4 <- loo(Model4_CJ)
Comparison<-
loo_compare(loo_Model2,
loo_Model3,
loo_Model4)
print(Comparison, simplify = F)
## elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo looic
## Model4_CJ 0.0 0.0 -693.8 17.7 33.5 2.4 1387.7
## Model3_CJ -0.4 1.5 -694.3 17.8 25.7 1.9 1388.5
## Model2_CJ -67.0 11.2 -760.8 15.7 4.1 0.3 1521.7
## se_looic
## Model4_CJ 35.3
## Model3_CJ 35.6
## Model2_CJ 31.3
112 / 133

What about setting your own priors?

brms has some default priors programmed (Uninformative priors)!

But: WAMBS (when to Worry and how to Avoid Misuse of Bayesian Statistics) (Rens van de Schoot et al. (2021))

"Ensure the prior distributions and the model or likelihood are well understood and described in detail in the text. Prior-predictive checking can help identify any prior-data conflict"

113 / 133

What about setting your own priors?

Let's first see how to get to know which priors are set by brms in the model we defined.

get_prior() function

get_prior(
Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar),
data = Data_CJ2,
family = "gaussian")
## prior class coef group resp dpar nlpar bound
## (flat) b
## (flat) b Right_Text
## (flat) b Text_Length
## student_t(3, 7.5, 2.5) Intercept
## student_t(3, 0, 2.5) sd
## student_t(3, 0, 2.5) sd Compar
## student_t(3, 0, 2.5) sd Intercept Compar
## student_t(3, 0, 2.5) sd Partic
## student_t(3, 0, 2.5) sd Intercept Partic
## student_t(3, 0, 2.5) sigma
## source
## default
## (vectorized)
## (vectorized)
## default
## default
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## default
114 / 133

Visualise priors

Imagine you want to visualise the prior for the Intercept & also visualise an alternative prior: N(6,2)

library(metRology) # to use dt.scaled()
# Create a vector with possible Dur_log values
# and call this vector x
x <- seq(1,20,by=.1)
# probability density for our own prior
# saved in the vector custom
custom <- dnorm(x,6,2)
# probability density for brms prior
# saved in the vector brms_default
brms_default <- dt.scaled(x,3,7.5,2.5)
# Plot both
data_frame(x,custom,brms_default) %>%
pivot_longer(
c(custom,brms_default),
names_to = "Prior") %>%
ggplot(
aes(x = x,
y = value, lty = Prior)) +
geom_line() +
scale_x_continuous(
"Dur_log",limits = c(1,20)) +
scale_y_continuous(
"probability density")

115 / 133

Change priors in brms

We use set_prior() to define priors and prior = ... in the brm() part

# Create an object with the definition of the priors you want to define manually
priors_M4 <- c(
set_prior("normal(6,2)", class = "Intercept"))
Model4b_CJ <- brm(
Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar),
data = Data_CJ2,
family = "gaussian",
backend = "cmdstanr",
prior = priors_M4,
cores = 4,
seed = 1975
)

if we want to add other manually defined priors as well, we can by doing something like:

priors_Manually <- c(
set_prior("normal(6,2)", class = "Intercept"),
set_prior("normal(0,3)", class = "b", coef = "Right_Text"),
...
)
116 / 133

Let's think about the effects we expect

Bürkner:

The default prior for population-level effects (including monotonic and category specific effects) is an improper flat prior over the reals. https://rdrr.io/cran/brms/man/set_prior.html

If you know that sd for Dur_Log equals 1.22, what could be a good prior (not to informative) for the effect of:

  • Text_Length

  • Right_Text

117 / 133

Some possibilities

Option 1: N(0,2) or Option 2: N(0,4)

x <- seq(-8,8,by=.1)
# probability densities for both
option1 <- dnorm(x,0,2)
option2 <- dnorm(x,0,4)
# Plot both
data_frame(x, option1, option2) %>%
pivot_longer(
c(option1, option2),
names_to = "Prior") %>%
ggplot(
aes(x = x,
y = value, lty = Prior)) +
geom_line() +
scale_x_continuous(
"Dur_log",limits = c(-8,8)) +
scale_y_continuous(
"probability density")

118 / 133

Your turn!

🏃

  • Re-estimate the model 4 with a custom prior for both regression effects
  • Use tab_model() to summarize both models: this model and the model with the brms default priors
  • Big changes?

Remember:

119 / 133

Possible solution

# Set the priors
priors_M4b <- c(
set_prior("normal(0,2)", class = "b", coef = "Right_Text"),
set_prior("normal(0,2)", class = "b", coef = "Text_Length")
)
# Estimate the model
Model4b_CJ <- brm(
Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar),
data = Data_CJ2,
family = "gaussian",
prior = priors_M4b,
backend = "cmdstanr",
cores = 4,
seed = 1975
)
# Summarize both models side by side
tab_model(Model4_CJ, Model4b_CJ)
120 / 133

Possible solution

  Dur Log Dur Log
Predictors Estimates CI (95%) Estimates CI (95%)
Intercept 6.63 6.23 – 6.99 6.64 6.25 – 7.01
Text Length 0.39 0.31 – 0.48 0.39 0.30 – 0.48
Right Text -0.35 -0.52 – -0.18 -0.35 -0.51 – -0.18
Random Effects
σ2 0.88 0.88
τ00 0.02 Compar 0.02 Compar
0.39 Partic 0.39 Partic
ICC 0.32 0.32
N 26 Partic 26 Partic
30 Compar 30 Compar
Observations 499 499
Marginal R2 / Conditional R2 0.133 / 0.418 0.132 / 0.417
121 / 133

Where to get help?

Now that you're ready to set your steps in Bayesian Mixed Effects Models there is a high probability that this will happen sometime:

This is a great source to learn more about Stan; brms & Hamiltonian MCMC:

https://discourse.mc-stan.org/t/divergent-transitions-a-primer/17099

122 / 133

To wrap-up: why brms ?

  • brms is more versatile than lme4 allowing also to estimate

    • multivariate models

    • finite mixture models

    • generalized mixed effects models

    • splines

    • ...

  • brms as stepping-stone to learn Stan

123 / 133

Some references & appendices

get me back to the topic slide

124 / 133

Some books

125 / 133

Some books

126 / 133

Some free online books

  • Bayes Rules!:

https://www.bayesrulesbook.com/

  • Last Friday I learned about this book:

https://vasishth.github.io/bayescogsci/book/

127 / 133

Rens van de Schoot

In Nature Reviews

128 / 133

THE Podcast

If you like running - like I do - this could be a great companion on your run!

https://www.learnbayesstats.com/

129 / 133

Site on creating the graphs

There are many blogs and websites that you can consult if you want to find out more about making graphs.


One that I often fall back to is:


http://mjskay.github.io/tidybayes/

130 / 133

Examplary paper

with an associated 'Walkthrough in R': https://rpubs.com/jensroes/765467

131 / 133

Questions?

Do not hesitate to contact me!

sven.demaeyer@uantwerpen.be

132 / 133

About me

  • University of Antwerp; Faculty of Social Sciences; dept. Training and Education Sciences

  • phd on methodological issues in research on School Effectiveness

  • research on a broad range of topics (e.g. psychometrics, writing research, learning strategies, education for sustainable development, linguistics, speech development of impaired children, ...)

  • my main focus shifted recently to Comparative Judgement and Learning from comparisons

  • since a year or 2 discovering Bayesian Statistics & brms




Blog: https://svendemaeyer.netlify.app/

Twitter: @svawa

github: https://github.com/Sdemaeyer2

2 / 133
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow