Workshop: Bayesian analyses in R

Sven De Maeyer

[University of Antwerp / Faculty of Social Sciences]

13/09/2021

1 / 133

About me

University of Antwerp; Faculty of Social Sciences; dept. Training and Education Sciences
phd on methodological issues in research on School Effectiveness
research on a broad range of topics (e.g. psychometrics, writing research, learning strategies, education for sustainable development, linguistics, speech development of impaired children, ...)
my main focus shifted recently to Comparative Judgement and Learning from comparisons
since a year or 2 discovering Bayesian Statistics & brms

Blog: https://svendemaeyer.netlify.app/

Twitter: @svawa

github: https://github.com/Sdemaeyer2

2 / 133

What about you?

Who has some experience with Bayesian analyses?

... with Stan?

... with brms?

3 / 133

Topics

Aims of the workshop & some practical stuff (let's go there)
What's Bayesian? (let's go there)
Why brms? (let's go there)
brms basic example (let's go there)
Let's get practical: the example data (let's go there)
Let's slightly increase complexity (let's go there)

4 / 133

1. Aims & practical stuff

get me back to the topic slide

5 / 133

My aims are ...

to give some background on Bayesian statistics without getting too technical (I hope)
to get you familiar with a basic workflow for a Bayesian analysis making use of brms and other great packages
to let you apply what's learned on your data (so it's hands-on)
to supply you with R code that may inspire
to share some great sources

6 / 133

All material is shared...

On Github there is a dedicated repository where you'll find:

the Rmarkdown file for the slides
saved models from the slides (some take too long to estimate 'live')
data used for the exercises

The Github repository can be accessed using the following link:

[https://github.com/Sdemaeyer2/Turku_brms_2021]

The slides are on my Talks site [https://slides-sdemaeyer.netlify.app/]

7 / 133

Zoom

Recordings

Please ask questions and interfere! Just give a shout!

Stuck with some exercises?

$\to$ Share your screen and I will try to help

8 / 133

Your own data

Day 2:

some dedicated time to apply stuff to your own data
be prepared:
- think about some (simple) models (and possible priors...)
- prepare your data to avoid a lot of data issues on day 2
- for the brave: try some modelling at home before day 2

9 / 133

Prerequisites

be familiar with R
have CmdStan & cmdstanr up and running (see https://mc-stan.org/cmdstanr/articles/cmdstanr.html)
have brms installed
have tidyverse installed

I will often use the functional programming style making use of the pipe operator (works if tidyverse is loaded):

A %>% B

This reads as take A and then do B

An example:

c(1,2,3,4,5) %>% mean()

reads as create a vector of the values 1, 2, 3, 4 and 5 and then calculate the mean

Want to learn dplyr by nice GIFS: hop to this great tweet feed

10 / 133

What's Bayesian inference about?

get me back to the topic slide

11 / 133

First things first...

What is statistical inference about?

In your definition, what is a statistical model?

What are parameters?

12 / 133

Frequentist vs. Bayesian inference

13 / 133

To get us started...

prof. dr. Von Helsing (a famous Vampire Hunter)
How many Vampires are out there?
Time for his famous Vampire Test
Sample of 200 citizens
Result: 10.5% = Vampire
Ok, but what about the population?

14 / 133

What a Frequentist approach can tell us

Make a Confidence Interval for the 10.5%:

Estimate the sampling error:
- sampling error: 0.0217
Calculate upper and lower limit of the 95% CI:
- lower: 0.0625
- upper: 0.1475
Inference:

In 95% of the samples (of sample size 200) we would get a percentage of vampires varying somewhere between 6.25% and 14.75%

15 / 133

The Bayesian approach

Before Von Helsing gathers data he had no knowledge at all!

If we visualize this 'prior knowledge' it could look like this.

16 / 133

The Bayesian approach

Von Helsing starts testing ...

First tested person is NOT a Vampire!

So, 100% of Vampires is already less probable.

Time to adjust our knowledge

17 / 133

The Bayesian approach

After testing the whole sample of 200 citizens, Von Helsing's knowledge is updated:

18 / 133

The Bayesian approach

We get a probability for each possible percentage of vampires.

plot it in a probability density plot
summarize: e.g. 90% most probable values are situated between 7.5% and 14.8%

19 / 133

Advantages & Disadvantages of Bayesian analyses

Advantages:

Natural approach to express uncertainty
Ability to incorporate prior knowledge
Increased model flexibility
Full posterior distribution of the parameters
Natural propagation of uncertainty

Disadvantage:

Slow speed of model estimation
Some reviewers don't understand you ("give me the p-value")

[*] Slight adaptation of a slide from Paul Bürkner's presentation available on YouTube
[https://www.youtube.com/watch?v=FRs1iribZME]

20 / 133

Bayesian Theorem

$P (θ | d a t a) = \frac{P (d a t a | θ) P (θ)}{P (d a t a)}$

with

$P (d a t a | θ)$ : the likelihood of the data given our model $θ$
$P (θ)$ : our prior belief about model parameters
$P (θ | d a t a)$ : the posterior probability about model parameters

meme from https://twitter.com/ChelseaParlett/status/1421291716229746689?s=20

21 / 133

Likelihood

$P (d a t a | θ)$

sometimes also written as

$L (θ | d a t a)$

Actually this is our model part

with $θ$ being the model and all parameters in the model

22 / 133

Likelihood

Example:

What if we want to model how fast people can run a 1OK?

Data = 10 observations (Running times for 10K in minutes)

##  [1] 52 54 58 48 41 49 72 53 64 62

The model (normal distribution):

$y_{i} ∽ N (μ, σ)$

The parameter values that maximize the likelihood of our data:

Mean_RT <- mean(RT, na.rm = T)
Mean_RT

## [1] 55.3

Sd_RT   <- sd(RT, na.rm = T)
Sd_RT

## [1] 8.957306

23 / 133

Prior

Expression of our prior knowledge (belief) about probable parameter values as a probability density function

"For Bayesians, the data are treated as fixed and the parameters vary. [...] Bayes' rule tells us that to calculate the posterior probability distribution we must combine a likelihood with a prior probability distribution over parameter values."
(Lambert, 2018, p.88)

24 / 133

Prior

Example:

What if we want to model how fast people can run a 1OK?

$y_{i} ∽ N (μ, σ)$

Express our prior beliefs about

$μ$

and

$σ$

🤔 Please share your thoughts? What are possible values of the population average and standard deviation for our example?

25 / 133

Prior

Uninformative / Vague

When objectivity is crucial and you want let the data speak for itself...

Informative

When including significant information is crucial

previously collected data
results from former research/analyses
data of another source
theoretical considerations

26 / 133

Prior

Example:

What if we want to model how fast people can run a 1OK?

Vague priors for $μ$

27 / 133

Say we have following priors

par(mfrow=c(2,2))
curve( dnorm( x , 50 , 20 ) , from=1 , to=200 ,xlab="mu", main="Prior for mu")
curve( dunif( x , 1 , 40 ) , from=-10 , to=50 ,xlab="sigma", main="Prior for sigma")

28 / 133

Calculate the Posterior by hand (aka grid approximation)# sample some values for mu and sigma
mu.list <- seq(from = 30, 
               to = 90, 
               length.out=200)
sigma.list <- seq(from = 2, 
                  to = 30, 
                  length.out = 200)
post <- expand.grid(mu = mu.list, 
                    sigma = sigma.list)
# Calculate the loglikelihood of the data
# for each parameter value
post$LL <- 
  sapply(1:nrow(post), 
    function(i) sum(
      dnorm(
        RT ,
        mean=post$mu[i] ,
        sd=post$sigma[i] ,
        log=TRUE ) ) )

# Calculate posterior as product 
# of LL and Prior
# but you see a '+ sign' 
# because we put everything on the 
# log scale 
# (to avoid getting zero's 
#  due to rounding in R)
post$prod <- post$LL + 
  dnorm(post$mu, 50 , 10 , TRUE) + 
  dunif(post$sigma , 1 , 40 , TRUE)
# Re-scale the posterior 
# to the probability scale
post$prob <- exp(post$prod - 
                   max(post$prod)
                 )

29 / 133

Visualise the posterior distribution

Make a contour plot

# Sampling going on
# So for reproducibility
set.seed(1975)
post %>%
  # sample 10000 rows
  # with replacement
  # higher prob higher prob to 
  # be sampled
  sample_n(size = 10000, 
           replace = TRUE, 
           weight = prob) %>%
  # create the plot
  ggplot(aes(x = mu, y = sigma)) +
  geom_density_2d_filled() + 
  theme_minimal()

30 / 133

Visualise the posterior distributions

Or sample from posterior and plot simple density plots

sample.rows <- sample(1:nrow(post), size=10000, replace=TRUE, prob=post$prob)
sample.mu <- post$mu[sample.rows]
sample.sigma <- post$sigma[sample.rows]

plot(density(sample.mu),main="mu")

plot(density(sample.sigma),main="sigma")

31 / 133

3. Why `brms`?

get me back to the topic slide

32 / 133

Imagine

A 'simple' linear model

$\begin{aligned} R T_{i} \sim N (μ, σ_{e_{i}}) \\ μ = β_{0} + β_{1} * {Weigth}_{i} + β_{2} * {Height}_{i} + β_{3} * {Age}_{i} + β_{4} * {WeeklyTrainingHours}_{i} + β_{5} * {Gender}_{i} \end{aligned}$

So you can get a REALLY LARGE number of parameters!

33 / 133

Markov Chain Monte Carlo - Why?

Complex models $\to$ Large number of parameters $\to$ exponentional number of combinations!

Posterior gets unsolvable by grid approximation

We will approximate the 'joint posterior' by 'smart' sampling

Samples of combinations of parametervalues are drawn

BUT: samples will not be random!

34 / 133

MCMC - demonstrated

Following link brings you to an interactive tool that let's you get the basic idea behind MCMC sampling:

https://chi-feng.github.io/mcmc-demo/app.html#HamiltonianMC,standard

35 / 133

Side-step: conjugate priors

Some prior probability distributions can be 'more easely' combined with a certain likelihood!

These are called conjugate priors

PRO:

You can actually analytically calculate the posterior. So: no need for all the sampling and approximation of the posterior.

A list of conjugate likelihood-prior pairs: https://en.wikipedia.org/wiki/Conjugate_prior

BUT:

"Simplicity comes with a cost! In most real-life examples of inference, the contraint of choosing likelihood-prior conjugate pairs is too restrictive and can lead us to use models that inadequately capture the variability in the data. (Lambert, 2018, p. 209)"

36 / 133

Software

different dedicated software/packages are available: JAGS / BUGS / Stan

most powerful is Stan! Specifically the Hamiltonian Monte Carlo algorithm makes it the best choice at the moment

Stan is a probabilistic programming language that uses C++

37 / 133

Example of Stan code## // generated with brms 2.14.4
## functions {
## }
## data {
##   int<lower=1> N;  // total number of observations
##   vector[N] Y;  // response variable
##   int prior_only;  // should the likelihood be ignored?
## }
## transformed data {
## }
## parameters {
##   real Intercept;  // temporary intercept for centered predictors
##   real<lower=0> sigma;  // residual SD
## }
## transformed parameters {
## }
## model {
##   // likelihood including all constants
##   if (!prior_only) {
##     // initialize linear predictor term
##     vector[N] mu = Intercept + rep_vector(0.0, N);
##     target += normal_lpdf(Y | mu, sigma);
##   }
##   // priors including all constants
##   target += normal_lpdf(Intercept | 500,50);
##   target += student_t_lpdf(sigma | 3, 0, 68.8)
##     - 1 * student_t_lccdf(0 | 3, 0, 68.8);
## }
## generated quantities {
##   // actual population-level intercept
##   real b_Intercept = Intercept;
## }
38 / 133

alt text

39 / 133

4. `brms` basic example

get me back to the topic slide

40 / 133

`brms` syntax

Very very similar to lme4 and in line with typical R-style writing up of a model ...

lme4

Model <- lmer(
  y ~ x1 + x2 + (1|Group),
  data = Data,
  ...
)

brms

Model <- brm(
  y ~ x1 + x2 + (1|Group),
  data = Data,
  family = "gaussian",
  backend = "cmdstanr"
  ...
)

Notice:

family = "gaussian" indicates the "likelihood" function we will use
backend = "cmdstanr" indicates the way we want to interact with Stan and C++

41 / 133

Let's retake the example on running

Example:

What if we want to model how fast people can run a 1OK?

The simplest model looked like:

$R T_{i} \sim N (μ, σ_{e})$

In brms this model is:

RT <- c(
  52, 54, 58, 48, 41, 49, 72, 53, 64, 62
  )
# First make a dataset from our RT vector
DataRT <- data_frame(RT)
Mod_RT1 <- brm(
  RT ~ 1, # We only model an intercept
  data = DataRT,
  backend = "cmdstanr",
  seed = 1975
)

🏃 Try it yourself and run the model ... (don't forget to load the necessary packages: brms & tidyverse)

42 / 133

43 / 133

But ...

44 / 133

So

For the workshop all models are in the models folder

You can normally read model information with readRDS

Here's an example:

Mod_RT1 <- readRDS(here("models", "Mod_RT1.RDS"))

This model information was saved with the saveRDS function:

saveRDS(Mod_RT1, here("models", "Mod_RT1.RDS"))

Notice that I use the package here allowing me to access files anywhere in the RStudio Projects folder using the here( ) commando (no fuzz with paths on your computer)

45 / 133

MCMC = sampling

MCMC results will differ each time run

Of course, not a lot if your model is good!

So, what if you want to be reproducible?

Mod_RT1 <- brm(
  RT ~ 1, # We only model an intercept
  data = DataRT,
  backend = "cmdstanr",
  seed = 1975 # make sure we get the same results each time :-)
)

46 / 133

Good old `summary( )` function

summary(Mod_RT1)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: RT ~ 1 
##    Data: DataRT (Number of observations: 10) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    55.10      2.82    49.68    60.90 1.00     2000     2180
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     9.47      2.32     6.17    15.08 1.00     2445     2318
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

47 / 133

Samples & chains?

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: RT ~ 1 
##    Data: DataRT (Number of observations: 10) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    55.10      2.82    49.68    60.90 1.00     2000     2180
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     9.47      2.32     6.17    15.08 1.00     2445     2318
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

By default brms sets 4 chains of 2000 iterations of which 1000 iterations/chain are warm-up

48 / 133

Samples & chains: what are 'burn-in' iterations

49 / 133

Samples & chains: what are 'burn-in' iterations

50 / 133

Samples & chains: what are 'burn-in' iterations

51 / 133

Good ald `plot( )` function

plot(Mod_RT1)

Left panel

the posterior distributions (later we will see other more informative ways to plot the information in the posterior)

Right panel

the convergence of the each parameter

$\to$ should look like a caterpillar

52 / 133

Model Convergence

$\hat{R}$ < 1.015 for each parameter estimate
at least 4 chains are recommended
Effective Sample Size (ESS) > 400 to rely on $\hat{R}$

53 / 133

Let's inspect the output again

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: RT ~ 1 
##    Data: DataRT (Number of observations: 10) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    55.10      2.82    49.68    60.90 1.00     2000     2180
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     9.47      2.32     6.17    15.08 1.00     2445     2318
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

54 / 133

5. Let's get practical: the example dataset

get me back to the topic slide

55 / 133

Judging 'argumentative texts'

alt text

56 / 133

Procedure

26 high school teachers (Dutch) voluntary participated
each did 10 comparisons of 2 argumentative texts from 10th graders
3 batches of comparisons with random allocation of judges to one of the batches
all batches similar composition of comparisons regarding the characteristics of the pairs; the pairs, however not the same
Tobii TX300 dark pupil eye-tracker with a 23-inch TFT monitor (max. resolution of 1920 x 1080 pixels)
data sampled binocularly at 300 Hz

57 / 133

AOI's

alt text

58 / 133

The data

Show entries

Search:

	Partic	Compar	AOI_Type	Dur_Seconds	Dur_Log
9	20	DLL 112-103	AOI_RIGHT_TEXT	0.13	4.868
10	20	DMH 803-102	AOI_LEFT_TEXT	0.32	5.768
11	20	DMH 803-102	AOI_LEFT_TEXT	0.723	6.583
12	20	DMH 803-102	AOI_LEFT_TEXT	0.09	4.500
13	20	DMH 803-102	AOI_LEFT_TEXT	0.06	4.094
14	20	DMH 803-102	AOI_LEFT_TEXT	0.083	4.419
15	20	DMH 803-102	AOI_LEFT_TEXT	0.557	6.323
16	20	DMH 803-102	AOI_LEFT_TEXT	8.586	9.058

Showing 9 to 16 of 2,914 entries

Previous1 2 3 4 5…365Next

59 / 133

6. Let's slightly increase complexity

get me back to the topic slide

60 / 133

Question 1

How much time do people spend in a text (in total) when comparing two texts?

A simple model could look like:

$l o g (T i m e_{i}) \sim N (μ, σ_{e})$

with $T i m e_{i}$ being the total time for each combination of participant, comparison and AOI_type

Durations_CJ %>%
  group_by(Partic, Compar, AOI_Type) %>%
  summarize(
    Dur_Seconds = mean(Dur_Seconds),
    Dur_Log = mean(Dur_Log)
  ) %>% head(5)

## # A tibble: 5 x 5
## # Groups:   Partic, Compar [3]
##   Partic Compar      AOI_Type       Dur_Seconds Dur_Log
##    <dbl> <chr>       <chr>                <dbl>   <dbl>
## 1      1 DHH 214-425 AOI_LEFT_TEXT         7.12    6.61
## 2      1 DHH 214-425 AOI_RIGHT_TEXT       15.3     8.55
## 3      1 DLL 508- 51 AOI_LEFT_TEXT         4.61    7.31
## 4      1 DLL 508- 51 AOI_RIGHT_TEXT        3.28    6.63
## 5      1 DMH 414-609 AOI_LEFT_TEXT         1.93    6.82

61 / 133

Question 1

How much time do people spend in a text (in total) when comparing two texts?

A simple model could look like:

$l o g (T i m e_{i}) \sim N (μ, σ_{e})$

🏃

Try it yourself and run the model

Inspect convergence

Interpret the results

62 / 133

Question 1

First we make a dataset

Data_CJ1 <- Durations_CJ %>%
  group_by(Partic, Compar, AOI_Type) %>%
  summarize(
    Dur_Seconds = mean(Dur_Seconds),
    Dur_Log = mean(Dur_Log)
  )

Then we can run the model

Model1_CJ <- brm(
  Dur_Log ~ 1,
  data = Data_CJ1,
  family = "gaussian",
  backend = "cmdstanr",
  seed = 1975
)

63 / 133

Question 1

Model1_CJ <- readRDS(here("models", "Model1_CJ.RDS"))
summary(Model1_CJ)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Dur_Log ~ 1 
##    Data: Data_CJ1 (Number of observations: 499) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     7.65      0.06     7.54     7.76 1.00     3411     2340
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.23      0.04     1.15     1.31 1.00     3446     2663
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

64 / 133

Convergence is clearly ok!

Question 1

plot(Model1_CJ)

65 / 133

Further exploration of the posterior

Pull the 4000 samples of our parameter values from the posterior with

posterior_samples()

posterior_samples(Model1_CJ) %>%
  head(10)

##    b_Intercept   sigma Intercept     lp__
## 1      7.64887 1.15629   7.64887 -812.378
## 2      7.63948 1.21292   7.63948 -810.850
## 3      7.67498 1.22842   7.67498 -810.910
## 4      7.59150 1.23610   7.59150 -811.445
## 5      7.68766 1.24512   7.68766 -811.191
## 6      7.59683 1.22045   7.59683 -811.293
## 7      7.69482 1.24441   7.69482 -811.274
## 8      7.61122 1.19426   7.61122 -811.343
## 9      7.62011 1.16751   7.62011 -812.042
## 10     7.61525 1.17405   7.61525 -811.850

*with the head() function we ask to only show the first 10 rows

66 / 133

Time to unpack the plotting power of `ggplot2`

First we set a custom theme that makes our plots look 'better'

theme_set(theme_linedraw() +
            theme(text = element_text(family = "Times"),
                  panel.grid = element_blank()))

67 / 133

Time to unpack the plotting power of `ggplot2`

Package tidybayes is a wrapper including functions from ggdist that open a lot of plotting options

posterior_samples(Model1_CJ) %>%         # Get draws of the posterior
  select(b_Intercept) %>%                # Only select our Intercept
  pivot_longer(everything()) %>%         # Rearrange the result to plot
  ggplot(aes(x = value, y = name)) +     # Let's plot...
    stat_pointinterval(.width = .89) +   # 89% CI
    xlab("marginal posterior")           # Set our x-axis label

68 / 133

Your turn

🏃

Make a similar plot for the estimate of the standard deviation

With a CI of 80%

Interpret the results

69 / 133

Solution...

posterior_samples(Mode1l_CJ) %>%         # Get draws of the posterior
  select(sigma) %>%                      # Only select sigma
  pivot_longer(everything()) %>%         # Rearrange the result to plot
  ggplot(aes(x = value, y = name)) +     # Let's plot...
    stat_pointinterval(.width = .80) +   # 80% CI
    xlab("marginal posterior")           # Set our x-axis label

70 / 133

Another type of plot with some more information in it

Let's use stat_interval() from the ggdist package

posterior_samples(Model1_CJ) %>%  
  select(b_Intercept) %>%               
  ggplot(aes(x = b_Intercept, y = 0)) +             
   stat_interval(
     .width = c(.95,.90,.50)   # The CI's to plot 
     ) + 
   xlab("marginal posterior") + 
   scale_y_continuous(   # No y-axis information
     NULL, 
     breaks = NULL
     ) +
   scale_color_brewer()

71 / 133

Your turn

🏃

Make a similar plot for the estimate of the standard deviation

With a CI of 50%, 80%, 89%, 95%

72 / 133

Solution

posterior_samples(Model1_CJ) %>%  
  select(sigma) %>%               
  ggplot(aes(x = sigma, y = 0)) +             
   stat_interval(
     .width = c(.95, .89,.80,.50)   # The CI's to plot 
     ) + 
   xlab("marginal posterior") + 
   scale_y_continuous(  
     NULL, 
     breaks = NULL
     ) +
   scale_color_brewer()

73 / 133

We can also integrate plots for mu and sigma

And we introduce the stat_halfeye() function of ggdist

names <- c("mu","sigma")
posterior_samples(Model1_CJ) %>%
  select(b_Intercept,sigma) %>%
  set_names(names) %>%
  pivot_longer(everything()) %>%
  ggplot(aes(x = value, y = 0)) +
   stat_halfeye(.width = .89, normalize = "panels") +
   scale_y_continuous(NULL, breaks = NULL) +
   xlab("marginal posterior") +
   facet_wrap(~name, scales = "free")

74 / 133

Your turn

🏃

Try to replicate the graph

But with other CI's (e.g. for 50%, 90% and 95%)

75 / 133

Solution

names <- c("mu","sigma")
posterior_samples(Model1_CJ) %>%
  select(b_Intercept,sigma) %>%
  set_names(names) %>%
  pivot_longer(everything()) %>%
  ggplot(aes(x = value, y = 0)) +
   stat_halfeye(
    .width = c(.50, .90, .95), normalize = "panels") +
   scale_y_continuous(NULL, breaks = NULL) +
   xlab("marginal posterior") +
   facet_wrap(~name, scales = "free")

76 / 133

How to get rid of the log?

We can simply do calculations with our posterior samples! So, we can just use exp():

names <- c("mu_log_scale","mu_sec", "sigma")
posterior_samples(Model1_CJ) %>%
  select(b_Intercept,sigma) %>%
  mutate(
      mu_sec = exp(b_Intercept)
  ) %>% 
  # we do the select() again to reorder the columns
  # this is not super necessary, but then look out
  # with setting the names of the columns!
  select(b_Intercept, mu_sec, sigma) %>%
  set_names(names) %>%
  pivot_longer(everything()) %>%
  ggplot(aes(x = value, y = 0)) +
  stat_halfeye(.width = c(.5, .9, .95), normalize = "panels") +
  scale_y_continuous(NULL, breaks = NULL) +
  xlab("marginal posterior") +
  facet_wrap(~name, scales = "free")

77 / 133

Question 2

Does the time that people spend reading a text differs for both texts if we control for the length of the text?

A model could look like:

$\begin{aligned} l o g (T i m e_{i}) \sim N (μ, σ_{e_{i}}) \\ μ = β_{0} + β_{1} * {Right_Text}_{i} + β_{2} * {Text_Length}_{i} \end{aligned}$

load(here("data", "Data_CJ2.Rdata"))
head(Data_CJ2, 6)

## # A tibble: 6 x 7
## # Groups:   Partic, Compar [3]
##   Partic Compar      AOI_Type       Dur_Seconds Dur_Log Right_Text Text_Length
##    <dbl> <chr>       <chr>                <dbl>   <dbl>      <dbl>       <dbl>
## 1      1 DHH 214-425 AOI_LEFT_TEXT         7.12    6.61          0        2.72
## 2      1 DHH 214-425 AOI_RIGHT_TEXT       15.3     8.55          1        2.48
## 3      1 DLL 508- 51 AOI_LEFT_TEXT         4.61    7.31          0        3.11
## 4      1 DLL 508- 51 AOI_RIGHT_TEXT        3.28    6.63          1        3.13
## 5      1 DMH 414-609 AOI_LEFT_TEXT         1.93    6.82          0        1.78
## 6      1 DMH 414-609 AOI_RIGHT_TEXT        1.53    6.33          1        3.89

78 / 133

Your turn

🏃

Try to run the model with brms

Check model convergence

Make one plot to visualise the posteriors for the regression parameters

79 / 133

Solution

Model2_CJ <- brm(
  Dur_Log ~ 1 + Text_Length + Right_Text,
  data = Data_CJ2,
  family = "gaussian",
  backend = "cmdstanr",
  seed = 1975
)

80 / 133

Solution

Visual checking of convergence

plot(Model2_CJ)

81 / 133

Solution: check outputsummary(Model2_CJ)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Dur_Log ~ 1 + Text_Length + Right_Text 
##    Data: Data_CJ2 (Number of observations: 499) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept       6.34      0.16     6.02     6.66 1.00     4980     3109
## Text_Length     0.50      0.05     0.40     0.59 1.00     4968     3327
## Right_Text     -0.33      0.10    -0.52    -0.14 1.00     4424     2890
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.11      0.04     1.04     1.18 1.00     4759     3210
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
82 / 133

Solution: create plot on posteriors for beta's

posterior_samples(Model2_CJ) %>%
  select(
    b_Intercept,
    b_Text_Length,
    b_Right_Text
  ) %>%
  pivot_longer(everything()) %>%
  ggplot(aes(x = value, y = 0)) +
  stat_halfeye(
    .width = c(.5, .9, .95), 
    normalize = "panels") +
  scale_y_continuous(NULL, breaks = NULL) +
  xlab("marginal posterior") +
  facet_wrap(~name, scales = "free")

83 / 133

Let's introduce `mcmc_areas()` from the `bayesplot`

mcmc_areas(
  Model2_CJ,
  pars = c("b_Text_Length",
           "b_Right_Text"
           ),
  prob = 0.89
)

84 / 133

or `mcmc_areas_ridges` from `bayesplot`

mcmc_areas_ridges(
  Model2_CJ,
  pars = c("b_Text_Length",
           "b_Right_Text"
           ),
  prob = 0.89
)

85 / 133

or `mcmc_intervals` from `bayesplot`

& let's play around with the color_scheme_set() that works with all bayesplot functions...

color_scheme_set("red")
mcmc_intervals(
  Model2_CJ,
  pars = c("b_Text_Length",
           "b_Right_Text"),
  prob = c(0.5, 0.89, 0.95)
)

86 / 133

CI's (ETI's) vs. HDI's

87 / 133

Alternative for p-values?88 / 133

Alternative for p-values?

ROPE : Region Of Practical Equivalence

Set a region of parameter values that can be considered equivalent to having no effect

in standard effect sizes the advised default is a range of -0.1 to 0.1
this equals half of a small effect size (Cohen, 1988)
all parameter values in that range are set equal to no effect

89 / 133

Alternative for p-values?

ROPE + HDI rule

95% of HDI out of ROPE $\to$ $H_{0}$ rejected
95% of HDI all in ROPE $\to$ $H_{0}$ not rejected
95% of HDI partially out of ROPE $\to$ undecided

90 / 133

HDI + ROPE with `brms` results

We can use the equivalence_test() function of the bayestestR package

equivalence_test(Model2_CJ)

## # Test for Practical Equivalence
## 
##   ROPE: [-0.12 0.12]
## 
## Parameter   |       H0 | inside ROPE |       95% HDI
## ----------------------------------------------------
## Intercept   | Rejected |      0.00 % | [ 5.99  6.63]
## Text_Length | Rejected |      0.00 % | [ 0.41  0.59]
## Right_Text  | Rejected |      0.00 % | [-0.52 -0.14]

91 / 133

Or the `parameters` package

This package has the nice flexible model_parameters() function!

posterior_samples(Model2_CJ) %>%
  model_parameters(rope_range=c(-0.12,0.12))

## # Fixed Effects
## 
## Parameter     |  Median |             89% CI |     pd | % in ROPE
## -----------------------------------------------------------------
## b_Intercept   |    6.34 | [   6.07,    6.59] |   100% |        0%
## b_Text_Length |    0.50 | [   0.42,    0.57] |   100% |        0%
## b_Right_Text  |   -0.34 | [  -0.49,   -0.17] | 99.95% |     1.43%
## sigma         |    1.11 | [   1.05,    1.16] |   100% |        0%
## Intercept     |    7.65 | [   7.57,    7.73] |   100% |        0%
## lp__          | -761.62 | [-763.69, -760.08] |   100% |        0%

pd stands for probability of direction

92 / 133

Now it's getting more fun

We can combine model_parameters() with plot() because the see package deals with the output!

posterior_samples(Model2_CJ) %>%
  select(
    b_Right_Text,
    b_Text_Length
  ) %>%
  model_parameters(
    ) %>%
  plot(show_labels = TRUE, 
       size_text = 3)

93 / 133

Now it's getting more fun

And another one ...

We can combine equivalence_test with plot() as well thx to the see package!

equivalence_test(Model2_CJ) %>%
  plot( )

94 / 133

"Hey! You promised mixed effects model"

Question 3

Does the time that people spend reading a text differs for both texts if we control for the length of the text?

In the previous analyses we neglected that we have multiple observations per person!

A mixed effects model could look like:

$\begin{aligned} l o g (T i m e_{i j}) \sim N (μ, σ_{e_{i j}}) \\ μ = β_{0} + β_{1} * {Right_Text}_{i} + β_{2} * {Text_Length}_{i} + μ_{j} \\ μ_{j} \sim N (0, σ_{u_{j}}) \end{aligned}$

95 / 133

Adding random term to the model

Model3_CJ <- brm(
  Dur_Log ~ Text_Length + Right_Text + (1|Partic),
  data = Data_CJ2,
  family = "gaussian",
  backend = "cmdstanr",
  seed = 1975
)

🏃

Estimate the model

Check the output

Interpret the results

96 / 133

SolutionModel3_CJ <- readRDS(here("models", "Model3_CJ.RDS"))
summary(Model3_CJ)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Dur_Log ~ 1 + Text_Length + Right_Text + (1 | Partic) 
##    Data: Data_CJ2 (Number of observations: 499) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~Partic (Number of levels: 26) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.62      0.11     0.45     0.87 1.00     1014     1218
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept       6.63      0.19     6.25     7.01 1.00     1543     2156
## Text_Length     0.39      0.04     0.31     0.48 1.00     6940     2934
## Right_Text     -0.35      0.08    -0.52    -0.19 1.00     6448     2957
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.95      0.03     0.89     1.01 1.00     5550     2644
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
97 / 133

What do we get?

If we want to plot information on the variances (or sd) we have to know the names of elements

posterior_samples(Model3_CJ) %>%
  str()

## 'data.frame':    4000 obs. of  59 variables:
##  $ b_Intercept           : num  6.63 6.79 6.66 6.74 6.57 ...
##  $ b_Text_Length         : num  0.419 0.396 0.404 0.347 0.373 ...
##  $ b_Right_Text          : num  -0.309 -0.403 -0.288 -0.258 -0.286 ...
##  $ sd_Partic__Intercept  : num  0.505 0.531 0.537 0.637 0.589 ...
##  $ sigma                 : num  0.928 0.937 0.939 0.936 0.874 ...
##  $ Intercept             : num  7.73 7.78 7.72 7.65 7.54 ...
##  $ r_Partic[1,Intercept] : num  -0.141 -0.673 -0.112 0.07 -0.57 ...
##  $ r_Partic[2,Intercept] : num  0.2899 -0.0303 0.5392 0.2669 0.5086 ...
##  $ r_Partic[3,Intercept] : num  0.619 0.767 0.996 1.152 0.998 ...
##  $ r_Partic[4,Intercept] : num  -0.3145 0.0216 -0.3654 0.1749 0.0874 ...
##  $ r_Partic[5,Intercept] : num  -0.3268 -0.0846 0.1697 -0.0618 -0.1935 ...
##  $ r_Partic[6,Intercept] : num  0.492 0.142 0.256 0.332 0.393 ...
##  $ r_Partic[7,Intercept] : num  -0.617 -0.777 -0.576 -0.444 -0.248 ...
##  $ r_Partic[8,Intercept] : num  0.1582 -0.1739 -0.0464 0.0666 0.1937 ...
##  $ r_Partic[9,Intercept] : num  0.1173 -0.4262 -0.2635 -0.4233 0.0362 ...
##  $ r_Partic[10,Intercept]: num  -0.0433 0.2303 -0.0641 0.3779 0.061 ...
##  $ r_Partic[11,Intercept]: num  0.0683 0.4425 0.1351 0.0419 0.4656 ...
##  $ r_Partic[12,Intercept]: num  -0.0778 0.5016 0.1428 0.1423 0.1359 ...
##  $ r_Partic[13,Intercept]: num  0.533 0.211 0.403 0.8 0.465 ...
##  $ r_Partic[14,Intercept]: num  -0.0189 0.7486 0.2406 0.8047 0.8817 ...
##  $ r_Partic[15,Intercept]: num  0.552 0.526 0.492 0.843 0.495 ...
##  $ r_Partic[16,Intercept]: num  -1.06 -1.61 -1.61 -1.53 -1.62 ...
##  $ r_Partic[17,Intercept]: num  0.293 -0.11 -0.366 0.151 0.181 ...
##  $ r_Partic[18,Intercept]: num  0.358 0.562 0.7 0.415 0.357 ...
##  $ r_Partic[19,Intercept]: num  -0.898 -0.246 -0.3 -0.456 -0.226 ...
##  $ r_Partic[20,Intercept]: num  -0.892 -0.2 -0.697 -0.315 -0.5 ...
##  $ r_Partic[21,Intercept]: num  0.3688 -0.1558 0.099 0.6076 0.0982 ...
##  $ r_Partic[22,Intercept]: num  0.2282 -0.4344 -0.1846 -0.0366 -0.0649 ...
##  $ r_Partic[23,Intercept]: num  0.749 0.76 0.743 0.904 0.707 ...
##  $ r_Partic[24,Intercept]: num  -0.792 -0.671 -0.647 -0.726 -0.701 ...
##  $ r_Partic[25,Intercept]: num  -0.302 -0.384 -0.14 -0.312 -0.306 ...
##  $ r_Partic[26,Intercept]: num  -0.599 -1.12 -0.663 -1.047 -1.045 ...
##  $ lp__                  : num  -726 -730 -726 -722 -725 ...
##  $ z_1[1,1]              : num  -0.28 -1.267 -0.208 0.11 -0.968 ...
##  $ z_1[1,2]              : num  0.5736 -0.0571 1.0042 0.4188 0.8632 ...
##  $ z_1[1,3]              : num  1.22 1.44 1.86 1.81 1.69 ...
##  $ z_1[1,4]              : num  -0.6223 0.0406 -0.6806 0.2744 0.1483 ...
##  $ z_1[1,5]              : num  -0.647 -0.159 0.316 -0.097 -0.329 ...
##  $ z_1[1,6]              : num  0.974 0.267 0.476 0.52 0.667 ...
##  $ z_1[1,7]              : num  -1.221 -1.464 -1.072 -0.697 -0.421 ...
##  $ z_1[1,8]              : num  0.3131 -0.3275 -0.0864 0.1046 0.3287 ...
##  $ z_1[1,9]              : num  0.2322 -0.8027 -0.4908 -0.6641 0.0614 ...
##  $ z_1[1,10]             : num  -0.0858 0.4337 -0.1193 0.5929 0.1036 ...
##  $ z_1[1,11]             : num  0.1351 0.8334 0.2516 0.0658 0.7903 ...
##  $ z_1[1,12]             : num  -0.154 0.945 0.266 0.223 0.231 ...
##  $ z_1[1,13]             : num  1.055 0.398 0.751 1.255 0.788 ...
##  $ z_1[1,14]             : num  -0.0374 1.4099 0.4482 1.2623 1.4965 ...
##  $ z_1[1,15]             : num  1.091 0.99 0.916 1.323 0.84 ...
##  $ z_1[1,16]             : num  -2.1 -3.02 -3 -2.4 -2.75 ...
##  $ z_1[1,17]             : num  0.579 -0.208 -0.682 0.237 0.307 ...
##  $ z_1[1,18]             : num  0.708 1.059 1.304 0.651 0.606 ...
##  $ z_1[1,19]             : num  -1.777 -0.463 -0.559 -0.715 -0.384 ...
##  $ z_1[1,20]             : num  -1.765 -0.376 -1.298 -0.494 -0.848 ...
##  $ z_1[1,21]             : num  0.73 -0.293 0.184 0.953 0.167 ...
##  $ z_1[1,22]             : num  0.4516 -0.818 -0.3438 -0.0574 -0.1101 ...
##  $ z_1[1,23]             : num  1.48 1.43 1.38 1.42 1.2 ...
##  $ z_1[1,24]             : num  -1.57 -1.26 -1.21 -1.14 -1.19 ...
##  $ z_1[1,25]             : num  -0.598 -0.723 -0.261 -0.49 -0.52 ...
##  $ z_1[1,26]             : num  -1.19 -2.11 -1.23 -1.64 -1.77 ...

98 / 133

Your turn!

🏃

Make a plot to summarise the posterior for sigma & the sd for participants

If your quick! Play around with different options we saw

99 / 133

A possible solution:

posterior_samples(Model3_CJ) %>%
  select(
    sigma,
    sd_Partic__Intercept
  ) %>%
  model_parameters(
    ) %>%
  plot(show_labels = TRUE, 
       size_text = 3)

100 / 133

Make plot for random effects

We can also make plots for all mu_j's (aka: the specific residuals for each participant)

posterior_samples(Model3_CJ) %>%
  select(
    starts_with("r_Partic")
    ) %>%
  pivot_longer(
    starts_with("r_Partic")
    ) %>% 
  mutate(
    mu_j = value
    ) %>%
  ggplot(
    aes(x = mu_j, y = reorder(name, mu_j))
    ) +
    stat_pointinterval(
      point_interval = mean_qi,
      .width = .89, 
      size = 1/6
      ) +
  scale_y_discrete("")

101 / 133

Make plot for random effects

Remember! We can simply calculate with the posterior! So, we can plot the differences between our participants by using mu_j and the Intercept estimate (6.63). Sum them together and take the exp()

posterior_samples(Model3_CJ) %>%
  select(
    starts_with("r_Partic")
    ) %>%
  pivot_longer(
    starts_with("r_Partic")
    ) %>% 
  mutate(
    mu_j = exp(6.63 + value)
    ) %>%
  ggplot(
    aes(x = mu_j, y = reorder(name, mu_j))
    ) +
    stat_pointinterval(
      point_interval = mean_qi,
      .width = .89, 
      size = 1/6
      ) +
  scale_y_discrete("") +
  scale_x_continuous("Reading time (in seconds)")

102 / 133

Another great package sjPlotslibrary(sjPlot)
tab_model(Model3_CJ)



 
Dur Log

Predictors
Estimates
CI (95%)

Intercept
6.63
6.25 – 7.01

Text Length
0.39
0.31 – 0.48

Right Text
-0.34
-0.52 – -0.19

Random Effects

σ2
0.90


τ00 Partic
0.39


ICC
0.30


N Partic
26

Observations
499

Marginal R2 / Conditional R2
0.133 / 0.405


103 / 133

	Dur Log
Predictors	Estimates	CI (95%)
Intercept	6.63	6.25 – 7.01
Text Length	0.39	0.31 – 0.48
Right Text	-0.34	-0.52 – -0.19
Random Effects
σ²	0.90
τ₀₀ _Partic	0.39
ICC	0.30
N _Partic	26
Observations	499
Marginal R² / Conditional R²	0.133 / 0.405

Combine models output in a table

tab_model(Model2_CJ, Model3_CJ)

	Dur Log		Dur Log
Predictors	Estimates	CI (95%)	Estimates	CI (95%)
Intercept	6.34	6.02 – 6.66	6.63	6.25 – 7.01
Text Length	0.50	0.40 – 0.59	0.39	0.31 – 0.48
Right Text	-0.34	-0.52 – -0.14	-0.34	-0.52 – -0.19
Random Effects
σ²			0.90
τ₀₀			0.39 _Partic
ICC			0.30
N			26 _Partic
Observations	499		499
R² Bayes	0.186		0.133 / 0.405

104 / 133

How to tweak the estimation?

Set the number of chains; the number of iterations; the number of warm-up

Model3_CJ <- brm(
  Dur_Log ~ Text_Length + Right_Text + (1|Partic),
  data = Data_CJ2,
  family = "gaussian",
  backend = "cmdstanr",
  iter = 3000,
  chains = 8,
  warmup = 1500,
  seed = 1975
)

Tip: it's better to run less chains and more iterations per chain, so you get more precise parameter estimation

105 / 133

How to tweak the estimation?

Increase efficiency

Model3_CJ <- brm(
  Dur_Log ~ Text_Length + Right_Text + (1|Partic),
  data = Data_CJ2,
  family = "gaussian",
  backend = "cmdstanr",
  iter = 3000,
  chains = 8,
  warmup = 1500,
  cores = 4,
  seed = 1975
)

Modern computers have multiple cores $\to$ let them work simultaneously

106 / 133

Question 4

Hey! It's still the same question!

Does the time that people spend reading a text differs for both texts if we control for the length of the text?

In the previous analyses we neglected that we have multiple observations per pair of texts!

Your turn! You have the power!

🏃

Do the necessary analyses

Make a plot to summarise the posterior for sigma & the sd's

If your quick! Play around with different options we saw...

107 / 133

Solution:

Model4_CJ <- brm(
  Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar),
  data = Data_CJ2,
  family = "gaussian",
  backend = "cmdstanr",
  cores = 4,
  seed = 1975
)

108 / 133

Model4_CJ <- readRDS(here("models", "Model4_CJ.RDS"))
summary(Model4_CJ)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Dur_Log ~ 1 + Text_Length + Right_Text + (1 | Partic) + (1 | Compar) 
##    Data: Data_CJ2 (Number of observations: 499) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~Compar (Number of levels: 30) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.14      0.07     0.01     0.28 1.00     1179     1770
## 
## ~Partic (Number of levels: 26) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.63      0.11     0.45     0.88 1.00      967     1194
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept       6.63      0.19     6.23     6.99 1.00     1304     2184
## Text_Length     0.39      0.04     0.31     0.48 1.00     4341     3143
## Right_Text     -0.35      0.09    -0.52    -0.18 1.00     4095     2984
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.94      0.03     0.88     1.00 1.00     3609     2829
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
109 / 133

But is it a good model?

Two complementary procedures:

posterior-predictive check
compare models with leave one out cross-validation

110 / 133

Posterior-predictive check

A visual check that can be performed with pp_check() from brms

pp_check(Model4_CJ)

111 / 133

Model comparison with loo cross-validation

$\sim$ AIC or BIC in Frequentist statistics

$\hat{e l p d}$ : "expected log predictive density" (higher $\hat{e l p d}$ implies better model fit without being sensitive for overfitting!)

loo_Model2 <- loo(Model2_CJ)
loo_Model3 <- loo(Model3_CJ)
loo_Model4 <- loo(Model4_CJ)
Comparison<- 
  loo_compare(loo_Model2, 
            loo_Model3, 
            loo_Model4)
print(Comparison, simplify = F)

##           elpd_diff se_diff elpd_loo se_elpd_loo p_loo  se_p_loo looic 
## Model4_CJ    0.0       0.0  -693.8     17.7        33.5    2.4   1387.7
## Model3_CJ   -0.4       1.5  -694.3     17.8        25.7    1.9   1388.5
## Model2_CJ  -67.0      11.2  -760.8     15.7         4.1    0.3   1521.7
##           se_looic
## Model4_CJ   35.3  
## Model3_CJ   35.6  
## Model2_CJ   31.3

112 / 133

What about setting your own priors?

brms has some default priors programmed (Uninformative priors)!

But: WAMBS (when to Worry and how to Avoid Misuse of Bayesian Statistics) (Rens van de Schoot et al. (2021))

"Ensure the prior distributions and the model or likelihood are well understood and described in detail in the text. Prior-predictive checking can help identify any prior-data conflict"

113 / 133

What about setting your own priors?

Let's first see how to get to know which priors are set by brms in the model we defined.

get_prior() function

get_prior(  
  Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar),
  data = Data_CJ2,
  family = "gaussian")

##                   prior     class        coef  group resp dpar nlpar bound
##                  (flat)         b                                         
##                  (flat)         b  Right_Text                             
##                  (flat)         b Text_Length                             
##  student_t(3, 7.5, 2.5) Intercept                                         
##    student_t(3, 0, 2.5)        sd                                         
##    student_t(3, 0, 2.5)        sd             Compar                      
##    student_t(3, 0, 2.5)        sd   Intercept Compar                      
##    student_t(3, 0, 2.5)        sd             Partic                      
##    student_t(3, 0, 2.5)        sd   Intercept Partic                      
##    student_t(3, 0, 2.5)     sigma                                         
##        source
##       default
##  (vectorized)
##  (vectorized)
##       default
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##       default

114 / 133

Visualise priors

Imagine you want to visualise the prior for the Intercept & also visualise an alternative prior: $N (6, 2)$

library(metRology) # to use dt.scaled()
# Create a vector with possible Dur_log values
# and call this vector x
x <- seq(1,20,by=.1) 
# probability density for our own prior
# saved in the vector custom
custom       <- dnorm(x,6,2)
# probability density for brms prior
# saved in the vector brms_default
brms_default <- dt.scaled(x,3,7.5,2.5)
# Plot both
data_frame(x,custom,brms_default) %>%
  pivot_longer(
    c(custom,brms_default),
    names_to = "Prior") %>%
  ggplot(
    aes(x = x, 
        y = value, lty = Prior)) + 
  geom_line() + 
  scale_x_continuous(
    "Dur_log",limits = c(1,20)) +
  scale_y_continuous(
    "probability density")

115 / 133

Change priors in `brms`

We use set_prior() to define priors and prior = ... in the brm() part

# Create an object with the definition of the priors you want to define manually
priors_M4 <- c(
  set_prior("normal(6,2)", class = "Intercept"))
Model4b_CJ <- brm(
  Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar),
  data = Data_CJ2,
  family = "gaussian",
  backend = "cmdstanr",
  prior = priors_M4,
  cores = 4,
  seed = 1975
)

if we want to add other manually defined priors as well, we can by doing something like:

priors_Manually <- c(
  set_prior("normal(6,2)", class = "Intercept"),
  set_prior("normal(0,3)", class = "b", coef = "Right_Text"),
  ...
)

116 / 133

Let's think about the effects we expect

Bürkner:

The default prior for population-level effects (including monotonic and category specific effects) is an improper flat prior over the reals. https://rdrr.io/cran/brms/man/set_prior.html

If you know that sd for Dur_Log equals 1.22, what could be a good prior (not to informative) for the effect of:

Text_Length
Right_Text

117 / 133

Some possibilities

Option 1: $N (0, 2)$ or Option 2: $N (0, 4)$

x <- seq(-8,8,by=.1) 
# probability densities for both
option1 <- dnorm(x,0,2)
option2 <- dnorm(x,0,4)
# Plot both
data_frame(x, option1, option2) %>%
  pivot_longer(
    c(option1, option2),
    names_to = "Prior") %>%
  ggplot(
    aes(x = x, 
        y = value, lty = Prior)) + 
  geom_line() + 
  scale_x_continuous(
    "Dur_log",limits = c(-8,8)) +
  scale_y_continuous(
    "probability density")

118 / 133

Your turn!

🏃

Re-estimate the model 4 with a custom prior for both regression effects

Use tab_model() to summarize both models: this model and the model with the brms default priors

Big changes?

Remember:

119 / 133

Possible solution

# Set the priors
priors_M4b <- c(
  set_prior("normal(0,2)", class = "b", coef = "Right_Text"),
  set_prior("normal(0,2)", class = "b", coef = "Text_Length")
)
# Estimate the model
Model4b_CJ <- brm(
  Dur_Log ~ 1 + Text_Length + Right_Text + (1|Partic) + (1|Compar),
  data = Data_CJ2,
  family = "gaussian",
  prior = priors_M4b,
  backend = "cmdstanr",
  cores = 4,
  seed = 1975
)
# Summarize both models side by side
tab_model(Model4_CJ, Model4b_CJ)

120 / 133

Possible solution

 
Dur Log
Dur Log

Predictors
Estimates
CI (95%)
Estimates
CI (95%)

Intercept
6.63
6.23 – 6.99
6.64
6.25 – 7.01

Text Length
0.39
0.31 – 0.48
0.39
0.30 – 0.48

Right Text
-0.35
-0.52 – -0.18
-0.35
-0.51 – -0.18

Random Effects

σ2
0.88
0.88


τ00
0.02 Compar
0.02 Compar



0.39 Partic
0.39 Partic


ICC
0.32
0.32


N
26 Partic
26 Partic



30 Compar
30 Compar

Observations
499
499

Marginal R2 / Conditional R2
0.133 / 0.418
0.132 / 0.417
121 / 133

Where to get help?

Now that you're ready to set your steps in Bayesian Mixed Effects Models there is a high probability that this will happen sometime:

This is a great source to learn more about Stan; brms & Hamiltonian MCMC:

https://discourse.mc-stan.org/t/divergent-transitions-a-primer/17099

122 / 133

To wrap-up: why `brms` ?

brms is more versatile than lme4 allowing also to estimate
- multivariate models
- finite mixture models
- generalized mixed effects models
- splines
- ...
brms as stepping-stone to learn Stan

123 / 133

Some references & appendices

get me back to the topic slide

124 / 133

Some books

125 / 133

Some books

126 / 133

Some free online books

Bayes Rules!:

https://www.bayesrulesbook.com/

Last Friday I learned about this book:

https://vasishth.github.io/bayescogsci/book/

127 / 133

Rens van de Schoot

In Nature Reviews

128 / 133

THE Podcast

If you like running - like I do - this could be a great companion on your run!

https://www.learnbayesstats.com/

129 / 133

Site on creating the graphs

There are many blogs and websites that you can consult if you want to find out more about making graphs.

One that I often fall back to is:

http://mjskay.github.io/tidybayes/

130 / 133

Examplary paper

with an associated 'Walkthrough in R': https://rpubs.com/jensroes/765467

131 / 133

Questions?

Do not hesitate to contact me!

sven.demaeyer@uantwerpen.be

132 / 133

back to Prerequisites

133 / 133

About me

University of Antwerp; Faculty of Social Sciences; dept. Training and Education Sciences

phd on methodological issues in research on School Effectiveness

research on a broad range of topics (e.g. psychometrics, writing research, learning strategies, education for sustainable development, linguistics, speech development of impaired children, ...)

my main focus shifted recently to Comparative Judgement and Learning from comparisons

since a year or 2 discovering Bayesian Statistics & brms

Twitter: @svawa

2 / 133

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Workshop: Bayesian analyses in R

About me

What about you?

Topics

1. Aims & practical stuff

My aims are ...

All material is shared...

Zoom

Your own data

Prerequisites

What's Bayesian inference about?

First things first...

Frequentist vs. Bayesian inference

To get us started...

What a Frequentist approach can tell us

The Bayesian approach

The Bayesian approach

The Bayesian approach

The Bayesian approach

Advantages & Disadvantages of Bayesian analyses

Bayesian Theorem

Likelihood

Likelihood

Prior

Prior

Prior

Uninformative / Vague

Informative

Prior

Say we have following priors

Calculate the Posterior by hand (aka grid approximation)

Visualise the posterior distribution

Visualise the posterior distributions

3. Why brms?

Imagine

Markov Chain Monte Carlo - Why?

MCMC - demonstrated

Side-step: conjugate priors

Software

Example of Stan code

4. brms basic example

brms syntax

Let's retake the example on running

But ...

So

MCMC = sampling

Good old summary( ) function

Samples & chains?

Samples & chains: what are 'burn-in' iterations

Samples & chains: what are 'burn-in' iterations

Samples & chains: what are 'burn-in' iterations

Good ald plot( ) function

Model Convergence

Let's inspect the output again

5. Let's get practical: the example dataset

Judging 'argumentative texts'

Procedure

AOI's

The data

6. Let's slightly increase complexity

Question 1

Question 1

Question 1

Question 1

Question 1

Further exploration of the posterior

Time to unpack the plotting power of ggplot2

Time to unpack the plotting power of ggplot2

Your turn

Solution...

Another type of plot with some more information in it

Your turn

Solution

We can also integrate plots for mu and sigma

Your turn

Solution

How to get rid of the log?

Question 2

Your turn

Solution

3. Why `brms`?

4. `brms` basic example

`brms` syntax

Good old `summary( )` function

Good ald `plot( )` function

Time to unpack the plotting power of `ggplot2`

Time to unpack the plotting power of `ggplot2`

Let's introduce `mcmc_areas()` from the `bayesplot`

or `mcmc_areas_ridges` from `bayesplot`

or `mcmc_intervals` from `bayesplot`

HDI + ROPE with `brms` results

Or the `parameters` package

Another great package `sjPlots`

Change priors in `brms`

To wrap-up: why `brms` ?