class: center, middle, inverse, title-slide # Modelling eye-tracking data in comparative judgement ## ⚔
EARLI SIG27 - 2020 ### Sven De Maeyer; Marije Lesterhuis; Marijn Gijsen ### University of Antwerp ### 2020-11-25 --- name: introduction class: center,inverse, bottom background-image: url(sharon-mccutcheon-NeRKgBUUDjM-unsplash.jpg) background-size: contain # Introduction ??? These slides are made, making use of the Xaringan package in R. Great Stuff! --- ## Judging 'argumentative texts'  --- ## Two types of processes? <http://www.youtube.com/watch?v=z2f_Ue45KWM> <iframe width="640" height="480" src="https://www.youtube.com/embed/z2f_Ue45KWM" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen > </iframe> --- ## So... Gaze duration data might result from different cognitive processes: `\(\rightarrow\)` .pink[shorter visits] in both texts (reading for building a first 'mental model' / .pink[scanning]) `\(\rightarrow\)` .pink[longer visits] in one of both texts (reading for .pink[text comprehension]) --- ## But... How to statistically model the resulting duration data? ... .pink[without **categorisation** ](setting tresholds to distinguish scanning from text comprehension) ... and .pink[avoiding **aggregation**] Goal of this research: > build and test statistical models that acknowledge the data generating cognitive processes and that do not make use of 'trimming', 'setting tresholds' or 'aggregation' --- class: inverse, center background-image: url(wyron-a-Lhb1DyyNr7U-unsplash.jpg) background-size: contain # Methodology --- ## Procedure - 26 high school teachers (Dutch) voluntary participated - each did 10 comparisons of 2 argumentative texts from 10th graders - 3 batches of comparisons with random allocation of judges to one of the batches - all batches similar composition of comparisons regarding the characteristics of the pairs; the pairs, however not the same - Tobii TX300 dark pupil eye-tracker with a 23-inch TFT monitor (max. resolution of 1920 x 1080 pixels) - data sampled binocularly at 300 Hz --- ## AOI's  --- ## The data
--- ## Statistical model 1 Simple .pink[mixed effects model] = 'Gaze Event duration' of AOI visits are nested within the combination of juges and comparisons `$$y_{i(jk)}=\beta_{0} + \mu_{0j} + \nu_{0k} + \epsilon_{i(jk)}$$` with: - `\(y_{i(jk)}\)` = gaze event duration of a visit to an AOI (text); - `\(\beta_0\)` = the intercept (overall average duration); - `\(\mu_{0j}\)` = unique effect of judge `\(j\)`; - `\(\nu_{0k}\)` = unique effect of comparison `\(k\)`; - `\(\epsilon_{i(jk)}\)` = residual; --- ## Statistical model 2 first mixed effects <span style="color: rgb(249, 38, 114)">finite mixture </span> model Model1 **+ assuming two data generating processes + gaze event durations from 'text comprehension' differ for judges and comparisons** `$$\begin{aligned} y_{i(jk)} = & \theta \times (\beta_{1} + \epsilon_{1i(jk)}) + \\ & (1-\theta) \times (\beta_{2} + \mu_{2j} + \nu_{2k} + \epsilon_{2i(jk)}) \end{aligned}$$` with: - `\(\beta_1\)` = intercept 1, overall average duration process 1 `\(\rightarrow\)` 'scanning'; - `\(\beta_2\)` = intercept 2, overall average duration process 2 `\(\rightarrow\)` 'text comprehension reading'; - `\(\mu_{2j}\)` = unique effect of judge `\(j\)` on 'text comprehension reading'; - `\(\nu_{2k}\)` = unique effect of comparison `\(k\)` on 'text comprehension reading'; - `\(\epsilon_{1i(jk)}\)` & `\(\epsilon_{2i(jk)}\)` = residuals; - `\(\theta\)` = mixing proportion (weight) --- ## Statistical model 3 Model 1 **+ assuming two data generating processes + gaze event durations from 'scanning' differ for judges and comparisons** $$ `\begin{aligned} y_{i(jk)} = & \theta \times (\beta_{1} + \mu_{1j} + \nu_{1k} + \epsilon_{1i(jk)}) + \\ & (1-\theta) \times (\beta_{2} + \epsilon_{2i(jk)}) \end{aligned}` $$ with: - `\(\beta_1\)` = intercept 1, overall average duration process 1 `\(\rightarrow\)` 'scanning'; - `\(\beta_2\)` = intercept 2, overall average duration process 2 `\(\rightarrow\)` 'text comprehension reading'; - `\(\mu_{1j}\)` = unique effect of judge `\(j\)` on 'scanning'; - `\(\nu_{1k}\)` = unique effect of comparison `\(k\)` on 'scanning'; - `\(\epsilon_{1i(jk)}\)` & `\(\epsilon_{2i(jk)}\)` = residuals; - `\(\theta\)` = mixing proportion (weight) --- ## Statistical model 4 Model 1 + two data generating processes + **gaze event durations from both processes differ for judges and comparisons** $$ `\begin{aligned} y_{i(jk)}= & \theta \times (\beta_{1} + \mu_{1j} + \nu_{1k} + \epsilon_{1i(jk)}) + \\ & (1-\theta) \times (\beta_{2} + \mu_{2j} + \nu_{2k} + \epsilon_{2i(jk)}) \end{aligned}` $$ With: - `\(\beta_1\)` = intercept 1, overall average duration process 1 `\(\rightarrow\)` 'scanning'; - `\(\beta_2\)` = intercept 2, overall average duration process 2 `\(\rightarrow\)` 'text comprehension reading'; - `\(\mu_{1j}\)` & `\(\mu_{2j}\)` = unique effect of judge `\(j\)` on 'scanning' & 'text comprehension reading'; - `\(\nu_{1k}\)` & `\(\nu_{2k}\)` = unique effect of comparison `\(k\)` on 'scanning' & 'text comprehension reading'; - `\(\epsilon_{1i(jk)}\)` & `\(\epsilon_{2i(jk)}\)` = residuals; - `\(\theta\)` = mixing proportion (weight) --- ## Analysis (Bayesian estimation) - <span style="color: rgb(249, 38, 114)"> brms </span>(wrapper around Stan) to estimate the models making use of MCMC in R - <span style="color: rgb(249, 38, 114)"> flat </span>(uninformed) priors - <span style="color: rgb(249, 38, 114)"> 4 chains </span> of <span style="color: rgb(249, 38, 114)">15000 iterations </span> (with 1000 burn-in it.) - compare the models with 'leave-one-out cross-validation' approach - summarize best model (interpret the posterior distribution) --- class: inverse, center background-image: url(markus-winkler-8-X2_qeTdlQ-unsplash.jpg) background-size: contain # Results --- ## Comparison of the models <table class=" lightable-minimal" style='font-family: "Trebuchet MS", verdana, sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'> <caption>Model comparison expressed as expected log predictive density with standard errors between brackets</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> `\(\Delta\widehat{elpd}\)` </th> <th style="text-align:right;"> `\(\widehat{elpd}\)` </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Model 4 </td> <td style="text-align:right;"> 0.0 (0.0) </td> <td style="text-align:right;"> -5548.8 (29.5) </td> </tr> <tr> <td style="text-align:left;"> Model 2 </td> <td style="text-align:right;"> -56.4 (10.3) </td> <td style="text-align:right;"> -5604.8 (29.3) </td> </tr> <tr> <td style="text-align:left;"> Model 3 </td> <td style="text-align:right;"> -121.4 (20.1) </td> <td style="text-align:right;"> -5669.9 (28.4) </td> </tr> <tr> <td style="text-align:left;"> Model 1 </td> <td style="text-align:right;"> -239.7 (19.6) </td> <td style="text-align:right;"> -5788.1 (30.5) </td> </tr> </tbody> </table> `\(\rightarrow\)` **.pink[Model 4] fits best ** - 2 data generating processes - differences between judges AND comparisons --- ### Posterior distribution of .pink[fixed effects] .left-column[ </br> _average duration of visits coming from ... _ - 'scanning' +/- .44 secs - 'text comprehension' +/- 8.10 secs ] .right-column[ .right[ <!-- --> ] ] --- ### Posterior distribution of .pink[theta & residual variances] .left-column[ </br> - more visits from 'scanning' than 'text comprehension' - residual variance larger for 'text comprehension' ] .right-column[ .right[ <!-- --> ]] --- ### Posterior distribution of .pink[random effects] .left-column[ </br> - judges stronger impact on both types of durations - biggest differences for durations form 'text comprehension' ] .right-column[ .right[ <!-- --> ]] --- ### Diff. between judges in reading for text comprehension .left-column[ </br> _This plot shows how .pink[judges] differ in durations coming form 'text comprehension'_ ] .right-column[ .right[ <!-- --> ]] --- ### Diff. between comparisons in reading for text comprehension .left-column[ </br> _This plot shows how .pink[comparisons] result in different durations for 'text comprehension'_ `\(\rightarrow\)` _clearly more alike than judges_ ] .right-column[ .right[ <!-- --> ]] --- class: inverse, center, bottom background-image: url(felicia-buitenwerf-Qs_Zkak27Jk-unsplash.jpg) background-size: contain # Conclusion & Discussion --- ## Conclusion - model that acknowledges gaze event durations come from different cognitive processes is most likely - judges and comparisons result in different gaze event durations - mixed effects finite mixture models are promising for this kind of data - _no need to decide which fixation duration point to scanning and which to text comprehension_ - _opens up many possibilities for follow-up research questions and analyses_ --- ## Discussion - what about more than 2 processes? - more informed priors - a need for .pink[triangulation] to understand the two types of processes - what if representations are not texts; how .pink[task specific] are these models? - other applications of .pink[(mixed effects) finite mixture models] when modelling process data? --- class: inverse, center, bottom background-image: url(marko-pekic-IpLa37Uj2Dw-unsplash.jpg) background-size: contain # Questions? Do not hesitate to contact us! sven.demaeyer@uantwerpen.be The material is shared on OSF [https://osf.io/evrsf/]