Mixed-effects modeling

Similar documents
Probability and Statistics Curriculum Pacing Guide

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Mandarin Lexical Tone Recognition: The Gating Paradigm

STA 225: Introductory Statistics (CT)

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Individual Differences & Item Effects: How to test them, & how to test them well

Universityy. The content of

Hierarchical Linear Models I: Introduction ICPSR 2015

Comparing Teachers Adaptations of an Inquiry-Oriented Curriculum Unit with Student Learning. Jay Fogleman and Katherine L. McNeill

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

Phonological and Phonetic Representations: The Case of Neutralization

PROMOTING QUALITY AND EQUITY IN EDUCATION: THE IMPACT OF SCHOOL LEARNING ENVIRONMENT

Statewide Framework Document for:

Python Machine Learning

Good Enough Language Processing: A Satisficing Approach

On-the-Fly Customization of Automated Essay Scoring

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

The Good Judgment Project: A large scale test of different methods of combining expert predictions

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Relationships Between Motivation And Student Performance In A Technology-Rich Classroom Environment

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

CS Machine Learning

Multiple regression as a practical tool for teacher preparation program evaluation

Analysis of Enzyme Kinetic Data

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Lecture 1: Machine Learning Basics

An Empirical and Computational Test of Linguistic Relativity

Phonological Encoding in Sentence Production

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

+32 (0)

Research Design & Analysis Made Easy! Brainstorming Worksheet

Gender and socioeconomic differences in science achievement in Australia: From SISS to TIMSS

English Language and Applied Linguistics. Module Descriptions 2017/18

Understanding Games for Teaching Reflections on Empirical Approaches in Team Sports Research

Cal s Dinner Card Deals

Comparison of network inference packages and methods for multiple networks inference

Running head: DELAY AND PROSPECTIVE MEMORY 1

CHAPTER III RESEARCH METHOD

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Multiple Route Model of Lexical Processing

Probabilistic Latent Semantic Analysis

Why Did My Detector Do That?!

Automatization and orthographic development in second language visual word recognition

Cross-Year Stability in Measures of Teachers and Teaching. Heather C. Hill Mark Chin Harvard Graduate School of Education

Evidence for Reliability, Validity and Learning Effectiveness

Physics 270: Experimental Physics

Go fishing! Responsibility judgments when cooperation breaks down

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Cooperative Game Theoretic Models for Decision-Making in Contexts of Library Cooperation 1

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Learning Disability Functional Capacity Evaluation. Dear Doctor,

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

learning collegiate assessment]

Discovering Statistics

Examining the Earnings Trajectories of Community College Students Using a Piecewise Growth Curve Modeling Approach

ROA Technical Report. Jaap Dronkers ROA-TR-2014/1. Research Centre for Education and the Labour Market ROA

How to Judge the Quality of an Objective Classroom Test

Rhythm-typology revisited.

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

NCEO Technical Report 27

Grade 6: Correlated to AGS Basic Math Skills

Syntactic surprisal affects spoken word duration in conversational contexts

Corpus Linguistics (L615)

Seminar - Organic Computing

Effects of Vocabulary and Phonotactic Probability on 2-Year-Olds Nonword Repetition

The phonological grammar is probabilistic: New evidence pitting abstract representation against analogy

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Tun your everyday simulation activity into research

Development of Multistage Tests based on Teacher Ratings

WHEN THERE IS A mismatch between the acoustic

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Assignment 1: Predicting Amazon Review Ratings

Retrieval in cued recall

Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure

PHD COURSE INTERMEDIATE STATISTICS USING SPSS, 2018

Greek Teachers Attitudes toward the Inclusion of Students with Special Educational Needs

Evaluation of Teach For America:

Introduction to the Practice of Statistics

2 nd grade Task 5 Half and Half

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

GDP Falls as MBA Rises?

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Evaluating Statements About Probability

Predicting the Performance and Success of Construction Management Graduate Students using GRE Scores

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Evaluation of ecodriving performances and teaching method: comparing training and simple advice

Age Effects on Syntactic Control in. Second Language Learning

APPENDIX A: Process Sigma Table (I)

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Is Event-Based Prospective Memory Resistant to Proactive Interference?

Transcription:

Mixed-effects modeling Colin Wilson Phonetics Seminar, May 21, 2007 Mixed-effects modeling p.1/34

Main sources Baayen (2004) Statistics in Psycholinguistics: A critique of some gold standards. Mental Lexicon Working Papers I. Baayen (to appear) Analyzing Linguistic Data: A practical approach to statistics. Cambridge University Press. Baayen, Davidson and Bates (2006) Mixed-effects modeling with crossed random effects for subjects and items. Submitted. Mixed-effects modeling p.2/34

Problem: crossed random effects In many types of linguistic experiments (artificial grammar learning, lexical decision, phonetic production and perception,... ) participants and materials are random and crossed effects. (Clark. 1973. The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. JVLVB 12, 335-359) Mixed-effects modeling p.3/34

Random effects A random effect is a factor whose levels in the experiment are nonexhaustively sampled from a larger population of interest. The analysis must take into account the fact that we wish to generalize our results beyond the particular sample to the population.* *This does not imply that the results necessarily or plausibly hold of every member of the population. To take a typical case, the mean µ of a normal distrib. is a population parameter, but Pr(x = µ) = 0. Mixed-effects modeling p.4/34

Random effects (Baayen et al. 2006) the interest of most studies is not about experimental effects present only in the individuals who participated in the experiment, but rather in effects present in speakers everywhere (2) most materials in a single experiment do not exhaust all possible syllables, words, or sentences that could be found in a given language (2) Any naturalistic stimulus which is a member of a population of stimuli which has not been exhaustively sampled should be considered a random variable for the purposes of an experiment (31). Mixed-effects modeling p.5/34

Random effects (Baayen et al. 2006) The current practice of psychophysiologists and neuorimaging researchers typically ignores the issue of whether linguistic materials should be modeled with fixed or random effect models (30). Individual subjects and items may have intercepts and slopes that diverge considerably from the population means (24). we know that no two brains are the same (25) Mixed-effects modeling p.6/34

Crossed effects Two effects are crossed in an experiment if every level of one effect co-occurs with every level of the other effect. Counterbalancing, while important, does not eliminate the crossing of participants and materials typically, it requires crossing. Each subject saw each item in exactly one condition. The number of items in each condition was the same for each subject, and each item occurred in each condition the same number of times across subjects. Mixed-effects modeling p.7/34

Problem: crossed random effects In many types of linguistic experiments (artificial grammar learning, lexical decision, phonetic production and perception,... ) participants and materials are random and crossed effects. (Clark. 1973. The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. JVLVB 12, 335-359) Mixed-effects modeling p.8/34

Standard solution (Baayen et al. 2006) Clarks oft-cited paper presented a technical solution to this modeling problem, based on statistical theory and computational methods available at the time (e.g., Winer, 1971). This solution involved computing a quasi-f statistic which, in the simplest-to-use form, could be approximated by the use of a combined minimum-f statistic derived from separate participants (F1) and items (F2) analyses (2). Mixed-effects modeling p.9/34

Critique of the standard (Baayen et al. 2006) Deficient statistical power (see 4 and Baayen 2004: 3.2.3 for simulation results) Inability to handle missing data Different methods for handling continuous and categorical data (see Baayen 2004: 4) unprincipled methods of modeling heteroskedasticity and non-spherical error variance (3) Mixed-effects modeling p.10/34

Critique of the standard (Baayen 2004) Cost of dichotomization It is widely believed that [dichotimization] is the most powerful means of ascertaining the independent effect of variables such as frequency of occurrence that are correlated with many other potentially relevant predictors. Unfortunately, this belief is incorrect (2). (references: Cohen 1983, Harrell 2001) Also: Arbitrarity of cutoff points for low and high conditions (e.g., what counts as hard or frequent?). Mixed-effects modeling p.11/34

Critique of the standard (Baayen 2004) Cost of prior averaging The by-subject and by-item analyses that are currently the norm in psycholinguistic studies also bring along systematic data loss. It is widely believed that these averaging techniques are the best that current statistics has to offer (8). [This seems to ignore one benefit of averaging, namely reduction of the error of the estimates.] Mixed-effects modeling p.12/34

Critique of the standard (Baayen 2004) Summary Factorial designs are commonly used where regression is more appropriate. Dichotomization and factorization of numerical predictors, although widely practised, lead to a loss of power and should be avoided. Psycholinguists are generally very reluctant to include covariates in their analyses, even though including relevant covariates is part and parcel of statistical common sense (37). Mixed-effects modeling p.13/34

The future is now (Baayen 2004) as anyone following statistical developments outside the field of psycholinguistics (for instance, in Psychological Methods or in Behavioral Research Methods, Instruments and Computers, or in Venables & Ripley, 2003) will have realized, current statistics has a lot more to offer, both in power and in the insight provided into the quantitative structure of the data (38). Mixed-effects modeling p.14/34

The future is now (Baayen et al. 2006) In the 30+ years since [Clark 1973], statistical techniques have expanded the space of possible solutions to this problem, but these techniques have not yet been applied widely in the field of language and memory studies (2) we introduce a very recent development in computational statistics, namely, the possibility to include subject and items as crossed random effects, as opposed to hierarchical or multilevel models in which random effects must be assumed to be nested (2). (see Bates & Penheiro, 1998; Pinheiro & Bates 2000) Mixed-effects modeling p.15/34

Proposal: Mixed-effect models Modern mixed-effect modeling allows fixed and random effects to be combined (i.e., mixed ) allows random effects to be crossed (a result of the recent developments ) allows covariates to be included in the model (e.g., trial number) is a form of regression (and so does not require dichotomization or aggregation) generalizes to categorical responses Mixed-effects modeling p.16/34

Hypothetical example (Baayen et al. 2006) Three participants (s1, s2, s3) responded to three items (w1, w2, w3) in a primed lexical decision task under both short and long SOA [stimulus onset asynchrony] Two random effects (crossed) participants items One fixed effect (crossed with rand. effects) SOA (short vs. long) Mixed-effects modeling p.17/34

Hypothetical data Subject Item SOA RT s1 w1 long 466 s1 w2 long 520 s1 w3 long 502 s1 w1 short 475 s1 w2 short 494 s1 w3 short 490 s2 w1 long 516... Mixed-effects modeling p.18/34

Mixed-effect formula y = Xβ + Zb + ǫ, ǫ N(0,σ 2 I), b N(0,σ 2 Σ), ǫ and b are independent random variables where y is the vector of dependent values (RTs) X is the fixed-effect design matrix β is the vector of fixed-effect coefficients Z is the random-effect design matrix b is the vector of adjustments for subjects, items ǫ is the vector of residual errors (subjects items Mixed-effects modeling p.19/34

Mixed effect formula (one subject-item pair) y ij = X ij β + Z ij b ij + ǫ ij where i is a subject and j is an item. Suppose that for all i and j in the population: X ij = 1 0 1 1,β = 522.2 19.10 intercept SOAshort and suppose that for subject 1 and item 1: Z 11 = 1 0 1 26.2 s1 intercept,b 11 = 11.0 s1 & SOAshort 1 1 1 28.3 w1 intercept Mixed-effects modeling p.20/34

Mixed effect formula (one subject-item pair) y 11 = X 11 β + Z 11 b 11 + ǫ 11 1 0 1 1 = = 522.2 503.1 467.7 459.6 522.2 19.10 + + ǫ 11 + 54.5 43.5 1 0 1 1 1 1 + ǫ 11 26.2 11.0 28.3 # by comparison with RT data, ǫ 11 0 + ǫ 11 @ 2 15 1 A Mixed-effects modeling p.21/34

Mixed-model analysis The goal of statistical analysis is to provide estimates of the population parameters and measures of the reliability of the estimates. The analysis allows us to test whether items contribute to responses independently (assumed by crossing) or only via interaction with subjects (the nested alternative). Despite the inclusion of both random effects, the analysis does not estimate a separate free parameter for each subject and item (or combination) (see Baayen et al. 2006:7). Mixed-effects modeling p.22/34

R (http://www.r-project.org/) To our knowledge, the only software currently available for fitting mixed-effects models with crossed random effects is the lme4 package (Bates, 2005; Bates & Sarkar, 2005) in R, an open-source language and environment for statistical computing (R development core team, 2005).... In statistical computing, R is the leading platform for research and development, which explains why mixed-effects models with crossed random effects are not (yet) available in commercial software packages (8). Mixed-effects modeling p.23/34

Analysis in R with lme4 library(grid) # plotting library(lme4) # analysis d <- read.table(".../baayendavidsonbates.data" + header=t) xyplot(rt SOA Item + Subject, data = d) fit1 <- lmer(rt SOA + (1 Item) + + (1 Subject), data = d, method="ml" Mixed-effects modeling p.24/34

Graph of data Mixed-effects modeling p.25/34

Result of fitting Linear mixed-effects model fit by maximum like Formula: RT SOA + (1 Item) + (1 Subject) number of obs: 18, groups: Item, 3; Subject, 3 Data: d AIC BIC loglik MLdeviance REMLdeviance 162.3 165.9-77.16 154.3 141.5 Random effects: Groups Name Variance Std.Dev. Item (Intercept) 473.18 21.753 Subject (Intercept) 401.40 20.035 Residual 127.01 11.270 Mixed-effects modeling p.26/34

Result of fitting (cont.) Fixed effects: Estimate Std. Error t value (Intercept) 522.111 17.483 29.865 SOAshort -18.889 5.313-3.555 Correlation of Fixed Effects: (Intr) SOAshort -0.152 [Note that the estimates of the fixed effects are quite close to the hypothetical parameter values a few slides back.] Mixed-effects modeling p.27/34

Regression plot (r 2 =.90) Mixed-effects modeling p.28/34

In-class excercise How would we obtain the predicted response for subject 1 on item 1 (both long and short SOA) given the fitted model above? [This is analogous to the calculation we did with the hypothetical population estimates. It s just a matter of pulling the right numbers out of R....] Mixed-effects modeling p.29/34

Assessing the model fit Note that t-values are given without ps, because determination of degrees of freedom is difficult (Bates, R-News). If t > 2, should be significant if the data set is large. Alternatively, use MCMC sampling methods (Baayen et al. 2006:11ff., Baayen to appear). The log likelihood (loglik) of the data gives a measure of how well the fitted model matches the data (MLdeviance = -2*logLik). Mixed-effects modeling p.30/34

Assessing the model fit ANOVA can be used to compare different models (Raudenbusch & Bryk 2002: 60-61) based on log likelihood / deviance. fit2 <- lmer(rt SOA + + (1 + Item Subject), + data = d, method="ml") anova(fit1, fit2) # Chi-Sq(4) = 3.0176, p<.6 # improvement in log likelihood # (-77.16 vs. -75.56) is not # sufficient to justify nesting Mixed-effects modeling p.31/34

Summary The problem addressed was estimating population parameters in the case of crossed random factors, typical in psycholinguistics. Modern mixed-effect modeling allows crossed random factors, is based on well-known statistical objectives (e.g., maximum likelihood), and provides alternatives to standard statistical tests. Mixed-effects modeling p.32/34

What was the crucial advance? Classic statistical methods allow key values to be calculate exactly, in closed-form (e.g., partitioning the variance and computing F ). Modern optimization methods such as Expectation Maximization and sampling techniques allow quantities that cannot be computed in closed form to be obtained or approximated to within any desired precision. These methods allow models with crossed random factors, for which no closed-form solution exists, to be fitted to the data. Mixed-effects modeling p.33/34

Modeling timecourse (Baayen et al. 2006) An important new possibility offered by mixed-effect modeling is to bring effects that unfold during the course of an experiment into account. There are several kinds of longitudinal effects. First, there are effects of learning or fatigue..... Second, in chronometric paradigms, the response to a target trial is heavily influenced by how the preceding trials were processed. In lexical decision, for instance, the reaction time to the preceding word in the experiment is one of the best predictors for the target latency.... Third, qualitative properties of preceding trials should be brought under statistical control (25-26). Mixed-effects modeling p.34/34