Individual Differences & Item Effects: How to test them, & how to test them well

Similar documents
Probability and Statistics Curriculum Pacing Guide

5 Guidelines for Learning to Spell

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Corpus Linguistics (L615)

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

B. How to write a research paper

12- A whirlwind tour of statistics

On-the-Fly Customization of Automated Essay Scoring

Mandarin Lexical Tone Recognition: The Gating Paradigm

STA 225: Introductory Statistics (CT)

Unraveling symbolic number processing and the implications for its association with mathematics. Delphine Sasanguie

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Rhythm-typology revisited.

OFFICE SUPPORT SPECIALIST Technical Diploma

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Research Design & Analysis Made Easy! Brainstorming Worksheet

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

learning collegiate assessment]

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

RETURNING TEACHER REQUIRED TRAINING MODULE YE TRANSCRIPT

Running head: DELAY AND PROSPECTIVE MEMORY 1

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Radius STEM Readiness TM

Interdisciplinary Journal of Problem-Based Learning

SOFTWARE EVALUATION TOOL

Lecture 2: Quantifiers and Approximation

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

How to Judge the Quality of an Objective Classroom Test

2 Any information on the upcoming science test?

Cognitive bases of reading and writing in a second/foreign language. DIALUKI (

CS Machine Learning

Course Law Enforcement II. Unit I Careers in Law Enforcement

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Relationships Between Motivation And Student Performance In A Technology-Rich Classroom Environment

EVERYTHING DiSC WORKPLACE LEADER S GUIDE

Speech Recognition at ICSI: Broadcast News and beyond

ROA Technical Report. Jaap Dronkers ROA-TR-2014/1. Research Centre for Education and the Labour Market ROA

Custom Program Title. Leader s Guide. Understanding Other Styles. Discovering Your DiSC Style. Building More Effective Relationships

How to make an A in Physics 101/102. Submitted by students who earned an A in PHYS 101 and PHYS 102.

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Empowering Students Learning Achievement Through Project-Based Learning As Perceived By Electrical Instructors And Students

Student Morningness-Eveningness Type and Performance: Does Class Timing Matter?

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Monitoring Metacognitive abilities in children: A comparison of children between the ages of 5 to 7 years and 8 to 11 years

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Game-based formative assessment: Newton s Playground. Valerie Shute, Matthew Ventura, & Yoon Jeon Kim (Florida State University), NCME, April 30, 2013

Listening to your members: The member satisfaction survey. Presenter: Mary Beth Watt. Outline

Analysis of Enzyme Kinetic Data

GDP Falls as MBA Rises?

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

How Does Physical Space Influence the Novices' and Experts' Algebraic Reasoning?

Evidence for Reliability, Validity and Learning Effectiveness

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Source-monitoring judgments about anagrams and their solutions: Evidence for the role of cognitive operations information in memory

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Sheila M. Smith is Assistant Professor, Department of Business Information Technology, College of Business, Ball State University, Muncie, Indiana.

Generating Test Cases From Use Cases

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Evaluation of Teach For America:

arxiv: v1 [cs.cl] 2 Apr 2017

MTH 141 Calculus 1 Syllabus Spring 2017

Introduction to Questionnaire Design

Contents. Foreword... 5

Educational Leadership and Policy Studies Doctoral Programs (Ed.D. and Ph.D.)

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

West s Paralegal Today The Legal Team at Work Third Edition

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Copyright Corwin 2015

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Active Ingredients of Instructional Coaching Results from a qualitative strand embedded in a randomized control trial

THE USE OF TINTED LENSES AND COLORED OVERLAYS FOR THE TREATMENT OF DYSLEXIA AND OTHER RELATED READING AND LEARNING DISORDERS

Towards Developing a Quantitative Literacy/ Reasoning Assessment Instrument

Travis Park, Assoc Prof, Cornell University Donna Pearson, Assoc Prof, University of Louisville. NACTEI National Conference Portland, OR May 16, 2012

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

Assessing speaking skills:. a workshop for teacher development. Ben Knight

Justin Raisner December 2010 EdTech 503

Independent Assurance, Accreditation, & Proficiency Sample Programs Jason Davis, PE

Driving Competitiveness. Delivering Growth and Sustainable Jobs. 29 May 2013 Dublin Castle, Ireland

Modern Project Management. Brendan Bartels

Office Hours: Mon & Fri 10:00-12:00. Course Description

Faculty Schedule Preference Survey Results

Investment in e- journals, use and research outcomes

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Full text of O L O W Science As Inquiry conference. Science as Inquiry

SSIS SEL Edition Overview Fall 2017

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Cued Recall From Image and Sentence Memory: A Shift From Episodic to Identical Elements Representation

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks

Learning From the Past with Experiment Databases

Measures of the Location of the Data

Transcription:

Individual Differences & Item Effects: How to test them, & how to test them well

Individual Differences & Item Effects Properties of subjects Cognitive abilities (WM task scores, inhibition) Gender Age Properties of items Lexical frequency Segmental properties Plausibility L2 proficiency Task strategy

Two Challenges Subject & item properties are not at the level of individual trials How to implement in your model? What do they mean statistically? Subject & item properties often not experimentally manipulated How to best investigate?

Example Study Fraundorf et al., 2010 Both the British and the French biologists had been searching Malaysia and Indonesia for the endangered monkeys. Finally, the British spotted one of the monkeys in MALAYSIA and planted a radio tag on it. British found it or French found it? In Malaysia or in Indonesia? INTRO. EXPT 1 EXPT 2 DISC.

Manipulate presentational vs contrastive accents Finally, the British spotted one of the monkeys in MALAYSIA... Finally, the BRITISH spotted one of the monkeys in Malaysia... Finally, the BRITISH spotted one of the monkeys in MALAYSIA... Finally, the British spotted one of the monkeys in Malaysia... INTRO. EXPT 1 EXPT 2 DISC.

Original Results Contrastive (L+H*) accent benefits memory No effect of accent on other item Effects seem localized INTRO. EXPT 1 EXPT 2 DISC.

Implementing in R New experiment: do these effects vary with individual differences in working memory? Need trial level and subject level variables in the same dataframe

Implementing in R Then, can add to model just like any other factor: lmer(correct ~ Accent * WM_Score + (1 Subject) + (1 StoryID), family=binomial) R automatically figures out it's subject-level Each subject always has the same score

Merging Dataframes What if trials & subjects in separate files? Data1: Trial-level Data2: Subject-level Load them both into R and use merge: FullDataframe = merge(data1, Data2, all.x=true)

Merging Dataframes But, these may be separate files Data1: Trial-level Data2: Subject-level Load them both into R and use merge: FullDataframe = merge(data1, Data2, all.x=true) Need some column that has the same name in both data frames

Merging Dataframes Load them both into R and use merge: FullDataframe = merge(data1, Data2, all.x=true) Need some column that has the same name in both data frames Can specify WHICH columns to use with the by parameter. See?merge for more details. Default is to delete subjects if they can't be matched across data frames. all.x = TRUE fills in NA values instead so you can track these subjects

What's Going On Statistically? LEVEL 2: Subjects, Items Knight story Monkey story LEVEL 1: Trial Knight Monkey Knight Monkey

What's Going On Statistically? LEVEL 2: Subjects, Items Knight Monkey Have random effects of our subjects & items. Results in residuals: Eun-Kyung accuracy: 80% Tuan accuracy: 72% +4 vs mean -4 vs mean Level 2 factors may help us explain this variation

What's Going On Statistically? Model without WM: Unexplained variance between subjects Model with main effect of WM: Unexplained subject variance reduced Fixed effects unchanged because these were manipulated within subjects

Random Slopes & Adding main effects at Level 2 will not change fixed effects at Level 1 But can also add INTERACTIONS with trial level factors These help explain the random slopes

Effect of Subject-Level Variables Remember random slopes? Variance between subjects in a fixed effect Memory Accuracy Alison Zhenghan Other Item Has Presentational Accent Other Item Has Contrastive Accent

Random Slopes & Adding main effects at Level 2 will not change fixed effects at Level 1 But can also add INTERACTIONS These help explain the random slopes May be more interesting, theoretically People with low WM scores DO show a penalty to memory if something else in the story gets a contrastive accent

Random Slopes & Illogical to have a random slope by subject for something at the subject level There isn't a separate WM effect for each subject lmer lets you fit this but I'm not sure what it represents

Individual Differences: How to Do Them Well What Scott has learned from the individual differences literature Example study: Pitch accenting as cue to reference resolution (deaccented referents are usually given) Can we predict individual differences in use of this cue?

Discriminant Validity Many individual differences are correlated

Discriminant Validity Many individual differences are correlated e.g. some subjects may just try harder than others Consequently, they would do better on both WM task & eye-tracking task Usually not theoretically interesting Principle #1: Include >1 construct so we know what really matters

Discriminant Validity How to deal with correlated predictors? Simple solution: Regress 1 on the other ModelWM <- lm(wmmean ~ PSpeed, data=cyclops) Then use the residuals as new measure Cyclops$ResidWM <- residuals(modelwm) The part of WM we couldn't explain from perceptual speed Better solutions: path analysis & structural equation modeling

Discriminant Validity

Discriminant Validity Here you select which variables go in the scatterplot Some people asked about how to get these colored scatterplots... Need to download & load package gclus Then... Cyclops.short <- subset(cyclops, select=c('pspeed', 'GoodProsody', 'ResidWM')) Cyclops.r <- abs(cor(cyclops.short, use="pairwise.complete.obs")) Cyclops.col <- dmat.color(cyclops.r) Cyclops.o order.single(cyclops.col) cpairs(cyclops.short, Cyclops.o, panel.colors=cyclops.col, gap=.5)

Reliability Not all individual measures are good measures Measures may be noisy Measures may not measure a stable or meaningful characteristic Suppose you found vocab predicted outcome but not WM Maybe you had a bad WM measure

Reliability Good tests produce consistent scores Measuring something real about a person Can test this yourself with >1 assessment or split halves Calculate Pearson's r: cor.test(cyclops$pspeed1, Cyclops$PSpeed2) Scatterplot: plot(cyclops$pspeed1, Cyclops$PSpeed2) Typical standard may be r =.70 -.80 needed for good reliability

Reliability Good tests produce consistent scores Measuring something real about a person Can test this yourself with >1 assessment or split halves r =.77 Good! r =.16 Bad! Principle 2: Check reliability of measures!

Latent Variables Some things can be measured directly e.g. gender of a subject, segmental properties of a work Many things in psychology measured indirectly i n k Ə d Ability to do tasks in spite of interference Alphabet Span Task (Read words & recall alphabeticaly)

Latent Variables But, few tasks are process pure Alphabet knowledge Alphabet Span Working memory Reading Span Reading ability

Latent Variables Principle 3: Overcome task-specific factors with multiple measures of same construct Simple analysis: Use sum or average as your predictor Advanced techniques Verify measures are related with factor analysis Examine only common variance: latent variable analysis, structural equation modeling

Continuous Predictors Many individual differences are continuous Good to include continuous variation if you have full range Splits needed in ANOVA But throws away info.; less powerful Histogram: hist(cyclops$wm, breaks=20)

Continuous Predictors Don't want to treat predictor as continuous if sampling was dichotomous In this case, we didn't sample middle-aged people

Continuous Predictors Don't want to treat predictor as continuous if sampling was dichotomous Pattern could be this...

Continuous Predictors Don't want to treat predictor as continuous if sampling was dichotomous...or this!

Continuous Predictors Don't want to treat predictor as continuous if sampling was dichotomous Here there be dragons We have no info. about what should be in the middle

Comparing Predictors How do we tell which has a stronger effect? QVT QVR TEMERITY (A) rashness (B) timidity (C) desire (D) kindness Perceptual Speed Measure: # of same/different judgments in 2 min. Beta = 6.03 1 add'l trial: prosody score + 6 Vocab Measure: # of multiplechoice Qs correct of 40 Beta = 14.69 1 add'l correct word: prosody score + 15

Comparing Predictors Issue: Measures often on different scales Perceptual Speed Beta = 6.03 Range: 82 to 236 Mean: 160 Std. Dev.: 28.75 Vocab Beta = 14.69 Range: 12.00 to 32.00 Mean: 20.80 Std. Dev.: 5.30

Center so mean = 0 Comparing Predictors Issue: Measures often on different scales Solution: Standardize the predictors so you are comparing z scores Cyclops$Vocab_z = scale(cyclops$vocab, center=true, scale=true) Changes your parameter estimates but not your hypothesis tests Perceptual speed: Standardized beta =.31 Vocab: Standardized beta =.14 Scale so SD = 1

Comparing Predictors