A statistical model of grammatical choices in children s productions of dative sentences. Marie-Catherine de Marneffe Scott Grimm

Similar documents
Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Processing as a Source of Accessibility Effects on Variation

Argument structure and theta roles

Word Stress and Intonation: Introduction

Control and Boundedness

Linking Task: Identifying authors and book titles in verbose queries

Morphosyntactic and Referential Cues to the Identification of Generic Statements

Access Center Assessment Report

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

2014 Colleen Elizabeth Fitzgerald

Construction Grammar. University of Jena.

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Frequency and pragmatically unmarked word order *

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Proof Theory for Syntacticians

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Assignment 1: Predicting Amazon Review Ratings

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

LEXICAL CATEGORY ACQUISITION VIA NONADJACENT DEPENDENCIES IN CONTEXT: EVIDENCE OF DEVELOPMENTAL CHANGE AND INDIVIDUAL DIFFERENCES.

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Multi-Lingual Text Leveling

The phonological grammar is probabilistic: New evidence pitting abstract representation against analogy

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

The Gradience of the Dative Alternation

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Lecture 2: Quantifiers and Approximation

Unraveling symbolic number processing and the implications for its association with mathematics. Delphine Sasanguie

The Evaluation of Students Perceptions of Distance Education

Probability and Statistics Curriculum Pacing Guide

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Parsing of part-of-speech tagged Assamese Texts

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Underlying and Surface Grammatical Relations in Greek consider

Phonological and Phonetic Representations: The Case of Neutralization

Evaluation of Teach For America:

Early Warning System Implementation Guide

LINGUISTICS. Learning Outcomes (Graduate) Learning Outcomes (Undergraduate) Graduate Programs in Linguistics. Bachelor of Arts in Linguistics

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Language acquisition: acquiring some aspects of syntax.

Compositional Semantics

Progressive Aspect in Nigerian English

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

AQUA: An Ontology-Driven Question Answering System

Integrating Common Core Standards and CASAS Content Standards: Improving Instruction and Adult Learner Outcomes

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Constraining X-Bar: Theta Theory

Hierarchical Linear Models I: Introduction ICPSR 2015

Abstract. Janaka Jayalath Director / Information Systems, Tertiary and Vocational Education Commission, Sri Lanka.

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Routledge Library Editions: The English Language: Pronouns And Word Order In Old English: With Particular Reference To The Indefinite Pronoun Man

The Divergent Lexicon: Lexical Overlap Decreases With Age in a Large Corpus of Conversational Speech

Lexical category induction using lexically-specific templates

Reflective Teaching KATE WRIGHT ASSOCIATE PROFESSOR, SCHOOL OF LIFE SCIENCES, COLLEGE OF SCIENCE

John Benjamins Publishing Company

NCEO Technical Report 27

Evidence for Reliability, Validity and Learning Effectiveness

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Natural Language Processing. George Konidaris

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Pseudo-Passives as Adjectival Passives

Do multi-year scholarships increase retention? Results

Detailed course syllabus

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Describing Motion Events in Adult L2 Spanish Narratives

A Case Study: News Classification Based on Term Frequency

Advanced Grammar in Use

Learning By Asking: How Children Ask Questions To Achieve Efficient Search

L1 and L2 acquisition. Holger Diessel

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Abstractions and the Brain

12- A whirlwind tour of statistics

A Case-Based Approach To Imitation Learning in Robotic Agents

An Introduction to Simio for Beginners

American Journal of Business Education October 2009 Volume 2, Number 7

A Comparison of Charter Schools and Traditional Public Schools in Idaho

Authors note Chapter One Why Simpler Syntax? 1.1. Different notions of simplicity

Language Learning and Development. ISSN: (Print) (Online) Journal homepage:

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

EDUCATIONAL ATTAINMENT

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Applications of memory-based natural language processing

Social and Economic Inequality in the Educational Career: Do the Effects of Social Background Characteristics Decline?

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Creating Meaningful Assessments for Professional Development Education in Software Architecture

Phonological Encoding in Sentence Production

Probabilistic Latent Semantic Analysis

An Introduction to the Minimalist Program

WHY SOLVE PROBLEMS? INTERVIEWING COLLEGE FACULTY ABOUT THE LEARNING AND TEACHING OF PROBLEM SOLVING

The Condition of College & Career Readiness 2016

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

What is related to student retention in STEM for STEM majors? Abstract:

EXECUTIVE SUMMARY. TIMSS 1999 International Science Report

Transcription:

A statistical model of grammatical choices in children s productions of dative sentences Marie-Catherine de Marneffe Scott Grimm Uriel Cohen Priva Sander Lestrade Gorkem Ozbek Tyler Schnoebelen Susannah Kirby Misha Becker Vivienne Fong Joan Bresnan

Do children follow the same production pattern as adults? Children s production seems to differ from adult speech. It is an open question how to exactly characterize the differences. Recent research has shown that syntactic alternation in adult speech is influenced by multiple cues. Do the same factors affect child production?

Case study: dative alternation NP NP I gonna show you something. recipient theme NP PP Show it to her. theme recipient Our models measure the probability of selecting a NP PP construction.

Outline 1. Modeling adult production of the dative alternation Motivations behind this approach Logistic regression model 2. Building a model for child production CHILDES database Methodology and annotation Resultant model and discussion 3. Model comparison between adult and child production

Modeling adult production of the dative alternation Variation in the dative construction has proven puzzling. Various forces have been held responsible: - lexical verb meaning [Gropen 89, Green 71] - constructional differences [Goldberg 95] - usage trends (e.g., phonological factors) Detailed studies of actual usage show a more complicated picture.

Multiple factors affect dative construction choice Statistical models allow one to investigate and predict factors influencing production. [Arnold 00, Szmrecsányi 05, Becker 06, Bresnan et al. 07] E.g., the influence of animacy and definiteness can be compared. This was shown in the model of Bresnan et al. [Bresnan et al. 07]

Modeling adult production of the dative alternation Adult data comes from Switchboard 2360 dative observations from the 3 million word Switchboard collection of recorded telephone conversations. Annotated for animacy givenness pronominality length person number verb and verb semantic class persistence... This data set is publicly available for download as part of the languager package.

Modeling adult production of the dative alternation Persistence Persistence is a measure of production priming: speakers reuse what they have just heard or just used. Szmrecsányi found persistence to play a highly significant role in linguistic choice for different English alternations. [Szmrecsányi 05] Syntactic priming effects have also been reported in young children. [Savage et al. 03, Huttenlocher et al. 04, Conwell and Demuth 07]

Modeling adult production of the dative alternation Logistic regression model Logistic regression model controls simultaneously for multiple factors giving a binary response. P(Response = NP PP X) = 1 1+exp((α+β 1 x 1 +β 2 x 2 +...)) where X is the model matrix of independent variables [x 1, x 2,...] and βs are their coefficents.

Modeling adult production of the dative alternation Adult model shows harmonic alignment Harmonic alignment of prominence scales with syntactic position: shorter > longer discourse given > not given animate > inanimate definite > indefinite pronoun > non-pronoun V NP NP V NP PP V recipient theme V theme recipient

Previous studies of child acquisition of datives emphasized lexical verb meaning. [Pinker 89, Tomasello 01] Given the adult model just shown, it s natural to question whether similar factors are in play for children. We follow the approach of Bresnan et al. [Bresnan et al. 07] and build a logistic regression model.

Child data comes from CHILDES We used a subset of the CHILDES database [MacWhinney 00] 7 children selected based on the amount of data available (both total utterances and utterances containing a dative construction) 538 utterances annotated for animacy givenness pronominality length persistence age MLU

Annotation: animacy It is not clear how children perceive animacy. We therefore used two different coding schemes for this factor: - standardly assumed definition: humans and animals - hypothetical over-generalization by children: the above plus toys The results of the two coding schemes were not significantly different from each other.

Annotation: givenness The theme/recipient is considered given if it has been mentioned in the previous 10 speaker turns. If so, we also coded the speaker of this previous mention (child vs. adult).

Annotation: pronominality definite pronoun demonstrative pronoun personal pronoun reflexive pronoun personal pronoun followed by a lexical NP it that me myself she gave them all her children a spanking.

Annotation: length The number of space-delimited words encodes the length.

Annotation: persistence We coded for α persistence (exact match), whereby we located the first previous dative construction within a range of 10 speaker turns: NP = previous NP NP in a dative construction PP = previous NP PP in a dative construction 0 = no previous dative construction We also took into account the distance (in number of clauses), as well as the speaker uttering the previous construction (adult vs. child).

Annotation: MLU Mean Length Utterance measured in morphemes, as computed by the CLAN program.

Logistic regression model for child production Probability {Response = NP PP} given animacy givenness pronominality length persistence age MLU Following standard methods, we use backward elimination to extract the most significant factors, i.e., those which account for the greatest amount of the variation in the data without overfitting the model.

Logistic regression model P(Response = NP PP X) = 1 1+exp((α+β 1 x 1 +β 2 x 2 +...)) where α is 0.27 and β i x i are + 2.36 {theme type = pronoun} 1.59 {recipient type = pronoun} 0.72 {theme length} 1.45 {previous dative = NP} + 1.81 {previous dative = PP}

Significant factors for child production The quality of the obtained model is high: C = 90.9 Nagelkerke R 2 = 56.9 (56.2 with bootstrap validation) 4 factors are independently significant (no collinearity, p <.05): Factor Odds P-Value theme type=pronoun 10.57 0.0000 recipient type=pronoun 0.20 0.0000 theme length 0.49 0.0061 previous dative=np 0.24 0.0002 previous dative=pp 6.10 0.0000

Previous construction tends to persist log odds 5 4 3 2 1 0 0 NP PP prev_dative

Decrease in theme length favors NP PP log odds 15 10 5 0 5 10 15 theme.nwords

Pronominal theme favors NP PP log odds 3 2 1 0 lexical pronoun theme.pron

Lexical recipient favors NP PP log odds 3.0 2.5 2.0 1.5 1.0 0.5 lexical pronoun recip.pron

Child data shows harmonic alignment As in the adult data, the child data show a qualitative picture of a quantitative harmonic alignment. shorter > longer pronoun > non-pronoun V NP NP V NP PP V recipient theme V theme recipient

There is no speaker effect Given that the children vary a lot in their individual developmental trajectories [Clark 03], we must control for whether the speaker is a significant factor, which data pooling has obscured. Using child as a random effect in a mixed effect model didn t lead to a significant result: surprisingly the global trends hold locally.

There is no speaker effect Coefficients of both models are very similar: Fixed effect Mixed effect model model Factor coefficients coefficients theme type=pronoun + 2.36 + 2.35 recipient type=pronoun 1.59 1.60 theme length 0.72 0.73 previous dative=np 1.45 1.46 previous dative=pp + 1.81 + 1.80

Length of theme effect by child Log odds 5 0 5 2 4 6 8 abe adam 2 4 6 8 naomi nina sarah 5 0 5 shem 5 0 5 trevor

Theme type effect by child Proportion NP PPs by Theme Type 1.0 trevor 0.8 0.6 0.4 0.2 0.0 nina sarah shem 1.0 Proportion 0.8 0.6 0.4 0.2 1.0 abe adam naomi 0.0 0.8 0.6 0.4 0.2 0.0 lexical pronoun lexical pronoun lexical pronoun

Recipient type effect by child Proportion NP PPs by Recipient Type 1.0 trevor 0.8 0.6 0.4 0.2 0.0 nina sarah shem 1.0 Proportion 0.8 0.6 0.4 0.2 1.0 abe adam naomi 0.0 0.8 0.6 0.4 0.2 0.0 lexical pronoun lexical pronoun lexical pronoun

Persistence effect by child Proportion NP PPs by Persistence Level 1.0 trevor 0.8 0.6 0.4 0.2 0.0 nina sarah shem 1.0 Proportion 0.8 0.6 0.4 0.2 1.0 abe adam naomi 0.0 0.8 0.6 0.4 0.2 0.0 0 NP PP 0 NP PP 0 NP PP

Multiple factors affect child production The overall picture of Bresnan et al. [Bresnan et al. 07] is much the same in child production of dative sentences: construction choice is governed by multiple factors, which align harmonically.

Differences from the adult model Number of factors Animacy Overall there were fewer significant factors in the child model. Despite our expectations, animacy was not found to be a significant factor in the child model. The two models suggest that there might be a difference between children and adults in the relation of animacy to construction choice.

Differences from the adult model The factors differ in magnitude: child adult factor aic factor aic verb - 1.95 previous dative 0.12 recipient animacy 0.45 theme length 5.53 theme length 4.30 recipient length 7.76 theme animacy 12.79 recipient type 28.65 recipient type 26.77 previous dative 57.49 theme type 114.75 theme type 46.57

Differences from the adult model We cannot infer such differences directly from two independent models. To fully assess similarities and differences between children and adults, one must analyze these factors across the data in a conjoined model.

Model comparison between adults and children We limited the adult model to the verbs give and show. This gives 611 data points, comparable to the 538 occurrences for the child data. We refitted the adult model to this restricted data set, and found no differences in main effects, e.g., animacy remains significant. We re-coded persistence in the adult data to approximate the 10 speaker turn range used in the child data.

Model comparison between adults and children The conjoined model attains high quality The conjoined model demonstrates that the following factors remain significant across data sets: C = 95.7 Nagelkerke R 2 = 70.3 (69.2 with bootstrap validation) Factor Odds P-Value intercept 0.284 0.0824 recipient type=pronoun 0.021 0.0000 theme type=pronoun 1536.0 0.0000 recipient length 2.6 0.0021 Main effects theme length 0.646 0.0008 previous dative=np 0.240 0.0000 previous dative=pp 5.5 0.0000 group=child recipient type=pronoun 11.0 0.0073 Interactions group=child theme type=pronoun 0.008 0.0000

Model comparison between adults and children Persistence plays a role log odds 8 7 6 5 4 3 2 0 NP PP prev_dative

Model comparison between adults and children Length of recipient and theme matters Increase in recipient length favors NP PP Decrease in theme length favors NP PP log odds 5 0 5 10 log odds 14 12 10 8 6 4 1 3 5 7 10 15 recipient.nwords 0 5 10 15 theme.nwords

Model comparison between adults and children Type of recipient and theme Lexical recipient favors NP PP Pronominal theme favors NP PP log odds 6 5 4 3 2 1 0 log odds 6 4 2 0 2 lexical pronoun lexical pronoun recip.pron theme.pron

Harmonic alignment is a significant main effect across both groups The children s and the adults construction choices show a consistent statistical pattern of harmonic alignment. All of the measured harmonic alignment effects (except the animacy effect) are significant across both groups.

Model comparison between adults and children Interaction: recipient and theme types For adults the type of NPs has greater influence on the production choice. adult log odds 5 4 3 2 child adult log odds 4 2 0 2 child lexical pronoun lexical pronoun Recipient type (adjusted) Theme type (adjusted)

Model comparison between adults and children Interaction: variation by degree The interaction effects show that the two groups differ in their sensitivity to the shared factors. Child and adult productions demonstrate the same general behavior, which corresponds to a shared harmonic alignment pattern. The differences in the interactions are a matter of degree, not direction.

Conclusion We have demonstrated the feasibility of comparing child and adult speech, and shown that statistical modeling techniques can yield insight into the factors at play in children s speech production. Given the size of the corpus, our results are promising rather than definitive. Further research may shed light upon why the differences between these patterns of production were observed (input children receive, resource limitations). The production choices made by children and adults are neither identical nor radically different: a core set of factors are shared.

There are no collinearities between co-variates VIF measures (the closer to 1 the better) theme type = pronoun 1.30 recipient type = pronoun 1.02 previous dative = NP 1.06 previous dative = PP 1.08 theme length 1.27