The primacy of graded grammaticality

Similar documents
Progressive Aspect in Nigerian English

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

The Role of the Head in the Interpretation of English Deverbal Compounds

Participate in expanded conversations and respond appropriately to a variety of conversational prompts

STA 225: Introductory Statistics (CT)

(Includes a Detailed Analysis of Responses to Overall Satisfaction and Quality of Academic Advising Items) By Steve Chatman

CS 598 Natural Language Processing

Variation of English passives used by Swedes

National University of Singapore Faculty of Arts and Social Sciences Centre for Language Studies Academic Year 2014/2015 Semester 2

Construction Grammar. University of Jena.

Hindi-Urdu Phrase Structure Annotation

12- A whirlwind tour of statistics

Advanced Grammar in Use

Predicting the Performance and Success of Construction Management Graduate Students using GRE Scores

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Unraveling symbolic number processing and the implications for its association with mathematics. Delphine Sasanguie

TU-E2090 Research Assignment in Operations Management and Services

Access Center Assessment Report

Methods for the Qualitative Evaluation of Lexical Association Measures

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Developing a TT-MCTAG for German with an RCG-based Parser

Do multi-year scholarships increase retention? Results

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Course Outline for Honors Spanish II Mrs. Sharon Koller

Susanne J. Jekat

GDP Falls as MBA Rises?

Probability and Statistics Curriculum Pacing Guide

Memory-based grammatical error correction

Evaluation of Teach For America:

On-the-Fly Customization of Automated Essay Scoring

Race, Class, and the Selective College Experience

EAGLE: an Error-Annotated Corpus of Beginning Learner German

Individual Differences & Item Effects: How to test them, & how to test them well

English Grammar and Usage (ENGL )

Data Fusion Through Statistical Matching

Assignment 1: Predicting Amazon Review Ratings

What is beautiful is useful visual appeal and expected information quality

Bigrams in registers, domains, and varieties: a bigram gravity approach to the homogeneity of corpora

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

Teacher Quality and Value-added Measurement

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Corpus Linguistics (L615)

Acquiring Competence from Performance Data

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Context Free Grammars. Many slides from Michael Collins

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Freitag 7. Januar = QUIZ = REFLEXIVE VERBEN = IM KLASSENZIMMER = JUDD 115

Vocabulary Usage and Intelligibility in Learner Language

CHAPTER 5. THE SIMPLE PAST

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

THINKING SKILLS, STUDENT ENGAGEMENT BRAIN-BASED LEARNING LOOKING THROUGH THE EYES OF THE LEARNER AND SCHEMA ACTIVATOR ENGAGEMENT POINT

Loughton School s curriculum evening. 28 th February 2017

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Using complexity to study linguistic expressiveness. A Case study of quantiers in English and German.

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Developing Grammar in Context

Guidelines for Writing an Internship Report

How long did... Who did... Where was... When did... How did... Which did...

Cross Language Information Retrieval

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Rote rehearsal and spacing effects in the free recall of pure and mixed lists. By: Peter P.J.L. Verkoeijen and Peter F. Delaney

EXTENSIVE READING AND CLIL (GIOVANNA RIVEZZI) Liceo Scientifico e Linguistico E. Bérard Aosta

An Empirical and Computational Test of Linguistic Relativity

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Lecture 2: Quantifiers and Approximation

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Association Between Categorical Variables

LNGT0101 Introduction to Linguistics

Physics 270: Experimental Physics

Today we examine the distribution of infinitival clauses, which can be

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Frequency in lexical processing. R. Harald Baayen, Petar Milin, and Michael Ramscar. Eberhard Karls University, Tübingen, Germany.

Accessing Higher Education in Developing Countries: panel data analysis from India, Peru and Vietnam

Multi-Lingual Text Leveling

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

CEFR Overall Illustrative English Proficiency Scales

An Evaluation of POS Taggers for the CHILDES Corpus

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN

Multiple case assignment and the English pseudo-passive *

Testing claims of a usage-based phonology with Liverpool English t-to-r 1

Effectiveness of Electronic Dictionary in College Students English Learning

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

Hierarchical Linear Models I: Introduction ICPSR 2015

Specifying a shallow grammatical for parsing purposes

John Benjamins Publishing Company

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

The Impact of Formative Assessment and Remedial Teaching on EFL Learners Listening Comprehension N A H I D Z A R E I N A S TA R A N YA S A M I

RIDIRE. Corpus and Tools for the Acquisition of Italian L2

Derivational and Inflectional Morphemes in Pak-Pak Language

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

Evidence for Reliability, Validity and Learning Effectiveness

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Formulaic Language and Fluency: ESL Teaching Applications

arxiv: v1 [cs.cl] 2 Apr 2017

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Transcription:

The primacy of graded grammaticality Markus Bader & Jana Häussler Konstanz & Potsdam KogWis 2010 Potsdam, 05.10.2010 Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 1 / 19

Introduction Introduction: Aims and Questions General aim: Developing a model of linguistic judgments in order to......establish a firm basis for linguistic theory...contribute to the ongoing debate about grammar and language use Specific questions: How do gradient ratings of grammaticality relate to binary grammaticality judgments? How does grammaticality relate to frequency of usage? Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 2 / 19

Experimental Material Experimental Material: Ditransitive verbs Empirical domain of our investigations: Ditransitive verbs in German (1)... dass er dem Mann ein that he.nom the.dat man a.acc...that he sent a book to the man. Buch book schickte. sent Advantage of ditransitive verbs: Argument alternations that are subject to verb-specific restrictions in a gradual way Optionality of the dative object Compatibility with the so-called bekommen passive Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 3 / 19

Experimental Material Experimental Material: Optionality of dative object Dropping the dative object: (2)... dass er dem Mann ein Buch schickte. that he.nom the.dat man a.acc book sent...that he sent a book. (3)?... dass er dem Mann ein Buch anvertraute. that he.nom the.dat man a.acc book entrusted...that he entrusted a book. Experimental results and corpus counts (Bader & Häussler, submitted): The option of omitting the dative object is a gradient, verb-specific property Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 4 / 19

Experimental Material Experimental Material: Bekommen passive Bekommen passive: the dative object becomes the subject of bekommen ( to get ) (4)... dass der Mann das Buch that the.nom man the.acc book...that the man was sent the book. (5)?... dass der Mann das Buch that the.nom man the.acc book...that the man was stolen the book. geschickt sent gestohlen stolen bekam. got bekam. got Linguistic literature: bekommen passive sentences with verbs like stehlen are often presented as fully grammatical (without a? or a * ). Experimental results and corpus counts (Bader & Häussler, submitted): verbs like stehlen are not fully acceptable in the bekommen passive. Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 5 / 19

Experimental Material Experimental Material: Regular passive Regular passive: unrestricted with regard to ditransitive verbs as considered here (6)... dass dem Mann das Buch that the.dat man the.nom book...that the book was sent to the man. geschickt sent wurde. was (7)... dass dem Mann das Buch gestohlen wurde. that the.dat man the.nom book stolen was...that the book was stolen from the man. Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 6 / 19

Experimental Material (10) Bekommen passive dass der Sohn letztes Jahr (von dem Vermieter) das Haus vererbt bekam. that the son last year by the landlord the house left got the son was left the house last year (by the landlord). Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 7 / 19 Experimental Material: Summary 120 verbs each in two sentences, for a total of 240 sentences 3 2 design: - Structure(active/ regular passive/ bekommen passive) - Nr. of Arguments (3 vs. 2) (8) Active dass der Vermieter letztes Jahr (dem Sohn) das Haus vererbte. that the landlord last year the son the house left that the landlord left the house to the son last year. (9) Regular passive dass dem Sohn letztes Jahr (von dem Vermieter) das Haus vererbt wurde. that the son last year by the landlord the house left was that the house was left to the son last year (by the landlord).

Experiment 1 and 2 Experiment 1 and 2: Procedure Experiment 1: Magnitude Estimation First, a reference item is presented to which the participant assigns an arbitrary numeric value (> 0). All further items are judged in proportion to the reference item on a continuous numerical scale. Each individual data point is divided by the reference value and the resulting ratio is log-transformed. Experiment 2: Speeded Grammaticality Judgments Word-by-word presentation in the middle of the screen Presentation time for each word: ca. 300 400 ms End-of-sentence judgments with a deadline of 2000 ms Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 8 / 19

Experiment 1 and 2 Experiment 1 and 2: Results Table : Mean percentages of judgments grammatical (Standard errors by subjects). Active Regular passive Bekommen passive 3 Args. 88 (2.3) 92 (1.5) 81 (2.9) 2 Args. 77 (2.8) 94 (1.1) 76 (3.2) Table : Mean ME scores (Standard errors by subjects). Active Regular passive Bekommen passive 3 Args..28 (.038).26 (.035).23 (.034) 2 Args..24 (.042).31 (.041).18 (.042) Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 9 / 19

Experiment 1 and 2 Experiment 2: Verb-specific variability in grammaticality Active, 3 Args Regular Passive, 3 Args Bekommen Passive, 3 Args 120 120 Rank Active, 2 Args Rank Regular Passive, 2 Args 120 Rank Bekommen Passive, 2 Args 120 120 120 Rank Rank Rank Figure : Rank-ordered distribution of mean percentages of grammatical judgments for the 120 verbs used in Experiment 2. Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 10 / 19

From gradient to binary judgments From gradient to binary judgments: Correlations All 720 data points (120 verbs in 6 conditions; Kendall s τ = 0.42) SGJ: % correct 0.2 0.0 0.1 0.2 0.3 0.4 0.5 ME scores 120 data points (verbs) per condition (Kendall s τ from 0.19 to 0.55) SGJ: Active SGJ: Active 2 Arguments 0.2 0.0 0.2 0.4 ME: Active 3 Arguments 0.2 0.0 0.2 0.4 ME: Active SGJ: Regular Passive SGJ: Regular Passive 2 Arguments 0.2 0.0 0.2 0.4 ME: Regular Passive 3 Arguments 0.2 0.0 0.2 0.4 ME: Regular Passive SGJ: Recipient Passive SGJ: Recipient Passive 2 Arguments 0.2 0.0 0.2 0.4 ME: Recipient Passive 3 Arguments 0.2 0.0 0.2 0.4 ME: Recipient Passive Figure : SGJ results plotted against ME results Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 11 / 19

From gradient to binary judgments From gradient to binary judgments: Regression model Do gradient grammaticality scores predict binary judgments? Logistic regression with mixed-effect modeling: results of Experiment 2 (SGJ2) as predicted variable results of Experiment 1 (ME) as predictor variable participants and items as random effects Results of logistic regression: ME scores are a highly significant predictor of SGJ results Somers C = 0.82 (n = 8640) Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 12 / 19

From gradient to binary judgments From gradient to binary judgments: Model fit observed predicted 18 39 80 123 444 194 256 950 2169 4367 observed predicted 0.2 0.1 0.0 0.1 0.2 0.3 0.4 ME score 0.2 0.1 0.0 0.1 0.2 0.3 0.4 ME score Figure : Observed and fitted SGJ results plotted against observed ME results (R 2 = 0.94) Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 13 / 19

The relationship between grammaticality and frequency Grammaticality and frequency: corpus details Can the experimental results be reduced to corpus-derived frequency measures? The dewac corpus described in Baroni et al. (2009) was analyzed: The dewac corpus is a huge corpus of German built by web crawling. It contains 1,278,177,539 tokens of text tagged for part of speech Various verb-specific frequency measures were derived from the dewac corpus, including: p(dative object): the probability of a ditransitive verb to occur with an overt dative object bigram ratio: bigram frequency for a verb (participle + auxiliary) divided by the verb s lemma frequency Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 14 / 19

The relationship between grammaticality and frequency Grammaticality and frequency: correlations Table : Rank correlations (Kendall s tau) between experimental grammaticality scores (SGJ) and different frequency measures. Active Regular passive Bekommen passive 3 Args 2 Args 3 Args 2 Args 3 Args 2 Args p(dative object).09 -.32**.07.03 -.03.10 Bigram ratios -.17**.-08 -.02.10.23**.36** Summary: The grammaticality-frequency correlations are far from perfect: - High grammaticality despite low frequency occurs often - High frequency despite low grammaticality occurs rarely Conclusion: Frequency cannot predict grammaticality Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 15 / 19

The relationship between grammaticality and frequency From grammaticality to language use Hypothesis: Grammaticality determines language use, not the other way round. The probability of a sentence s n can be modeled as follows: (11) p(s n ) = f(grammaticality[s n ], real world context[s n ], linguistic context[s n ], performance[s n ]) Here, we consider only two factors: grammaticality: estimated from our experiment real world-context: approximated by overall verb frequency The remaining two factors are left out: performance: not relevant for our sentences linguistic context: relevant, but not yet coded Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 16 / 19

The relationship between grammaticality and frequency From grammaticality to language use Active Regular passive Bekommen passive Bigram frequency 1e+01 1e+03 1e+05 1e+02 1e+04 1e+06 Lemma frequency Active 1e+02 1e+04 1e+06 Lemma frequency Bigram frequency 1e+01 1e+03 1e+05 5 50 500 5000 Bigram frequency 5 50 500 5000 Regular passive 1e+02 1e+04 1e+06 Lemma frequency Bigram frequency Bigram frequency 1 5 50 500 Bekommen passive Bigram frequency 1 5 50 500 Figure : Bigram frequency plotted against verb frequency (upper row) and against experimental grammaticality scores (lower row). Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 17 / 19

The relationship between grammaticality and frequency From grammaticality to language use Table : Results of Poisson regression with bigram frequency as predicted variable and either grammaticality alone, verb frequency alone or grammaticality and verb frequency together. Active Regular passive Bekommen Passive Null deviance 5701182 959280 19741 Reduction R 2 Reduction R 2 Reduction R 2 Grammaticality 2505.00 8016.00 5907.19 Frequency 5492666.95 734056.57 3567.12 Grammaticality & Frequency 5493190.95 734365.56 10508.47 Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 18 / 19

The relationship between grammaticality and frequency Conclusion Binary grammaticality judgments can be derived directly from gradient judgments. Grammaticality is not determined by frequency but is rather among the factors determining frequency. Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 19 / 19

The relationship between grammaticality and frequency Bader, M. & Häussler, J. (submitted). Frequency and grammaticality: A case study on ditransitive verbs in German. Manuscript submitted for publication, University of Konstanz. Baroni, M., Bernardini, S., Ferraresi, A. & Zanchetta, E. (2009). The WaCky Wide Web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation Journal 23, 209 226. Bader/Häussler (Konstanz/Potsdam) The primacy of graded grammaticality KogWis 2010 19 / 19