Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Similar documents
CS 598 Natural Language Processing

Using computational modeling in language acquisition research

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

English Language and Applied Linguistics. Module Descriptions 2017/18

Software Maintenance

Language Development: The Components of Language. How Children Develop. Chapter 6

Ch VI- SENTENCE PATTERNS.

A Stochastic Model for the Vocabulary Explosion

The suffix -able means "able to be." Adding the suffix -able to verbs turns the verbs into adjectives. chewable enjoyable

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Lexical category induction using lexically-specific templates

Parsing of part-of-speech tagged Assamese Texts

An Empirical and Computational Test of Linguistic Relativity

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

California Department of Education English Language Development Standards for Grade 8

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Applications of memory-based natural language processing

Coast Academies Writing Framework Step 4. 1 of 7

Chapter 4: Valence & Agreement CSLI Publications

AQUA: An Ontology-Driven Question Answering System

Evolution of Symbolisation in Chimpanzees and Neural Nets

Abstractions and the Brain

An Introduction to the Minimalist Program

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Lecture 1: Machine Learning Basics

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

CS Machine Learning

Learning Methods in Multilingual Speech Recognition

Tracy Dudek & Jenifer Russell Trinity Services, Inc. *Copyright 2008, Mark L. Sundberg

Language acquisition: acquiring some aspects of syntax.

BASIC ENGLISH. Book GRAMMAR

Compositional Semantics

Some Principles of Automated Natural Language Information Extraction

Word learning as Bayesian inference

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

The Acquisition of English Grammatical Morphemes: A Case of Iranian EFL Learners

Did they acquire? Or were they taught?

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Developing Grammar in Context

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Derivational and Inflectional Morphemes in Pak-Pak Language

A heuristic framework for pivot-based bilingual dictionary induction

Computerized Adaptive Psychological Testing A Personalisation Perspective

Language Acquisition Chart

GOLD Objectives for Development & Learning: Birth Through Third Grade

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

A Case Study: News Classification Based on Term Frequency

phone hidden time phone

SARDNET: A Self-Organizing Feature Map for Sequences

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

A Process-Model Account of Task Interruption and Resumption: When Does Encoding of the Problem State Occur?

Guided Reading with A SPECIAL DAY written and illustrated by Anne Sibley O Brien

A Case-Based Approach To Imitation Learning in Robotic Agents

Phonological and Phonetic Representations: The Case of Neutralization

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Physics 270: Experimental Physics

Psychology and Language

Construction Grammar. University of Jena.

Artificial Neural Networks written examination

Test Blueprint. Grade 3 Reading English Standards of Learning

The College Board Redesigned SAT Grade 12

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Intensive English Program Southwest College

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

BULATS A2 WORDLIST 2

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Phenomena of gender attraction in Polish *

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Context Free Grammars. Many slides from Michael Collins

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Speech Recognition at ICSI: Broadcast News and beyond

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

The Strong Minimalist Thesis and Bounded Optimality

Generative Second Language Acquisition & Foreign Language Teaching Winter 2009

Sample Problems for MATH 5001, University of Georgia

Reinforcement Learning by Comparing Immediate Reward

Using a Native Language Reference Grammar as a Language Learning Tool

Python Machine Learning

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Developing a TT-MCTAG for German with an RCG-based Parser

Transcription:

Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University

Children s Sensitivity to Lexical Categories Look, this is Zav! Point to Zav. Gelman & Taylor 84: 2-year-olds treat names not followed by a determiner (e.g. Zav ) as a proper name, and interpret them as individuals (e.g., the animal-like toy). 2

Children s Sensitivity to Lexical Categories Look, this is a zav! Point to the zav. Gelman & Taylor 84: 2-year-olds treat names followed by a determiner (e.g. the zav ) as a common name, and interpret them as category members (e.g., the block-like toy). 3

Challenges of Learning Lexical Categories Children form lexical categories gradually and over time Nouns and verb categories are learned by age two, but adjectives are not learned until age six Child language acquisition is bounded by memory and processing limitations Child category learning is unsupervised and incremental Highly extensive processing of data is cognitively implausible Natural language categories are not clear cut Many words are ambiguous and belong to more than one category Many words appear in the input very rarely 4

Information Cues Children might use different information cues for learning lexical categories perceptual cues (phonological and morphological features) semantic properties of the words distributional properties of the local context each word appears in Distributional context is a reliable cue Analysis of child-directed speech shows abundance of consistent contextual patterns (Redington et al., 1998; Mintz, 2003) Several computational models have used distributional context to induce intuitive lexical categories (e.g. Schutze 1993, Clark 2000) 5

Computational Models of Lexical Category Induction The majority of the existing models categorize word types in an iterative, batch process E.g. Brown 92, Schütze 93, Redington et al 98 Incremental clustering models Cartwright & Brent 97 Use word groups to extract templates from sentences, then use a MDL approach to merge word groups together Evaluated on artificially generated input Parisien et al 08 A Bayesian clustering model with a bootstrapping module; categories are revised periodically Very sensitive to context features, and computationally extensive 6

Computational Models of Lexical Category Induction Hierarchical clustering [e.g., Schutze 93, Redington et al 98] Start from a cluster per word merge two most similar clusters in each iteration sock shoe cat dog man boy girl 7

Computational Models of Lexical Category Induction Cluster optimization [e.g., Brown 92, Clark 00] partition vocabulary into non-overlapping clusters optimize clusters according to an information theoretic measure shoe cat girl sock man dog boy

Computational Models of Lexical Category Induction Incremental clustering models ( Cartwright & Brent 97, Parisien et al 08, Chrupala & Alishahi 10 ) Each word usage is processed one at a time It is added to the most similar existing cluster, or a new cluster is created 9

Case Study: Parisien et al. (2008) A Bayesian model of lexical category induction Word usages are categorized based on similarity of their content and context to the existing categories -2-1 0 1 2 want to put them on Best cluster is selected by maximizing the conditional probability of each cluster for the current usage: BestCluster(F )=argmax k P (k F )) = P (k)p (F k) P (F ) P (k)p (F k) 10

Case Study: Parisien et al. (2008) 0.25 0.2 Nouns Verbs Adjectives Matching score 0.15 0.1 0.05 0 0 1 2 3 4 5 6 Training set size (words) x 10 5 The model replicates the order of acquisition of different categories as observed in children 11

Case Study: Parisien et al. (2008) 0.2 0.15 Combination Word based Bootstrap R adj 0.1 0.05 0 0 1 2 3 4 5 6 Training set size (words) x 10 5 The model predicts that using previous category labels will improve the overall performance 12

Case Study: Alishahi & Chrupala (2009) An incremental clustering algorithm: 1. Each word usage is put into a new category 2. The most similar category to the new one is found I. If the similarity is above a certain threshold θw, the two clusters are merged II.The most similar category to the newly merged one is found i. If the similarity is above a certain threshold θc, the two clusters are merged 13

Representation of Word Categories Word usage: a vector of content and context features: -2-1 0 1 2 want to put them on -2=want -1=to 0=put 1=them 2=on 1 1 1 1 1 A lexical category is a cluster of word usages Category: the mean of the distribution vectors of its members -2=want -2=have -1=to 0=go 0=sit 0=show 0=send 1=it... 0.25 0.75 1 0.25 0.25 0.25 0.25 0.5... The similarity between two categories: dot product of their vectors 14

Evaluation of the Acquired Categories Most of the models treat POS tags as gold-standard Evaluate learned categories based on how well they match POS categories Instead, they use the categories in a variety of tasks Word prediction from context Inferring semantic properties of novel words based on the context they appear in They compare the performance in each task against a POSbased implementation of the same task 15

Word Prediction She slowly --- the road I had --- for lunch Task: predicting a missing (target) word based on its context This task is non-deterministic (i.e. it can have many answers), but the context can significantly limit the choices Human subjects have shown to be remarkably accurate at using context for guessing target words (Gleitman 90, Lesher 02) 16

Word Prediction Using Categories Test item: -2-1 0 1 2 want to put them on Categorize Reciprocal rank of the target word: 1/4 Cw -2-1 0 1 2............... Ranked word list for content feature make take get put sit eat let point give : 17

Word Prediction - POS Categories baby 's Mummy n v n:prop put them on the table look v pro prep det n v have her hair brushed v pro n part there is a spider adv:loc v det n baby table hair spider... -2-1 0 1 2.................. Labelled Data Noun Category Feature Representation 18

Inferring Word Semantic Properties I had ZAV for lunch Task: guessing the semantic properties of a novel word based on its local context Children and adults can guess (some aspects of) the meaning of a novel word from context (Landau & Gleitman 85, Naigles & Hoff- Ginsberg 95) 19

Inferring Semantic Properties Test item: -2-1 0 1 2 I ate Zag for lunch original target word: 0 soup Categorize Cw -2-1 0 1 2............... Semantic feature for target word position entity object substance matter food edible : substance food edible liquid meal soup : Semantic vector Similarity Measure 20

Lexical Category Acquisition Finer-grained lexical categories seem more suitable for some tasks than traditional POS categories Standardized applications are needed to evaluate and compare lexical categories induced by different unsupervised methods When categorizing words, do children pay attention to semantic cues as well? Computational investigation: include the semantic features of words into a category learning model, and evaluate the performance What about other cues? (E.g., phonological and morphological features) 21

Rules that Govern Form Moving from fixed forms (e.g. apple ) to derivational forms play plays, played, playing I, you, admire I admire you Morphology and syntax In all languages, the formation of words and sentences follows highly regular patterns How are the regulations and exceptions represented? The study and analysis of language production in children reveals common and persistent patterns 22

U-shaped Learning Curves Observed U-shaped learning curves in children Imitation: an early phase of conservative language use Generalization: general regularities are applied to new forms Overgeneralization: occasional misapplication of general patterns Recovery: over time, overgeneralization errors cease to happen Lack of Negative Evidence Children do not receive reliable corrective feedback from parents to help them overcome their mistakes (Marcus, 1993) 23

Case Study: Learning English Past Tense The problem of English past tense formation: Regular formation: stem + ed Irregulars do show some patterns No-change: hit hit Vowel-change: ring rang, sing sang Over-regularizations are common: goed These errors often occur after the child has already produced the correct irregular form: went What causes the U-shaped learning curve? 24

A Symbolic Account of English Past Tense Dual-Route Account (Pinker, 1991): two qualitatively different mechanisms Blocking Output past tense List of exceptions (Associative memory) Regular route (Rule-based) Prediction: Input stem Errors result from transition from rote learning to rule-governed Recovery occurs after sufficient exposure to irregulars 25

A Connectionist Account of Learning English Past Tense A connectionist model (Plunkett & Marchman, 1993) Output units: phonological features of past tense hidden units Properties: Input units: phonological features of the stem Early in training, the model shows tendency to overgeneralize; by the end of training, it exhibits near perfect performance U-shaped performance is achieved using a single learning mechanism, but depends on sudden change in the training size 26

A Hybrid, Analogy-based Account A rational model of learning past tense based on the ACT-R architecture (Taatgen & Anderson, 2002) Declarative memory chunks represent past tenses, both as a goal and as examples PAST-TENSE-GOAL23 ISA PAST OF WALK STEM NIL SUFFIX NIL goal to determine past tense of walk PAST-TENSE-GOAL23 ISA PAST OF WALK STEM WALK SUFFIX ED accomplished goal, stored in the memory 27

A Hybrid, Analogy-based Account The analogy strategy is implemented by two production rules, based on simple pattern matching: RULE ANALOGY-FILL-SLOT IF!the goal has an empty suffix slot AND there is an example in which suffix has a value THEN!set the suffix of the goal to the suffix value of the example RULE ANALOGY-COPY-A-SLOT IF!the goal has an empty stem slot and the of slot has a certain value AND in the example the values of the of and stem slots are equal THEN!set the stem to the value of the of slot 28

ACT-R Equations Equation Activation A ¼ B 1 context 1 noise Base-level activation BðtÞ ¼log P n j¼1 ðt 2 t j Þ 2d Retrieval time Time ¼ Fe 2fA Expected outcome Expected outcome ¼ P p G 2 C p 1 noise Description The activation of a chunk has three parts: base-level activation, spreading activation from the current context and noise. Since spreading activation is a constant factor in the models discussed, we treat activation as if it were just base-level activation. n is the number of times a chunk has been retrieved from memory, and t j represents the time at which each of these retrievals took place. So, the longer ago a retrieval was, the less it contributes to the activation. d is a fixed ACT-R parameter that represents the decay of base-level activation in declarative memory. Activation determines the time required to retrieve a chunk. A is the activation of the chunk that has to be retrieved, and F and f are fixed ACT-R parameters. Retrieval will only succeed as long as the activation is larger than retrieval threshold t, which is also a fixed parameter. Expected outcome is based on three quantities, the estimated probability of success of a production rule (P), the estimated cost of the production rule (C), and the value of the goal (G). 29

A Hybrid, Analogy-based Account ACT-R s production rule mechanism learns new rules by combining two rules that have fired consecutively into one: RULE LEARNED-REGULAR-RULE IF!the goal is to find the past tense of a word and slots stem and suffix are empty THEN!set the suffix slot to ED and set the stem slot to the word of which you want the past tense 30

A Hybrid, Analogy-based Account 31

Innateness of Language Central claim: humans have innate knowledge of language Assumption: all languages have a common structural basis Argument from the Poverty of the Stimulus (Chomsky 1965) Linguistic experience of children is not sufficiently rich for learning the grammar of the language, hence they must have some innate specification of grammar Assumption: knowing a language involves knowing a grammar Universal Grammar (UG) A set of rules which organize language in the human brain 32