Lexical Acquisition in Statistical NLP

Similar documents
Using dialogue context to improve parsing performance in dialogue systems

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Natural Language Processing. George Konidaris

CS 598 Natural Language Processing

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Myths, Legends, Fairytales and Novels (Writing a Letter)

Parsing of part-of-speech tagged Assamese Texts

Compositional Semantics

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Construction Grammar. University of Jena.

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Common Core State Standards for English Language Arts

Language acquisition: acquiring some aspects of syntax.

Disambiguation of Thai Personal Name from Online News Articles

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Prediction of Maximal Projection for Semantic Role Labeling

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Linking Task: Identifying authors and book titles in verbose queries

CEFR Overall Illustrative English Proficiency Scales

A Domain Ontology Development Environment Using a MRD and Text Corpus

Answer Key For The California Mathematics Standards Grade 1

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Context Free Grammars. Many slides from Michael Collins

Constraining X-Bar: Theta Theory

Language Acquisition Chart

Korean ECM Constructions and Cyclic Linearization

Some Principles of Automated Natural Language Information Extraction

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Control and Boundedness

Probabilistic Latent Semantic Analysis

The Smart/Empire TIPSTER IR System

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

The Role of the Head in the Interpretation of English Deverbal Compounds

Part III: Semantics. Notes on Natural Language Processing. Chia-Ping Chen

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Argument structure and theta roles

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Best website to write my essay >>>CLICK HERE<<<

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Australian Journal of Basic and Applied Sciences

CS Machine Learning

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Achievement Level Descriptors for American Literature and Composition

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

(Sub)Gradient Descent

Learning to Think Mathematically with the Rekenrek Supplemental Activities

Individual Differences & Item Effects: How to test them, & how to test them well

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

MYP Language A Course Outline Year 3

Text-mining the Estonian National Electronic Health Record

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

The Four Principal Parts of Verbs. The building blocks of all verb tenses.

Course Law Enforcement II. Unit I Careers in Law Enforcement

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Developing a TT-MCTAG for German with an RCG-based Parser

Lexical category induction using lexically-specific templates

Update on Soar-based language processing

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

The Structure of Multiple Complements to V

THE VERB ARGUMENT BROWSER

Ensemble Technique Utilization for Indonesian Dependency Parser

Building an HPSG-based Indonesian Resource Grammar (INDRA)

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Case study Norway case 1

First Grade Standards

To appear in The TESOL encyclopedia of ELT (Wiley-Blackwell) 1 RECASTING. Kazuya Saito. Birkbeck, University of London

A Case Study: News Classification Based on Term Frequency

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

A heuristic framework for pivot-based bilingual dictionary induction

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Online Updating of Word Representations for Part-of-Speech Tagging

Generation of Referring Expressions: Managing Structural Ambiguities

Detecting English-French Cognates Using Orthographic Edit Distance

Proceedings of the 19th COLING, , 2002.

Proof Theory for Syntacticians

Theoretical Syntax Winter Answers to practice problems

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Chapter 4: Valence & Agreement CSLI Publications

SCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany

Universiteit Leiden ICT in Business

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Transcription:

Lexical Acquisition in Statistical NLP Adapted from: Manning and Schütze, 1999 Chapter 8 (pp. 265-278; 308-312) Anjana Vakil University of Saarland

Outline What is lexical information? Why is it important for NLP? How can we evaluate the performance of NLP systems? Example: Verb Subcategorization

What is lexical information? What is the lexicon? That part of the grammar of a language which includes the lexical entries for all the words and/or morphemes in the language and which may also include various other information, depending on the particular theory of grammar. (Trask 1993:159) Imagine a big, detailed (machine-readable) dictionary What/how much information? Varies by theory

Why is it important for NLP? Many NLP problems can be resolved by looking at lexical information, such as: Verb subcategorization Attachment ambiguity Selectional preferences Semantic similarity between words

Why is it important for NLP? Couldn't we just write a lexicon with the relevant info? Building dictionaries by hand is expensive! Quantitative information is missing Contextual information is missing Language is always changing New ideas new words Old words take on new meanings, usage patterns

How can we evaluate NLP systems? Most important: do the desired task well! Break it down: evaluate (& adjust) system components Hopefully, better component performance better overall performance on the task Need a convention for evaluating certain components: precision vs. recall

How can we evaluate NLP systems? Collection Target Selected

How can we evaluate NLP systems? selected, target = tp = true positives selected, ~target = fp = false positives (Type II errors) ~selected, target = fn = false negatives (Type I errors) ~selected, ~target = tn = true negatives

How can we evaluate NLP systems? One approach: Just compare the number of things we got right: tp + tn (accuracy) to the number of things we got wrong: fp + fn (error) What's the problem?

Precision vs. Recall Better questions to ask: How many of the things we found were correct? precision = tp (tp + fp) = tp selected How many of the things we were supposed to find did we actually find? recall = tp (tp + fn) = tp target

Precision vs. Recall Q: What could we do to get 100% recall? A: Select everything! Q: What would happen to precision in this case? A: Approaches zero Q: Which is more important, precision or recall? A: It depends!

The F measure Combines precision & recall performance into one score F = 1 α 1 P +(1 α ) 1 R α determines weighting of precision vs. recall α: < 0.5 = 0.5 > 0.5 Preference: recall equal precision With equal weighting (α = 0.5), F = 2PR P+R

Exercise: Rhymes for go do grew know glow though to throw cow apple lemon no show flow sew tomato banana slow how so few enough thorough blow two now goo orange through follow crow What is the target set? What feature(s) should we look for? Select: -o and -ow words Calculate: Precision Recall F (even P/R weights)

How can we evaluate NLP systems? lemon apple banana orange grew few enough through do to how two now goo cow Selected Collection know glow throw no show flow tomato slow so blow follow Target though sew thorough

Verb Subcategorization Verb categories: based on semantic arguments taken I gave him a present. I ate a hamburger. RECIPIENT THEME THEME *I gave him. *I ate him a hamburger. RECIPIENT RECIPIENT THEME

Verb Subcategorization Categories can be divided into subcategories based on how arguments are represented syntactically I gave [ NP him] [ NP a present]. I gave [ NP a present] [ PP to him]. * I gave [ PP to him] [ NP a present]. We call the structures a verb allows its subcategorization frames give subcategorizes for NP NP and NP PP, not PP NP (NB: subject NP left out all English verbs require this)

Verb Subcategorization Why might subcategorization information be helpful? Parsing: I told her where the CoLi students eat. She found the table where the CoLi students eat. How could we acquire this information automatically?

Acquiring Verb Subcategorization Info Brent, Michael R. 1993. From grammar to lexicon: Unsupervised learning of lexical syntax. Computational Linguistics 19:243-262 Lerner system Determine cues for certain subcat frames Find verbs in corpus sentences See if the word(s) following the verb fit the cue(s) for a certain frame Use this to decide how likely it is that the verb allows that frame

Acquiring Verb Subcategorization Info Reproduced from (Brent 1993)

Acquiring Verb Subcategorization Info Reproduced from (Brent 1993)

Acquiring Verb Subcategorization Info Analyze a corpus v i = verb you're interested in f j = frame you're investigating c j = cue you've defined for that frame ϵ j = probability of error for c j n = C(v i ) = occurrences of verb in corpus m = C(v i,c j ) = co-occurrences of verb & cue

Acquiring Verb Subcategorization Info Hypothesis testing H 0 = The verb does not permit the frame H 1 = The verb does permit the frame Assume H 0, and calculate the probability of obtaining your data if H 0 is true p E = P((v i ( f j )=0) (C (v i,c j )) m) = If p E is small enough (compared to α), we can reject H 0 n r=m ( n r) ϵ r j (1 ϵ j ) n r

Exercise: Manning's implementation Calculate precision Calculate recall What do these numbers imply about the system? How could we do better? Reproduced from (Manning and Schütze 1999, p. 274)