Lecture 8 Lexicalized and Probabilistic Parsing

Similar documents
11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

CS 598 Natural Language Processing

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Context Free Grammars. Many slides from Michael Collins

Grammars & Parsing, Part 1:

Prediction of Maximal Projection for Semantic Role Labeling

Parsing of part-of-speech tagged Assamese Texts

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Natural Language Processing. George Konidaris

Analysis of Probabilistic Parsing in NLP

Domain Adaptation for Parsing

The stages of event extraction

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Some Principles of Automated Natural Language Information Extraction

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Chapter 4: Valence & Agreement CSLI Publications

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Using dialogue context to improve parsing performance in dialogue systems

LTAG-spinal and the Treebank

Accurate Unlexicalized Parsing for Modern Hebrew

Proof Theory for Syntacticians

Developing a TT-MCTAG for German with an RCG-based Parser

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

SEMAFOR: Frame Argument Resolution with Log-Linear Models

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Compositional Semantics

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

The Interface between Phrasal and Functional Constraints

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Construction Grammar. University of Jena.

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

An Efficient Implementation of a New POP Model

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

A Computational Evaluation of Case-Assignment Algorithms

Learning Computational Grammars

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Pre-Processing MRSes

Control and Boundedness

An Interactive Intelligent Language Tutor Over The Internet

A Case Study: News Classification Based on Term Frequency

Character Stream Parsing of Mixed-lingual Text

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Formulaic Language and Fluency: ESL Teaching Applications

The Ups and Downs of Preposition Error Detection in ESL Writing

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Part I. Figuring out how English works

The Role of the Head in the Interpretation of English Deverbal Compounds

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Ensemble Technique Utilization for Indonesian Dependency Parser

The Smart/Empire TIPSTER IR System

The Discourse Anaphoric Properties of Connectives

Probabilistic Latent Semantic Analysis

Word Sense Disambiguation

Adapting Stochastic Output for Rule-Based Semantics

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Constraining X-Bar: Theta Theory

Linking Task: Identifying authors and book titles in verbose queries

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Building a Semantic Role Labelling System for Vietnamese

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Morphosyntactic and Referential Cues to the Identification of Generic Statements

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

What is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation

An Introduction to the Minimalist Program

Procedia - Social and Behavioral Sciences 154 ( 2014 )

CS Machine Learning

The Strong Minimalist Thesis and Bounded Optimality

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Using Semantic Relations to Refine Coreference Decisions

Argument structure and theta roles

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Parsing natural language

Experiments with a Higher-Order Projective Dependency Parser

Specifying Logic Programs in Controlled Natural Language

Annotation Projection for Discourse Connectives

Interfacing Phonology with LFG

Update on Soar-based language processing

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Part III: Semantics. Notes on Natural Language Processing. Chia-Ping Chen

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Extracting and Ranking Product Features in Opinion Documents

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Transcription:

Lecture 8 Lexicalized and Probabilistic Parsing CS 6320 337

Outline PP Attachment Problem Probabilistic CFG Problems with PCFG Probabilistic Lexicalized CFG he Collins Parser Evaluating parsers Example 338

PP-attachment Problem I buy books for children I buy (books for children Or I buy (for children (books 339

Semantic selection I ate a pizza with anchovies. I ate a pizza with friends. I ate (a pizza with anchovies. I ate (a pizza (with friends. 340

Parse tree for pizza with anchovies S NP VP I V NP ate NP PP a pizza IN NP with anchovies 341

Parse tree for pizza with friends S NP VP I V NP PP ate a pizza IN NP with friends 342

More than one PP I saw the man in the house with the telescope. I saw (the man (in the house (with the telescope. I saw (the man (in the house (with the telescope I saw (the man (in the house (with the telescope. I saw (the man (in the house (with the telescope. 343

Accuracy for most probable attachment for a preposition Prep. %of total % V of 13.47% 6.17% to 13.27% 80.14% in 12.42% 73.64% for 6.87% 82.44% on 6.21% 75.51% with 6.17% 86.30% from 5.37% 75.90% at 4.09% 76.63% as 3.95% 86.51% by 3.53% 88.02% into 3.34% 89.52% that 1.74% 65.97% about 1.45% 70.85% over 1.30% 86.83% Accuracy on UPenn-II reebank: 81.73% 344

Semantic information Identify the correct meaning of the words Use this information to decide which is the most probable parse Ex.: eat pizza with friends 345

Pragmatic and discourse information o achieve 100% accuracy, we need this kind of information Examples Buy a car with a steering wheel you need knowledge about how the cars are made I saw that car in the picture you need also surrounding discourse [McLauchlan 2001 Maximum Entropy Models and Prepositional Phrase Ambiguity] 346

Probabilistic Context-Free Grammars A β[p] P(A β P(A β A P(RHS LHS P(A 1 347

348 Probabilistic Context-Free Grammar PCFG assigns a probability to each parse-tree (, ( 1 ( but ( (, ( ( (, ( P S P S P S P P S P n r p S P n ( arg max ˆ( yield(.. S P S S t s (, ( arg max ˆ( yield(.. S P S P S S t s, ( arg max ˆ( yield(.. S P S S t s ( arg max ˆ( yield(.. P S S t s

Probabilistic Context-Free Grammars Figure 14.1 A PCFG that is a probabilistic augmentation of the L 1 miniature English CFG grammar and lexicon of Fig 13.1. hese probabilities were made up for pedagogical purposes and are not based on a corpus (since any real corpus would have many more rules, so the true probabilities of each rule would be much smaller. 349

Probabilistic Context-Free Grammars Figure 14.2 wo parse trees for an ambiguous sentence. he transitive parse on the left corresponds to the sensible meaning Book a flight that serves dinner, while the ditransitive parse on the right corresponds to the nonsensical meaning Book a flight on behalf of the dinner. 350

Probabilistic Context Free Grammars wo parse trees for an ambiguous sentence. Parse (a corresponds to the meaning Can you book flights on behalf of WA, parse (b to Can you book flights which are run by WA. 351

Probabilistic Context Free Grammars P( l P( r. 15. 40. 40 1. 5. 15. 75. 40. 1. 7 Note: P( S. 40. 40. 50 10. 40. 40. 50 10 6. 05. 40. 40 6 P(, S ( S P( ( S. 40. 05. 30. 05. 40. 35. 05. 30. 75 Probabilities of a sentence is the sum of probabilities of all parse trees. Useful for Language Modeling 352

Probabilistic CKY Parsing Figure 14.3 he probabilistic CKY algorithm for finding the maximum probability parse of a string of num_words words given a PCFG grammar with num_rules rules in Chomsky normal form. back is an array of backpointers used to recover the best parse. he build_tree function is left as an exercise to the reader. 353

Probabilistic CKY Parsing Figure 14.4 he beginning of the probabilistic CKY matrix. Filling out the rest of the chart is left as Exercise 14.4 for the reader. 354

Learning PCFG Probabilities reebank contains parse trees for a large corpus P( Count( Count( Count( Count( 355

Problems with PCFG and Enhancement 1. Assumption that production probabilities are independent does not hold. Often the choice of how a node expands depends on the location of that node in the parse tree. Ex: syntactic subjects are often realized with pronouns, whereas direct objects use more non- pronominal nounphrases. NP Pronoun NP DetNoun 356

Problems with PCFG and Enhancement NP D NN.28 NP PRP.25 would be erroneous 357

Problems with PCFG and Enhancement 2.1 PCFG are insensitive to the words they expand; In reality the lexical information about words plays an important role in selecting correct parse trees. (a I ate pizza with anchovies. (b I ate pizza with friends. In (a NP NP PP (b VP NP PP (NP attachment (VP attachment PP attachment depends on the semantics of PP head noun. 358

Problems with PCFG and Enhancement 2.2 Lexical preference of verbs (Subcategorization Moscow sent more than 100,000 soldiers into Afghanistan. NP into Afghanistan attaches to sent not to soldiers. his is because the verb send subcategorizes for destination, expressed by preposition into. 359

Problems with PCFG and Enhancement 2.3 Coordination Ambiguities (a (dogs in houses and (cats (b dogs in (houses and cats (a is preferred because dogs and cats are semantic siblings - i.e., animals. 360

Coordination Ambiguities Figure 14.7 An instance of coordination ambiguity. Although the left structure is intuitively the correct one, a PCFG will assign them identical probabilities since both structures use exactly the same rules. After Collins (1999. 361

Probabilistic Lexicalized CFG Figure 14.5 wo possible parse trees for a prepositional phrase attachment ambiguity. he left parse is the sensible one, in which into a bin describes the resulting location of the sacks. In the right incorrect parse, the sacks to be dumped are the ones which are already into a bin, whatever that might mean. 362

Probabilistic Lexicalized CFG Figure 14.6 Another view of the preposition attachment problem. Should the PP on the right attach to the VP or NP nodes of the partial parse tree on the left? 363

Probabilistic Lexicalized CFG Figure 14.10 A lexicalized tree, including head tags, for a WSJ sentence, adapted from Collins (1999. Below we show the PCFG rules that would be needed for this parse tree, internal rules on the left, and lexical rules on the right. 364

Probabilistic Lexicalized CFG Lexical heads play an important role since the semantics of the head dominates the semantics of that phrase. Annotate each non-terminal phrasal node in a parse tree with its lexical head. Workers dumped sacks into a bin. 365

Probabilistic Lexicalized CFG A lexicalized grammar shows lexical preferences between heads and their constituents. Probabilities are added to show the likelihood of each rule/head combination. 10 VP( dumped VBD( dumped NP( sacks PP( into [310 ] VP( dumped VBD( dumped NP( cats PP( into [810 VP( dumped VBD( dumped NP( hats PP( into [410 VP( dumped VBD( dumped NP( sacks PP( above [110 Since it is not possible to store all possibilities, one solution is to cluster some of the cases based on their semantic category. E.g., hats and sacks are inanimated objects. E.g., dumped prefers preposition into over above. 11 ] 10 ] 12 ] 366

Probabilistic Lexicalized CFG A lexicalized tree from Collins (1999 An incorrect parse of the sentence from Collins (1999 367

Probabilistic Lexicalized CFG p( VP VBD NP PP C( VP( dumped VBD NP PP C( VP( dumped 6 0.67 9 p( VP VBD NP VP, dumped C( VP( dumped VBD NP C( VP( dumped 0 9 0 VP, dumped 368

Probabilistic Lexicalized CFG Head probabilities. he mother head is dumped and the head of PP is into. p( into PP, dumped C( X ( dumped PP( into C( X ( dumped PP 2 9 0.22 369

Probabilistic Lexicalized CFG he mother head is sacks and the head of PP is into. p( into PP, sacks C( X ( sacks PP( into C( X ( sacks PP 0 0 hus the head probabilities predict that dumped is more likely to be modified by into that is sacks. 370

Probabilistic Lexicalized CFG Modern parsers (Charniak, Collins, etc. make simplifying assumptions about relating the heads of phrases to the heads of their constituents. In PCFG the probability of a node n being expanded by a rule r is conditioned only by the syntactic category of node n. Idea: add one more conditioning factor: the headword of the node h(n. p(r(n n, h(n is the conditional probability of expanding rule r given the syntactic category of n and lexical information h(n. p ( r VP, dumped r is VP VBD NP PP 371

Probabilistic Lexicalized CFG How to compute the probability of a head? wo factors are important: syntactic category of a node neighboring heads p( h( n word where h( m( n is i n, h( m( n the head of the node's mother. p( head ( n sacks n NP, h( m( n dumped is the probability that an NP whose mother node is dumped has the head sacks. his probability captures the depending information between dumped and sacks. 372

Probabilistic Lexicalized CFG Update the formula for computing the probability of a parse. P(, S p( r( n n, h( n p( h( n n, h( m( n An example: n Consider an incorrect parse tree for Workers dumped sacks into a bin, and compare it with the previous correct one. 373

How to Calculate Probabilities P( VP( dumped, VBD VBD( dumped, VBD NP( sacks, NNS PP(into, P can be estimated as Count ( VP( dumped, VBD VBD( dumped, VBD NP( sacks, NNS Count ( VP( dumped, VBD PP( into, P However, this is difficult due to small number of times such a specific rule applies. Instead, make independence assumptions. P( VP( dumped, VBD VBD( dumped, VBD NP( sacks, NNS PP( into, P Note: Modern statistical parsers differ in which independent assumptions they make. 374

he Collins Parser LHS L L 1... L HR... R 1R n n 1 1 n n P( VP( dumped, VBD SOP VBD( dumped, VBD NP( sacks, NNS PP( into, PSOP 375

he Collins Parser P( VP( dumped, VBD VBD( dumped, VBD NP( sacks, NNS PP( into, P P H ( VBD VP, dumped P P P P L R R R ( SOP VP, VBD, dumped ( NP( sacks, NNS VP, VBD, dumped ( PP( into, P VP, VBD, dumped ( SOP VP, VBD, dumped Count( VP( dumped, VBD with NNS ( sacks as a daughter somewhereon the right Count( VP( dumped, VBD 376

Evaluating Parsers labeled recall: # of correct constituents in hypothesis parse of s # of correct constituents in referenceparse of s labeled precision : # of correct constituents in hypothesis parse of # of total constituents in hypothesis parse of s s F ( 2 2 1 PR P R F 1 PR 2 P R 377

References Chart parsing Caraballo, S. and Charniak, E. New figures of merit for best-first probabilistic chart parsing. Computational Linguistics 24 (1998, 275-298 Charniak, E., Goldwater, S. and Johnson, M. Edge-based best-first chart parsing. In Proceedings of the Sixth Workshop on Very Large Corpora. 1998, 127-133 Charniak, E. A Maximum-Entropy-Inspired Parser Proceedings of NAACL -2000 Maximum entropy Berger, A.L., Pietra, S.A.D. and Pietra, V.J.D. A maximum entropy approach to natural language processing. Computational Linguistics 22 1 (1996, 39-71. Ratnaparkhi, A. Learning to parse natural language with maximum entropy models. Machine Learning 34 1/2/3 (1999, 151-176. Charniak s parser on web: ftp://ftp.cs.brown.edu/pub/nlparser/ 378