Grammars & Parsing, Part 1:

Similar documents
Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

CS 598 Natural Language Processing

Context Free Grammars. Many slides from Michael Collins

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Prediction of Maximal Projection for Semantic Role Labeling

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Accurate Unlexicalized Parsing for Modern Hebrew

Proof Theory for Syntacticians

LTAG-spinal and the Treebank

Parsing of part-of-speech tagged Assamese Texts

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Natural Language Processing. George Konidaris

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Ch VI- SENTENCE PATTERNS.

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Writing a composition

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

A Computational Evaluation of Case-Assignment Algorithms

Language properties and Grammar of Parallel and Series Parallel Languages

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Minimalism is the name of the predominant approach in generative linguistics today. It was first

A Version Space Approach to Learning Context-free Grammars

Construction Grammar. University of Jena.

Developing a TT-MCTAG for German with an RCG-based Parser

Some Principles of Automated Natural Language Information Extraction

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

GACE Computer Science Assessment Test at a Glance

The Indiana Cooperative Remote Search Task (CReST) Corpus

An Introduction to the Minimalist Program

Constraining X-Bar: Theta Theory

The stages of event extraction

A General Class of Noncontext Free Grammars Generating Context Free Languages

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Analysis of Probabilistic Parsing in NLP

Chapter 9 Banked gap-filling

Compositional Semantics

Multiple case assignment and the English pseudo-passive *

The Interface between Phrasal and Functional Constraints

Parsing natural language

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Using dialogue context to improve parsing performance in dialogue systems

Loughton School s curriculum evening. 28 th February 2017

LNGT0101 Introduction to Linguistics

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Developing Grammar in Context

Hyperedge Replacement and Nonprojective Dependency Structures

BULATS A2 WORDLIST 2

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Ensemble Technique Utilization for Indonesian Dependency Parser

Indian Institute of Technology, Kanpur

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Specifying a shallow grammatical for parsing purposes

Argument structure and theta roles

Advanced Grammar in Use

Adapting Stochastic Output for Rule-Based Semantics

What is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Part I. Figuring out how English works

Theoretical Syntax Winter Answers to practice problems

The Role of the Head in the Interpretation of English Deverbal Compounds

Hindi-Urdu Phrase Structure Annotation

Specifying Logic Programs in Controlled Natural Language

Sample Goals and Benchmarks

An Interactive Intelligent Language Tutor Over The Internet

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

IBAN LANGUAGE PARSER USING RULE BASED APPROACH

The suffix -able means "able to be." Adding the suffix -able to verbs turns the verbs into adjectives. chewable enjoyable

Today we examine the distribution of infinitival clauses, which can be

(Sub)Gradient Descent

"f TOPIC =T COMP COMP... OBJ

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Chapter 3: Semi-lexical categories. nor truly functional. As Corver and van Riemsdijk rightly point out, There is more

Pre-Processing MRSes

AQUA: An Ontology-Driven Question Answering System

California Department of Education English Language Development Standards for Grade 8

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

The Discourse Anaphoric Properties of Connectives

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

6.863J Natural Language Processing Lecture 12: Featured attraction. Instructor: Robert C. Berwick

Chapter 4: Valence & Agreement CSLI Publications

Transcription:

Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing

Game plan for today: Review of constituents, and why we care Your friend, the context-free grammar Introduction to parsing Tree transformations for fun and profit

Constituents are a sequence of words that behave as a unit.* S VP PRP VBD VP We helped PRP VB her paint DT NN We helped her paint the house. He helped her paint the house. They watched her paint the house while they drank lemonade. the house * This is a somewhat fuzzy definition.

The same constituent often can appear in different contexts: On September seventeenth, I d like to fly from Atlanta to Denver. I d like to fly on September seventeenth from Atlanta to Denver. I d like to fly from Atlanta to Denver on September seventeenth.

Why do we care? Often, the important information in a sentence can only be understood in terms of constituents: On September seventeenth, I d like to fly from Atlanta to Denver. When do they want to fly?

Why do we care? Often, the important information in a sentence can only be understood in terms of constituents: On September seventeenth, I d like to fly from Atlanta to Denver. Where do they want to go?

Why do we care? Sometimes, template-filling and regular expressions do the trick... On September seventeenth, I d like to fly from Atlanta to Denver. I d like to fly on September seventeenth from Atlanta to Denver. I d like to fly from Atlanta to Denver on September seventeenth.... often, though, we need a more robust syntactic analysis.

Many NLP tasks make use of syntactic information: Grammar checking (in e.g., MS Word) (If a sentence s syntax looks wrong, it might be ungrammatical) Information extraction & retrieval Who/what is the article talking about? When do the events described take place? Where is the user trying to go? Machine translation Going from SVO to SOV is easier if you know which words/ constituents are which!

Hwæt! Syntax is very useful...... but it ain t everything. Colorless green ideas sleep furiously. Noam Chomsky 1928 present http://wmjasco.blogspot.com/2008/11/colorless-green-ideas-do-not-sleep.html http://itre.cis.upenn.edu/%7emyl/languagelog/archives/000025.html

The Chomsky Hierarchy describes several classes of formal grammars: Each superclass can express more complex constructions than its children. https://en.wikipedia.org/wiki/file:chomsky-hierarchy.svg

We ve already talked about regular grammars: baaa! baaaaaaaa! baa! /baa+!/ a b a q0 q1 q2 q3 a! q4

\(\d{3}\)[- ]\d{3}[- ]\d{4} 15 16 (:( 14 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0 0 1 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0 2 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0 3 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0 4 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0 5 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0 6 <space>:<space> -:- 7 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0 8 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0 9 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0 10 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0 11 <space>:<space> -:- 12 ):) 13 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0 9:9 8:8 7:7 6:6 5:5 4:4 3:3 2:2 1:1 0:0

Regular languages can be very powerful...... but have their limitations. For example: Write a regular expression to tell if a string s nested parentheses match up. ( ( 2 + 3 ) * 4 ) Yes! ( ( 2 + 3 ) * 4 No!

Python obviously manages to do it, somehow... % cat python_syntax_example.py print ( ( 2 + 3 ) * 4 ) print ( ( 2 + 3 ) * 4 % python python_syntax_example.py File "python_syntax_example.py", line 3 ^ SyntaxError: invalid syntax But it can t do it using a regular grammar.

Another example: Try and use a regular grammar to match the family of strings a n b n. E.g., match aaabbb, aaaabbbb, etc...... but not aaabb, aabbb, etc. A useful way to think about it: can you make an FSA to do this?

Both cases are examples of languages that can be described using context-free grammars but not with regular grammars.

A context-free grammar (CFG) is a 4-tuple consisting of: N A set of non-terminal symbols Σ R S A set of terminal symbols Set of rules of the form A α* where α is a string of symbols from (Σ N) A designated start symbol Any string from a context-free language can be produced by recursively applying the rewrite rules in its grammar...... and any string that cannot be so produced is not part of that language!

A (very) simple example: basic arithmetic. Let s write a grammar that can tell us whether an arithmetic expression (e.g. 2 + (3-4) ) is well-formed. The simplest expression is just a number: Exp number Valid unary operators are + and - (e.g., -4 ), and their result is also an expression: UnOp + - Exp UnOp Exp Binary operators work similarly: BinOp + - * / Exp Exp BinOp Exp

A (very) simple example: basic arithmetic. Finally, expressions can be wrapped in matched parentheses: Exp ( Exp ) Root Exp Terminal 2 + 3 * 5 number 1 2 3... 0 Root BinOp + - * / Exp UnOp + - Exp number Exp BinOp Exp Exp UnOp Exp number + Exp BinOp Exp Exp Exp BinOp Exp 2 number * number Exp ( Exp ) 3 5 Non-terminal Can you spot the problem?

Useful aside: As finite-state automata (FSA) are to regular grammars... Push-down automata (PDA) are to context-free grammars. All CFGs have an equivalent PDA. PDAs are very similar to FSAs, but with one major difference: they have memory in the form of a stack. Transition rules can specify stack actions and stack criteria as well as input symbols.

An example PDA for a n b n for n 0: a,#ε# "a# a,ε a next symbol must be a, and push a on stack after transition. ε,#ε# "$# q 0# q 1# b,#a# "ε# b,a ε next symbol must be a, top of stack must be b, and pop top element off of stack after transition. ε,#$# "ε# q 3# q 2# b,#a# "ε# read a s, push each on the stack; when the b s start, read each one and pop an a off the stack each time; keep reading until we run out of b s or the stack is empty. If either one happens by itself, fail. http://www-cs.ccny.cuny.edu/~vmitsou/304spring10/

Back to CFGs... This is one way to represent them, and is what the book uses. Root Exp number 1 2 3... 0 BinOp + - * / UnOp + - Exp number Exp UnOp Exp Exp Exp BinOp Exp Exp ( Exp ) Another way uses a standardized notation, Backus-Naur Form: <lhs> ::= <rhs> terminal <Root> ::= <Exp> <number> ::= 1 2... 0... <Exp> ::= <UnOp> <Exp>...

Our arithmetic example is not very language-y... Let s try a more interesting example. S VP Pronoun ProperNoun Det Nominal I prefer a morning flight. Nominal Nominal Noun Noun VP Verb Verb Verb PP Verb PP S PP Preposition VP Noun flight breeze morning trip... Pro Verb Verb is prefer like need want... I prefer Det Nominal Pronoun me I you it A Nominal Noun ProperNoun Baltimore Los Angeles Chicago United Alaska Noun flight Det the a an this these that morning Preposition from to on hear

Producing a grammar from a tree is called induction... S S VP VP Pro Det Nominal Pro Verb Nominal Nominal Noun Noun I prefer Det A Nominal Nominal Noun VP Noun Verb Verb flight morning prefer Noun flight Pronoun I morning Det a If only we had some sort of data-bank of trees from which to induce grammars...

The Penn WSJ Treebank provides a standard set of nonterminals to use (this table only shows the major ones): Basic non-terminal tagset (not including pre-terminals): ADJP Adjective Phrase ADVP Adverbial Phrase CONJP Conjunction Phrase FRAG Fragment INTJ Interjection LST List marker NAC Not a Constituent Noun Phrase NX Complex PP Prepositional Phrase PRN Parenthetical PRT Particle QP Quantifier Phrase RRC Reduced Relative Clause S Simple Clause SBAR Subordinate Clause SBARQ Subordinate Question Clause SINV Inverted Clause SQ Inverted Question UCP Unlike Coordinated Phrase VP Verb Phrase WHADJP Wh-adjective Phrase WHAVP Wh-adverb Phrase WH Wh-noun Phrase WHPP Wh-prepositional Phrase X Unknown Other function tags may label constituents, This is in addition to the standard pre-terminal tags (PoS tags: NN, JJ, etc.). One common criticism of PTB s tag set is that it is too flat, and makes it hard to encode certain things.

One important extension to CFGs is the addition of probability: how likely is a certain production? If we have a rule, e.g. S VP, a PCFG would also tell us P(S VP). P(S VP) = P(rhs = ( VP) lhs = S) = P( VP S) When inducing such a grammar, we keep track of how many times each LHS & RHS appear, and use these counts to compute probabilities.

Grammars can be equivalent in several different ways. Two CFGs G and G are strongly equivalent if they describe the same language, and they produce identical trees for strings (modulo some details about labels). Two CFGs G and G are weakly equivalent if they describe the same language. Sometimes, we want to convert G into a weakly equivalent G that might have useful properties.

One common transformation is into Chomsky Normal Form (CNF): A grammar G=(N, Σ, R, S) is in CNF if all productions in R are in one of two forms: A B C s.t. A, B, and C N (all are non-terminals) A a s.t. A N and a Σ (unary nonterm-term production) Another is Griebach Normal Form (GNF): A grammar G=(N, Σ, R, S) is in GNF if all productions in R are in one of two forms: A a X s.t. A N, a Σ, and X N* No left-branching allowed!

CNF is named for Noam Chomsky... about whom we ve heard a lot already... GNF is named for Sheila Greibach, a noted pioneer in the field of automata theory, and discoverer of Greibach s Theorem. Sheila Greibach 1939 present All CFGs have weakly equivalent CNF and GNF forms.

Another family of transformations: factorization. When we factorize a rule, we are taking a single rule and factorizing it into multiple rules. There are two main ways of doing this: from the left, or from the right. DT JJ NN NNS DT JJ NN NNS DT -DT DT -DT -DT JJ -DT,JJ JJ -DT,JJ -DT,JJ NN NNS NN NNS

There are two main ways of doing this: from the left, or from the right. DT JJ -DT -DT,JJ DT JJ NN NNS NN NNS DT JJ NN NNS DT-JJ-NN NNS DT-JJ-NN NNS DP-JJ-NN DT-JJ NN DT-JJ NN DT-JJ DT JJ DT JJ

These are two different ways of binarizing a grammar: all productions now have a maximum of two children. DT JJ NN NNS DT -DT DT-JJ-NN NNS JJ -DT,JJ DT-JJ NN NN NNS DT JJ Besides being computationally useful, depending on how you label your new nodes, it may help with rule sparsity!

Going from a tree to a grammar is induction...... going the other way (from a string to a tree, using a grammar) is parsing. I prefer a morning flight. S VP Pronoun ProperNoun Det Nominal Nominal Nominal Noun Noun VP Verb Verb Verb PP Verb PP Pro S Verb VP PP Preposition I prefer Det Nominal Noun flight breeze morning trip... A Nominal Noun Verb is prefer like need want... Noun flight Pronoun me I you it morning ProperNoun Baltimore Los Angeles Chicago United Alaska Det the a an this these that Preposition from to on hear

There are two general approaches to parsing: topdown, and bottom-up. Top-down parsing starts at the top of the tree, and tries combinations of productions until it gets to the end. I prefer a morning flight. S VP Pronoun ProperNoun Det Nominal Nominal Nominal Noun Noun S VP VP Verb Verb Verb PP Verb PP Pronoun Verb PP Preposition I Prefer Noun flight breeze morning trip... Verb is prefer like need want... Pronoun me I you it ProperNoun Baltimore Los Angeles Chicago United Alaska Det the a an this these that Preposition from to on hear

There are two general approaches to parsing: topdown, and bottom-up. Top-down parsing starts at the top of the tree, and tries combinations of productions until it gets to the end. I prefer a morning flight. S VP S Pronoun ProperNoun Det Nominal Nominal Nominal Noun Noun VP Verb Verb Verb PP Verb PP Pronoun Verb VP PP Preposition I Noun flight breeze morning trip... Verb is prefer like need want... Pronoun me I you it ProperNoun Baltimore Los Angeles Chicago United Alaska Det the a an this these that Preposition from to on hear

There are two general approaches to parsing: topdown, and bottom-up. Bottom-up parsing does the opposite, and starts with the words themselves and works upwards: I prefer a morning flight. S VP Pronoun ProperNoun Det Nominal Nominal Nominal Noun Noun VP Verb Verb Verb PP Verb PP PP Preposition Noun flight breeze morning trip... Verb is prefer like need want... Pronoun me I you it ProperNoun Baltimore Los Angeles Chicago United Alaska Det the a an this these that Preposition from to on hear Noun morning Noun flight

There are two general approaches to parsing: topdown, and bottom-up. Bottom-up parsing does the opposite, and starts with the words themselves and works upwards: I prefer a morning flight. S VP Pronoun ProperNoun Det Nominal Nominal Nominal Noun Noun VP Verb Verb Verb PP Verb PP PP Preposition Noun flight breeze morning trip... Verb is prefer like need want... Pronoun me I you it ProperNoun Baltimore Los Angeles Chicago United Alaska Det the a an this these that Preposition from to on hear Noun morning Nominal Noun flight Noun flight Nominal Noun morning

Top-down parsing: Disadvantage: potential for lots of backtracking. Advantage: doesn t waste time on trees that won t root. Bottom-up parsing: Disadvantage: many possible trees will have to be abandoned, because they won t root. Advantage: simpler, less egregious backtracking.

We will discuss specific parsing algorithms in detail next time...