Syntax: Context-free Grammars. Ling 571 Deep Processing Techniques for NLP January 6, 2016

Similar documents
Grammars & Parsing, Part 1:

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

CS 598 Natural Language Processing

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Prediction of Maximal Projection for Semantic Role Labeling

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Parsing of part-of-speech tagged Assamese Texts

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Context Free Grammars. Many slides from Michael Collins

Developing a TT-MCTAG for German with an RCG-based Parser

LTAG-spinal and the Treebank

Accurate Unlexicalized Parsing for Modern Hebrew

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Proof Theory for Syntacticians

"f TOPIC =T COMP COMP... OBJ

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Compositional Semantics

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Natural Language Processing. George Konidaris

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Analysis of Probabilistic Parsing in NLP

SEMAFOR: Frame Argument Resolution with Log-Linear Models

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The Smart/Empire TIPSTER IR System

Chapter 4: Valence & Agreement CSLI Publications

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Using dialogue context to improve parsing performance in dialogue systems

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Grammar Extraction from Treebanks for Hindi and Telugu

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Hyperedge Replacement and Nonprojective Dependency Structures

An Interactive Intelligent Language Tutor Over The Internet

Parsing natural language

Ensemble Technique Utilization for Indonesian Dependency Parser

The stages of event extraction

Cross Language Information Retrieval

Adapting Stochastic Output for Rule-Based Semantics

Some Principles of Automated Natural Language Information Extraction

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

A Grammar for Battle Management Language

Som and Optimality Theory

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

The Role of the Head in the Interpretation of English Deverbal Compounds

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

AQUA: An Ontology-Driven Question Answering System

LNGT0101 Introduction to Linguistics

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Construction Grammar. University of Jena.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Specifying a shallow grammatical for parsing purposes

The Interface between Phrasal and Functional Constraints

Annotation Projection for Discourse Connectives

Hindi Aspectual Verb Complexes

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Control and Boundedness

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Domain Adaptation for Parsing

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

The Discourse Anaphoric Properties of Connectives

Pre-Processing MRSes

Learning Computational Grammars

Loughton School s curriculum evening. 28 th February 2017

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Applications of memory-based natural language processing

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Building a Semantic Role Labelling System for Vietnamese

Constraining X-Bar: Theta Theory

A Usage-Based Approach to Recursion in Sentence Processing

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Update on Soar-based language processing

The Effect of Multiple Grammatical Errors on Processing Non-Native Writing

Linking Task: Identifying authors and book titles in verbose queries

Words come in categories

An Introduction to the Minimalist Program

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

A Version Space Approach to Learning Context-free Grammars

Ch VI- SENTENCE PATTERNS.

Language properties and Grammar of Parallel and Series Parallel Languages

Natural Language Processing: Interpretation, Reasoning and Machine Learning

On the Notion Determiner

Refining the Design of a Contracting Finite-State Dependency Parser

Corpus Linguistics (L615)

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Transcription:

Syntax: Context-free Grammars Ling 571 Deep Processing Techniques for NLP January 6, 2016

Roadmap CFG adequacy? Motivation: Applications Context-free grammars (CFGs) Formalism Grammars for English Treebanks and CFGs Speech and Text Parsing

Is Context-free Enough? Natural language provably not finite state Do we need context-sensitivity? Many articles have attempted to demonstrate Many failed, too Solid proofs for Swiss German (Shieber) Key issue: Cross-serial dependencies: a n b m c n d m

Examples Verbs and their arguments can be ordered cross-serially - arguments and verbs must match

Applications Shallow techniques useful, but limited Deeper analysis supports: Grammar-checking and teaching Question-answering Information extraction Dialogue understanding

Grammar and NLP Grammar in NLP is NOT prescriptive high school grammar Explicit rules Split infinitives, etc Grammar in NLP tries to capture structural knowledge of language of a native speaker Largely implicit Learned early, naturally

Representing Syntax Context-free grammars CFGs: 4-tuple A set of terminal symbols: Σ A set of non-terminal symbols: N A set of productions P: of the form A à α Where A is a non-terminal and α in (Σ U N)* A designated start symbol S

CFG Components Terminals: Only appear as leaves of parse tree Right-hand side of productions (rules) (RHS) Words of the language Cat, dog, is, the, bark, chase Non-terminals Do not appear as leaves of parse tree Appear on left or right side of productions (rules) Constituents of language NP, VP, Sentence, etc

CFG Components Productions Rules with one non-terminal on LHS and any number of terminals and non-terminals on RHS S à NP VP VP à V NP PP V NP Nominal à Noun Nominal Noun Noun à dog cat rat Det à the

L0 Grammar Jurafsky and Martin Speech and Language Processing - 1/5/16

Parse Tree

Some English Grammar Sentences: Full sentence or clause; a complete thought Declarative: S à NP VP I want a flight from Sea-Tac to Denver. Imperative: S à VP Show me the cheapest flight from New York to Los Angeles. S à Aux NP VP Can you give me the non-stop flights to Boston? S à Wh-NP VP Which flights arrive in Pittsburgh before 10pm? S à Wh-NP Aux NP VP What flights do you have from Seattle to Orlando?

The Noun Phrase NP à Pronoun Proper Noun (NNP) Det Nominal Head noun + pre-/post-modifiers Determiners: Det à DT the, this, a, those Det à NP s United s flight, Chicago s airport

In and around the Noun Nominal à Noun PTB POS: NN, NNS, NNP, NNPS flight, dinner, airport NP à (Det) (Card) (Ord) (Quant) (AP) Nominal The least expensive fare, one flight, the first route Nominal à Nominal PP The flight from Chicago

Verb Phrase and Subcategorization Verb phrase includes Verb, other constituents Subcategorization frame: what constituent arguments the verb requires VP à Verb VP à Verb NP VP à Verb PP PP disappear book a flight fly from Chicago to Seattle VP à Verb S think I want that flight VP à Verb VP want to arrange three flights

CFGs and Subcategorization Issues? I prefer United has a flight. How can we solve this problem? Create explicit subclasses of verb Verb-with-NP Verb-with-S-complement, etc Is this a good solution? No, explosive increase in number of rules Similar problem with agreement

Treebanks Treebank: Large corpus of sentences all of which are annotated syntactically with a parse Built semi-automatically Automatic parse with manual correction Examples: Penn Treebank (largest) English: Brown (balanced); Switchboard (conversational speech); ATIS (human-computer dialogue); Wall Street Journal; Chinese; Arabic Korean, Hindi,.. DeepBank, Prague dependency,

Treebanks Include wealth of language information Traces, grammatical function (subject, topic, etc), semantic function (temporal, location) Implicitly constitutes grammar of language Can read off rewrite rules from bracketing Not only presence of rules, but frequency Will be crucial in building statistical parsers

Treebank WSJ Example

Treebanks & Corpora Many corpora on patas patas$ ls /corpora birkbeck enron_email_dataset grammars LEAP TREC Coconut europarl ICAME med-data treebanks Conll europarl-old JRC-Acquis.3.0 nltk DUC framenet LDC proj-gutenberg Also, corpus search function on CLMS wiki Many large corpora from LDC Many corpus samples in nltk

Treebank Issues Large, expensive to produce Complex Agreement among labelers can be an issue Labeling implicitly captures theoretical bias Penn Treebank is bushy, long productions Enormous numbers of rules 4,500 rules in PTB for VP VPà V PP PP PP 1M rule tokens; 17,500 distinct types and counting!

Spoken & Written Can we just use models for written language directly? No! Challenges of spoken language Disfluency Can I um uh can I g- get a flight to Boston on the 15 th? 37% of Switchboard utts > 2 wds Short, fragmentary Uh one way More pronouns, ellipsis That one

Computational Parsing Given a grammar, how can we derive the analysis of an input sentence? Parsing as search CKY parsing Earley parsing Given a body of (annotated) text, how can we derive the grammar rules of a language, and employ them in automatic parsing? - Treebanks & PCFGs

Algorithmic Parsing Ling 571 Deep Processing Techniques for NLP January 6, 2016

Roadmap Motivation: Recognition and Analysis Parsing as Search Search algorithms Top-down parsing Bottom-up parsing Issues: Ambiguity, recursion, garden paths Dynamic Programming Chomsky Normal Form

Parsing CFG parsing is the task of assigning proper trees to input strings For any input A and a grammar G, assign (zero or more) parse-trees T that represent its syntactic structure, and Cover all and only the elements of A Have, as root, the start symbol S of G Do not necessarily pick one (or correct) analysis Recognition: Subtask of parsing Given input A and grammar G, is A in the language defined by G or not

Motivation Parsing goals: Is this sentence in the language is it grammatical? I prefer United has the earliest flight. FSAs accept the regular languages defined by automaton Parsers accept language defined by CFG What is the syntactic structure of this sentence? What airline has the cheapest flight? What airport does Southwest fly from near Boston? Syntactic parse provides framework for semantic analysis What is the subject?

Parsing as Search Syntactic parsing searches through possible parse trees to find one or more trees that derive input Formally, search problems are defined by: A start state S, A goal state G, A set of actions, that transition from one state to another Successor function A path cost function

Parsing as Search The parsing search problem (one model): Start State S: Start Symbol Goal test: Does parse tree cover all and only input? Successor function: Expand a non-terminal using production in grammar where non-terminal is LHS of grammar Path cost: We ll ignore here

Parsing as Search Node: Partial solution to search problem: Partial parse Search start node: Initial state: Input string Start symbol of CFG Goal node: Full parse tree: covering all and only input, rooted at S

Search Algorithms Many search algorithms Depth first Keep expanding non-terminal until reach words If no more expansions, back up Breadth first Consider all parses with a single non-terminal expanded Then all with two expanded and so Other alternatives if have associated path costs

Parse Search Strategies Two constraints on parsing: Must start with the start symbol Must cover exactly the input string Correspond to main parsing search strategies Top-down search (Goal-directed search) Bottom-up search (Data-driven search)

A Grammar Book that flight.

Top-down Search All valid parse trees must start with start symbol Begin search with productions with S on LHS E.g., S à NP VP Successively expand non-terminals E.g., NP à Det Nominal; VP à V NP Terminate when all leaves are terminals Book that flight

Depth-first Search Jurafsky and Martin Speech and Language Processing -

Breadth-first Search Jurafsky and Martin Speech and Language Processing -

Pros and Cons of Top-down Parsing Pros: Doesn t explore trees not rooted at S Doesn t explore subtrees that don t fit valid trees Cons: Produces trees that may not match input May not terminate in presence of recursive rules May rederive subtrees as part of search