What is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation

Similar documents
Context Free Grammars. Many slides from Michael Collins

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Grammars & Parsing, Part 1:

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

CS 598 Natural Language Processing

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Proof Theory for Syntacticians

Natural Language Processing. George Konidaris

Chapter 4: Valence & Agreement CSLI Publications

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Prediction of Maximal Projection for Semantic Role Labeling

Parsing of part-of-speech tagged Assamese Texts

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Construction Grammar. University of Jena.

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

LTAG-spinal and the Treebank

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

The stages of event extraction

Compositional Semantics

Loughton School s curriculum evening. 28 th February 2017

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

The Strong Minimalist Thesis and Bounded Optimality

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Developing Grammar in Context

AQUA: An Ontology-Driven Question Answering System

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Contents. Foreword... 5

Constraining X-Bar: Theta Theory

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Using dialogue context to improve parsing performance in dialogue systems

TEAM-BUILDING GAMES, ACTIVITIES AND IDEAS

BULATS A2 WORDLIST 2

Cal s Dinner Card Deals

Ch VI- SENTENCE PATTERNS.

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Formulaic Language and Fluency: ESL Teaching Applications

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Procedia - Social and Behavioral Sciences 154 ( 2014 )

A Neural Network GUI Tested on Text-To-Phoneme Mapping

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

Developing a TT-MCTAG for German with an RCG-based Parser

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Control and Boundedness

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Part I. Figuring out how English works

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Radius STEM Readiness TM

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

What the National Curriculum requires in reading at Y5 and Y6

Linking Task: Identifying authors and book titles in verbose queries

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Analysis of Probabilistic Parsing in NLP

The Discourse Anaphoric Properties of Connectives

Accurate Unlexicalized Parsing for Modern Hebrew

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Managerial Decision Making

Aspectual Classes of Verb Phrases

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction

Some Principles of Automated Natural Language Information Extraction

An Introduction to Simio for Beginners

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Adjectives tell you more about a noun (for example: the red dress ).

An Interactive Intelligent Language Tutor Over The Internet

Today we examine the distribution of infinitival clauses, which can be

Tap vs. Bottled Water

5 Star Writing Persuasive Essay

Cross Language Information Retrieval

Switchboard Language Model Improvement with Conversational Data from Gigaword

Ensemble Technique Utilization for Indonesian Dependency Parser

Words come in categories

Mathematics Success Grade 7

Advanced Grammar in Use

The Role of the Head in the Interpretation of English Deverbal Compounds

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

First Grade Curriculum Highlights: In alignment with the Common Core Standards

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

MERRY CHRISTMAS Level: 5th year of Primary Education Grammar:

SESSION 2: HELPING HAND

Transcription:

C 188: Artificial Intelligence pring 2006 What is NLP? Lecture 27: NLP 4/27/2006 Dan Klein UC Berkeley Fundamental goal: deep understand of broad language Not just string processing or keyword matching! End systems that we want to build: Ambitious: speech recognition, machine translation, information extraction, dialog interfaces, question answering Modest: spelling correction, text categorization Why is Language Hard? Ambiguity EYE DROP OFF HELF MINER REFUE TO WORK AFTER DEATH KILLER ENTENCED TO DIE FOR ECOND TIME IN 10 YEAR LACK OF BRAIN HINDER REEARCH The Big Open Problems Machine translation Information extraction olid speech recognition Deep content understanding Machine Translation Information Extraction Information Extraction (IE) Unstructured text to database entries New York Times Co. named Russell T. Lewis, 45, president and general manager of its flagship New York Times newspaper, responsible for all business-side activities. He was executive vice president and deputy general manager. He succeeds Lance R. Primis, who in eptember was named president and chief operating officer of the parent. Translation systems encode: omething about fluent language omething about how two languages correspond OTA: for easy language pairs, better than nothing, but more an understanding aid than a replacement for human translators Person Russell T. Lewis Russell T. Lewis Lance R. Primis Company New York Times newspaper New York Times newspaper New York Times Co. Post president and general manager executive vice president president and CEO tate start start OTA: perhaps 70% accuracy for multi-sentence temples, 90%+ for single easy fields end 1

Question Answering Question Answering: More than search Ask general comprehension questions of a document collection Can be really easy: What s the capital of Wyoming? Can be harder: How many U states capitals are also their largest cities? Can be open ended: What are the main issues in the global warming debate? OTA: Can do factoids, even when text isn t a perfect match Models of Language Two main ways of modeling language Language modeling: putting a distribution P(s) over sentences s Useful for modeling fluency in a noisy channel setting, like machine translation or AR Typically simple models, trained on lots of data Language analysis: determining the structure and/or meaning behind a sentence Useful for deeper processing like information extraction or question answering tarting to be used for MT The peech Recognition Problem We want to predict a sentence given an acoustic sequence: s* = arg max P( s A) s The noisy channel approach: Build a generative model of production (encoding) P ( A, s ) = P ( s ) P ( A s ) To decode, we use Bayes rule to write s* = arg max P( s A) s = arg max P( s) P( A s) / P( A) s = arg max P( s) P( A s) s Now, we have to find a sentence maximizing this product N-Gram Language Models No loss of generality to break sentence probability down with the chain rule P ( w1 w2 wn ) = P( wi w1w 2 wi 1) Too many histories! i N-gram solution: assume each word depends only on a short linear history P ) ( w1 w2 wn ) = P( wi wi k wi 1 i Unigram Models implest case: unigrams P ( w1 w2 wn ) = P( w i ) i Generative process: pick a word, pick another word, As a graphical model: w 1 w 2 w n-1 TOP. To make this a proper distribution over sentences, we have to generate a special TOP symbol last. (Why?) Examples: [fifth, an, of, futures, the, an, incorporated, a, a, the, inflation, most, dollars, quarter, in, is, mass.] [thrift, did, eighty, said, hard, 'm, july, bullish] [that, or, limited, the] [] [after, any, on, consistently, hospital, lake, of, of, other, and, factors, raised, analyst, too, allowed, mexico, never, consider, fall, bungled, davison, that, obtain, price, lines, the, to, sass, the, the, further, board, a, details, machinists, the, companies, which, rivals, an, because, longer, oakes, percent, a, they, three, edward, it, currier, an, within, in, three, wrote, is, you, s., longer, institute, dentistry, pay, however, said, possible, to, rooms, hiding, eggs, approximate, financial, canada, the, so, workers, advancers, half, between, nasdaq] Bigram Models Big problem with unigrams: P(the the the the) >> P(I like ice cream) Condition on last word: TART P ( w1 w2 wn ) = P( wi wi 1) Any better? [texaco, rose, one, in, this, issue, is, pursuing, growth, in, a, boiler, house, said, mr., gurria, mexico, 's, motion, control, proposal, without, permission, from, five, hundred, fifty, five, yen] [outside, new, car, parking, lot, of, the, agreement, reached] [although, common, shares, rose, forty, six, point, four, hundred, dollars, from, thirty, seconds, at, the, greatest, play, disingenuous, to, be, reset, annually, the, buy, out, of, american, brands, vying, for, mr., womack, currently, sharedata, incorporated, believe, chemical, prices, undoubtedly, will, be, as, much, is, scheduled, to, conscientious, teaching] [this, would, be, a, record, november] i w 1 w 2 w n-1 TOP 2

parsity moothing Problems with n-gram models: New words appear all the time: ynaptitute 132,701.03 fuzzificational New bigrams: even more often Trigrams or more still worse! Fraction een 1 0.8 0.6 0.4 0.2 0 Unigrams Bigrams Rules 0 200000 400000 600000 800000 1000000 Number of Words Zipf s Law Types (words) vs. tokens (word occurences) Broadly: most word types are rare pecifically: Rank word types by token frequency Frequency inversely proportional to rank Not special to language: randomly generated character strings have this property We often want to make estimates from sparse statistics: P(w denied the) 3 allegations 2 reports 1 claims 1 request 7 total moothing flattens spiky distributions so they generalize better P(w denied the) 2.5 allegations 1.5 reports 0.5 claims 0.5 request 2 other 7 total Very important all over NLP, but easy to do badly! allegations allegations reports reports claims claims request request attack man outcome attack man outcome Phrase tructure Parsing PP Attachment Phrase structure parsing organizes syntax into constituents or brackets In general, this involves nested trees Linguists can, and do, argue about details Lots of ambiguity Not the only kind of syntax PP N new art critics write reviews with computers Attachment is a implification I cleaned the dishes from dinner I cleaned the dishes with detergent I cleaned the dishes in the sink yntactic Ambiguities I Prepositional phrases: They cooked the beans in the pot on the stove with handles. Particle vs. preposition: A good pharmacist dispenses with accuracy. The puppy tore up the staircase. Complement structures The tourists objected to the guide that they couldn t hear. he knows you like the back of her hand. Gerund vs. participial adjective Visiting relatives can be boring. Changing schedules frequently confused passengers. 3

yntactic Ambiguities II Modifier scope within s impractical design requirements plastic cup holder Garden pathing: Human Processing Multiple gap constructions The chicken is ready to eat. The contractors are rich enough to sue. Coordination scope: mall rats and mice can squeeze into holes or cracks in the wall. Ambiguity maintenance Context-Free Grammars A context- free grammar is a tuple <N, T,, R> N : the set of non-terminals Phrasal categories:,,, ADJP, etc. Parts-of-speech (pre-terminals): NN, JJ, DT, VB T : the set of terminals (the words) : the start symbol Often written as or TOP Not usually the sentence non-terminal R : the set of rules Of the form X Y 1 Y 2 Y k, with X, Y i N Examples:, CC Also called rewrites, productions, or local trees Example CFG Can just write the grammar (rules with non-terminal LHs) and lexicon (rules with pre-terminal LHs) VBP VBP PP PP IN Grammar NN NN JJ NN PP Lexicon JJ new NN art NN critics NN reviews NN computers VBP write IN with Top-Down Generation from CFGs A CFG generates a language Fix an order: apply rules to leftmost non-terminal NN critics critics VBP critics write critics write NN critics write reviews Gives a derivation of a tree using rules of the grammar NN critics VBP write NN reviews Corpora A corpus is a collection of text Often annotated in some way ometimes just lots of text Balanced vs. uniform corpora Examples Newswire collections: 500M+ words Brown corpus: 1M words of tagged balanced text Penn Treebank: 1M words of parsed WJ Canadian Hansards: 10M+ words of aligned French / English sentences The Web: billions of words of who knows what 4

Treebank entences Corpus-Based Methods A corpus like a treebank gives us three important tools: It gives us broad coverage. PRP VBD ADJ cale Why is Language Hard? Parsing as earch: Top-Down Top- down parsing: starts with the root and tries to generate the input ADJ DET DET NOUN PLURAL NOUN PP NOUN NN CONJ IUT: critics write reviews PP Treebank Parsing in 20 sec PCFGs and Independence Need a PCFG for broad coverage parsing. Can take a grammar right off the trees (doesn t work well): 1. 1 PRP 1 VBD ADJP 1.. Better results by enriching the grammar (e.g., lexicalization). Can also get reasonable parsers without lexicalization. ymbols in a PCFG define independence assumptions: DT NN At any node, the material inside that node is independent of the material outside that node, given the label of that node. Any information that statistically connects behavior inside and outside a node must flow through that node. 5

Corpus-Based Methods Corpus-Based Methods It gives us statistical information All s s under s under 23% 21% It lets us check our answers! 11% 9% 6% 9% 9% 7% 4% PP DT NN PRP PP DT NN PRP PP DT NN PRP This is a very different kind of subject/object asymmetry than what many linguists are interested in. emantic Interpretation Back to meaning! A very basic approach to computational semantics Truth-theoretic notion of semantics (Tarskian) Assign a meaning to each word Word meanings combine according to the parse structure People can and do spend entire courses on this topic We ll spend about an hour! What s NLP and what isn t? Designing meaning representations? Computing those representations? Reasoning with them? upplemental reading will be on the web page. Meaning Meaning What is meaning? The computer in the corner. Bob likes Alice. I think I am a gummi bear. Knowing whether a statement is true? Knowing the conditions under which it s true? Being able to react appropriately to it? Who does Bob like? Close the door. A distinction: Linguistic (semantic) meaning The door is open. peaker (pragmatic) meaning Today: assembling the semantic meaning of sentence from its parts Entailment and Presupposition Truth-Conditional emantics ome notions worth knowing: Entailment: A entails B if A being true necessarily implies B is true? Twitchy is a big mouse Twitchy is a mouse? Twitchy is a big mouse Twitchy is big? Twitchy is a big mouse Twitchy is furry Presupposition: A presupposes B if A is only well-defined if B is true The computer in the corner is broken presupposes that there is a (salient) computer in the corner Linguistic expressions: Bob sings Logical translations: sings(bob) Could be p_1218(e_397) Denotation: [[bob]] = some specific person (in some context) [[sings(bob)]] =??? Types on translations: bob : e (for entity) sings(bob) : t (for truth-value) Bob bob sings(bob) sings λy.sings(y) 6

Truth-Conditional emantics Proper names: Refer directly to some entity in the world Bob : bob [[bob]] W??? entences: Are either true or false (given how the world actually is) Bob sings : sings(bob) sings(bob) o what about verbs (and verb phrases)? sings must combine with bob to produce sings(bob) The λ-calculus is a notation for functions whose arguments are not yet filled. sings : λx.sings(x) This is predicate a function which takes an entity (type e) and produces a truth value (type t). We can write its type as e t. Adjectives? Bob bob sings λy.sings(y) Compositional emantics o now we have meanings for the words How do we know how to combine words? Associate a combination rule with each grammar rule: : β(α) : α : β (function application) : λx. α(x) β(x) : α and : : β (intersection) Example: sings(bob) dances(bob) [λx.sings(x) dances(x)](bob) λx.sings(x) dances(x) Bob and bob sings dances λy.sings(y) λz.dances(z) Other Cases Transitive verbs: likes : λx.λy.likes(y,x) Two-place predicates of type e (e t). likes Amy : λy.likes(y,amy) is just like a one-place predicate. Quantifiers: What does Everyone mean here? x.likes(x,amy) [λf. x.f(x)](λy.likes(y,amy)) Everyone : λf. x.f(x) Mostly works, but some problems Have to change our / rule. λy.likes(y,amy) Won t work for Amy likes everyone. Everyone VBP Everyone like someone. λf. x.f(x) likes Amy This gets tricky quickly! λx.λy.likes(y,x) amy Denotation What do we do with logical translations? Translation language (logical form) has fewer ambiguities Can check truth value against a database Denotation ( evaluation ) calculated using the database More usefully: assert truth and modify a database Questions: check whether a statement in a corpus entails the (question, answer) pair: Bob sings and dances Who sings? + Bob Chain together facts and use them for comprehension Grounding Grounding o why does the translation likes : λx.λy.likes(y,x) have anything to do with actual liking? It doesn t (unless the denotation model says so) ometimes that s enough: wire up bought to the appropriate entry in a database Meaning postulates Insist, e.g x,y.likes(y,x) knows(y,x) This gets into lexical semantics issues tatistical version? Tense and Events In general, you don t get far with verbs as predicates Better to have event variables e Alice danced : danced(alice) e : dance(e) agent(e,alice) (time(e) < now) Event variables let you talk about non-trivial tense / aspect structures Alice had been dancing when Bob sneezed e, e : dance(e) agent(e,alice) sneeze(e ) agent(e,bob) (start(e) < start(e ) end(e) = end(e )) (time(e ) < now) 7

Propositional Attitudes Bob thinks that I am a gummi bear thinks(bob, gummi(me))? Thinks(bob, I am a gummi bear )? thinks(bob, ^gummi(me))? Usual solution involves intensions (^X) which are, roughly, the set of possible worlds (or conditions) in which X is true Hard to deal with computationally Modeling other agents models, etc Can come up in simple dialog scenarios, e.g., if you want to talk about what your bill claims you bought vs. what you actually bought Trickier tuff Non-Intersective Adjectives green ball : λx.[green(x) ball(x)] fake diamond : λx.[fake(x) diamond(x)]? λx.[fake(diamond(x)) Generalized Quantifiers the : λf.[unique-member(f)] all : λf. λg [ x.f(x) g(x)] most? Could do with more general second order predicates, too (why worse?) the(cat, meows), all(cat, meows) Generics Cats like naps The players scored a goal Pronouns (and bound anaphora) If you have a dime, put it in the meter. the list goes on and on! Multiple Quantifiers Quantifier scope Groucho Marx celebrates quantifier order ambiguity: In this country a woman gives birth every 15 min. Our job is to find that woman and stop her. Deciding between readings Bob bought a pumpkin every Halloween Bob put a pumpkin in every window 8