CS 598 Natural Language Processing

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Parsing of part-of-speech tagged Assamese Texts

Grammars & Parsing, Part 1:

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Context Free Grammars. Many slides from Michael Collins

Natural Language Processing. George Konidaris

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Constraining X-Bar: Theta Theory

Compositional Semantics

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Developing a TT-MCTAG for German with an RCG-based Parser

Ch VI- SENTENCE PATTERNS.

Proof Theory for Syntacticians

Construction Grammar. University of Jena.

English Language and Applied Linguistics. Module Descriptions 2017/18

Language acquisition: acquiring some aspects of syntax.

Argument structure and theta roles

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Accurate Unlexicalized Parsing for Modern Hebrew

Analysis of Probabilistic Parsing in NLP

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

AQUA: An Ontology-Driven Question Answering System

Chapter 4: Valence & Agreement CSLI Publications

An Interactive Intelligent Language Tutor Over The Internet

BULATS A2 WORDLIST 2

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Character Stream Parsing of Mixed-lingual Text

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

THE VERB ARGUMENT BROWSER

Words come in categories

LNGT0101 Introduction to Linguistics

Developing Grammar in Context

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

California Department of Education English Language Development Standards for Grade 8

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

L1 and L2 acquisition. Holger Diessel

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Hyperedge Replacement and Nonprojective Dependency Structures

"f TOPIC =T COMP COMP... OBJ

Campus Academic Resource Program An Object of a Preposition: A Prepositional Phrase: noun adjective

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

Word Stress and Intonation: Introduction

Lecture 1: Basic Concepts of Machine Learning

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

The Evolution of Random Phenomena

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

Specifying Logic Programs in Controlled Natural Language

Some Principles of Automated Natural Language Information Extraction

Specifying a shallow grammatical for parsing purposes

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

A Grammar for Battle Management Language

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Adjectives tell you more about a noun (for example: the red dress ).

The College Board Redesigned SAT Grade 12

Building an HPSG-based Indonesian Resource Grammar (INDRA)

Ensemble Technique Utilization for Indonesian Dependency Parser

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

The Pennsylvania State University. The Graduate School. College of the Liberal Arts THE TEACHABILITY HYPOTHESIS AND CONCEPT-BASED INSTRUCTION

Sight Word Assessment

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Participate in expanded conversations and respond appropriately to a variety of conversational prompts

Applications of memory-based natural language processing

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

The Smart/Empire TIPSTER IR System

Pseudo-Passives as Adjectival Passives

Did they acquire? Or were they taught?

Parsing natural language

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Hindi Aspectual Verb Complexes

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

5 Star Writing Persuasive Essay

Sample Goals and Benchmarks

The Structure of Multiple Complements to V

EAGLE: an Error-Annotated Corpus of Beginning Learner German

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Multiple case assignment and the English pseudo-passive *

A Usage-Based Approach to Recursion in Sentence Processing

Derivational and Inflectional Morphemes in Pak-Pak Language

Chapter 9 Banked gap-filling

Transcription:

CS 598 Natural Language Processing

Natural language is everywhere

Natural language is everywhere

Natural language is everywhere

Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@ AMNOPQ;RSTUV<=WXYZ [\O]^_`;abcde>fghi jklmpnopqklmpnrst

Natural language is everywhere NLP applications: Information extraction (news, scientific papers) Machine translation Dialog systems (phone, robots)!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@ AMNOPQ;RSTUV<=WXYZ [\O]^_`;abcde>fghi jklmpnopqklmpnrst

Different ways of studying language How does language work? (core linguistics) How do people learn and process language? (psycholinguistics) Where in the brain is language located? (neurolinguistics) How do languages change over time? (historical linguistics) How does language express identity/social status? (sociolinguistics) How can you teach foreign languages? (applied linguistics)

How does language work? What sounds are used in human speech? (phonetics) How do languages use and combine sounds? (phonology) How do languages form words? (morphology) How do languages form sentences? (syntax) How do languages convey meaning in sentences? (semantics) How do people use language to communicate? (pragmatics)

How does language work? What sounds are used in human speech? (phonetics) How do languages use and combine sounds? (phonology) How do languages form words? (morphology) How do languages form sentences? (syntax) How do languages convey meaning in sentences? (semantics) How do people use language to communicate? (pragmatics)

How does language work? What sounds are used in human speech? (phonetics) How do languages use and combine sounds? (phonology) How do languages form words? (morphology) How do languages form sentences? (syntax) How do languages convey meaning in sentences? (semantics) How do people use language to communicate? (pragmatics)

Computational Linguistics/ Natural Language Processing Can we build computational systems that process language? Process: translate, understand, summarize, generate,... Text-based: Requires (at least) morphology, syntax, semantics (pragmatics is hard) Speech-based: also phonetics/phonology

Why NLP needs grammars: Machine translation The output of current systems is often ungrammatical: Daniel Tse, a spokesman for the Executive Yuan said the referendum demonstrated for democracy and human rights, the President on behalf of the people of two. 3 million people for the national space right, it cannot say on the referendum, the legitimacy of Taiwan s position full. (BBC Chinese news, translated by Google Chinese to English) Correct translation requires grammatical knowledge: the girl that Mary thinks Jane saw - [das Mädchen], von dem Mary glaubte, dass Jane es gesehen hat. - [la fille] dont Marie croit que Jane l a vue.

Why NLP needs grammars: Question Answering This requires grammatical knowledge...: John persuaded/promised Mary to leave. - Who left?... and inference: John managed/failed to leave. - Did John leave? John and his parents visited Prague. They went to the castle. - Was John in Prague? - Has John been to the Czech Republic? - Has John s dad ever seen a castle?

Research trends in NLP 1980s to mid-1990s: Focus on theory or large, rule-based ( symbolic ) systems that are difficult to develop, maintain and extend. Mid-1990s to mid-2000s: We discovered machine learning and statistics! (and nearly forgot about linguistics...oops) NLP becomes very empirical and data-driven. Today: Maturation of machine learning techniques and experimental methodology. We re beginning to realize that we need (and are able to) use rich linguistic structures after all!

Parsing: a necessary first step!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@ AMNOPQ;RSTUV<=WXYZ [\O]^_`;abcde>fghi jklmpnopqklmpnrst What are these symbols? (you need a lexicon) How do they fit together? (you need a grammar)

I eat sushi with tuna.

I eat sushi with tuna.

I eat sushi with tuna. I eat sushi with chopsticks.

I eat sushi with tuna. I eat sushi with chopsticks.

I eat sushi with tuna. I eat sushi with chopsticks. Language is ambiguous. Statistical models: What is the most likely structure? We need a probability model.

What is the structure of a sentence? Sentence structure is hierarchical: A sentence consists of words (I, eat, sushi, with, tuna)..which form phrases: sushi with tuna Sentence structure defines dependencies between words or phrases: I eat sushi with tuna

Two ways to represent structure Phrase structure trees Dependency trees Correct analysis V eat sushi P PP with tuna eat sushi with tuna V eat sushi P PP with chopsticks eat sushi with chopsticks Incorrect analysis

Structure (Syntax) corresponds to Meaning (Semantics) V eat sushi Correct analysis P PP with tuna eat sushi with tuna V eat sushi P PP with chopsticks eat sushi with chopsticks Incorrect analysis V V eat eat sushi P P with tuna PP PP sushi with chopsticks eat sushi with tuna eat sushi with chopsticks

The goal of formal syntax: Can we define a program that generates all English sentences? We will call this program grammar. What is the right programming language for grammars? [N.B: linguists demand that the program fit into the mind of a child that learns the language]

English John Mary saw. with tuna sushi ate I. John saw Mary. I ate sushi with tuna. Did you go there? John made but Mary just bought some cake I want you to go there. I ate the cake that John had made for me yesterday... Did you went there?...

Overgeneration English John Mary saw. with tuna sushi ate I. John saw Mary. I ate sushi with tuna. Did you go there? John made but Mary just bought some cake I want you to go there. I ate the cake that John had made for me yesterday... Did you went there?...

Overgeneration English John Mary saw. with tuna sushi ate I. John saw Mary. I ate sushi with tuna. Did you go there? John made but Mary just bought some cake I want you to go there. I ate the cake that John had made for me yesterday... Did you went there?... Undergeneration

Basic word classes (parts of speech) Content words (open-class): - nouns: student, university, knowledge - verbs: write, learn, teach, - adjectives: difficult, boring, hard,... - adverbs: easily, repeatedly, Function words (closed-class): - prepositions: in, with, under, - conjunctions: and, or - determiners: a, the, every

Basic sentence structure I eat sushi.

Basic sentence structure I eat sushi. Noun (Subject)

Basic sentence structure I eat sushi. Noun (Subject) Noun (Object)

Basic sentence structure I eat sushi. Noun (Subject) Verb (Head) Noun (Object)

As a dependency tree sbj obj I eat sushi.

As a dependency tree sbj obj I eat sushi. eat sbj obj I sushi

A finite-state-automaton (FSA) (or Markov chain) Noun (Subject) Verb (Head) Noun (Object)

A Hidden Markov Model (HMM) Noun (Subject) Verb (Head) Noun (Object) I, you,... eat, drink sushi,...

Words take arguments I eat sushi. I eat sushi you.??? I sleep sushi??? I give sushi??? I drink sushi?

Words take arguments I eat sushi. I eat sushi you.??? I sleep sushi??? I give sushi??? I drink sushi? Subcategorization: Intransitive verbs (sleep) take only a subject. Transitive verbs (eat) take also one (direct) object. Ditransitive verbs (give) take also one (indirect) object. Selectional preferences: The object of eat should be edible.

A better FSA Noun (Subject) Transitive Verb (Head) Noun (Object)

Language is recursive the ball the big ball the big, red ball the big, red, heavy ball... Adjectives can modify nouns. The number of modifiers/adjuncts a word can have is (in theory) unlimited.

Can we define a program that generates all English sentences? The number of sentences is infinite. But we need our program to be finite.

Another FSA Adjective Determiner Noun

Recursion can be more complex the ball the ball in the garden the ball in the garden behind the house the ball in the garden behind the house next to the school...

Yet another FSA Noun Adj Det Noun Preposition

Yet another FSA Noun Adj Det Noun Preposition So, what do we need grammar for?

What does this mean? the ball in the garden behind the house

What does this mean? the ball in the garden behind the house

What does this mean? the ball in the garden behind the house

What does this mean? the ball in the garden behind the house

The FSA does not generate structure Noun Adj Det Noun Preposition

Strong vs. weak generative capacity Formal language theory: - defines language as string sets - is only concerned with generating these strings (weak generative capacity) Formal/Theoretical syntax (in linguistics): - defines language as sets of strings with (hidden) structure - is also concerned with generating the right structures (strong generative capacity)

Context-free grammars (CFGs) capture recursion Language has complex constituents ( the garden behind the house ) Syntactically, these constituents behave just like simple ones. ( behind the house can always be omitted) CFGs define nonterminal categories to capture equivalent constituents.

An example N {ball, garden, house, sushi } P {in, behind, with} N PP PP P N: noun P: preposition : noun phrase PP: prepositional phrase

Context-free grammars A CFG is a 4-tuple N,Σ,R,S - A set of nonterminals N (e.g. N = {S,,, PP, Noun, Verb,...}) - A set of terminals Σ (e.g. Σ = {I, you, he, eat, drink, sushi, ball, }) - A set of rules R R {A β with left-hand-side (LHS) A N and right-hand-side (RHS) β (N Σ)* } - A start symbol S (sentence)

CFGs define parse trees N {sushi, tuna} P {with} V {eat} N PP PP P V V eat sushi Correct an P PP with tuna

Structural ambiguity results in multiple parse trees N {sushi, tuna} P {with} V {eat} N PP PP P V PP eat V V eat sushi sushi Correct an P P PP with tuna PP with chopsticks Incorrect

Structural ambiguity results in multiple parse trees N {sushi, tuna} P {with} V {eat} N PP PP P V PP eat V V eat sushi sushi Correct an P P with tuna PP PP with chopsticks V V eat eat sushi P Incorrect P with tuna PP PP sushi with chopsticks Incorrect

Structural ambiguity results in multiple parse trees N {sushi, tuna} P {with} V {eat} N PP PP P V PP eat V V eat sushi sushi Correct an P P with tuna PP PP with chopsticks V V eat eat sushi P Incorrect P with tuna PP PP sushi with chopsticks Correct Incorrect Structures

Structural ambiguity results in multiple parse trees N {sushi, tuna} P {with} V {eat} N PP PP P V PP eat V V eat sushi sushi Correct an P P with tuna PP PP with chopsticks V V eat eat sushi P Incorrect P with tuna PP PP sushi with chopsticks Correct Incorrect Incorrect Structures Structures