Levels of Language used by Natural Language Processing

Similar documents
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

CS 598 Natural Language Processing

Parsing of part-of-speech tagged Assamese Texts

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

L1 and L2 acquisition. Holger Diessel

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Chapter 4: Valence & Agreement CSLI Publications

Context Free Grammars. Many slides from Michael Collins

Natural Language Processing. George Konidaris

BULATS A2 WORDLIST 2

The stages of event extraction

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Compositional Semantics

SEMAFOR: Frame Argument Resolution with Log-Linear Models

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

English Language and Applied Linguistics. Module Descriptions 2017/18

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Analysis of Probabilistic Parsing in NLP

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Derivational and Inflectional Morphemes in Pak-Pak Language

AQUA: An Ontology-Driven Question Answering System

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Grammars & Parsing, Part 1:

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Words come in categories

Constraining X-Bar: Theta Theory

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Using dialogue context to improve parsing performance in dialogue systems

Some Principles of Automated Natural Language Information Extraction

Control and Boundedness

The Smart/Empire TIPSTER IR System

Prediction of Maximal Projection for Semantic Role Labeling

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Construction Grammar. University of Jena.

Procedia - Social and Behavioral Sciences 154 ( 2014 )

LING 329 : MORPHOLOGY

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Developing Grammar in Context

Linking Task: Identifying authors and book titles in verbose queries

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

A Comparison of Two Text Representations for Sentiment Analysis

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Memory-based grammatical error correction

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

First Grade Curriculum Highlights: In alignment with the Common Core Standards

How long did... Who did... Where was... When did... How did... Which did...

Applications of memory-based natural language processing

Proof Theory for Syntacticians

The Role of the Head in the Interpretation of English Deverbal Compounds

Sample Goals and Benchmarks

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Introduction to Text Mining

Vocabulary Usage and Intelligibility in Learner Language

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Adjectives tell you more about a noun (for example: the red dress ).

THE VERB ARGUMENT BROWSER

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Aspectual Classes of Verb Phrases

Speech Recognition at ICSI: Broadcast News and beyond

LTAG-spinal and the Treebank

A Bayesian Learning Approach to Concept-Based Document Classification

Emmaus Lutheran School English Language Arts Curriculum

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Multilingual Sentiment and Subjectivity Analysis

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

The College Board Redesigned SAT Grade 12

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

California Department of Education English Language Development Standards for Grade 8

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Ensemble Technique Utilization for Indonesian Dependency Parser

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Beyond the Pipeline: Discrete Optimization in NLP

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Developing a TT-MCTAG for German with an RCG-based Parser

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Argument structure and theta roles

Pseudo-Passives as Adjectival Passives

Word Stress and Intonation: Introduction

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Transcription:

Levels of Language used by Natural Language Processing

Levels of Language Analysis Use the synchronic model to guide computational techniques to analyze text (as much as possible) Lexical Morphological Phonetic Discourse Semantic Syntactic Pragmatic

Synchronic Model of Language The more exterior the level of language processing: The larger the unit of analysis phoneme-> morpheme -> word -> sentence -> text -> world The less precise the language phenomena The more free choice & variability less rule-oriented, more exceptions to regularities The more levels it presumes a knowledge of or reliance on Theories used to explain the data move more into the areas of cognitive psychology and AI Lower levels of the model have been more thoroughly investigated and incorporated into NLP systems

Phonetic Speech Level Processing - Interpretation of speech sounds within & across words - sound waves are analyzed and encoded into a digitized signal Rules used in Phonological Analysis 1. Phonetic rules sounds within words 2. Phonemic rules variations of pronunciation when words are spoken together 3. Prosodic rules fluctuation in stress and intonation across a sentence

Morphological Analysis - deals with the componential nature of lexical entities: prefix pre registra tion suffix stem/root - What features do inflections reveal in English? Verbs tense & number Nouns single/plural Adjectives comparison features

Lexical 1. Part-of-speech (POS) tagging tags words with specific noun, verb, adjective and adverb types 03/14/1999 (AFP) the extremist Harkatul Jihad group, reportedly backed by Saudi dissident Osama bin Laden... the DT extremist JJ Harkatul_Jihad NP group NN,, reportedly RB backed VBD by IN Saudi NP dissident NN Osama_bin_Laden NP 2. Productive rules which explain how new words are formed highchair egghead

Lexical Word Level (Lexico-Semantics) Meaning Usually given by online lexicon such as WordNet Word with senses Example: launch Definitions Noun sense 1: a large, usually motor-driven boat used for carrying people on rivers, lakes harbors, etc. Verb sense 1: set up or found Synonyms Verb sense 1: establish, set up, found

Syntactic Analysis - produces a de-linearized representation of a sentence which reveals dependency relationships between words S Tree Structure Determiner - analyzing of words in a sentence so as to uncover the grammatical structure of the sentence - requires both a grammar and a parser NP Adjective NP2 Noun VP2 VP Prep PP NP Aux Verb Determiner NP2 Noun the glorious sun will shine in the winter

Sentence Noun Phrase Verb Phrase Determiner Noun Verb Noun Phrase Determiner Noun The cat ate the mouse The phase structure rules underlying this analysis are as follows: Sentence Noun Phrase Verb Phrase Noun Phrase Determiner Noun Verb Phrase Verb Noun Phrase Determiner = The Noun = cat Noun = mouse Verb = ate Parsing a sentence using simple phrase structure rules

Syntactic Ambiguity: We fed her dog bones S VP NP V NP NP Adj noun noun We fed her dog bones S VP NP V NP NP noun Adj noun We fed her dog bones

Semantics Determining possible meanings of a sentence Interactions among words affect lexico-semantic interpretation Capturing meaning of a sentence in a knowledge representation formalism

Semantic Role Labeling (SRL) Problem In a sentence, a verb and its semantic roles form a proposition; the verb can be called the predicate and the roles are known as arguments. Given a target verb, the Semantic Role Labeling task is to identify and label each semantic role present in the sentence. When Disney offered to pay Mr. Steinberg a premium for his shares, the New York investor didn t demand the company also pay a premium to other shareholders. Example roles for the verb pay, using roles more specific than theta roles: When [ payer Disney] offered to [ V pay] [ recipient Mr. Steinberg] [ money a premium] for [ commodity his shares], the New York investor 12

Semantic Relation Extraction Coca-Cola Enterprises, Inc. said its Atlanta Coca-Cola Bottling Co. unit and its CEO, John Smith, is a target of an investigation into alleged antitrust violations in the softdrink industry by a federal grand jury in Atlanta. Extracted Relations: Owns Coca-cola Enterprises, Inc. Coca-cola Bottling Co. Employs Coca-cola Enterprises, Inc. John Smith Location Coca-cola Bottling Co. Atlanta Location federal grand jury Atlanta

Discourse - determining meaning in texts longer than a sentence - making connections between component sentences - multi-sentence texts are not just concatenated sentences to be interpreted singly - Documents may have distinct patterns in different sections: introduction, conclusions, methodology, etc. - Text in dialogs has distinct forms according to position in the dialog - interpretation of later-mentioned entities depends on interpretation of earlier-mentioned entities anaphora

Why Pragmatic Knowledge is Needed Anaphora (coreference) resolution Excerpt from story by Farhad Manjoo of Slate Siri vs. Google Google Voice Search isn t close to realizing that vision, but it s not impossibly far off either. Huffman points out that Google s app can already hold very small conversations. It understands pronouns, so if you ask, Who is Barack Obama? and then ask, Who is his wife?, it knows that his refers to Obama. And most important, it gives you the correct answer. I just tried the same set of queries with Siri. First, she correctly identified the president. But when I asked, Who is his wife? she shot back, What is your wife s name? That s not what I asked. Actually, it s really, really far off. And there aren t any signs that Apple s voice assistant is going to get much closer any time soon.

Why Pragmatic Knowledge is Needed Anaphora (coreference) resolution The city councilors refused the demonstrators a permit because they feared violence. The city councilors refused the demonstrators a permit because they advocated revolution.

Pragmatics - The purposeful use of language in situations - A functional perspective - Those aspects of language which require context for understanding - Goal is to explain how extra meaning is read into texts without actually being encoded in them - Requires much world knowledge - Understanding of intentions / plans / goals

Pragmatics TAKE-TRIP BUY-TICKET GOTO-TRAIN GETON-TRAIN GOTO-TICKETBOOTH GIVE MONEY RECEIVE-TICKET Sketch of a commonsense task plan to take a trip

Techniques for NLP Analysis Corpus Statistics Frequencies of words Frequencies of word pairs, using co-occurrence or semantic measures Classification or other Machine Learning Use NLP to produce features, also known as attributes, of the text Classify the text according to a set of labels Classify customer reviews as positive or negative Classify news articles according to topic 19