Natural Language Processing

Similar documents
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

BULATS A2 WORDLIST 2

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

CS 598 Natural Language Processing

An Evaluation of POS Taggers for the CHILDES Corpus

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Specifying a shallow grammatical for parsing purposes

Grammars & Parsing, Part 1:

Ch VI- SENTENCE PATTERNS.

Words come in categories

Training and evaluation of POS taggers on the French MULTITAG corpus

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

Development of the First LRs for Macedonian: Current Projects

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Leveraging Sentiment to Compute Word Similarity

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Indian Institute of Technology, Kanpur

Parsing of part-of-speech tagged Assamese Texts

Ensemble Technique Utilization for Indonesian Dependency Parser

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

The stages of event extraction

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Linking Task: Identifying authors and book titles in verbose queries

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Writing a composition

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Chapter 4: Valence & Agreement CSLI Publications

Using dialogue context to improve parsing performance in dialogue systems

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Grade 7. Prentice Hall. Literature, The Penguin Edition, Grade Oregon English/Language Arts Grade-Level Standards. Grade 7

Analysis of Probabilistic Parsing in NLP

First Grade Curriculum Highlights: In alignment with the Common Core Standards

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Developing Grammar in Context

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Myths, Legends, Fairytales and Novels (Writing a Letter)

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

Sample Goals and Benchmarks

Prediction of Maximal Projection for Semantic Role Labeling

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

The Discourse Anaphoric Properties of Connectives

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

The Effect of Multiple Grammatical Errors on Processing Non-Native Writing

Survey on parsing three dependency representations for English

A Syllable Based Word Recognition Model for Korean Noun Extraction

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

CORPUS ANALYSIS CORPUS ANALYSIS QUANTITATIVE ANALYSIS

LTAG-spinal and the Treebank

Parsing Morphologically Rich Languages:

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Memory-based grammatical error correction

Cross-lingual Transfer Parsing for Low-Resourced Languages: An Irish Case Study

Character Stream Parsing of Mixed-lingual Text

Oakland Unified School District English/ Language Arts Course Syllabus

Loughton School s curriculum evening. 28 th February 2017

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Experts Retrieval with Multiword-Enhanced Author Topic Model

Large vocabulary off-line handwriting recognition: A survey

Multilingual Sentiment and Subjectivity Analysis

Natural Language Processing. George Konidaris

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

Advanced Grammar in Use

The Choice of Features for Classification of Verbs in Biomedical Texts

A Bayesian Learning Approach to Concept-Based Document Classification

Adjectives tell you more about a noun (for example: the red dress ).

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

BASIC ENGLISH. Book GRAMMAR

EAGLE: an Error-Annotated Corpus of Beginning Learner German

4 th Grade Reading Language Arts Pacing Guide

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Constructing Parallel Corpus from Movie Subtitles

Learning Methods in Multilingual Speech Recognition

Refining the Design of a Contracting Finite-State Dependency Parser

THE VERB ARGUMENT BROWSER

The Smart/Empire TIPSTER IR System

AQUA: An Ontology-Driven Question Answering System

Vocabulary Usage and Intelligibility in Learner Language

SAMPLE. Chapter 1: Background. A. Basic Introduction. B. Why It s Important to Teach/Learn Grammar in the First Place

5 th Grade Language Arts Curriculum Map

Emmaus Lutheran School English Language Arts Curriculum

The College Board Redesigned SAT Grade 12

Universiteit Leiden ICT in Business

Transcription:

Natural Language Processing Part-of-Speech Tagging Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Natural Language Processing 1(13)

Parts of Speech I Basic grammatical categories used since antiquity: 1. Noun 2. Verb 3. Adjective 4. Adverb 5. Preposition 6. Pronoun 7. Conjunction 8. Interjection I Lots of debate in linguistics about their nature and universality I Nevertheless very robust and useful for NLP Natural Language Processing 2(13)

Part-of-Speech Tagging I Assign a part-of-speech tag to every word of a sentence Word Tag Holmes PROPN put VERB the DET keys NOUN on ADP the DET table NOUN. PUNCT Natural Language Processing 3(13)

Why is PoS tagging useful? I First step in a vast number of practical tasks 1. Text-to-speech how to pronounce lead or insult? 2. Parsing need to know if a word is NOUN or VERB 3. Information extraction finding names, relations, etc. I Used as a backoff model for word tokens (sparse data) Natural Language Processing 4(13)

Why is PoS tagging hard? I Lexical ambiguity: 1. Prince is expected to race/verb tomorrow 2. People wonder about the race/noun for outer space I Unknown words: 1. The rural Babbitt who bloviates about progress and growth Natural Language Processing 5(13)

How is it done? I Lexical information (the word itself) I Known words can be looked up in a lexicon listing possible tags for each word I Unknown words can be analyzed with respect to affixes, capitalization, special symbols, etc. I Contextual information (surrounding words) I A language model can rank tags in context I Many different models and techniques (more later) Natural Language Processing 6(13)

Quiz 1 I Consider the following incomplete sentence: She sent a letter to the... I Which parts of speech are likely to occur next? 1. ADJ 2. NOUN 3. VERB Natural Language Processing 7(13)

Tag Sets I There are many potential distinctions we can draw I Tag sets range from coarse-grained to fine-grained 1. Universal Dependencies: 17 tags 2. Penn Treebank, English: 45 tags 3. SUC, Swedish: 25 tags + features 150 tags I Choice of tag set may depend on application Natural Language Processing 8(13)

Universal Dependencies (UD) Open class words Closed class words Other ADJ adjective ADP preposition/postposition PUNCT punctuation ADV adverb AUX auxiliary verb SYM symbol INTJ interjection CONJ coordinating conjunction X unspecified NOUN noun DET determiner PROPN proper noun NUM numeral VERB verb PART particle PRON pronoun SCONJ subordinating conjunction Natural Language Processing 9(13)

Penn Treebank Natural Language Processing 10(13)

How hard is PoS tagging? Natural Language Processing 11(13)

Evaluation I Evaluation against a manually annotated gold standard I Evaluation metrics: I Accuracy = percentage of correctly tagged tokens I Separate results for ambiguous and/or unknown words I State of the art: I I I 96 98% for English news text What about Turkish? What about Twitter? Natural Language Processing 12(13)

Quiz 2 I Consider the following tagging: She/PRON won/verb the/det race/verb I What accuracy score would you give it? 1. 100% 2. 75% 3. 50% Natural Language Processing 13(13)

Natural Language Processing Tagging Methods Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Natural Language Processing 1(14)

Part-of-Speech Tagging I Task: I Assign a part-of-speech tag to every word of a sentence I Useful tools and techniques: I Lexicon mapping words to possible tags I Linguistic rules for disambiguation in context I Statistical models of tags and words in context I Heuristics for handling unknown words I In this lecture: I Transformation-based tagging a rule-based approach I HMM tagging a statistical approach Natural Language Processing 2(14)

Transformation-Based Tagging I Assign each word its most frequent tag Prince is expected to race/noun tomorrow People wonder about the race/noun for outer space Prince is expected to run/verb tomorrow People wonder about the run/verb for charity Natural Language Processing 3(14)

Transformation-Based Tagging I Assign each word its most frequent tag Prince is expected to race/noun tomorrow People wonder about the race/noun for outer space Prince is expected to run/verb tomorrow People wonder about the run/verb for charity I Use a sequence of rules to refine the tagging 1. NOUN! VERB if preceding word is to 2. VERB! NOUN if preceding word is the Natural Language Processing 3(14)

Transformation-Based Tagging I Assign each word its most frequent tag Prince is expected to race/noun tomorrow People wonder about the race/noun for outer space Prince is expected to run/verb tomorrow People wonder about the run/verb for charity I Use a sequence of rules to refine the tagging 1. NOUN! VERB if preceding word is to 2. VERB! NOUN if preceding word is the Prince is expected to race/verb tomorrow People wonder about the race/noun for outer space Prince is expected to run/verb tomorrow People wonder about the run/noun for charity Natural Language Processing 3(14)

Transformation-Based Tagging I Learning a set of rules from a tagged corpus: 1. Define a set of rule templates 2. Assign every word its most frequent tag 3. Repeat until no further improvement: 3.1 Apply every rule to the current tagged corpus by itself 3.2 Add the best rule R to the sequence of rules 3.3 Transform the current tagged corpus using R Natural Language Processing 4(14)

Transformation-Based Tagging I Learning a set of rules from a tagged corpus: 1. Define a set of rule templates 2. Assign every word its most frequent tag 3. Repeat until no further improvement: 3.1 Apply every rule to the current tagged corpus by itself 3.2 Add the best rule R to the sequence of rules 3.3 Transform the current tagged corpus using R I Using the rules to tag a new text: 1. Assign every word its most frequent tag 2. For every rule R 1,...,R n in the learned sequence: 2.1 Transform the current tagged corpus using R i Natural Language Processing 4(14)

Quiz 1 I Consider the following initial taggings of the word light: 1.... light/verb the/det candle/noun... 2.... see/verb the/det light/verb... 3.... carry/verb the/det light/verb suitcase/noun... I And suppose we apply the following two rules in sequence: 1. VERB! NOUN if preceding word is DET 2. NOUN! ADJ if preceding word is DET and following word is NOUN I Which of the following statements are true? 1. All three occurrences are correctly tagged in the end 2. There is at least one error in the end tagging 3. Removing the second rule gives one more error in the end 4. Switching the order of the rules has no impact on the end result Natural Language Processing 5(14)

Statistical Tagging I Basic ideas: I Build a statistical model of words and their tags I Estimate model parameters from (tagged) corpus data I Use the model to assign the most probable tags to words I Example: I Part-of-speech tagging using Hidden Markov Models (HMM) Natural Language Processing 6(14)

Hidden Markov Models I Markov models are probabilistic sequence models used for problems such as: 1. Speech recognition 2. Spell checking 3. Part-of-speech tagging 4. Named entity recognition I Given a word sequence w 1,...,w n,wewantto find the most probable tag sequence t 1,...,t n : argmax t 1,...,t n P(t 1,...,t n w 1,...,w n ) Natural Language Processing 7(14)

Model Construction I Bayesian inversion: P(t 1,...,t n w 1,...,w n )= P(t 1,...,t n )P(w 1,...,w n t 1,...,t n ) P(w 1,...,w n ) I Submodels: 1. Prior: P(t 1,...,t n ) 2. Likelihood: P(w 1,...,w n t 1,...,t n ) 3. Marginal: P(w 1,...,w n ) canbeignoredinargmaxsearch Natural Language Processing 8(14)

Markov Assumptions I Context model (prior) P(t 1,...,t n )= I Lexical model (likelihood) ny P(t i t i k...,t i 1 ) i=1 ny P(w 1,...,w n t 1,...,t n )= P(w i t i ) i=1 Natural Language Processing 9(14)

Model Parameters I Contextual probabilities P(t i t i k,...,t i 1 ) I Lexical probabilities P(w i t i ) I We can estimate these probabilities from a tagged corpus: ˆP MLE (w i t i )= f (w i, t i ) f (t i ) ˆP MLE (t i t i k,...,t i 1 )= f (t i k,...,t i 1, t i ) f (t i k,...,t i 1 ) Natural Language Processing 10(14)

Computing Probabilities I The probability of a tagging: P(t 1,...,t n, w 1,...,w n )= ny P(t i t i k,...,t i 1 )P(w i t i ) i=1 I Finding the most probable tagging: argmax t 1,...,t n ny P(t i t i k,...,t i 1 )P(w i t i ) i=1 I This requires an efficient algorithm (more later) Natural Language Processing 11(14)

Example P(she PRON) = 0.1 P(PRON START) = 0.5 P(can AUX) = 0.2 P(AUX PRON) = 0.2 P(can NOUN) = 0.001 P(NOUN PRON) = 0.001 P(run VERB) = 0.01 P(VERB AUX) = 0.5 P(run NOUN) = 0.001 P(NOUN AUX) = 0.001 P(VERB NOUN) = 0.2 P(NOUN NOUN) = 0.1 Natural Language Processing 12(14)

Example P(she PRON) = 0.1 P(PRON START) = 0.5 P(can AUX) = 0.2 P(AUX PRON) = 0.2 P(can NOUN) = 0.001 P(NOUN PRON) = 0.001 P(run VERB) = 0.01 P(VERB AUX) = 0.5 P(run NOUN) = 0.001 P(NOUN AUX) = 0.001 P(VERB NOUN) = 0.2 P(NOUN NOUN) = 0.1 P(she/PRON can/aux run/verb) =0.5 0.1 0.2 0.2 0.5 0.01 = 0.00001 Natural Language Processing 12(14)

Example P(she PRON) = 0.1 P(PRON START) = 0.5 P(can AUX) = 0.2 P(AUX PRON) = 0.2 P(can NOUN) = 0.001 P(NOUN PRON) = 0.001 P(run VERB) = 0.01 P(VERB AUX) = 0.5 P(run NOUN) = 0.001 P(NOUN AUX) = 0.001 P(VERB NOUN) = 0.2 P(NOUN NOUN) = 0.1 P(she/PRON can/aux run/verb) =0.5 0.1 0.2 0.2 0.5 0.01 = 0.00001 P(she/PRON can/noun run/noun) =0.5 0.1 0.001 0.001 0.1 0.001 = 5 10 11 Natural Language Processing 12(14)

Fundamental Problems I Decoding: I How do we compute the best tag sequence given parameters? I Learning: I How do we estimate the parameters? Natural Language Processing 13(14)

Quiz 2 I Consider this simple HMM for tagging: P(she PRON) = 0.1 P(PRON START) = 0.5 P(can AUX) = 0.2 P(AUX PRON) = 0.2 P(can NOUN) = 0.001 P(NOUN PRON) = 0.001 P(run VERB) = 0.01 P(VERB AUX) = 0.5 P(run NOUN) = 0.001 P(NOUN AUX) = 0.001 P(VERB NOUN) = 0.2 P(NOUN NOUN) = 0.1 I Which of the following statements are true? 1. The probability that can is a NOUN is 0.001. 2. The probability that the word after an AUX is not a VERB is 0.5. 3. P(she/PRON can/aux) > P(she/PRON can/noun) Natural Language Processing 14(14)