Part-of-speech tagging. Yuguang Zhang CS 886: Topics in Natural Language Processing University of Waterloo Spring 2015

Similar documents
ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Context Free Grammars. Many slides from Michael Collins

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

BULATS A2 WORDLIST 2

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

CS 598 Natural Language Processing

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Grammars & Parsing, Part 1:

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

An Evaluation of POS Taggers for the CHILDES Corpus

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

The stages of event extraction

Prediction of Maximal Projection for Semantic Role Labeling

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Linking Task: Identifying authors and book titles in verbose queries

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Ch VI- SENTENCE PATTERNS.

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Parsing of part-of-speech tagged Assamese Texts

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Training and evaluation of POS taggers on the French MULTITAG corpus

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Words come in categories

Learning Computational Grammars

Using dialogue context to improve parsing performance in dialogue systems

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

LTAG-spinal and the Treebank

Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Advanced Grammar in Use

Semi-supervised Training for the Averaged Perceptron POS Tagger

Emmaus Lutheran School English Language Arts Curriculum

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

Indian Institute of Technology, Kanpur

Writing a composition

Lecture 1: Machine Learning Basics

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Mercer County Schools

Today we examine the distribution of infinitival clauses, which can be

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Development of the First LRs for Macedonian: Current Projects

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Ensemble Technique Utilization for Indonesian Dependency Parser

BASIC ENGLISH. Book GRAMMAR

The Ups and Downs of Preposition Error Detection in ESL Writing

What the National Curriculum requires in reading at Y5 and Y6

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Automatic Translation of Norwegian Noun Compounds

Online Updating of Word Representations for Part-of-Speech Tagging

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Part I. Figuring out how English works

Outline. Dave Barry on TTS. History of TTS. Closer to a natural vocal tract: Riesz Von Kempelen:

Developing Grammar in Context

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Extracting Verb Expressions Implying Negative Opinions

THE VERB ARGUMENT BROWSER

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Memory-based grammatical error correction

Specifying a shallow grammatical for parsing purposes

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Grade 7. Prentice Hall. Literature, The Penguin Edition, Grade Oregon English/Language Arts Grade-Level Standards. Grade 7

A Graph Based Authorship Identification Approach

Part of Speech Template

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

A Syllable Based Word Recognition Model for Korean Noun Extraction

The taming of the data:

The Role of the Head in the Interpretation of English Deverbal Compounds

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Adjectives tell you more about a noun (for example: the red dress ).

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Sample Goals and Benchmarks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Comparison of Two Text Representations for Sentiment Analysis

CS Machine Learning

Construction Grammar. University of Jena.

SEMAFOR: Frame Argument Resolution with Log-Linear Models

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

The Smart/Empire TIPSTER IR System

Transcription:

Part-of-speech tagging Yuguang Zhang CS 886: Topics in Natural Language Processing University of Waterloo Spring 2015 1

Parts of Speech Perhaps starting with Aristotle in the West (384 322 BCE), there was the idea of having parts of speech a.k.a lexical categories, word classes, tags, POS It comes from Dionysius Thrax of Alexandria (c. 100 BCE) the idea that is still with us that there are 8 parts of speech But actually his 8 aren t exactly the ones we are taught today Thrax: noun, verb, article, adverb, preposition, conjunction, participle, pronoun School grammar: noun, verb, adjective, adverb, preposition, conjunction, pronoun, interjection

Open class (lexical) words Nouns Verbs Adjectives old older oldest Proper Common Main Adverbs slowly IBM cat / cats see Italy snow registered Numbers more 122,312 Closed class (functional) Modals one Determiners the some can Prepositions to with Conjunctions and or had Particles off up more Pronouns he its Interjections Ow Eh

Open vs. Closed classes Open vs. Closed classes Closed: determiners: a, an, the pronouns: she, he, I prepositions: on, under, over, near, by, Why closed? Open: Nouns, Verbs, Adjectives, Adverbs.

POS Tagging Words often have more than one POS: back The back door = JJ On my back = NN Win the voters back = RB Promised to back the bill = VB The POS tagging problem is to determine the POS tag for a particular instance of a word.

POS Tagging Input: Plays well with others Ambiguity: NNS/VBZ UH/JJ/NN/RB IN Output: NNS Plays/VBZ well/rb with/in others/nns Uses: Text-to-speech (how do we pronounce lead?) Can write regexps like (Det) Adj* N+ over the output for phrases, etc. As input to or to speed up a full parser If you know the tag, you can back off to it in other tasks Penn Treebank POS tags

POS tagging performance How many tags are correct? (Tag accuracy) About 97% currently But baseline is already 90% Baseline is performance of stupidest possible method Tag every word with its most frequent tag Tag unknown words as nouns Partly easy because Many words are unambiguous You get points for them (the, a, etc.) and for punctuation marks!

Deciding on the correct part of speech can be difficult even for people Mrs/NNP Shaefer/NNP never/rb got/vbd around/rp to/to joining/vbg All/DT we/prp gotta/vbn do/vb is/vbz go/vb around/in the/dt corner/nn Chateau/NNP Petrus/NNP costs/vbz around/rb 250/CD

How difficult is POS tagging? About 11% of the word types in the Brown corpus are ambiguous with regard to part of speech But they tend to be very common words. E.g., that I know that he is honest = IN (Preposition) Yes, that play was nice = DT (Determiner) You can t go that far = RB (Adverb) 40% of the word tokens are ambiguous

A Maximum Entropy Model for POS Tagging Adwait Ratnaparkhi

Sources of information Large annotated corpora for learning probability distributions man is rarely used as a verb. Word context Bill saw that man yesterday NNP NN DT NN NN VB VB(D) IN VB NN

Probability model p h, t h history t tag f j features k = πμ j=1 μ,a j model parameters a j f j (h,t) h i = {w i, w i+1, w i+2, w i 1, w i 2, t i 1, t i 2 } p(h,t) is determined by the a j such that f j (h,t)=1 {μ,a 1,a 2,,a k } are chosen to maximize the likelihood of training data 12

Other uses for the Maxent model You can use a maxent classifier whenever you want to assign data points to one of a number of classes: Sentence boundary detection (Mikheev 2000) Is a period end of sentence or abbreviation? Sentiment analysis (Pang and Lee 2002) Word unigrams, bigrams, POS counts, Machine translation (Pang and Lee 2002) Prepositional phrase attachment (Ratnaparkhi 1998) Attach to verb or noun? Features of head noun, preposition, etc. Parsing decisions in general (Ratnaparkhi 1997; Johnson et al. 1999, etc.) 13

An Example Word: The stories about wellheeled communiti es Tag DT NNS IN JJ NNS CC NNS Position 1 2 3 4 5 6 7 and developers 14

Example - Common Word 15

Example Rare Word 16

Testing the model Wall St. Journal data Training set to train the statistical model Development set to tune parameters and decide on the best model Test set distinct from development set gives an estimate of error rate on real data DataSet Sentences Words Unknown Words Training 40000 962687 Developm ent 8000 192826 6107 Test 5485 133805 3546 17

Procedure test corpus tagged one sentence at a time a modified beam search through possible tag sequences for a sentence tag sequence with the highest probability selected O(NTAB) running time with parameter estimation B beam size set to 5 N training set size T number of allowable tags A average number of active features for an event (h, t) 18

Performance summary Development Set Test Set Baseline with Tag Dictionary Baseline without Tag Dictionary Specialized Model Total Word Accuracy Unknown Word Accuracy Sentence Accuracy 96.43 86.23 47.55 96.31 86.28 47.38 96.63 85.56 47.51 19

Specialized model for problematic words 20

Overview: POS Tagging Accuracies Rough accuracies: Most freq tag: ~90% Trigram HMM: ~95% Maxent P(t w): 96.6% TnT (HMM++): 96.2% MEMM tagger: 96.9% Bidirectional dependencies: 97.2% Upper bound: ~98% (human agreement)

Feature-rich part-of-speech tagging with a cyclic dependency network Toutanova et al. 22

How to solve this? Left to right factors do not always suffice MD VB TO DT NN Will go to the store The TO tag is most often preceded by noun, rarely a modal verb MD NN TO VB Will to fight P(t 0 t -1 ) does not capture this, but P(t -1 =NN t 0 =TO) does

Bayesian dependency networks a) P(A)P(B A) b) P(A B)P(B) c) bidirectional net with models of P(A B) and P(B A) 24

Dependency networks p t, w = P(? ) a) P(t i t i 1, w i ) b) P(t i 1 t i, w i ) c) P(t i t i 1, t i+1, w i ) i 25

Inference for linear dependency networks Modified Viterbi algorithm to find the optimal sequence of tags Start from the last tag Multiply best score for previous tag and probability of current tag given word and surrounding tags 26

Directionality experiments CMM performance with tags alone gives token accuracies of L: 95.79% R: 95.14% L+R: 96.57% LR: 96.55% L+LL+LR+RR+R: 96.92% templates for TAGS in 3W+ TAGS 27

Lexicalization experiments Baseline Three Words t 0 t 0 w 0 w -1 w 0 w 1 Model Features Sentence Accuracy Token Accuracy Unknown Accuracy BASELINE 6,501 1.63% 60.16% 82.98% 3W 239,767 48.27% 96.57% 86.78% 3W+TAGS 263,160 53.83% 97.02% 88.05% BEST 460,552 55.31% 97.15% 88.61%

Unknown word features Crude company name detector Capitalized words followed within 3 words by Co., Inc., etc Minor: allcaps conjunction of allcaps and digits eg CFC-12 Prefixes and suffixes of length up to 10 29