Part-of-Speech Tagging

Similar documents
2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Context Free Grammars. Many slides from Michael Collins

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Training and evaluation of POS taggers on the French MULTITAG corpus

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

The stages of event extraction

Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Grammars & Parsing, Part 1:

An Evaluation of POS Taggers for the CHILDES Corpus

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

Semi-supervised Training for the Averaged Perceptron POS Tagger

Prediction of Maximal Projection for Semantic Role Labeling

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Indian Institute of Technology, Kanpur

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

LTAG-spinal and the Treebank

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

cmp-lg/ Jan 1998

Outline. Dave Barry on TTS. History of TTS. Closer to a natural vocal tract: Riesz Von Kempelen:

Ensemble Technique Utilization for Indonesian Dependency Parser

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Online Updating of Word Representations for Part-of-Speech Tagging

Survey on parsing three dependency representations for English

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

Learning Computational Grammars

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Learning Methods in Multilingual Speech Recognition

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

CS 598 Natural Language Processing

The Indiana Cooperative Remote Search Task (CReST) Corpus

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

The Role of the Head in the Interpretation of English Deverbal Compounds

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

Named Entity Recognition: A Survey for the Indian Languages

The Ups and Downs of Preposition Error Detection in ESL Writing

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)

Leveraging Sentiment to Compute Word Similarity

BULATS A2 WORDLIST 2

The taming of the data:

Automatic Translation of Norwegian Noun Compounds

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

INSTANT VOCABULARY 6-10

Short Text Understanding Through Lexical-Semantic Analysis

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

THE VERB ARGUMENT BROWSER

Tagging Urdu Sentences from English POS Taggers

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Linking Task: Identifying authors and book titles in verbose queries

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Syntactic surprisal affects spoken word duration in conversational contexts

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Applications of memory-based natural language processing

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Distant Supervised Relation Extraction with Wikipedia and Freebase

Today we examine the distribution of infinitival clauses, which can be

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Comparison of Two Text Representations for Sentiment Analysis

Parsing of part-of-speech tagged Assamese Texts

Corrective Feedback and Persistent Learning for Information Extraction

Introduction to Text Mining

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Disambiguation of Thai Personal Name from Online News Articles

A Syllable Based Word Recognition Model for Korean Noun Extraction

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Ch VI- SENTENCE PATTERNS.

Memory-based grammatical error correction

Development of the First LRs for Macedonian: Current Projects

Natural Language Processing. George Konidaris

Modeling function word errors in DNN-HMM based LVCSR systems

MWU-aware Part-of-Speech Tagging with a CRF model and lexical resources

A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books

Using dialogue context to improve parsing performance in dialogue systems

Emmaus Lutheran School English Language Arts Curriculum

CS Machine Learning

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Transcription:

Part-of-Speech Tagging Announcements Lit Review Part 2 Written review of 2 articles, due April 1 CS 341: Natural Language Processing Prof. Heather Pon-Barry www.mtholyoke.edu/courses/ponbarry/cs341.html Final Project Proposal Due Monday April 6

Today POS Tagging Process of assigning part of speech marker to each word in a collection! POS Tagging She/pronoun! found/verb! herself/pronoun! falling/verb!...

POS Tagging Penn Treebank Tagset Words often have more than one POS: e.g., back The back door = adjective (JJ) On my back = noun (NN) Win the voters back = adverb (RB) Promised to back the bill = verb (VB) The POS tagging problem is to determine the POS tag for a particular instance of a word.

Applications Speech synthesis I object vs. This object... Parsing Machine translation Named entity recognition Word sense disambiguation POS Tagging Performance How many tags are correct? (Tag accuracy) State of the art: about 97% But baseline is already 90% Baseline is performance is: Tag every word with its most frequent tag Tag unknown words as nouns Partly easy because Many words are unambiguous You get points for them (the, a, etc.) and for punctuation marks!

How difficult is POS Tagging? In the Brown corpus: ~ 11% of the word types are ambiguous with regard to part of speech ~ 40% of the word tokens are ambiguous But they tend to be very common words. E.g., that I know that he is honest = preposition (IN) Yes, that play was nice = determiner (DT) You can t go that far = adverb (RB) Automatic POS Tagging Symbolic Rule-based Transformation-based Probabilistic Hidden Markov models Log-linear models

Rule-based Tagging Rule-based Example Start with a dictionary Assign all possible tags to words from the dictionary Write rules by hand to selectively remove tags Leaving the correct tag for each word!!!!! NN!!!!! RB!!!! VBN!! JJ VB! PRP! VBD!! TO VB DT NN! She!promised to back the! bill

Rule-based Example Eliminate VBN if VBD is an option when VBN VBD follows <start> PRP!!!! NN! RB!!! VBN! JJ VB! PRP VBD!! TO VB DT NN! She!promised to back the! bill Transformation-based Combines rule-based and probabilistic tagging rules are used to specify tags in a certain environment probabilistic, we use a tagged corpus to find the best performing rules (supervised learning) Input tagged corpus dictionary (with most frequent tags) Example: Brill tagger

Automatic POS Tagging Symbolic HMM: Part-of-Speech Transition Probabilities Rule-based Transformation-based Probabilistic Hidden Markov models Log-linear models

Observation Likelihoods: P(word tag) HMM

Maxent P(tag word) MEMMs Can do surprisingly well just looking at a word by itself: Word the: the DT Prefixes unfathomable: un- JJ Suffixes Importantly: -ly RB Maximum Entropy Markov Model A sequence version of the maximum entropy classifier. Capitalization Meridian: CAP NNP t i-2 t i-1 Word shapes 35-year: d-x JJ NNP MD VB Then build a classifier to predict tag w i-1 w i-1 w i w i+1 Maxent P(tag word): 93.7% overall / 82.6% unknown <s> Janet will back the bill Slide adapted from Dan Jurafsky

MEMMs More Features t i-2 t i-1 NNP MD VB w i-1 w i-1 w i w i+1 <s> Janet will back the bill Slide adapted from Dan Jurafsky

MEMM Decoding Simplest algorithm Greedy: at each step in sequence, select tag that maximizes P(tag nearby words, nearby tags) In practice Viterbi algorithm Beam search POS Tagging Accuracies Rough accuracies: Baseline: most freq tag: ~90% Trigram HMM: ~95% Maxent P(t w): 93.7% MEMM tagger: 96.9% Bidirectional MEMM: 97.2% Upper bound: ~98% (human agreement) Slide adapted from Dan Jurafsky

More Resources References Stanford POS Tagger (cyclic dependency network, bidirectional version of MEMM) http://nlp.stanford.edu/software/tagger.shtml CMU Twitter POS tagger http://www.ark.cs.cmu.edu/tweetnlp/ Log-linear models Ratnaparkhi, EMNLP 1996 Toutanova et al., NAACL 2003 Excellent recent survey: Part-of-speech tagging from 97% to 100%: is it time for some linguistics? (Manning, 2011)

Summary Training a Tagger Penn Treebank: standard tagset Approaches to POS tagging: Symbolic: rule-based, transformation-based Probabilistic: HMMs, MEMMs Input tagged corpus dictionary (with most frequent tags) These are available for English What about other languages?

Research in POS Tagging Low resource languages Learning a Part-of-Speech Tagger from Two Hours of Annotation (Garrette and Baldridge, 2013) [video]