Part-of-Speech Tagging. Yan Shao Department of Linguistics and Philology, Uppsala University 19 April 2017

Similar documents
2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

CS 598 Natural Language Processing

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Parsing of part-of-speech tagged Assamese Texts

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

An Evaluation of POS Taggers for the CHILDES Corpus

Grammars & Parsing, Part 1:

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

BULATS A2 WORDLIST 2

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Training and evaluation of POS taggers on the French MULTITAG corpus

Specifying a shallow grammatical for parsing purposes

The stages of event extraction

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Context Free Grammars. Many slides from Michael Collins

Prediction of Maximal Projection for Semantic Role Labeling

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Indian Institute of Technology, Kanpur

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Probabilistic Latent Semantic Analysis

Linking Task: Identifying authors and book titles in verbose queries

Words come in categories

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

A Syllable Based Word Recognition Model for Korean Noun Extraction

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)

Named Entity Recognition: A Survey for the Indian Languages

Development of the First LRs for Macedonian: Current Projects

Natural Language Processing. George Konidaris

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Ensemble Technique Utilization for Indonesian Dependency Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

LTAG-spinal and the Treebank

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems

The Ups and Downs of Preposition Error Detection in ESL Writing

Distant Supervised Relation Extraction with Wikipedia and Freebase

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

THE VERB ARGUMENT BROWSER

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

What the National Curriculum requires in reading at Y5 and Y6

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Advanced Grammar in Use

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

A Graph Based Authorship Identification Approach

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

Grade 7. Prentice Hall. Literature, The Penguin Edition, Grade Oregon English/Language Arts Grade-Level Standards. Grade 7

Introduction to Text Mining

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Ch VI- SENTENCE PATTERNS.

Applications of memory-based natural language processing

AQUA: An Ontology-Driven Question Answering System

The College Board Redesigned SAT Grade 12

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Using dialogue context to improve parsing performance in dialogue systems

Myths, Legends, Fairytales and Novels (Writing a Letter)

The Smart/Empire TIPSTER IR System

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Guidelines for Writing an Internship Report

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Modeling function word errors in DNN-HMM based LVCSR systems

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Speech Recognition at ICSI: Broadcast News and beyond

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Leveraging Sentiment to Compute Word Similarity

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Multi-Lingual Text Leveling

Vocabulary Usage and Intelligibility in Learner Language

Universiteit Leiden ICT in Business

Modeling full form lexica for Arabic

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Large vocabulary off-line handwriting recognition: A survey

Accurate Unlexicalized Parsing for Modern Hebrew

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Chapter 4: Valence & Agreement CSLI Publications

LING 329 : MORPHOLOGY

Sample Goals and Benchmarks

Character Stream Parsing of Mixed-lingual Text

Learning Methods in Multilingual Speech Recognition

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Course Outline for Honors Spanish II Mrs. Sharon Koller

Developing Grammar in Context

Grammar Extraction from Treebanks for Hindi and Telugu

The Role of the Head in the Interpretation of English Deverbal Compounds

Modeling function word errors in DNN-HMM based LVCSR systems

Analysis of Probabilistic Parsing in NLP

BYLINE [Heng Ji, Computer Science Department, New York University,

Transcription:

Part-of-Speech Tagging Yan Shao Department of Linguistics and Philology, Uppsala University 19 April 2017

Last time N-grams are used to create language models The probabilities are obtained via on corpora using MLE. Data sparsity and smoothing. Markov Assumption. PoS Tagging 2/33

Outline 1 Introduction of PoS Tagging 2 An Example of a Tagged Corpus: SUC 3 Evaluation 4 Types of Tagging Approaches Rule-based Approaches Statistical Approaches PoS Tagging 3/33

Part-of-speech (PoS) Part of Speech: Category of words corresponding to similar grammatical properties. traditional parts of speech - Noun, verb, adjective, adverb, preposition, article, interjection, pronoun, conjunction,... Also known as: - Parts of speech, lexical categories, word classes, morphological classes, lexical tags,... Lots of debate within linguistics about the number, nature, and universality of these PoS Tagging 4/33

PoS Examples PoS Tagging 5/33

Introduction of PoS Tagging Definition of POS Tagging The process of assigning a part-of-speech tag to every word of a sentence/text PoS Tagging 6/33

Why PoS-Tagging? Distinguish heterophones in speech synthesis - I did not object to the object. To present the present. The bandage was wound around the wound. Parsing - POS tagging is the proceeding step for parsing (syntactic analysis). Information extraction - Finding names, relations, etc. Machine translation PoS Tagging 7/33

What is the challenge in PoS Tagging? Ambiguous words Solve the lexical ambiguities - The/DT wind/nn was/vb too/adv strong/adj to/prp wind/vb the/dt sail/nn. Unknown words The/DT rural/jj Babbitt/??? who/wp bloviates/??? about/in progress/nn and/cc growth/nn PoS Tagging 8/33

How is PoS-Tagging performed Two sources of information Lexical information (the word itself) - Known words can be looked up in a lexicon listing possible tags for each word - Unknown words can be analyzed with respect to affixes, capitalization, special symbols, etc. Contextual information (surrounding words) - Contextual words - Contextual POS tags Two Main approaches Rule-based systems Statistical systems PoS Tagging 9/33

Tagsets are not universal We need a standard set of tags to do POS tagging Various tagging schemes are employed in different annotated corpora. Very coarse tagsets: - N, V, Adj, Adv,... More commonly used sets are more fine-grained: - English: Penn Treebank tagset, 45 tags - Swedish: SUC tagset, 25 base tags + features 150 tags Even more fine-grained tagsets exist PoS Tagging 10/33

There are two types of tags, open and closed classes. Closed class: a small fixed membership - Prepositions: of, in, by,... - Pronouns: I, you, she, mine, his, this, that,... - Determiners: the, a, this, that,... - Usually function words - Often frequent and ambiguous Open class: new ones can be added all the time - English has 4: Nouns, Verbs, Adjectives, Adverbs - Usually content words - Often rare and (therefore sometimes) unknown PoS Tagging 11/33

Penn TreeBank POS Tagset PoS Tagging 12/33

How Hard is POS Tagging? Measuring Ambiguity PoS Tagging 13/33

The SUC POS Tagset PoS Tagging 14/33

The SUC POS Tagset Och han menade faktiskt allvar PoS Tagging 15/33

The SUC POS Tagset Och/KN han/pn menade/vb faktiskt/ab allvar/nn PoS Tagging 16/33

SUC includes morphosyntactic features, as we see in this sample: PoS Tagging 17/33

List of the morphosyntactic features PoS Tagging 18/33

Adding morphosyntactic features Och han menade faktiskt allvar KN PN VB AB NN Och han menade faktiskt allvar KN PN_UTR SIN DEF SUB VB_PRT AKT AB_POS NN_NEU SIN IND NOM PoS Tagging 19/33

Adding morphosyntactic features Och han menade faktiskt allvar KN PN VB AB NN Och han menade faktiskt allvar KN PN_UTR SIN DEF SUB VB_PRT AKT AB_POS NN_NEU SIN IND NOM PoS Tagging 19/33

Evaluation Evaluate the accuracy of the POS tagger Overall error rate with respect to a manually annotated gold-standard test set Error rates on known vs. unknown words Error rates on particular tags Accuracy typically reaches 96 97% for English newswire text PoS Tagging 20/33

Evaluation Evaluate the accuracy of the POS tagger Overall error rate with respect to a manually annotated gold-standard test set Error rates on known vs. unknown words Error rates on particular tags Accuracy typically reaches 96 97% for English newswire text PoS Tagging 20/33

Error Analysis Generate a confusion matrix (for development data): How often was tag i mistagged as tag j: A confusion matrix See what errors are causing problems: Noun (NN) vs ProperNoun (NNP) vs Adj (JJ) Preterite (VBD) vs Participle (VBN) vs Adjective (JJ) PoS Tagging 21/33

Some Vocabulary Unknown word: word that is not in the dictionary/lexicon of the tagger Ambiguous word: word that can have different tag, depending on the context. Low-frequent word: word that are very rare (sometimes appears one time) in your corpus. PoS Tagging 22/33

Two Approaches for POS Tagging Rule-based systems Constraint Grammar Transformation-Based Learning Statistical sequence models Hidden Markov Models Maximum Entropy Markov Models Conditional Random Fields Neural Networks PoS Tagging 23/33

Two Approaches for PoS Tagging Rule-based systems a) Constraint Grammar - Assign all possible tags to each word - Apply rules that discard tags based on context - Rules created by hand b) Transformation-Based Learning - Assign most frequent tag to each word - Apply rules that replace tags based on context - Later rules may overwrite earlier rules - Rules learned from tagged corpus PoS Tagging 24/33

Two Approaches for PoS Tagging Rule-based systems a) Constraint Grammar - Assign all possible tags to each word - Apply rules that discard tags based on context - Rules created by hand b) Transformation-Based Learning - Assign most frequent tag to each word - Apply rules that replace tags based on context - Later rules may overwrite earlier rules - Rules learned from tagged corpus PoS Tagging 24/33

Two Approaches for PoS Tagging a) Constraint Grammar For each ambiguous word, apply a rule. Example: "An ambiguous word is a noun rather than a verb if it succeeds a determiner". Advantages: - Can achieve very high recall with good lexical resources - Rules can be interpreted by humans, which facilitates debugging Drawbacks: - Not always possible to eliminate all ambiguity - Rule design is (very) expensive and time-consuming PoS Tagging 25/33

Two Approaches for PoS Tagging b)transformation-based Learning (=Brill tagging) The rules are NOT hand-written. The most probable tags are initially assigned. Advantages: - Rules can be interpreted by humans, which facilitates debugging - Rules are learnt automatically from data Drawbacks: - Not quite as accurate as the best models - Slow to train on large data sets PoS Tagging 26/33

Two Approaches for POS Tagging Statistical Models The parameter of the tagger is statistically learned from an annotated corpus. - What is the most probable tag sequence given a sequence of words? - What is the most probable sequence of tags that generates this sentence? PoS Tagging 27/33

Exercise: Imagine a tagged corpus A C B B A B A A <S> Adj Verb Noun Adj Verb Adj Noun Adj </S> We can for instance compute: (1) The probability of a word to be assigned a certain tag (example: P(Adj B)=2/3 ). (2) The probability of transition between tags c(verb,noun)=1 c(noun)=2 P(Noun Verb)=1/2 (3) The probability of generating a word given a certain tag c(adj, B)=2 c(adj)=4 P(B Adj)=2/4 PoS Tagging 28/33

Exercise: Imagine a tagged corpus A C B B A B A A <S> Adj Verb Noun Adj Verb Adj Noun Adj </S> We can for instance compute: (1) The probability of a word to be assigned a certain tag (example: P(Adj B)=2/3 ). (2) The probability of transition between tags c(verb,noun)=1 c(noun)=2 P(Noun Verb)=1/2 (3) The probability of generating a word given a certain tag c(adj, B)=2 c(adj)=4 P(B Adj)=2/4 PoS Tagging 28/33

Exercise: Imagine a tagged corpus A C B B A B A A <S> Adj Verb Noun Adj Verb Adj Noun Adj </S> We can for instance compute: (1) The probability of a word to be assigned a certain tag (example: P(Adj B)=2/3 ). (2) The probability of transition between tags c(verb,noun)=1 c(noun)=2 P(Noun Verb)=1/2 (3) The probability of generating a word given a certain tag c(adj, B)=2 c(adj)=4 P(B Adj)=2/4 PoS Tagging 28/33

Exercise: Imagine a tagged corpus A C B B A B A A <S> Adj Verb Noun Adj Verb Adj Noun Adj </S> We can for instance compute: (1) The probability of a word to be assigned a certain tag c(adj,b)=2 c(b)=3 P(Adj B)=2/3 (2) The probability of transition between tags c(verb,noun)=1 c(noun)=2 P(Noun Verb)=1/2 (3) The probability of generating a word given a certain tag c(adj,b)=2 c(adj)=4 P(B Adj)=2/4 PoS Tagging 29/33

HMM Hidden Markov Model (HMM) for POS tagging. PoS Tagging 30/33

Hidden Markov Model (HMM): Formally HMM tagging is based on two mathematical statements The Bayesian inference: Applied to tag sequence prediction: And the Markov assumptions - Generation of each word w i, only depends on its tag t i, and not on previous words - Generation of each tag t i only depends on its immediate predecessor t i 1 PoS Tagging 31/33

More Formally Alphabet Σ = { s 1, s 2,, s M } Set of states Q = { q 1, q 2,, q M } Transition probabilities between any two states a ij = P(q j q i ) = transition prob from state i to state j Start probabilities for any state π 0i = P(q i ) = start prob for state I Emission probabilities for each symbol and state b ik = P( s k q i ) 36

Summary Part-of-speech tagging Prior step in many NLP applications Different tagsets and tagging schemes Approaches Rule-based systems (Constraint Grammar, Transformation Based Learning) Statistical sequence models (HMM,...) State of the art 96-97% accuracy for English newswire text PoS Tagging 32/33

References Daniel Jurafsky and James H Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, volume 163 of Prentice Hall Series in Artificial Intelligence. Prentice Hall, Pearson International Edition, 2009. Have a look as well here : https://www.coursera.org/course/nlp PoS Tagging 33/33