Chapter 8: Part-of-Speech Tagging (POS Tagging) See Manning & Schütze Chapter 10

Similar documents
2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Context Free Grammars. Many slides from Michael Collins

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

The stages of event extraction

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Training and evaluation of POS taggers on the French MULTITAG corpus

Grammars & Parsing, Part 1:

CS 598 Natural Language Processing

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

The Indiana Cooperative Remote Search Task (CReST) Corpus

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

SEMAFOR: Frame Argument Resolution with Log-Linear Models

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

cmp-lg/ Jan 1998

Online Updating of Word Representations for Part-of-Speech Tagging

LTAG-spinal and the Treebank

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

An Evaluation of POS Taggers for the CHILDES Corpus

Indian Institute of Technology, Kanpur

Linking Task: Identifying authors and book titles in verbose queries

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Outline. Dave Barry on TTS. History of TTS. Closer to a natural vocal tract: Riesz Von Kempelen:

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Prediction of Maximal Projection for Semantic Role Labeling

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

The Smart/Empire TIPSTER IR System

A Case Study: News Classification Based on Term Frequency

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Developing Grammar in Context

A Graph Based Authorship Identification Approach

Switchboard Language Model Improvement with Conversational Data from Gigaword

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Universiteit Leiden ICT in Business

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Sample Goals and Benchmarks

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

What is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation

Introduction to Text Mining

Large vocabulary off-line handwriting recognition: A survey

Semi-supervised Training for the Averaged Perceptron POS Tagger

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Ch VI- SENTENCE PATTERNS.

Memory-based grammatical error correction

Using computational modeling in language acquisition research

A Bayesian Learning Approach to Concept-Based Document Classification

Learning Computational Grammars

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

The Role of the Head in the Interpretation of English Deverbal Compounds

On document relevance and lexical cohesion between query terms

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

Using dialogue context to improve parsing performance in dialogue systems

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

BULATS A2 WORDLIST 2

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Part of Speech Template

Survey on parsing three dependency representations for English

THE VERB ARGUMENT BROWSER

ScienceDirect. Malayalam question answering system

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Cross Language Information Retrieval

Ensemble Technique Utilization for Indonesian Dependency Parser

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Lecture 1: Machine Learning Basics

Parsing of part-of-speech tagged Assamese Texts

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Extracting Verb Expressions Implying Negative Opinions

A Comparison of Two Text Representations for Sentiment Analysis

AQUA: An Ontology-Driven Question Answering System

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

A Computational Evaluation of Case-Assignment Algorithms

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

NEURAL DIALOG STATE TRACKER FOR LARGE ONTOLOGIES BY ATTENTION MECHANISM. Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim

BYLINE [Heng Ji, Computer Science Department, New York University,

Natural Language Processing. George Konidaris

Learning Methods in Multilingual Speech Recognition

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

A Syllable Based Word Recognition Model for Korean Noun Extraction

Distant Supervised Relation Extraction with Wikipedia and Freebase

Word Stress and Intonation: Introduction

Accurate Unlexicalized Parsing for Modern Hebrew

Probabilistic Latent Semantic Analysis

Speech Recognition at ICSI: Broadcast News and beyond

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Transcription:

Chapter 8: Part-of-Speech Tagging (POS Tagging) See Manning & Schütze Chapter 10

Overview Task Brill-tagger (rule based) HMM tagger (statistical) 2

Goal of Part-of-Speech Tagging Determine in a simple way the grammatical function of a word 3

Goal of Part-of-Speech Tagging Examples of tags: Tag Description Example CC Coordinating Conjunction and, but, or CD Cardinal number one, two, three DT Determiner a. the JJ Adjective yellow NN Noun, sing. or mass province NNS Noun in plural houses, apples IN Preposition in VB Verb, base form eat VBD Verb, past tense ate The representative put chairs on the table. DT NN VBD NNS IN DT NN. 4

Goal of Part-of-Speech Tagging One more example sentence Next you flour the pan. JJ PRP VB DT NN. More examples of ambiguous words: play: NN ( a new play ) VBP ( to play ) bear: NN ( the bear ) VB ( to bear ) 5

How difficult is the task? Roughly 10% of the tokens (running words) are ambiguous. the is also an OOV problem! 6

Applications of Tagging Partial parsing: syntactic analysis Information Extraction: tagging and partial parsing help identify useful terms and relationships between them. Information Retrieval: noun phrase recognition and query-document matching based on meaningful units rather than individual terms. Question Answering: analyzing a query to understand what type of entity the user is looking for and how it is related to other noun phrases mentioned in the question. 7

Brill-Tagger: Transformation based learning (TBL) 8

Transformation-Based Tagging (Brill Tagging) Idea: Assign each word the most likely tag Learn rules how to correct errors Combination of rule-based and machinelearning approach Example "The bear Most likely tags: DT VB Transformation rule VB NN if previous tag is DT Corrected tags: DT NN 9

Rule Learning Problem: Could apply transformations for ever Constrain the set of transformations with templates : Replace tag X with tag Y, provided tag Z or word Z appears in some position Rules are learned in ordered sequence Rules may interact Rules are compact and can be inspected by humans 10

Brill Tagger Types of rules Tag triggered Word triggers Morphology triggered (unknown words!) Tagging-Algorithms Assign default tag For each rule For all positions in text If rule is applicable: change tag accordingly 11

Most likely tags thanksgiving NN Thanks NNS UH thanks NNS VBZ VB UH thank VB VBP the DT VBD VBP NN DT IN JJ NN NNP PDT See LEXICON 12

TBL: Rule Learning 2 parts to a rule Triggering environment Rewrite rule The range of triggering environments of templates (from Manning & Schutze 1999:363) Schema t i-3 t i-2 t i-1 t i t i+1 t i+2 t i+3 1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 13

Templates for TBL See CONTEXTUALRULEFILE 14

Using Morphological Information 15

Training of the Brill Tagger C 0 := corpus tagged with most likely tag k:=0 Do { V:= the transformation u i that minimizes E(u i (C k )) If ( E(C k ) - E(v(C k )) ) < then break C k+1 = v(c k ) k+1 = v k++ } print 1, 2, k 16

Accuracy vs. Transformation Number Few transformation rules give the major contributions Overall a small number of transformation rules sufice 17

TBL: Problems First 100 rules achieve 96.8% accuracy First 200 rules achieve 97.0% accuracy Execution Speed: TBL tagger is slower than HMM approach Learning Speed: Brill s implementation over a day (600k tokens) BUT (1) Learns small number of simple, nonstochastic rules (2) Can be made to work faster with FST (3) Best performing algorithm on unknown words 18

Download the Tagger http://www.cs.jhu.ed u/~brill 19

Hidden Markov Model (HMM) based Taggers 20

Part-Of-Speech Tagging Sentence: Next you flour the pan. Tags: JJ PRP VBP DT NN. Intuitive idea of HMM: use statistics of co-occurences 21

Frequency of Determiners (Penn Treebank) THE 51302 A 21328 AN 3529 THIS 2433 SOME 1673 THAT 1475 ALL 977 ANY 831 NO 749 THESE 612 THOSE 604 ANOTHER 467 BOTH 462 EACH 441 EVERY 202 EITHER 51 NEITHER 40 22

Most frequent continuations of DT DT: overall 87232 occurrences DT NN 39302 DT JJ 19085 DT NNP 9529 DT NNS 6269 DT CD 2787 DT NN-POS 2174 DT RB 878 DT IN 843 DT JJS 800 23

Most frequent continuations of DT NN DT NN IN 11926 DT NN NN 4799 DT NN, 4067 DT NN. 3384 DT NN VBD 2559 DT NN VBZ 2462 DT NN TO 1612 DT NN RB 1138 DT NN CC 1095 24

Most frequent continuations of DT NN IN DT NN IN DT 3698 DT NN IN NNP 1601 DT NN IN NN 1557 DT NN IN JJ 1306 DT NN IN NNS 933 DT NN IN CD 714 DT NN IN PRP 480 Reliable statistics available How to use it? 25

Remember Bayes classifier from WSD Can the POS-tagging problem be cast as a Bayes classifier? 26

HMMs as a Bayes Classifier Consider the complete sequence of tags as the class to be assigned 27

Rewrite HMMs as a Bayes Classifier How many classes are there? 28

Estimation of Emission probabilities Assume Each word only depends on its tag Words are statistically independent 29

Estimate Transition Probabilities Use definition of conditional probabilities to rewrite Too many parameters to be estimated! 30

Simplifying assumptions Markov property: only immediate predecessors matter Bigram: Trigram: 31

Bigram Tagger 32

Trigram Tagger 33

Estimate Probabilities Maximum likelihood estimate would give: C(): count on training corpus In case of unseen events: use your favorite smoothing technique (see chapter 4) 34

Handling of Unknown Words Guess the POS: plunking resuciation verb (VBG) ( to plunk?) noun (NN) 35

Statistical properties of unknown words Feature Value NNP NN NNS VBG VBZ Unk. word yes 0.05 0.02 0.02 0.005 0.005 no 0.95 0.98 0.98 0.995 0.995 Captialized yes 0.95 0.10 0.10 0.005 0.005 no 0.05 0.90 0.90 0.995 0.995 ending -s 0.05 0.01 0.98 0.00 0.99 -ing 0.01 0.01 0.00 1.00 0.00 -tion 0.05 0.10 0.00 0.00 0.00 0.89 0.88 0.02 0.00 0.01 other 36

Estimate Emission Probabilities use Decomposition About 80% of the unknown words can be tagged correctly using that model 37

Finding the Best Tag Sequence Suppose sentence has N words Tag set has T tags T N possible tag sequences e.g. N=14, T=50 10 23 hypothesis to check (10 6 hypothesis per second 3 171 000 000 CPU years; about the age of Earth) 38

Finding the best path: Viterbi Algorithm (Bigram) See wikipedia or other lectures 39

Summary Assign grammatical labels to words Two well established approaches Brill tagger Hidden Markov model 40