CS630 Representing and Accessing Digital Information

Similar documents
2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Natural language processing implementation on Romanian ChatBot

Fuzzy Reference Gain-Scheduling Approach as Intelligent Agents: FRGS Agent

E-LEARNING USABILITY: A LEARNER-ADAPTED APPROACH BASED ON THE EVALUATION OF LEANER S PREFERENCES. Valentina Terzieva, Yuri Pavlov, Rumen Andreev

Consortium: North Carolina Community Colleges

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

CONSTITUENT VOICE TECHNICAL NOTE 1 INTRODUCING Version 1.1, September 2014

arxiv: v1 [cs.dl] 22 Dec 2016

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Context Free Grammars. Many slides from Michael Collins

Application for Admission

HANDBOOK. Career Center Handbook. Tools & Tips for Career Search Success CALIFORNIA STATE UNIVERSITY, SACR AMENTO

The stages of event extraction

part2 Participatory Processes

'Norwegian University of Science and Technology, Department of Computer and Information Science

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Grammars & Parsing, Part 1:

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Management Science Letters

Outline. Dave Barry on TTS. History of TTS. Closer to a natural vocal tract: Riesz Von Kempelen:

Training and evaluation of POS taggers on the French MULTITAG corpus

A Graph Based Authorship Identification Approach

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Learning Methods in Multilingual Speech Recognition

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

LTAG-spinal and the Treebank

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Indian Institute of Technology, Kanpur

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Semi-supervised Training for the Averaged Perceptron POS Tagger

Beyond the Pipeline: Discrete Optimization in NLP

Distant Supervised Relation Extraction with Wikipedia and Freebase

Switchboard Language Model Improvement with Conversational Data from Gigaword

Natural Language Processing. George Konidaris

A Bayesian Learning Approach to Concept-Based Document Classification

A Syllable Based Word Recognition Model for Korean Noun Extraction

Prediction of Maximal Projection for Semantic Role Labeling

Matching Similarity for Keyword-Based Clustering

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Disambiguation of Thai Personal Name from Online News Articles

Discriminative Learning of Beam-Search Heuristics for Planning

The Role of the Head in the Interpretation of English Deverbal Compounds

2014 Gold Award Winner SpecialParent

A heuristic framework for pivot-based bilingual dictionary induction

Survey on parsing three dependency representations for English

Online Updating of Word Representations for Part-of-Speech Tagging

A Case Study: News Classification Based on Term Frequency

cmp-lg/ Jan 1998

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

An Evaluation of POS Taggers for the CHILDES Corpus

Short Text Understanding Through Lexical-Semantic Analysis

Applications of memory-based natural language processing

What is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation

A Domain Ontology Development Environment Using a MRD and Text Corpus

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

The Indiana Cooperative Remote Search Task (CReST) Corpus

Using Semantic Relations to Refine Coreference Decisions

The Smart/Empire TIPSTER IR System

Truth Inference in Crowdsourcing: Is the Problem Solved?

Learning Computational Grammars

Chapter 2 Rule Learning in a Nutshell

Accuracy (%) # features

arxiv: v1 [cs.cl] 2 Apr 2017

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Comparison of network inference packages and methods for multiple networks inference

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

Multilingual Sentiment and Subjectivity Analysis

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

ScienceDirect. Malayalam question answering system

Linking Task: Identifying authors and book titles in verbose queries

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Parsing of part-of-speech tagged Assamese Texts

Radius STEM Readiness TM

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A Comparison of Two Text Representations for Sentiment Analysis

CS 446: Machine Learning

Ensemble Technique Utilization for Indonesian Dependency Parser

Lecture 10: Reinforcement Learning

A deep architecture for non-projective dependency parsing

Introduction to Text Mining

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Yoshida Honmachi, Sakyo-ku, Kyoto, Japan 1 Although the label set contains verb phrases, they

Rule Learning With Negation: Issues Regarding Effectiveness

Probabilistic Latent Semantic Analysis

Transcription:

CS630 Represetig ad Accesg Digital Iformatio Part-of-Speech Taggig Thorste Joachims Corell Uiverty Based o slides from Prof. Claire Cardie Why is POS Taggig Hard? Ambiguity He will race/vb the car. Whe will the race/noun ed? The boat floated/vbd dow the river. The boat floated/vbn dow the river sak. Average of ~2 parts of speech for each word The umber of tags used by differet systems varies a lot. Some systems use < 20 tags, while others use > 400. Part-of-Speech Taggig Task defiitio Task specificatio Why is POS taggig difficult Trasformatio-based learig approach [Brill 93] Hidde Markov Models Amog Eaest of NLP Problems State of the art methods achieve ~97% accuracy. Simple heuristics ca go a log way. ~90% accuracy just by choog the most frequet tag for a word But defiig the rules for special cases ca be time-cosumig, difficult, ad proe to errors ad omisos Part-of-Speech Taggig Task Asg the correct part of speech (word class) to each word i a documet The/DT plaet/nn Jupiter/NNP ad/cc its/prp moos/nns are/vbp i/in effect/nn a/dt mii-solar/jj system/nn,/, ad/cc Jupiter/NNP itself/prp is/vbz ofte/rb called/vbn a/dt star/nn that/in ever/rb caught/vbn fire/nn./. Needed as a iitial procesg step for a umber of laguage techology applicatios Iformatio extractio Aswer extractio i QA Base step i idetifyig sytactic phrases for IR systems Critical for word-sese disambiguatio (WordNet apps) Part-of-Speech Taggig Task defiitio Task specificatio Why is POS taggig difficult Trasformatio-based learig approach [Brill 93] Hidde Markov Models

Trasformatio-Based Learig Machie learig techique For acquirig mple default heuristics ad rules for special cases Rules are leared by iteratively collectig errors ad geeratig rules to correct them. Requires a large (traiig) corpus of maually tagged text iitial state tagger Trasformatio-Based Learig allowable trasformatios: based o words ad tags i widow surroudig the target word objective fuctio: # correct- # icorrect [Brill 993] TBL: Top-Level Algorithm Lears a ordered list of trasformatios (i.e. rewrite rules) Learig Algorithm: Greedy Search Specify A iitial state aotator Space of allowable trasformatios Objective fuctio for comparig corpus to truth Algorithm Iterate Try each posble trasformatio Choose the oe with the best score Add to list of trasformatios Update the traiig corpus Util o trasformatio improves performace Rewrite Rules Rule Chage modal to ou, if precedig word is a determier, Example Determier: the, a, a, this, that Modals: ca, will, would, may, might followed by the mai verb The/det ca/modal rusted/verb./. The/det ca/ou rusted/verb./. Trasformatio Templates Chage tag A to B whe: precedig/followig word is tagged Z word two before/after is tagged Z oe of the two precedig/followig words is tagged Z oe of the three precedig/followig words is tagged Z precedig word is tagged Z ad followig word is tagged W precedig/followig word is tagged Z ad word two before/after is tagged W

Geeratig Trasformatios Apply the iitial tagger ad compile types of taggig errors. Each type of error is of the form: <icorrect tag, dered tag,# of occurreces> For each error type, istatiate all templates to geerate cadidate trasformatios. Apply each cadidate trasformatio to the corpus ad cout the umber of correctios ad errors that it produces. Save the trasformatio that yields the greatest improvemet. Stop whe o trasformatio ca reduce the error rate by a predetermied threshold. Taggig New Text The resultig tagger costs of two phases: Use the iitial tagger to tag all the text Apply each trasformatio, i order, to the corpus to correct some of the errors. The order of the trasformatios is very importat! For example, it is posble for a word s tag to chage several times as differet trasformatios are applied. I fact, a word s tag could thrash back ad forth betwee the same two tags. Example Suppose that the iitial tagger mistags 59 words as verbs whe they should have bee ous. Produces the error triple: < verb, ou, 59> Suppose template #3 is istatiated as the rule: Chage the tag from verb to ou if oe of the two precedig words is tagged as a determier. Evaluatio Traiig: 600,000 words from the Pe Treebak WSJ corpus Testig: separate 50,000 words from PTB Assumes all posble tags for all test set words are kow. 97.0% accuracy Tagger leared 378 rules. Whe this template is applied to the corpus, it corrects 98 of the 59 errors. But it also creates 8 ew errors. Error reductio is 98-8=80. Leared Rules. NN VB if the previous tag is TO I wated to/to wi/nn VB a Subaru WRX 2. VBP VB if oe of the prev-3 tags is MD The food might/md vaish/vbp VB from ght. 3. NN VB if oe of prev-2 tags is MD I might/md ot reply/nn VB 4. VB NN if oe of the prev-2 tags is DT 5. VBD VBN if oe of the prev-3 tags is VBZ 6. VBN VBD if oe of the previous tag is PRP Problems? Not lexicalized Trasformatios are etirely tag-based; o specific words were used i the rules. But certai phrases ad lexicalized expresos ca yield idiosycratic tag sequeces, so allowig the rules to look for specific words should help Add additioal templates E.g. whe the precedig/followig word is w Tagger achieves 97.2% accuracy First 200 rules achieved 97.0% First 00 rules achieved 96.8% Lears 447 rules Ukow words

Trasformatio-Based Learig Part-of-speech taggig [Brill 995; Ramshaw & Marcus 994] Prepotioal phrase attachmet [Brill & Rek 995] Sytactic parg [Brill 994] Nou phrase chukig [Ramshaw & Marcus 995, 999] Cotext-setive spellig correctio [Magu & Brill 997] Dialogue act taggig [Samuel et al. 998] States ad Tratios States Thik about as odes of a graph Oe for each POS tag special start state (ad maybe ed state) Tratios Thik about as directed edges i a graph Edges have tratio probabilities Output Each state also produces a word of the sequece Setece is geerated by a walk through the graph Part-of-Speech Taggig Part-of-Speech Taggig Task specificatio Why is POS taggig difficult Trasformatio-based learig approach [Brill 93] Hidde Markov Models Named Etity Recogitio Probabilistic Model Startig state s 0 Specifies where the sequece starts Tratio probability S t S t- ) Probability that oe states succeeds aother Matrix of ze #states * #states Emiso probability W t S t ) Probability that word is geerated i this state Matrix of ze #states * #words => Every word + state sequece has a probability W,S) w, s,..., s) = wi ) i=,..., w, sstart ) Hidde Markov Models Applicatio to POS taggig: View POS taggig as a sequece of word clasficatio tasks Goal: Trai a HMM to label every word with oe of the POS tags. What is a HMM? Hidde Markov Model (HMM) represets a process of geeratig the word ad tag sequece Probabilistic model Probability for each word ad tag sequece Predict most likely tag sequece for a give word sequece HMM Iferece Type I: Evaluatio Questio: What is the probabiliy of a output sequece give a HMM Give fully specified HMM: s 0, W t S t ), S t S t- ) Fid for a give w,,w w,..., w ) = wi ) ) ( s0,..., s ) i= Naïve algorithm expoetial rutime; forward algorithm liear i legth of sequece Laguage model Example: clasfy sequeces as questio vs. aswer setece.

HMM Iferece Type II: Decodig Questio: What is the most likely state sequece give a output sequece Give fully specified HMM: s 0, W t S t ), S t S t- ) Fid max s,..., s s0, w,..., w ) = max( s,..., s P wi P s ) ( ) ( i i= ) Viterbi algorithm has rutime liear i legth of sequece Example: fid the most likely tag sequece for a give sequece of words Tagger HMM TBL Experimetal Results Accuracy 96.80% 96.47% Experimet setup WSJ Corpus Trigram HMM model Lexicalized from [Pla ad Molia, 200] Traiig time 20 sec 9 days Predictio time 8.000 words/s 750 words/s Estimatig the Probabilities Give: Fully observed data Pairs of word sequece with their state sequece Estimatig tratio probabilities S t S t- ) # oftimesstateafollowsstateb sa sb ) = # oftimesstateboccurs Estimatig miso probabilities W t S t ) # oftimeswordaisobservedistateb wa sb) = # oftimesstateboccurs Smoothig the estimates Laplace smoothig -> uiform prior See aïve Bayes for text clasficatio Partially observed data: Expectatio Maximizatio (EM) HMM s for POS Taggig Deg HMM structure (vailla) States: oe state per POS tag Tratios: fully coected Emisos: all words observed i traiig corpus Estimate probabilities Use corpus, e.g. Treebak Smoothig Usee words? Taggig ew seteces Use Viterbi to fid most likely tag sequece