Computational Linguistics

Similar documents
2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

An Evaluation of POS Taggers for the CHILDES Corpus

The stages of event extraction

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Training and evaluation of POS taggers on the French MULTITAG corpus

Learning Computational Grammars

Memory-based grammatical error correction

Indian Institute of Technology, Kanpur

Context Free Grammars. Many slides from Michael Collins

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

CS 598 Natural Language Processing

Linking Task: Identifying authors and book titles in verbose queries

Grammars & Parsing, Part 1:

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Semi-supervised Training for the Averaged Perceptron POS Tagger

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Prediction of Maximal Projection for Semantic Role Labeling

CS Machine Learning

Learning Methods in Multilingual Speech Recognition

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Myths, Legends, Fairytales and Novels (Writing a Letter)

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

The taming of the data:

A Case Study: News Classification Based on Term Frequency

SEMAFOR: Frame Argument Resolution with Log-Linear Models

cmp-lg/ Jan 1998

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Applications of memory-based natural language processing

LTAG-spinal and the Treebank

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Specifying a shallow grammatical for parsing purposes

Modeling function word errors in DNN-HMM based LVCSR systems

Parsing of part-of-speech tagged Assamese Texts

Using dialogue context to improve parsing performance in dialogue systems

Disambiguation of Thai Personal Name from Online News Articles

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Beyond the Pipeline: Discrete Optimization in NLP

ARNE - A tool for Namend Entity Recognition from Arabic Text

Using Semantic Relations to Refine Coreference Decisions

Leveraging Sentiment to Compute Word Similarity

Speech Recognition at ICSI: Broadcast News and beyond

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

On document relevance and lexical cohesion between query terms

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Multilingual Sentiment and Subjectivity Analysis

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain

Accurate Unlexicalized Parsing for Modern Hebrew

ScienceDirect. Malayalam question answering system

The Ups and Downs of Preposition Error Detection in ESL Writing

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Introduction to Text Mining

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Vocabulary Usage and Intelligibility in Learner Language

Distant Supervised Relation Extraction with Wikipedia and Freebase

What the National Curriculum requires in reading at Y5 and Y6

The Role of the Head in the Interpretation of English Deverbal Compounds

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Modeling function word errors in DNN-HMM based LVCSR systems

The Discourse Anaphoric Properties of Connectives

Word Segmentation of Off-line Handwritten Documents

Cross Language Information Retrieval

A Graph Based Authorship Identification Approach

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Corrective Feedback and Persistent Learning for Information Extraction

Large vocabulary off-line handwriting recognition: A survey

A Syllable Based Word Recognition Model for Korean Noun Extraction

A Comparison of Two Text Representations for Sentiment Analysis

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Survey on parsing three dependency representations for English

Analysis of Probabilistic Parsing in NLP

Named Entity Recognition: A Survey for the Indian Languages

Switchboard Language Model Improvement with Conversational Data from Gigaword

Advanced Grammar in Use

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Transcription:

Computational Linguistics Part-of-Speech Tagging Suhaila Saee & Bali Ranaivo-Malançon Faculty of Computer Science and Information Technology Universiti Malaysia Sarawak August 2014

Part Of Speech (POS) aka word class, grammatical category, or lexical category A category of words based on their grammatical function

POS Tagging The automatic process of choosing the best POS tag for each word in a text, and guessing the tags of unknown words. The name of the tool is POS tagger Example: Tagged sentence: Must/md-hl Berlin/np-hl remain/vb-hl divided/vbn-hl?/.-hl?/.-hl

Tagset A list of word classes Different from traditional grammar word classes (only 8 classes) Size Large tagset: fine distinction, low accuracy Small tagset: high accuracy Very small tagset: less information Consistency Words with the same meaning and function should be tagged with the same tag

Example of Penn Treebank Tagset (SOURCE: MARCUS ET. AL., 1993)

One Word, Multiple Tags Example: The word back can be: Adjective as in "The back door" Common noun as in "On my back" Adverb as in "Win the voters back" Verb as in "Promised to back the bill"

Multiple Words One lexical word can be made with multiple tokens Some tagsets allow multiple words to be treated as a single word by adding numbers to each tag Example: In C7 tagset: in/ii31 spite/ii32 of/ii33

Multipart Words A word can be composed with multiple units Some tagged corpora split certain words Example: In Penn Treebank: would/md n t/rb children/nns s/pos

POS Tagging Methods

TAGGIT (Green and Rubin, 1971) To tag the Brown corpus 2 steps: 1 Initial tag selection - Identify all possible tags for a token 2 Tag disambiguation - Choose the most appropriate tag Uses 86 basic tags Accuracy: 77%

Markov Models Alternatives for laborious and time-consuming manual tagging Extract linguistic knowledge automatically from large corpora A Markov model is a finite state automaton (collection of states connected by transitions) Each state has 2 probability distribution: 1 Probability of emitting a symbol 2 Probability of moving to a particular state (probabilistic transition function) From one state, the Markov model emits a symbol and then moves to another state The simplest Markov model is the Markov chain Markov chain

(SOURCE: JIA LI, 2006) Markov Models (cont d)

Hidden Markov Model (HMM) Tagging HMM: a mathematical model of a stochastic process ( Stochastic tagging) HMM tagger can be trained from untagged corpus But, a lexicon describing the possible tags for tokens is required OBSERVED: tokens in the sentence HIDDEN: POS tags DECODING: to uncover the most likely state sequence (sequence of POS tags) that could have generated the observation sequence (sequence of tokens)

Transformation-based Learning (TBL) Transforms one state to another using transformation rules GOAL: To find the suitable tag for each token INPUT: corpus OUTPUT: An ordered sequence of transformations Components: 1 Initial annotator 2 Transformations 3 Objective function

Brill Tagger Brill s tagging is an instance of TBL Like rule-based taggers, it is based on rules which determine when an ambiguous word should have a given tag Like stochastic taggers, it has a machine-learning component: the rules are automatically induced from a previously tagged training corpus

1 Initialisation Brill Tagger Algorithm Known words: assigning the most frequent tag associated to a form of the word Unknown words: Proper noun if capitalised; otherwise, simple noun Learning or guessing rules with lexical rules on the same basis as contextual rules 2 Learning phase Iteratively compute the error score of each candidate rule (difference between the number of errors before and after applying the rule) Select the best (higher score) rule Add it to the rule set and apply it to the text Repeat until no rule has a score above a given threshold, until applying new rules leaves the text in the same state (Modified from Wikipedia: "Brill tagger")

Other Tagging Methods Neural networks tagging (Schmid, 1994) Constraint relaxation and nearest neighbour tagging (Daelemans et al., 1996) Decision trees tagging (Schmid, 1997) Maximum entropy tagging (Ratnaparkhi, 1996)

Accuracy The most used performance measure for POS taggers Degree of correctness Accuracy = correctly tagged tokens total tokens Tagger s result is compared to an annotated reference corpus (= Gold Standard) Correct tagging requires Tagger and reference corpus must use the same word segmentation convention Tagger and reference corpus must use the same tagset Most current tagging algorithms have an accuracy of around 96-97% for simple tagsets like the Penn Treebank tagset These accuracies are for words and punctuation Accuracy for words only would be lower

What May Influence the Accuracy Measure? Amount of training data - More Better Size and quality of the tagset - Large tagset More potential ambiguity, tagging becomes harder Difference between training corpus and dictionary on the one hand and the corpus of application on the other - "If training and application text are drawn from the same source (e.g., the same time period of a particular newspaper), then accuracy will be high" Occurrence of unknown words in the test data - Many unknown words poor performance

Other metrics t i = the set of tags assigned to the i th word w i by a tagger r i = the set of tags assigned to same word in the reference corpus Recall wi = t i r i r i Precision wi = t i r i t i 1 F measure wi = α (1 α) + Precision Recall

Performance Analysis We should interpret an accuracy score relative to a lower-bound baseline and an upper-bound ceiling The choice of baseline is somewhat arbitrary, but it usually corresponds to minimal knowledge about the domain For POS tagging, the baseline can be a unigram tagger Human ceiling - How often do human annotators agree on the same tag?

Error Analysis Generate a confusion matrix (for development data): How often was tag i mistagged as tag j: The row labels indicate correct tags The column labels indicate the hypothesized tags by the tagger Each cell indicates percentage of the overall tagging error e.g., 8.7% of the total errors were caused by mistagging JJ as NN Errors that are causing problems:

Error Analysis (cont d) Noun (NN) vs ProperNoun (NNP) vs Adj (JJ) Preterite (VBD) vs Participle (VBN) vs Adjective (JJ)

Tool: Brill tagger As mentioned in the previous slides, Brill tagger is an open-source program For further information, click to Brill tagger Online demo of Brill tagger is available demo

Tool: Stanford POS Tagger The tagger is implemented in Java GNU under licensed For further information on the tagger, go to Standford tagger

References Mitchell P. Marcus, Santorini B., & Mary Ann Marcinkiewicz. Building a Large Annotated Corpus of English: The Penn Treebank. Journal of Computational Linguistics. 19(2):313 330, 1993. J.Li. (2006). Hidden Markov Model. Retrieved from HMM Schmid H. (1994). Part-Of-Speech Tagging With Neural Networks. In Proceedings of 15th International Conference on Computational Linguistics. COLING 94. Schmid H. (1997). Probabilistic Part-of-Speech Tagging Using Decision Trees. New Methods in Language Processing. London: Studies in Computational Linguistics. UCL Press. Daelmans, W., Zavrel, J., Berck, P. & Gillis, S. (1996). MTB: A Memory-based Part-of-Speech Generator. In Proceedings of 4th Workshop on Very Large Corpora, Conpenhagen. Ratnaparkhi, A. (1996). A Maximum Entropy Model for Part-Of-Speech Tagging.