1 2 Word-Classes and Part-of-Speech Tagging Christopher Brewster University of Sheffield Computer Science Department Natural Language Processing Group C.Brewster@dcs.shef.ac.uk Lecture Outline Definition and Example Motivation Word-classes A Basic Tagging System Transformation-Based Tagging Tagging Unknown Words Definition the process of assigning a part-of-speech or other lexical class marker to each word in a corpus D. Jurafsky and J.H. Martin, 2000, Speech and Language Processing WORDS the girl kissed the boy on the cheek TAGS N V P ART 3 An Example lemma tag The the +DET girl girl +NOUN kissed kiss +VPAST the the +DET boy boy +NOUN on on +PREP the the +DET cheek cheek +NOUN from http://www.xrce.xerox.com/research/mltt/toolhome.html 4 1
5 6 Motivation: the uses of Tagging Word Classes Speech synthesis pronunciation Speech recognition class-based N-grams Information retrieval stemming Word-sense disambiguation Corpus analysis of language & lexicography Basic words classes: Noun, Verb, Adjective, Adverb, Preposition,.. Open vs. Closed classes. Closed e.g determiners: a, an, the pronouns: she, he, I, others prepositions: on, under, over, near, by, at, from, to, with 7 8 Word Classes: Tag sets Word Classes: Tag set example Vary in number of tags: a dozen to over 200 Size of tag sets depends on language, objectives and purpose Simple morphology = more ambiguity = fewer tags Some tagging approaches (e.g. constraint grammar based) make fewer distinctions eg. conflating adverbs, particles and interjections CC CD DT EX FW coordin. conjunction cardinal number determiner existential there foreign word and, but, or one, two, three a, the there mea culpa IN JJ JJR NN NNS prepositi on adjective adj. compar. noun singular or mass noun, plural of, in, by yellow bigger llama llamas from the Penn treebank part-of-speech tag set. 2
The Problem Words often have more than one word class: this This is a nice day = PR This day is nice = ADJ You can go this far. = ADV 9 Word Class Ambiguity (in the Brown Corpus) Unambiguous (1 tag) 35, 340 Ambiguous (2-7 tags) 4,100 2 tags 3,760 3 tags 264 4 tags 61 5 tags 12 6 tags 2 7 tags 1 (still) from DeRose (1988) 10 A Basic System: the PARTS program PARTS A System for Assigning Word Classes to English Texts, L.L.Cherry Uses list of function words, and list of suffixes and auxiliaries as key sources of information many combination classes e.g. noun_adj words members of >2 classes initially assigned unk 11 input List of function words and irregular verbs with tags: able,adj will, aux or, conj outside, prep every, adj do, auxv but, conj up, prep own, adj be, be begun, ed over, prep ago, adj_adv and, conj bitten, ed until, prep_adv List of suffixes with most probable tag for words of that suffix. ic, adj ship, noun age, noun ment, noun ance, noun ant, noun_adj ize, verb ary, adj suffixes chosen by hand if most words with suffix have only 1 or 2 tags, this single or combined class assigned, exceptions added to exception list exception list has many obscure words A text 12 3
step 1 pre-processing 13 step 2 suffix analysis 14 1. tokenises words and sentences word = string of characters separated by blanks or punctuation sentence = string of words ending in.?! (other punctuation is treated as a comma 2. marks capitalised words not starting sentences as noun_adj 3. marks hyphenated words as noun_adj 4. lookup function words & irregular verbs in the list 1. applies to words NOT assigned tags in step 1 2. look up suffix list 3. unassigned words go on to step 3 15 16 step 3 word class assignment results and example 1. finds verb in the sentence (using auxiliary) 2. finds nouns 3. applies a set of rules of form: verb_adj & ~a => verb if the word has been assigned the class verb_adj and the verb has not been recognised in the sentence, assign verb to it 95% correct assignment 41.5% of errors arise from noun-adjective confusion Example: They act as messengers for the legislators. pronp unk prep_adv nv_pl prep_adv art nv_pl pron verb prep noun prep art noun 4
Other methods: Stochastic Tagging Not based on rules, but on probability of a certain tag occurring given. various possibilities. Necessitates a TRAINING CORPUS i.e. a hand tagged text in order to derive probabilities. Problem: no probabilities for words not in corpus Problem: Bad results if training corpus is very different from test corpus 17 Stochastic tagging Method: Choose most frequent tag in training text for each word. Result: 90% accuracy Reason: cf. figures on word class ambiguity where 90% of words have only one tag Therefore: this is a base line, and any other method must do significantly better cf. HMM tagging (lecture of Nick Webb) 18 Transformation-Based Learning Tagging (Brill Tagging) Combination of rule-based AND stochastic tagging methodologies Like rule-based because rules are used to specify tags in a certain environment Like stochastic approach because machine learning is used using a tagged corpus as input Input: a tagged corpus a dictionary (with the most frequent tags) 19 TBL: Rule Application Example rules: Change NN to VB when previous tag is TO For example: race has the following probabilities in the Brown corpus: P(NN race) =.98 P(VB race) =.02 is/vbz expected/vbn to/to race/nn tomorrow/nn becomes is/vbz expected/vbn to/to race/vb tomorrow/nn 20 5
TBL: Rule Learning 21 TBL: Rule Learning (2) 22 2 parts to a rule: Triggering environment Rewrite rule The range of Triggering environments or templates(from Manning & Schutze 1999:363): Schema t1-3 ti-2 ti-1 ti ti+1 ti+2 ti+3 1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * Templates are like under specified rules: Replace tag X with tag Y, provided tag Z or word Z appears in some position Rules are learned in ordered sequence whichever gives best net improvement at each iteration of the learning algorithm. Rules may interact i.e. Rule 1 may make a change which provides context for Rule 2 to fire. Rules are compact (a few hundred) and can be inspected by humans (vs. impossibility of inspecting HMM transition probabilities) TBL: the Algorithm Step 1: Label every word with most likely tag (from dictionary) Step 2: Check every possible transformation & select one which most improves tagging (with respect to hand tagged corpus) Step 3: Re-tag corpus applying the rules Repeat 2-3 until some stopping criterion is reached e.g. x % correct with respect to training corpus RESULT: a sequence of transformation rules 23 TBL: Problems Execution Speed: TBL tagger is slow compared to HMM approach Solution: compile the rules to a Finite State Transducer (FST) Learning Speed: Brill s implementation over a day (600k tokens) 24 6
Tagging Unknown Words New words added to (newspaper) language 20+ per month. Plus many proper names. Increases error rates by 1-2% Method 1: assume they are nouns Method 2: assume the unknown words have a probability distribution similar to hapax legomena Method 3: use capitalisation, suffixes, etc. This works very well for morphologically complex languages 25 Further Reading Introductory: Jurafsky, Daniel & James H. Martin, Speech and Language Processing, Prentice Hall: 2000 Chapter 8, pp285-322 Manning, Christopher & Hinrich Schutze, Foundations of Statistical Natural Language Processing, Chap 10, pp341-380 Texts: Brill, Eric Transformation-based error-driven learning and natural language processing: A case-study in part-of-speech tagging. Computational Linguistics 21:543-565 Cherry, L. PART: a system for assigning words classes to English text. AT &T memorandum. 1978 Church, K. A stochastic parts program and noun phrase parser for unrestricted text. Second Conference on Applied NLP, Austin, 1988 Garside, Roger, Geoffrey Sampson and Geoffrey Leach (eds) The Computational analysis of English: a corpus-based approach. London: 1987 Also check the papers referred to in the Introductory references. 26 7