Chapter 8: Part-of-Speech Tagging (POS Tagging) See Manning & Schütze Chapter 10

Size: px

Start display at page:

Download "Chapter 8: Part-of-Speech Tagging (POS Tagging) See Manning & Schütze Chapter 10"

Marianna Manning
5 years ago
Views:

1 Chapter 8: Part-of-Speech Tagging (POS Tagging) See Manning & Schütze Chapter 10

2 Overview Task Brill-tagger (rule based) HMM tagger (statistical) 2

3 Goal of Part-of-Speech Tagging Determine in a simple way the grammatical function of a word 3

4 Goal of Part-of-Speech Tagging Examples of tags: Tag Description Example CC Coordinating Conjunction and, but, or CD Cardinal number one, two, three DT Determiner a. the JJ Adjective yellow NN Noun, sing. or mass province NNS Noun in plural houses, apples IN Preposition in VB Verb, base form eat VBD Verb, past tense ate The representative put chairs on the table. DT NN VBD NNS IN DT NN. 4

5 Goal of Part-of-Speech Tagging One more example sentence Next you flour the pan. JJ PRP VB DT NN. More examples of ambiguous words: play: NN ( a new play ) VBP ( to play ) bear: NN ( the bear ) VB ( to bear ) 5

6 How difficult is the task? Roughly 10% of the tokens (running words) are ambiguous. the is also an OOV problem! 6

7 Applications of Tagging Partial parsing: syntactic analysis Information Extraction: tagging and partial parsing help identify useful terms and relationships between them. Information Retrieval: noun phrase recognition and query-document matching based on meaningful units rather than individual terms. Question Answering: analyzing a query to understand what type of entity the user is looking for and how it is related to other noun phrases mentioned in the question. 7

8 Brill-Tagger: Transformation based learning (TBL) 8

9 Transformation-Based Tagging (Brill Tagging) Idea: Assign each word the most likely tag Learn rules how to correct errors Combination of rule-based and machinelearning approach Example "The bear Most likely tags: DT VB Transformation rule VB NN if previous tag is DT Corrected tags: DT NN 9

10 Rule Learning Problem: Could apply transformations for ever Constrain the set of transformations with templates : Replace tag X with tag Y, provided tag Z or word Z appears in some position Rules are learned in ordered sequence Rules may interact Rules are compact and can be inspected by humans 10

11 Brill Tagger Types of rules Tag triggered Word triggers Morphology triggered (unknown words!) Tagging-Algorithms Assign default tag For each rule For all positions in text If rule is applicable: change tag accordingly 11

12 Most likely tags thanksgiving NN Thanks NNS UH thanks NNS VBZ VB UH thank VB VBP the DT VBD VBP NN DT IN JJ NN NNP PDT See LEXICON 12

13 TBL: Rule Learning 2 parts to a rule Triggering environment Rewrite rule The range of triggering environments of templates (from Manning & Schutze 1999:363) Schema t i-3 t i-2 t i-1 t i t i+1 t i+2 t i+3 1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 13

14 Templates for TBL See CONTEXTUALRULEFILE 14

15 Using Morphological Information 15

16 Training of the Brill Tagger C 0 := corpus tagged with most likely tag k:=0 Do { V:= the transformation u i that minimizes E(u i (C k )) If ( E(C k ) - E(v(C k )) ) < then break C k+1 = v(c k ) k+1 = v k++ } print 1, 2, k 16

17 Accuracy vs. Transformation Number Few transformation rules give the major contributions Overall a small number of transformation rules sufice 17

18 TBL: Problems First 100 rules achieve 96.8% accuracy First 200 rules achieve 97.0% accuracy Execution Speed: TBL tagger is slower than HMM approach Learning Speed: Brill s implementation over a day (600k tokens) BUT (1) Learns small number of simple, nonstochastic rules (2) Can be made to work faster with FST (3) Best performing algorithm on unknown words 18

19 Download the Tagger u/~brill 19

20 Hidden Markov Model (HMM) based Taggers 20

21 Part-Of-Speech Tagging Sentence: Next you flour the pan. Tags: JJ PRP VBP DT NN. Intuitive idea of HMM: use statistics of co-occurences 21

22 Frequency of Determiners (Penn Treebank) THE A AN 3529 THIS 2433 SOME 1673 THAT 1475 ALL 977 ANY 831 NO 749 THESE 612 THOSE 604 ANOTHER 467 BOTH 462 EACH 441 EVERY 202 EITHER 51 NEITHER 40 22

23 Most frequent continuations of DT DT: overall occurrences DT NN DT JJ DT NNP 9529 DT NNS 6269 DT CD 2787 DT NN-POS 2174 DT RB 878 DT IN 843 DT JJS

24 Most frequent continuations of DT NN DT NN IN DT NN NN 4799 DT NN, 4067 DT NN DT NN VBD 2559 DT NN VBZ 2462 DT NN TO 1612 DT NN RB 1138 DT NN CC

25 Most frequent continuations of DT NN IN DT NN IN DT 3698 DT NN IN NNP 1601 DT NN IN NN 1557 DT NN IN JJ 1306 DT NN IN NNS 933 DT NN IN CD 714 DT NN IN PRP 480 Reliable statistics available How to use it? 25

26 Remember Bayes classifier from WSD Can the POS-tagging problem be cast as a Bayes classifier? 26

27 HMMs as a Bayes Classifier Consider the complete sequence of tags as the class to be assigned 27

28 Rewrite HMMs as a Bayes Classifier How many classes are there? 28

29 Estimation of Emission probabilities Assume Each word only depends on its tag Words are statistically independent 29

30 Estimate Transition Probabilities Use definition of conditional probabilities to rewrite Too many parameters to be estimated! 30

31 Simplifying assumptions Markov property: only immediate predecessors matter Bigram: Trigram: 31

32 Bigram Tagger 32

33 Trigram Tagger 33

34 Estimate Probabilities Maximum likelihood estimate would give: C(): count on training corpus In case of unseen events: use your favorite smoothing technique (see chapter 4) 34

35 Handling of Unknown Words Guess the POS: plunking resuciation verb (VBG) ( to plunk?) noun (NN) 35

36 Statistical properties of unknown words Feature Value NNP NN NNS VBG VBZ Unk. word yes no Captialized yes no ending -s ing tion other 36

37 Estimate Emission Probabilities use Decomposition About 80% of the unknown words can be tagged correctly using that model 37

38 Finding the Best Tag Sequence Suppose sentence has N words Tag set has T tags T N possible tag sequences e.g. N=14, T= hypothesis to check (10 6 hypothesis per second CPU years; about the age of Earth) 38

39 Finding the best path: Viterbi Algorithm (Bigram) See wikipedia or other lectures 39

40 Summary Assign grammatical labels to words Two well established approaches Brill tagger Hidden Markov model 40

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz