Error-driven HMM-based Chunk Tagger with Context-dependent Lexicon

Error-drive HMM-based Chuk Tagger with Cotext-depedet Lexico GuoDog ZHOU Ket Ridge Digital Labs 21 Heg Hui Keg Terrace Sigapore 119613 zhougd@krdl.org.sg Jia SU Ket Ridge Digital Labs 21 Heg Hui Keg Terrace Sigapore 119613 sujia@krdl.org.sg Abstract This paper proposes a ew error-drive HMMbased text chuk tagger with cotext-depedet lexico. Compared with stadard HMM-based tagger, this tagger uses a ew Hidde Markov Modellig approach which icorporates more cotextual iformatio ito a lexical etry. Moreover, a error-drive learig approach is adopted to decrease the memory requiremet by keepig oly positive lexical etries ad makes it possible to further icorporate more cotextdepedet lexical etries. Experimets show that this techique achieves overall precisio ad recall rates of 93.40% ad 93.95% for all chuk types, 93.60% ad 94.64% for ou phrases, ad 94.64% ad 94.75% for verb phrases whe traied o PENN WSJ TreeBak sectio 00-19 ad tested o sectio 20-24, while 25-fold validatio experimets of PENN WSJ TreeBak show overall precisio ad recall rates of 96.40% ad 96.47% for all chuk types, 96.49% ad 96.99% for ou phrases, ad 97.13% ad 97.36% for verb phrases. Itroductio Text chukig is to divide seteces ito ooverlappig segmets o the basis of fairly superficial aalysis. Abey(1991) proposed this as a useful ad relatively tractable precursor to full parsig, sice it provides a foudatio for further levels of aalysis, while still allowig more complex attachmet decisios to be postpoed to a later phase. Text chukig typically relies o fairly simple ad efficiet processig algorithms. Recetly, may researchers have looked at text chukig i two differet ways: Some researchers have applied rule-based methods, combiig lexical data with fiite state or other rule costraits, while others have worked o iducig statistical models either directly from the words ad/or from automatically assiged part-of-speech classes. O the statistics-based approaches, Skut ad Brats(1998) proposed a HMM-based approach to recogise the sytactic structures of limited legth. Buchholz, Veestra ad Daelemas(1999), ad Veestra(1999) explored memory-based learig method to fred labelled chuks. Rataparkhi(1998) used maximum etropy to recogise arbitrary chuk as part of a taggig task. O the rule-based approaches, Bourigaut(1992) used some heuristics ad a grammar to extract "termiology ou phrases" from Frech text. Voutilaie(1993) used similar method to detect Eglish ou phrases. Kupiec(1993) applied. fiite state trasducer i his ou phrases recogiser for both Eglish ad Frech. Ramshaw ad Marcus(1995) used trasformatio-based learig, a error-drive learig techique itroduced by Eric B11(1993), to locate chuks i the tagged corpus. Grefestette(1996) applied fiite state trasducers to fred ou phrases ad verb phrases. I this paper, we will focus o statisticsbased methods. The structure of this paper is as follows: I sectio 1, we will briefly describe the ew error-drive HMM-based chuk tagger with cotext-depedet lexico i priciple. I sectio 2, a baselie system which oly icludes the curret part-of-speech i the lexico is give. I sectio 3, several exteded systems with differet cotext-depedet lexicos are described. I sectio 4, a error=drive learig method is used to decrease memory requiremet of the lexico by keepig oly positive lexical 71

etries ad make it possible to further improve the accuracy by mergig differet cotextdepedet lexicos ito oe after automatic aalysis of the chukig errors. Fially, the coclusio is give. The data used for all our experimets is extracted from the PENN" WSJ Treebak (Marcus et al. 1993) by the program provided by Sabie Buchholz from Tilbug Uiversity. We use sectios 00-19 as the traiig data ad 20-24 as test data. Therefore, the performace is o large scale task istead of small scale task o CoNLL-2000 with the same evaluatio program. For evaluatio of our results, we use the precisio ad recall measures. Precisio is the percetage of predicted chuks that are actually correct while the recall is the percetage of correct chuks that are actually foud. For coveiet comparisos of oly oe value, we also list the F~= I value(rijsberge 1979): (/32 + 1). precisio, recall, with/3 = 1. /3 2. precisio + recall 1 HMM-based Chuk Tagger The idea of usig statistics for chukig goes back to Church(1988), who used corpus frequecies to determie the boudaries of simple o-recursive ou phrases. Skut ad Brats(1998) modified Church's approach i a way permittig efficiet ad reliable recogitio of structures of limited depth ad ecoded the structure i such a way that it ca be recogised by a Viterbi tagger. This makes the process ru i time liear to the legth of the iput strig. Our approach follows Skut ad Brats' way by employig HMM-based taggig method to model the chukig process. Give a toke sequece G~ = g~g2 ""g,, the goal is to fred a stochastic optimal tag sequece Ti = tlt2...t which maximizes log P(T~" I Of ) : e(:q",g?) log P(Ti [ G? ) = log P(Ti ) + log P(Ti )" P(G? ) The secod item i the above equatio is the mutual iformatio betwee the tag sequece Ti ad the give toke sequece G~. By assumig that the mutual iformatio betwee G~ ad T1 ~ is equal to the summatio of mutual iformatio betwee G~ ad the idividual tag ti(l_<i_< ) : log P(TI"' G?) = ~ log P(t,, G~) e(tl ). P(G~) i=1 P(t,). P(G? ) or MI(T~ ~, G~ ) = ~ MI(t,, G? ), we have: log P(T~ I G~) i=l = log P(T1 ) + ~, log P(ti' G? )_ P(t i). P(G?) rl = log P(T1 ~ ) - Z log P(t, ) + ~ log P(t, [ G? ) i=1 i=1 The first item of above equatio ca be solved by usig chai rules. Normally, each tag is assumed to be probabilistic depedet o the N-1 previous tags. Here, backoff bigram(n=2) model is used. The secod item is the summatio of log probabilities of all the tags. Both the first item ad secod item correspod to the laguage model compoet of the tagger while the third item correspods to the lexico compoet of the tagger. Ideally the third item ca be estimated by usig the forward-backward algorithm(rabier 1989) recursively for the first-order(rabier 1989) or secod-order HMMs(Watso ad Chuk 1992). However, several approximatios o it will be attempted later i this paper istead. The stochastic optimal tag sequece ca be foud by maxmizig the above equatio over all the possible tag sequeces. This is implemeted by the Viterbi algorithm. The mai differece betwee our tagger ad other stadard taggers lies i our tagger has a cotext-depedet lexico while others use a cotext-idepedet lexico. For chuk tagger, we haveg 1 = piwi where W~ = w~w2---w is the word-sequece ad P~ = PiP2 "" P~ is the part-of-speech 72

sequece. Here, we use structural tags to represetig chukig(bracketig ad labellig) structure. The basic idea of represetig the structural tags is similar to Skut ad Brats(1998) ad the structural tag cosists of three parts: 1) Structural relatio. The basic idea is simple: structures of limited depth are ecoded usig a fiite umber of flags. Give a sequece of iput tokes(here, the word ad part-of-speech pairs), we cosider the structural relatio betwee the previous iput toke ad the curret oe. For the recogitio of chuks, it is sufficiet to distiguish the followig four differet structural relatios which uiquely idetify the sub-structures of depth l(skut ad Brats used seve differet structural relatios to idetify the sub-structures of depth 2). 00 the curret iput toke ad the previous oe have the same paret 90 oe acestor of the curret iput toke ad the previous iput toke have the same paret 09 the curret iput toke ad oe acestor of the previous iput toke have the same paret 99 oe acestor of the curret iput toke ad oe acestor of the previous iput toke have the same paret For example, i the followig chuk tagged setece(null represets the begiig ad ed of the setece): NULL [NP He/PRP] [VP reckos/vbz] [ NP the/dt curret/jj accout/nn deficit/nn] [VP will/md arrow/vb] [PP to/to] [NP oly/rb #/# 1.8/CD billio/cd] [PP i/in] [NP September/NNP] [O./.] NULL the correspodig structural relatios betwee two adjacet iput tokes are: 90(NULL He/PRP) 99(He/PRP reckos/vbz) 99(reckos/VBZ the/dt) 00(the/DT curret/jj) 00(curret/JJ accout/nn) 00(accout/NN deficit/nn) 99(deficit/NN will/md) 00(will/MD arrow/vb) 99(arrow/VB to/to) 99(to/TO oly/rb) O0(oly/RB #/#) 00(#/# 1.8/CD) 00(1.8/CD billio/cd) 99(billio/CD i/in) 99(i/IN september/nnp) 99(september/NNP./.) 09(./. NULL) Compared with the B-Chuk ad I-Chuk used i Ramshaw ad Marcus(1995), structural relatios 99 ad 90 correspod to B-Chuk which represets the first word of the chuk, ad structural relatios 00 ad 09 correspod to I-Chuk which represts each other i the chuk while 90 also meas the begiig of the setece ad 09 meas the ed of the setece. 2)Phrase category. This is used to idetify the phrase categories of iput tokes. 3)Part-of-speech. Because of the limited umber of structural relatios ad phrase categories, the part-of-speech is added ito the structural tag to represet more accurate models. For the above chuk tagged setece, the structural tags for all the correspodig iput tokes are: 90 PRt~NP(He/PRP) 99_VB Z_VP(reckos/VBZ) 99 DT NP(the/DT) O0 JJ NP(curretJJJ) 00_N/'~NP(accout/NN) 00 N1NNP(deficiffNN) 99_MDSVP(will/MD) 00 VB_VP(arrow/VB) 99_TO PP(to/TO) 99_RB~,IP(oly/RB) oo_# NP(#/#) 00 CD_NP(1.8/CD) 0(~CD~qP(billio/CD) 99_IN PP(i/IN) 99~lNP~,lP(september/NNP) 99_._0(./.) 2 The Baselie System As the baselie system, we assume P(t i I G?)= P(t i I pi ). That is to say, oly the curret part-of-speech is used as a lexical etry to determie the curret structural chuk tag. Here, we defie: is the list of lexical etries i the chukig lexico, 73

[ @ [ is the umber of lexical etries(the size of the chukig lexico) C is the traiig data. For the baselie system, we have : @={pi,p~3c}, where Pi is a part-ofspeech existig i the tra]lig data C ]@ [=48 (the umber of part-of-speech tags i the traiig data). Table 1 gives a overview of the results of the chukig experimets. For coveiece, precisio, recall ad F#_ 1 values are give seperately for the chuk types NP, VP, ADJP, ADVP ad PP. Type Precisio Recall Fa ~ Overall 87.01 89.68 88.32 NP 90.02 90.50 90.26 VP 89.86 93.14 91.47 ADJP 70.94 63.84 67.20 ADVP 57.98 80.33 I 67.35 PP 85.95 96.62 90.97 Table 1 : Results of chukig experimets with the lexical etry list : ~ = { pi, p~3c} 3 Cotext-depedet Lexicos I the last sectio, we oly use curret part-ofspeech as a lexical etry. I this sectio, we will attempt to add more cotextual iformatio to approximate P(t i/g~). This ca be doe by addig lexical etries with more cotextual iformatio ito the lexico ~. I the followig, we will discuss five cotextdepedet lexicos which cosider differet cotextual iformatio. 3.1 Cotext of curret part-of-speech ad curret word Here, we assume: e(t i I G~) = I P(ti I p~wi) [ P(tl I Pi) where piwi ~ dp PiWi ~ dp ~={piwi,piwi3c}+{pi,pi3c } ad piwi is a part-of-speech ad word pair existig i the traiig data C. I this case, the curret part-of-speech ad word pair is also used as a lexical etry to determie the curret structural chuk tag ad we have a total of about 49563 lexical etries([ ]=49563). Actually, the lexico used here ca be regarded as cotext-idepedet. The reaso we discuss it i this sectio is to distiguish it from the cotext-idepedet lexico used i the baselie system. Table 2 give a overview of the results of the chukig experimets o the test data. Type [Precisio Overall 90.32 NP 90.75 VP 90.88 ADJP 76.01 ADVP 72.67 PP 94.96 Recall Fa~.l 92.18 9i.24 92.14 91.44 92.78 91.82 70.00 72.88 88.33 79.74 96.48 95.71 Table 2 : Results of chukig experimets the lexical etry = {piwi, Piwi3C} "1"{Pi" Pi 3C} with list : Table 2 shows that icorporatio of curret word iformatio improves the overall F~=~ value by 2.9%(especially for the ADJP, ADVP ad PP chuks), compared with Table 1 of the baselie system which oly uses curret part-ofspeech iformatio. This result suggests that curret word iformatio plays a very importat role i determiig the curret chuk tag. 3.2 Cotext of previous part-of-speech ad curret part-of-speech Here, we assume : P(t i / G~) I P(ti / pi-lpi ) Pi-lPi E = [ P(ti I Pi) Pi-! Pi ~ ~ where = {Pi-l Pi, P~-1Pi 3C} + { Pi, pi3c} ad Pi-lPi is a pair of previous part-of-speech ad curret part-of-speech existig i the traiig data C. I this case, the previous part-of-speech ad curret part-of-speech pair is also used as a lexical etry to determie the curret structural chuk tag ad we have a total of about 1411 lexical etries(l~]=1411). Table 3 give a overview of the results of the chukig experimets. 74

Type Precisio Overall 88.63 NP 90.77 VP 92.46 Recall F#= I 89.00 88.82 91.18 90.97 92.98 92.72 ADJP 74.93 60.13 66.72 ADVP 71.65 73.21 72.42 PP 87.28 91.80 89.49 Table 3: Results of chukig experimets with the lexical etry list : = {Pi-lPi, Pi-lPi 3C} + {Pi, Pi 3C} Compared with Table 1 of the baselie system, Table 3 shows that additioal cotextual iformatio of previous part-of-speech improves the overall F/~_~ value by 0.5%. Especially, F/3_ ~ value for VP improves by 1.25%, which idicates that previous part-of-speech iformatio has a importat role i determiig the chuk type VP. Table 3 also shows that the recall rate for chuk type ADJP decrease by 3.7%. It idicates that additioal previous partof-speech iformatio makes ADJP chuks easier to merge with eibghbourig chuks. 3.3 Cotext of previous part-of-speech, previous word ad curret part-of-speech Here, we assume : P(t, / G~) IP(ti / pi_lwi_lpi) pi_lwi_lpl ~ dp I [ P(ti [ Pi ) Pi-lWi-I Pi ~ ~ where = { Pi-i wi-l Pi, Pi-l wi-i Pi3 C} + { Pi, Pi 3 C }, where pi_lwi_lp~ is a triple patter existig i the traiig corpus. I this case, the previous part-of-speech, previous word ad curret part-of-speech triple is also used as a lexical etry to determie the curret structural chuk tag ad } 1=136164. Table 4 gives the results of the chukig experimets. Compared with Table 1 of the baselie system, Table 4 shows that additioal 136116 ew lexical etries of format Pi-lw~-lPi improves the overall F#= l value by 3.3%. Compared with Table 3 of the exteded system 2.2 which uses previous part-of-speech ad curret part-of-speech as a lexical etry, Table 4 shows that additioal cotextual iformatio of previous word improves the overall Fa= 1 value by 2.8%. Type Precisio Recall F~=l Overall 91.23 92.03 91.63 NP 92.89 93.85 93.37 VP 94.10 94.23 94.16 ADJP 79.83 69.01 74.03 ADVP 76.91 80.53 78.68 PP 90.41 94.77 92.53 Table 4 : Results of chukig experimets with the lexical etry list : ={p,_lw~_~ p,, p,_~ w,_ip,3c } + {Pi, p~3c} 3.4 Cotext of previous part-of-speech, curret part-of-speech ad curret word Here, we assume : P(t i I G~ ) IP(tt I Pi-i PiWi) Pi-I piwi E dp [ P(ti / Pi ) Pi-I Pi Wi ~ 1I) where = {Pi-lPiWi, Pi-lP~W~ 3C} + {Pi, Pi3C}, where pi_lpiw~ is a triple patter existig i the traiig ad ] [=131416. Table 5 gives the results of the chukig experimets. Type Precisio Recall F/3= 1 Overall 92.67 93.43 93.05 NP 93.35 94.10 93.73 VP 93.05 94.30 93.67 ADJP 80.65 72.27 76.23 ADVP 78.92 84.48 81.60 PP 95.30 96.67 95.98 Table 5: Results of chukig experimets with the lexical etry list : ={Pi-lPiWi, P,-iP, w,3c} + {pi, Pi 3C} Compared with Table 2 of the exteded system which uses curret part-of-speech ad curret word as a lexical etry, Table 5 shows that additioal cotextual iformatio of previous part-of-speech improves the overall Fa= 1 value by 1.8%. 3.5 Cotext of previous part-of-speech, previous word, curret part-of-speech ad curret word Here, the cotext of previous part-of-speech, curret part-of-speech ad curret word is used as a lexical etry to determie the curret 75

structural chuk tag ad qb = {Pi-l wi-lpiwi, Pi-lwi-~piwi 36'} + {Pi, Pi3C}, where p~_lwi_~p~w~ is a patter existig i the traiig corpus. Due to memory limitatio, oly lexical etries which occurs :more tha 1 times are kept. Out of 364365 possible lexical etries existig i the traiig data, 98489 are kept( 1~ 1=98489). = I P(ti/Pi-]wi-,PiWli) [ P(t, lp,) pi_lwi_lpiwi ~ Table 6 gives the results of the chukig experimets. Type Overall NP VP ADJP ADVP PP Precisio 92.28 93.50 92.62 81.39 75.09 94.12 Recall 93.04 93.53 94.07 72.17 86.23 97.12 F~=l 92.66 93.52 93.35 76.50 80.27 95.59 Table 6: Results of chukig experimets with the lexical etry list : = {Pi-l wi-]piwi, Pi-lwi-lpiwi3C} + {Pi, p~3c} Compared with Table 2 of the exteded system which uses curret part-of-speech ad curret word as a lexical etry, Table 6 shows that additioal cotextual iformatio of previous part-of-speech improves the overall Ft3=l value by 1.8%. 3.6 Coclusio Above experimets shows that addig more cotextual iformatio ito lexico sigificatly improves the chukig accuracy. However, this improvemet is gaied at the expese of a very large lexico ad we fred it difficult to merge all the above cotext-depedet lexicos i a sigle lexico to further improve the chukig accurracy because of memory limitatio. I order to reduce the size of lexico effectively, a error-drive learig approach is adopted to examie the effectiveess of lexical etries ad make it possible to further improve the chukig accuracy by mergig all the above cotext-depedet lexicos i a sigle lexico. This will be discussed i the ext sectio. 4 Error-drive Learig I sectio 2, we implemet a basefie system which oly cosiders curret part-of-speech as a lexical etry to dete, ufie the curret chuk tag while i sectio 3, we implemet several exteded systems which take more cotextual iformatio ito cosideratio. Here, we will examie the effectiveess of lexical etries to reduce the size of lexico ad make it possible to further improve the chukig accuracy by mergig several cotextdepedet lexicos i a sigle lexico. For a ew lexical etry e i, the effectiveess F~ (e i) is measured by the reductio i error which results from addig the lexical etry to -- ~ Error the lexico : F~ (e i ) = F: rr r (e i ) - o+ao (e,). Here, F,~ r~ r (el) is the chukig error umber of the lexical etry e i for the old lexico Error / x ad r~,+~ te i) is the chukig error umber of the lexical etry e i for the ew lexico + AO where e~ e A~ (A~ is the list of ew lexical etries added to the old lexico ~ ). If F o (e i ) > 0, we defie the lexical etry ei as positive for lexico ~. Otherwise, the lexical etry e i is egative for lexico ~. Tables 7 ad 8 give a overview of the effectiveess distributios for differet lexicos applied i the exteded systems, compared with the lexico appfied i the baselie system, o the test data ad the traiig data, respectively. Tables 7 ad 8 show that oly a miority of lexical etries are positive. This idicates that discardig o-positive lexical etries will largely decrease the lexico memory requiremet while keepig the chukig accurracy. Cotext Positive 1800 209 Negative 314 136 Total 49515 1363 2876 229 136116 2895 193 131368 4083 I 155 98441 Table 7 : The effectiveess of lexical etries o the test data... 76

Cotext Positive i Negative Total vos,w, 6724l 719 49515 eos,_,pos, 357 196 1363 POS,.~w,.,eos,, 13205 582 136116 POS,_,eos,w, 14186 325 131368 POS,.,w,_leos,,w, 15516 144 98441 Table 8 : The effectiveess of lexical etries o the traiig data Tables 9-13 give the performaces of the five error-drive systems which discard all the o-positive lexical erties o the traiig data. Here, ~' is the lexico used i the baselie system, dp'={pi,pi3c } ad A~=~-~'. It is foud that Ffl_~ values of error drive systems for cotext of curret part-of-speech ad word pak ad for cotext of previous partof-speech ad curret part-of-speech icrease by 1.2% ad 0.6%. Although F~= 1 values for other three cases slightly decrease by 0.02%, 0.02% ad 0.19%, the sizes of lexicos have bee greatly reduced by 85% to 97%. Type Precisio Recall F#=l Overall 91.69 93.28 92.48 NP 92.64 93.48 93.06 VP 92.16 93.66 92.90 ADJP 78.39 71.69 74.89 ADVP 73.66 87.80 80.11 PP 95.18 97.38 96.27 Table 9 : Results of chukig experimets with error-drive lexico : dp= { p~w~, p,w,3c & F~,. (p~w i ) > O} + { p~, p~3c} Type Precisio Recall F~=l Overall 88.68 90.28 89.47 NP 90.61 91.57 91.08 VP 91.80 94.08 92.90 ADJP 72.20 62.72 67.13 ADVP 70.53 78.90 74.48 PP 86.55 96.34 91.19 Table 10: Results of chukig experimets with error-drive lexico : = { P,-~ Pi, Pi-1 Pi ~C & F~. (p,_~ p, ) > 0} + { Pi, Pi 3C} Type i Precisio Recall Fa=l Overall 91.02 92.21 91.61 NP 92.36 93.69 93.02 VP 93.68 94.94 94.30 ADJP 78.28 71.46 74.71 ADVP 76.77 81.79 79.20 PP 90.67 95.37 92.96 Table 11: Results of chukig experimets with error-drive lexico : = { pi_l Wi_lPi, pi_l wi_lpi3c & V~,(Pi_l Wi_iPi ) > O} +{pi,pi~c} Type Overall Precisio 92.84 Recall 93.21 Ffl=l 93.03 NP VP ADJP 93.35 93.97 79.49 93.65 94.67 72.94 93.50 94.32 76.07 ADVP 79.47 85.91 82.57 PP 95.19 96.29 95.74 Table 12: Results of chukig experimets with error-drive lexico : = { Pi-I P~W~, p~_~ Piw,3C & F.. (pi_~ p,w i ) > 0} +{pi,pi3c} Type Precisio Recall F~_ 1 Overall 91.99 92.95 92.47 NP 93.35 93.39 93.37 VP 92.89 94.36 93.62 ADJP 80.01 71.70 75.63 ADVP 73.40 87.32 79.76 PP 93.42 97.33 95.33 Table 13: Results of chukig experimets with error-drive lexico : = {Pi-l Wi-lPiWi' Pi-lWi-lpiWi3C+{pi ' Pi3C} & F.(p~_~w~_~p~w~) > O} After discussig the five cotext-depedet lexicos separately, ow we explore the mergig of cotext-depedet lexicos by assumig : CI~.~{ Pi-lWi-I PiWi, Pi-lWi-I PiwigC & Fa,. (pi-lwi-t piwi ) > 0} + { Pi-I PiW~, Pi-l piwi ~C & Fa" (Pi-l piwi ) > O} + { Pi-lWi-I Pi" Pi-lWi-1Pi 3C & F~. (pi_lwi_l Pi ) > 0} + { Pi-1 Pi, Pi-I Pii ~C & F~, (Pi-l Pi )> O} + { piw~, Piw~3C & F~,. (PiWi) > 0} + { Pi, p~3c} 77

ad P(t i /G~) is approximatl~ by the followig order : 1. if Pi_lWi_iPiWi E fi~, P(ti /G~)=P(t i / pi_lwi_lpiwi) 2. if p~_lp~wi E cb, P(ti /G~)=P(t i /pi_lwi_lpiwi) 3. if Pi-twi-lPi E ~, P(t i/g~) = P(t i / pi_l wi_l: pi ) 4. if PiWi E ~, P(t i / G~ ) = P(t i / piwi ) 5. if Pi-I Pi E ~, P(t i / G~ ) = P(t i / Pi-1Pi) 6. P(t ilg:)=p(t ilpi_lpi) Table 14 gives a overview of the chukig experimets usig the above assumptio. It shows that the F:=i value for the merged cotext-depedet lexico ireases to 93.68%. For a compariso, the F/~=i value is 93.30% whe all the possible lexical etries are icluded i ~ (Due to memory limitatio, oly the top 150000 mostly occurred lexical etries are icluded). Type Precisio Recall F#=i Overall 93.40 93.95 93.68 NP 93.60 94.64 94.12 VP 94.64 94.75 94.70 ADJP 77.12 74.55 75.81 ADVP 82.39 83.80 83.09 PP 96.61 96.63 96.62 Table 14: Results of chukig experimets with the merged cotext-depedet lexico For the relatioship betwee the traiig corpus size ad error drive learig performace, Table 15 shows that the performace of error-drive learig improves stably whe the traiig corpus size icreases. Traiig Sectios I ~ I Accuracy i FB 1 0-1 0-3 0-5 0-7 0-9 0-11 0-13 0-i5 0-17 0-19 14384 94.78% 91.95 24507 95.19% i 92.51 32316 95.28%1 92.77 38286 95.41% 93.00 39876 95.53% i 93.12 43372 95.65% 93.31 46029 95.62% 93.29 47901 95.66% 93.34 48813 95.74% i 93.41 49988 95.92% 93.68 Table 15: The performace of error-drive learig with differet traiig corpus size For compariso with other chuk taggers, we also evaluate our chuk tagger with the merged cotext-depedet lexico by crossvalidatio o all 25 partitios of the PENN WSJ TreeBak. Table 16 gives a overview of such chukig experimets. Type Precisio Recall Fa=l Overall 96.40 96.47 96.44 NP 96.49 96.99 96.74 VP 97.13 97.36 97.25 ADJP 89.92 88.15 89.03 ADVP 91.52 87.57 89.50 PP 97.13 97.36 97.25 Table 16: Results of 25-fold cross-validatio chukig experimets with the merged cotext-depedet lexico Tables 14 ad 16 shows that our ew chuk tagger greatly outperforms other reported chuk taggers o the same traiig data ad test data by 2%-3%.(Buchholz S., Veestra J. ad Daelmas W.(1999), Ramshaw L.A. ad Marcus M.P.(1995), Daelemas W., Buchholz S. ad Veestra J.(1999), ad Veestra J.(1999)). Coclusio This paper proposes a ew error-drive HMMbased chuk tagger with cotext-depedet lexico. Compared with stadard HMM-based tagger, this ew tagger uses a ew Hidde Markov Modellig approach which icorporates more cotextual iformatio ito a lexical etry by assumig MI(Tq,G~)= 2Ml(t,,Gf). i=1 Moreover, a error-drive learig approach is adopted to drease the memeory requiremet ad further improve the accuracy by icludig more cotext-depedet iformatio ito lexico. It is foud that our ew chuk tagger sigificatly outperforms other reported chuk taggers o the same traiig data ad test data. For future work, we will explore the effectivessess of cosiderig eve more cotextual iformatio o approximatio of P(T~"IG ~) by usig the forward-backward algodthm(rabier 1989) while curretly we oly cosider the cotextual iformatio of curret locatio ad previous locatio. 78

Ackowledgemet We wish to thak Sabie Buchholz from Tilbug Uiversity for kidly providig us her program which is also used to extact data for Coll-2000 share task. Refereces Abey S. "Parsig by chuks ". Priciple-Based Parsig edited by Berwick, Abey ad Tey. Kluwer Academic Publishers. Argamo S., Daga I. ad Krymolowski Y. "A memory-based approach to learig shallow atural laguage patters." COL1NG/ACL- 1998. Pp.67-73. Motreal, Caada. 1998. Bod R. "A computatioal model of laguage performace: Data-orieted parsig." COLING-1992. Pp.855-859. Nates, Frace. 1992. Bougault D. "Surface grammatical aalysis for the extractio of termiological ou phrases". COLING-92. Pp.977-981. 1992. Bdll Eric. "A corpus-based approach to laguage learig". Ph.D thesis. Uiv. of Pe. 1993 Buchholz S., Veestra J. ad Daelmas W. "Cascaded grammatical relatio assigmet." Proceedig of EMNLP/VLC-99, at ACL'99. 1999 Cardie C. "A case-based approach to kowledge acquisitio for domai-specific setece aalysis." Proceedig of the 11 'h Natioal Coferece o Artificial Itelligece. Pp.798-803. Melo Park, CA, USA. AAAI Press. 1993. Church K.W. "A stochastic parts program ad ou phrase parser for urestricted Text." Proceedig of Secod Coferece o Applied Natural Laguage Processig. Pp.136-143. Austi, Texas, USA. 1988. Daelemas W., Buchholz S. ad Veestra J. "Memory-based shallow parsig." CoNLL- 1999. Pp.53-60. Berge, Norway. 1999. Daelemas W., Zavrel J., Berck P. ad Gillis S. "MBT: A memory-based part-of-speech tagger geerator." Proceedig of the Fourth Workshop o Large Scale Corpora. Pp. 14-27. ACL SIGDAT. 1996. Grefestette G. "Light parsig as fiite-state filterig". Workshop o Exteded Fiite State Models of Laguage at ECAI'96. Budapest, Hugary. 1996. Kupiec J. " A algorithm for fidig ou phrase correspodeces i biligual corpora". ACL'93. Pp17-22. 1993. Marcus M., Satodi B. ad Marcikiewicz M.A. "Bulidig a large aotated corpus of Eglish: The Pe Treebak". Computatioal Liguistics. 19(2):313-330. 1993. Rabier L. "A tutorial o Hidde Markov Models ad selected applicatios i speech recogitio". IEEE 77(2), pp.257-285. 1989. Ramshaw L.A. ad Marcus M.P. "Trasformatio-based Learig". Proceedig of 3th ACL Workshop o Very Large Corpora at ACL'95. 1995. Rijsberge C.J.va. Iformatio Retrieval. Buttersworth, Lodo. 1979. Skut W. ad Brats T. "Chuk tagger: statistical recogitio of ou phrases." ESSLLI-1998 Workshop o Automated Acquisitio of Sytax ad Parsig. Saarbruucke, Germay. 1998. Veestra J. "Memory-based text chukig". Workshop o machie learig i huma laguage techology at A CAI'99. 1999. Voutilaie A. "Nptool: a detector of Eglish phrases". Proceedig of the Workshop o Very Large Corpora. Pp48-57. ACL' 93. 1993 Watso B. ad Chuk Tsoi A. "Secod order Hidde Markov Models for speech recogitio". Proceedig of 4 ~ Australia Iteratioal Coferece o Speech Sciece ad Techology. Pp. 146-151.1992. 79