The Entropy of Recursive Markov Processes BENNY BRODDA

Research Group for Quatitative Liguistic s KVAL PM 339 Jue 191 1967 Fack Stockholm 40 SWEDEN The Etropy of Recursive Markov Processes By BENNY BRODDA The work reported i this paper has bee sposored by Humaistiska forskigsr~det, Tekiska forskigsr~det ad Riksbakes Jubileumsfod, Stockholm, Swede. '.

THE~ENTROPY OF RECURSIVE MARKOV PROCESSES By BENNY BRODDA KVAL, Fack, Stockholm 40, Swede Summary The aim of this commuicatio is to obtai a explicit formula for calculatig the etropy of a source which behaves i accordace with the rules of a arbitrary Phrase Structure Grammar, i which relative probabilities are attached to the rules i the grammar. With this aim i mid we itroduce a alte~rative defiitio of the cocept of a PSG as a set of self-embedded (re- Cursive) Fiite State Grammars; whe the probabilities are take ito accout i such a grammar we call it a Recursive Markov Process. 1. I the first sectio we give a more detailed defiitio of what kid of Markov Processes we are goig to geeralize later o (i sec. 3), ad we also outlie the cocept of etropy i a ordiary Markov source. More details "of iformatio may be foupd~ e.g., i Khichis "Mathematical Foudatios of Iformatio Theory", N.Y. ~ 1957~ or "Iformatio Theory" by R. Ash, N. Y., 1965. A Markov Grammar tie s : is defied as a Markov Source with the followig proper- Assume that there are + 1 states, say S O, S1,..., S, i the source. S O is defied as the iitial state ad S is defied as the fial state ad the other states are called itermediate states. We shall, of course, also have a trasi- tio matrix, M = (Pij), cotaiig the, trasitio probabilities of the source. a) A trasitio from state S i to state S k is always accompaied by a produc- tio of a (o-zero) letter aik from a give fiite alphabet. Trasitio to differet states from oe give state alway s produce differet letters. b) From the" iitial state, S0~ direct or idirect trasitios should be possible to ay other state i the source. From o state is a trasitio to S O allowed. c) From ay state, direct or idirect trasitios to the fial state S should be possible. From S o trasitio is allowed to ay other state (S is a "absorbig state"). The work reported i this paper has bee sposored by Humaistiska forskigsr~det, Tekiska forskigsr~det ad Riksbakes Jubileumsfod, Stockholm, Swederi.

A (grammatical) sete'ce should ow be defied as the (left-to-right) cocateatio of the letters produced by the source, whe passig from the iitial state to the fial state. The legth of a setece is defied as the umber of letters i the setece. To simplify matters without droppig much of geerality we also require that d) The greatest commo divisor for all the possible legths of seteces is = l (i.e., the source becomes a aperiodic source, if it is short-circuited by idetifyig the fial ad iitial states). ~- With the properties a - d above, the source obtaied by idetifyig the fial ad iitial states is a idecomposable, ergodic Markov process (cf. Feller, "Probability Theory ad Its Applicatios", ch. 15, N. Y. s 1950). I the trasitio matrix M for a Markov grammar of our type all elemets i the first colum are zero, ad i the last row all elemets are zero except the last oe which is = 1. For a give Markov grammar we defie the ucertaity or etropy, Hi, for each state S i, i = 0, 1...,, as: Hi=~l Pij l g Pij; i= 1, Z.... j=o We also defie the etropy, H or H(M), for the grammar as = 1 (1). = x.h. 1 1 i= 0 where x = (x0, x z,..., X_l) is defied as the statioary distributio-~ the source obtaied whe S O ad S are idetified; thus x is defied as the (uique) solutio to the set of simultaeous equatios (z) xm 1 = x x0 + X l + ''" + X-1 = 1 where M 1 is formed by shiftig the last ad first colums ad the omittig the last row ad colum. The mea setece legth. ~, of the set of grammatical seteces ca ow be easily calculated as

(3) = 1/x 0 2. Embedded Grammars (cf. Feller, op. tit.) We ow assume that we have two Markov grammars, M ad M1, with states S O, S 1..., S, ad T o, T I,..., T m, respectively, where S O ads, T O ad T m are the correspodig iitial ad fial states. Now cosider two states S i ad S k i the grammar M; assume that the correspodig trasitio probability is = Pik" We ow trasform the grammar, M1, ito a ew oe, M], by embeddig the grammar M 2 i M 1 betwee the states S i ad Sk, a operatio which is performed by idetifyig the states T O ad T with the m states S i ad S k respectively. Or, to be more precise, assume that i the grammar M 1 the trasitios to the states Tj, j~l, has the probabilities q0j" The, i the grammar M', trasitios to a state T. from the state S. will 3 1 take place with the probability =.Pikq0 j. A retur to the state S k i the "mai" grammar from a itermediate state Tj i M 1 takes place with the probability qjm" With the coditios above fulfilled, we propose that the etropy for the. com- posed grammar be calculated accordig to the formula: (4) H(M') = H(M) + xipik " ~I " H(M ) 1 + xipik (~1-1) where H(M) is the etropy of the grammar M whe there is a ordiary co- ectio (with probability Pik) betwee the states S i ad Sk, ad where x. is 1 the iheret probability of beig i the state S. uder the same coditios. 1 ~1 is the mea setece legth of the seteces produced by the grammar M 1 aloe. (It is quite atural that this umber appears as a weight i the formula, sice if oe is producig a setece accordig to the grammar M ad arrives at the state S i ad from there "dives" ito the grammar M1, the ~1 is the expected waitig time for emergig agai i the mai grammar M.) The factor xipik may be iterpreted as the combied probability of ever arrivig at.s i ad there choosig the path over to M 1 (you may, of course, choose quite aother path from Si).

also say that Aik grammar/.) stads as a abbreviatio for a arbitrary setece of that We associate each grammar M! with the grammar M., j = 0, 12..., N, by 3 3 just cosiderig it as a o-recursive oe, thaf is, we cosider all the sym- bols Aik as termial symbols (eve if they are:'ot). The grammars thus ob- taied are ordiarily Markov grammars accordig to our defiitio, ad the etropies Hj = H(Mj) are easily computed accordig to formula (1), as are the statioary distributios /formula (2)/. The follwoig theorem shows how the etropies H! for the fully recursive grammars M! are coected with the J 3 umbers H.. J Theorem The etropy H! for a set of recursive Markov grammar Mj, j = 0, 1, J ca be calculated accordig to the formula..., N, (6) k j=0, 1...,N. k Here the factors Yjk are depedet oly of the probability matrix of the grammar ad the umbers ~k defied as the mea setece legth of the seteces of the grammar M~, k = 0, 1,... N, ad computable accordig to lemma below. H~ is the etropy for the grammar. The theorem above is a direct applicatio for the grammar of formula (4), sec. 2. The coefficiets Yjk i formula (6) ca, more precisely, be calculated as a sum of terms of the type xipim with the idices (i, m) are where the gram- mar M~ appears i the grammar M3~!" x i ad Pim are the compoets the sta- tioary distributio ad probability matrix for the grammar M.o, J

Assume ow that we have a Markov grammar of our type, but for which each trasitio will take a certai amout of time. A very atural questio is the: "What is the expected time to produce a setece i that laguage?" The aswer is i the followig lemma. Lemma Let M be a_mmarkov grammar with states Si, i= O, S are the iitial ad fial states respectively, 1...,, where S O ad Assume that each trasitio S i -. S k will take Ylk time uits. Deote the expected time for arrival at S give that the grammar is i state S i by ti, i = 0, I,...~ ~ (thus t o is the expected time for producig ase- tece). The times t I will the fulfill the followig set of simultaeously liear equatios : (7) ti = ~ Pik (tik + tk) k Formula (7) is itself a proof of the lemra. With more coveiet otatios we ca write (7) as (E - P) t = Pt where E is the uit matrix, P is the probability matrix (with P = 0) ad Pt is the vector with compoets Pi (t) =~ Pim tim' i = 0, 1...,. m The applicatio of ~he lemma for computig the umbers ~k i formula (6) is ow the followig. The trasitio times of the lemma are, of course, the expected time (or "legths" as we have called it earlier) for passig via a sub-grammar of the grammar uder cosideratio. Thus the umber tik i-~]itself the ukow etitle s ~k" 6

For each of the sub-grammars M~, j : 0, I,..., N, we geta set of liear J equatios of type (7) for determiig the vectors t of 1emma. The first com- poet of this vector, i.e.j the umber t O, is the equal to the expected legth, ~, of the seteces of that g~ammar. (Ufortuately, we have to compute extra the expected time for goig from ay state of the sub-gram- mars to the correspodig fial state.) The total umber of ukows ivolved whe computig the etropy of our grammar (i. e., the etropy H~) is equal to (the total umber of states i all our sub-grammars) plus (the umber of sub-grammars). This is also the umber of equatior~,_for we have + 1 e~uatios from formula (6) ad the ( + 1) sets of equatios of the type (7). We assert that all these simultaeous equatios a~e solvable, if the grammar fulfills the coditios we earlier stated for the grammar, i.e., that from 'each state i ay subgrammar exists at least oe path to the fial state of that grammar.