Guidelines for tagging of Sanskrit Compounds

Size: px
Start display at page:

Download "Guidelines for tagging of Sanskrit Compounds"

Transcription

1 Guidelines for tagging of Sanskrit Compounds K V Ramkrishnamacharyulu, Amba Kulkarni, Tirumala Kulkarni and Anil Kumar Final draft for circulation among the SHMT consortium members dated 12/03/ Background With the advent of computers, and the advances in the field of NLP, the annotated corpora is gaining importance. Annotated corpora not only serves as an important resource for building Statistical tools for automatic annotation, but also provides useful insights for language teachers, language learners, and researchers working on various aspects of language. Compound formation is very productive in Sanskrit. On an average every fourth word in a Sanskrit text is a compound. It is not practical to store all the compounds and their analysis/meaning. So for Sanskrit, one needs a very good compound identifier and a program to generate the paraphrase of the compound. Pāṇini has provided rules for compound formation and also semantic restrictions for many of the compound formations. But to implement these rules on machine, one requires a knowledge base. A good collection of tagged compounds will be useful in deciding the parameters for development of such a knowledge base. Similarly, a collection of tagged corpus will be useful for getting a frequency distribution of various compounds. Such a tagged corpus will also be useful for developing automatic compound identifiers using suitable machine learning algorithms. 2 Tag set for compounds The Indian Grammatical Tradition has a vast literature on samāsa. Following the literature, Sanskrit compounds are classified into 5 major and 55 The original draft has been modified after a series of workshops on samāsa tagging for Sanskrit. The details of the workshops are available on the SHMT protal viz: 1

2 minor categories. The major categories are अ य भ व त ष कम ध रय ब ह Note here that there is a small deviation from the standard literature. कम ध रय has been added as a major type of compound, instead of sub-class of त ष. The reason is purely from the convenience of the tag-names. We give below the sub-classification along with the associated tags. 2

3 compound type: अ य भ व compound type: कम ध रय अ य-प व पद A1 वश षण-प व पद-कम ध रय K1 अ य-उ रपद A2 वश षण-उ रपद-कम ध रय K2 त भ त A3 वश षण- उभयपद-कम ध रय K3 स प व पद-न रपद A4 उपम न-प व पद-कम ध रय K4 न रपद-अ पद थ स य म A5 उपम न-उ रपद-कम ध रय K5 स प व पद-व य रपद A6 अवध रण प व पद-कम ध रय K6 प र -म -प व पद ष रपद A7 स वन प व पद-कम ध रय K7 म मपदल प -कम ध रय Km compound type: त ष compound type: ब ह थम त ष T1 त य त ष T2 त य थ -ब ह (सम न धकरण ) Bs2 त त य त ष T3 त त य थ -ब ह (सम न धकरण ) Bs3 चत थ त ष T4 चत थ -ब ह (सम न धकरण ) Bs4 प म त ष T5 प थ -ब ह (सम न धकरण ) Bs5 ष त ष T6 ष थ -ब ह (सम न धकरण ) Bs6 स म त ष T7 स थ -ब ह (सम न धकरण ) Bs7 न त ष Tn द व चक-ब ह (सम न धकरण ) Bsd सम ह र- ग Tds हरण वषयक-ब ह (सम न धकरण ) Bsp त त थ ग Tdt हण वषयक-ब ह (सम न धकरण ) Bsg उ रपद ग Tdu अ थ -म मपदल प (नञ)-ब ह Bsmn ग तसम स Tg द-ब ह Bvp क सम स Tk स भयपद-ब ह (सम न धकरण ) Bss दसम स Tp उपम नप व पद-ब ह (सम न धकरण ) Bsu मय र सक द Tm धकरण-ब ह Bv त ष ब पद Tb स पद धकरण-ब ह Bvs त ष उपपद U सहप व पद- धकरण-ब ह BvS उपम नप व पद- धकरण-ब ह BvU ब पद-ब ह Bb 3

4 compound type: इतर तरय ग- सम ह र- एकश ष compound type: क वलसम स compound type: Di Ds E क वल S d 3 General Guidelines The tagging of a compound involves the following separating the constituent padas by - s, Undoing the sandhi, changing the samāsa purvapadas to their pratipadikas, assigning a tag. Below we give some examples of samasta padas and their tagging. र जप ष ल य धन व ष म स न <र जन-प ष >T6 <ल -छ य >T6 <धन ष- व >T6 <षट-म स न>Tds Do not split the padas in following cases तङ or क द with upa-padas as in आग, ह र ण etc. If the words have ढ-अथ as in म डप, तर श, उप ह र etc. In case either the प व पद or the उ रपद or both in turn are सम पदs, then they are also to be tagged. E.g. तप य नरतम should be tagged as <<तपस- य>Di- नरतम >T7 Note the use of < and > to indicate the words constituting the compound. If a pada is य - वश षण in a given context, then it should be marked as Bvs only. 4

5 In case there are more than one possible tags, then show both the tags separated by. E.g. व क इव अ वरत च पश न, the word अ वरत is ambiguous between Tn and T7. Hence it should be tagged as: <अ- वरत>Tn- च >Bs6 <अ व-रत>T7- च >Bs6 Handling taddhita constituents: If a constituent of a compund is a taddhita formed from a compound pratipadika, then, the taddhita suffix is to be added after indicating the samāsa tag. E.g. a) ह मम लक त is to be tagged as <<<ह म-म ल >T6 ˆ इ>-क त >T6. Note the use of i suffix, which indicates a taddhita pratyaya. b) एक ण व क त = <एक -अण व>K1 ˆ ई>-क त c) र मक वत = <र म-क >Di ˆ वत d) म हनन मक = <म हन-न म ˆ क>Bs6 In case the vigraha vākya must specify the number, then we suggest you to specify the number information while tagging the compound, as in न प त = <न {3}-प त >T6 धम भर म य = <<धम -अ भर म>Bs6{3}- य >T6 Similarly, if the ल information is also required, it may also be specified as सक पम = <स-क पम {P}>Bs, if क पम refers to क प च य, or = <स-क पम {S}>Bs, if the pratipadika of क पम is क प. 4 Rules for generating vigraha vākya Though a substantial amount of literature on Sanskrit Compounds is available, for the benefit of annotators, we give below the vigraha vākya for each type of the compound, with an example. Further, for the benefit of the programmers, we also give rules to generate a vigraha vākya from the properly tagged compound. Thus each of the example below consists of name of a compound, its major class, its tag, an example, paraphrase describing the meaning of the example compound( व हव ), and the rule to get the paraphrase mechanically (wherever possible) from its components. 5

6 4.1 Major class: अ य भ व 1) compound type: अ य भ व compound sub-type: अ य-प व पद A1 उपक म Example with <उप-क म >A1 paraphrase: क सम पम paraphrase rule: <x-y>a1 => y{6} f{x} where f maps x to a noun with same semantic content. A function f needs to be defined. 2) compound type: अ य भ व compound sub-type: अ य-उ रपद A2 अ प र Example with <अ -प र>A2 paraphrase: अ ण वपर तम व म paraphrase rule: <x-y>a2 => x{3} वपर तम व म 3) compound type: अ य भ व compound sub-type: त भ त A3 त Example with < त द -ग >A2 paraphrase: त ग व य न द श paraphrase rule: List to be given; collect from प ण न s अ य 4) compound type: अ य भ व compound sub-type: स प व पद-न रपद A4 स ग म Example with <स न-ग म >A4 paraphrase: स न म ग न म सम ह र paraphrase rule: <x-y>a4 => x{6} y {6} सम ह र y is the तप दक of y. 6

7 5) compound type: अ य भ व compound sub-type: न रपद-अ पद थ स य म A5 उ ग म Example with <उ -ग म >A5 paraphrase: उ ग य न द श paraphrase rule: <x-y>a5 => x {1} y{1} य न द श y is the तप दक of y. x is derived from x by changing the gender to that of y. the number of x and y will be plural except when x = they are in वचन 6) compound type: अ य भ व compound sub-type: स प व पद-व य रपद A6 म न Example with < -म न>A6 paraphrase: य ण म म न न म सम ह र paraphrase rule: <x-y>a6 => x{6} y{6} सम ह र 7) compound type: अ य भ व compound sub-type: प र -म -प व पद ष रपद A7 प र ग म Example with <प र -ग म >A7 paraphrase: ग य -प र paraphrase rule: <x-y>a7 => y{6} x 4.2 Major class: त ष 8) compound type: त ष compound sub-type: थम त ष T1 अध प ल Example with <अध - प ल >T1 paraphrase: अध म प paraphrase rule: <x-y>t1 => x{1} y{6} 7

8 9) compound type: त ष compound sub-type: त य त ष T2 क त Example with <क - त >T2 paraphrase: क म त paraphrase rule: <x-y>t2 => x{2} y 10) compound type: त ष compound sub-type: त त य त ष T3 श ल ख ड Example with <श ल -ख ड >T3 paraphrase: श लय ख ड paraphrase rule: <x-y>t3 => x{3} y 11) compound type: त ष compound sub-type: चत थ त ष T4 य पद Example with <य प-द >T4 paraphrase: य प य द paraphrase rule: <x-y>t4 => x{4} y 12) compound type: त ष compound sub-type: प म त ष T5 च रभयम Example with <च र-भयम >T5 paraphrase: च र त भयम paraphrase rule: <x-y>t5 => x{5} y 8

9 13) compound type: त ष compound sub-type: ष त ष T6 दशरथप Example with <दशरथ-प >T6 paraphrase: दशरथ प paraphrase rule: <x-y>t6 => x{6} y 14) compound type: त ष compound sub-type: स म त ष T7 अ श ड Example with <अ -श णड >T7 paraphrase: अ ष श ड paraphrase rule: <x-y>t7 => x{7} y 15) compound type: त ष - नञ compound sub-type: न त ष Tn अ ण /अन Example with <न- ण >Tn / <न-अ >Tn paraphrase: न ण / न अ paraphrase rule: <x-y>tn => न y 16) compound type: त ष - ग compound sub-type: सम ह र- ग Tds प गवम Example with <प न-गवम >Tds paraphrase: प न म गव म सम ह र paraphrase rule: <x-y>tds => x{6;ba} y{6;ba} सम ह र 9

10 17) compound type: त ष compound sub-type: त त थ ग Tdt अ कप ल Example with <अ न-कप ल >Tdt paraphrase: अ स कप ल ष स स त paraphrase rule: No Specific Rule 18) compound type: त ष compound sub-type: उ रपद ग Tdu प गवधन Example with <<प न-गव>Tdu-धन >>Bs paraphrase: paraphrase rule: 19) compound type: त ष compound sub-type: ग तसम स Tg स ग Example with <सम-ग >Tg paraphrase: No paraphrase, since it is a न compound paraphrase rule: 20)compound type: compound sub-type: Example with paraphrase: paraphrase rule: त ष क सम स Tk क प ष /क प ष <क -प ष >Tk / <क -प ष >Tk No paraphrase, since it is a ' न ' compound 10

11 21) compound type: त ष compound sub-type: दसम स Tp च य Example with < -आच य >Tp paraphrase: क आच य paraphrase rule: <x-y>tp =<fx y> Meanings of upasargas (fx) need to be listed. 22) compound type: त ष compound sub-type: मय र सक द Tm र ज रम Example with <र जन-अ रम >Tm paraphrase: paraphrase rule: <x-y>tm =>?? गणप ठ is there. So no rule for making व हव is required. The list should be given 23) compound type: त ष compound sub-type: त ष ब पद Tb स नम Example with < -अ -स नम >Tb paraphrase: paraphrase rule: <x-y-z>tb = x{1} y{1} z{1} Here y is the prathama puruṣa ekavacana rūpa of the verb whose kṛdanta 24) compound type: त ष compound sub-type: त ष उपपद U क क र Example with <क -क र >U paraphrase: क म कर त paraphrase rule: <x-y>u => x{2} y 11

12 4.3 Major class: कम ध रय 25) compound type: कम ध रय compound sub-type: वश षण-प व पद-कम ध रय K1 न ल लम Example with <न ल-उ लम >K1 paraphrase: न लम तत उ लम च paraphrase rule: <x-y>k1 => x{1} तत y{1} च 26) compound type: कम ध रय compound sub-type: वश षण-उ रपद-कम ध रय K2 णब ल Example with < ण-ब ल >K2 paraphrase: ण च ब ल च paraphrase rule: <x-y>k2 => x{1} च y{1} च 27) compound type: कम ध रय compound sub-type: वश षण-उभयपद-कम ध रय K3 म श तल Example with <म -श तल >K3 paraphrase: म च अस श तल च paraphrase rule: <x-y>k3 => x{1}च अस y{1} च 28) compound type: कम ध रय compound sub-type: उपम न-प व पद-कम ध रय K4 म घ य म Example with <म घ- य म >K4 paraphrase: म घ इव य म paraphrase rule: <x-y>k4 => x{1} इव y{1} 12

13 29) compound type: कम ध रय compound sub-type: उपम न-उ रपद-कम ध रय K5 प ष Example with <प ष- >K5 paraphrase: प ष इव paraphrase rule: <x-y>k5 => x{1} y{1} इव 30) compound type: कम ध रय compound sub-type: अवध रण -प व पद K6 ग द व Example with <ग -द व >K6 paraphrase: ग एव द व paraphrase rule: <x-y>k6 => x{1} एव y{1} 31) compound type: कम ध रय compound sub-type: स भ वन -प व पद K7 अय नगर Example with <अय -नगर >K7 paraphrase: अय इ त नगर paraphrase rule: <x-y>k7 => x{1} इ त y{1} 32) compound type: कम ध रय compound sub-type: म मपदल प Km श कप थ व Example with <श क-प थ व >Km paraphrase: श क य प थ व paraphrase rule: <x-y>km => x{1} z* y{1} z* is a missing madhyama pada. 13

14 4.4 Major class: ब ह 33) compound type: ब ह compound sub-type: त य थ -ब ह (सम न धकरण ) Bs2 दक Example with < -उदक >Bs2 paraphrase: उदक यम paraphrase rule: <x-y>bs2 => x{1} y{1} यत {g}{2} where g is the gender of the given compound. 34) compound type: ब ह compound sub-type: त त य थ -ब ह (सम न धकरण ) Bs3 ऊढरथ Example with <ऊढ-रथ >Bs3 paraphrase: ऊढ रथ य न paraphrase rule: <x-y>bs3 => x{1} y{1} य न/यय /य न 35) compound type: ब ह compound sub-type: चत थ -ब ह (सम न धकरण ) Bs4 द व Example with <द -व >Bs4 paraphrase: द म व म य paraphrase rule: <x-y>bs4 => x{1} y{1} य /य /य 36) compound type: ब ह compound sub-type: प थ -ब ह (सम न धकरण ) Bs5 अपगतज व Example with <अपगत-ज व >Bs5 paraphrase: अपगत ज व य त paraphrase rule: <x-y>bs5 => x{1} y{1} य त /य /य त 14

15 37) compound type: ब ह compound sub-type: ष थ -ब ह (सम न धकरण ) Bs6 प त र Example with <प त-अ र >Bs6 paraphrase: प तम अ रम य paraphrase rule: <x-y>bs6 => x{1} y{1} य /य /य 38) compound type: ब ह compound sub-type: स थ -ब ह (सम न धकरण ) Bs7 न व स Example with < न - व स >Bs7 paraphrase: न व स य न paraphrase rule: <x-y>bs7 => x{1} y{1} य न/य म /य न 39) compound type: ब ह compound sub-type: द व चक-ब ह (सम न धकरण ) Bsd प व र Example with <प व -उ र >Bsd paraphrase: प व च उ र च यद र लम paraphrase rule: <x-y>bsd => x{6} च y{6} च यद र लम 40) compound type: ब ह compound sub-type: हरण वषयक-ब ह (सम न धकरण ) Bsp द ड द ड Example with <द ड -द ड>Bsp paraphrase: द ड च द ड च इदम य म व म paraphrase rule: <x-y>bsp => x{3} च y{3} च इदम य म व म 15

16 41) compound type: ब ह compound sub-type: हण वषयक-ब ह (सम न धकरण ) Bsg क श क श Example with <क श -क श>Bsg paraphrase: क श ष क श ष ग ह इदम य म व म paraphrase rule: <x-y>bsg => x{7}-y{7} ग ह इदम य म व म 42) compound type: ब ह compound sub-type: अ थ -म मपदल प -(नञ)ब ह Bsmn अप Example with <अ-प >Bsmn paraphrase: न व त प य paraphrase rule: <x-y>bsmn => न व त -y{1} य /य /य 43) compound type: ब ह compound sub-type: द-ब ह Bvp नद य Example with < नर-दय >Bvp paraphrase: नग त दय य त paraphrase rule: 44) compound type: ब ह compound sub-type: स भयपद-ब ह (सम न धकरण ) Bss चत र Example with < -चत र >Bss paraphrase: य व चत र व य paraphrase rule: <x-y>bss = > x{1} व y{1} य /य /य 16

17 45) compound type: ब ह compound sub-type: उपम न-प व पद-ब ह (सम न धकरण ) Bsu च म ख Example with <च -म ख >Bsu paraphrase: च इव म खम य paraphrase rule: <x-y>bsu => x{1} इव y{1} य /य /य 46) compound type: ब ह compound sub-type: धकरण-ब ह Bv क ठ क ल /च श खर Example with <क ठ -क ल >Bv/<च -श खर >Bv paraphrase: क ठ क ल य /च श खर य paraphrase rule: <x-y>bv => x y{1} य /य /य 47) compound type: ब ह compound sub-type: स रपद- धकरण-ब ह Bvs उपदश Example with <उप-दश >Bvs paraphrase: दश न म सम प य स त paraphrase rule: <x-y>bvs => y{6} x य स त 48) compound type: ब ह compound sub-type: सहप व पद- धकरण-ब ह BvS सप Example with <स-प >BvS paraphrase: प ण सह paraphrase rule: <x-y>bvs => y{3} सह 17

18 49) compound type: ब ह compound sub-type: उपम नप व पद- धकरण-ब ह BvU उ म ख Example with <उ -म ख >BvU paraphrase: उ इव म खम य paraphrase rule: <x-y>bvu => x{6} इव y य /य /य 50) compound type: ब ह compound sub-type: ब पद-ब ह Bb Example with paraphrase: paraphrase rule: 4.5 Major class: 51) compound type: compound sub-type: इतर तरय ग- Di र मक Example with <र म-क >Di paraphrase: र म च क च paraphrase rule: <x-y+>di => x{1} च (y{1} च)+ Here + indicates one or more occurences. 52) compound type: compound sub-type: सम ह र- Ds स प रभ षम Example with <स -प रभ षम >Ds paraphrase: स च प रभ ष च एतय सम ह र paraphrase rule: <x-y+>ds => x{1} च (y{1} च)+ एतत n सम ह र Here + indicates one or more occurences. n=2 if there are only two components. n=3 otherwise. 18

19 53) compound type: एकश ष compound sub-type: एकश ष- E पतर Example with < पतर >E paraphrase: म त च पत च paraphrase rule: Give a list of exceptions with व हव म No common rule 54) compound type: क वल compound sub-type: क वल S भ तप व Example with <भ त-प व >S paraphrase: प व म भ त paraphrase rule: <x-y>s => y{1} x{1} 55) compound type: compound sub-type: d उपय प र Example with <उप र-उप र>d paraphrase: उप र उप र paraphrase rule: <x-y>d => x y 5 Examples of compound tagging from ब लक ड of व कर म यणम Sloka 1.1.1: <<तपस- य>Di- नरतम >T7 तप <व ग- वद म >U वरम न रदम प रप व क <म न-प वम >T Sloka 1.1.8: <<इ क -व श>T6- भव >Bs6 र म न म जन त < नयत-आ >Bs6 <मह (महत)-व य >Bs6 तम न ध तम न वश

20 Sloka : <<<व द-<व द-अ >T6>Di-त >T6- >U <धन र-व द >T6 च न त <<<<सव -श >K1-अथ >T6-त >T6- >U तम न तभ नव न Structure of Sanskrit Compounds The Sanskrit compounds are binary in nature (with an exception of, and ब पद-ब ह). Hence they can be faithfully represented as binary trees as in Figure 1. The analysis shown in this figure may be represented in a linear notation as <A-<B-C>>. We add a tag to each of the compounds labeling its name. Thus the compound ABC after proper labeling will be <A-<B-C>tag1>tag2, where tag1 is the name of compound formed by the words B and C, and tag2 is the name of the compound formed by A and BC. The grammar for validation of tagged compounds is given below. 7 Grammar of tagged compounds compound: < component - component > tag < component - component > tag taddhita < component - component > tag number < component - component > tag gender < component - dvandvacomponents > dvandvatag < component > Etag ; dvandvacomponents: dvandvacomponents - component component ; component: pada compound ; A[1-7] 20

21 Bs[2-7] Bs[dpgsu] Bsmn Bv[sSU] B[bv] K[1-7] Km T[1-7] T[bgkmnp] Td[stu] [ESUd] ; dvandva D[is] ; Etag : E ; pada: [a-za-z]+ ; 21

DCA प रय जन क य म ग नद शक द र श नद श लय मह म ग ध अ तरर य ह द व व व लय प ट ह द व व व लय, ग ध ह स, वध (मह र ) DCA-09 Project Work Handbook

DCA प रय जन क य म ग नद शक द र श नद श लय मह म ग ध अ तरर य ह द व व व लय प ट ह द व व व लय, ग ध ह स, वध (मह र ) DCA-09 Project Work Handbook मह म ग ध अ तरर य ह द व व व लय (स सद र प रत अ ध नयम 1997, म क 3 क अ तगत थ पत क य व व व लय) Mahatma Gandhi Antarrashtriya Hindi Vishwavidyalaya (A Central University Established by Parliament by Act No.

More information

वण म गळ ग र प ज http://www.mantraaonline.com/ वण म गळ ग र प ज Check List 1. Altar, Deity (statue/photo), 2. Two big brass lamps (with wicks, oil/ghee) 3. Matchbox, Agarbatti 4. Karpoor, Gandha Powder,

More information

S. RAZA GIRLS HIGH SCHOOL

S. RAZA GIRLS HIGH SCHOOL S. RAZA GIRLS HIGH SCHOOL SYLLABUS SESSION 2017-2018 STD. III PRESCRIBED BOOKS ENGLISH 1) NEW WORLD READER 2) THE ENGLISH CHANNEL 3) EASY ENGLISH GRAMMAR SYLLABUS TO BE COVERED MONTH NEW WORLD READER THE

More information

क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD

क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD FROM PRINCIPAL S KALAM Dear all, Only when one is equipped with both, worldly education for living and spiritual education, he/she deserves respect

More information

Question (1) Question (2) RAT : SEW : : NOW :? (A) OPY (B) SOW (C) OSZ (D) SUY. Correct Option : C Explanation : Question (3)

Question (1) Question (2) RAT : SEW : : NOW :? (A) OPY (B) SOW (C) OSZ (D) SUY. Correct Option : C Explanation : Question (3) Question (1) Correct Option : D (D) The tadpole is a young one's of frog and frogs are amphibians. The lamb is a young one's of sheep and sheep are mammals. Question (2) RAT : SEW : : NOW :? (A) OPY (B)

More information

ENGLISH Month August

ENGLISH Month August ENGLISH 2016-17 April May Topic Literature Reader (a) How I taught my Grand Mother to read (Prose) (b) The Brook (poem) Main Course Book :People Work Book :Verb Forms Objective Enable students to realise

More information

HinMA: Distributed Morphology based Hindi Morphological Analyzer

HinMA: Distributed Morphology based Hindi Morphological Analyzer HinMA: Distributed Morphology based Hindi Morphological Analyzer Ankit Bahuguna TU Munich ankitbahuguna@outlook.com Lavita Talukdar IIT Bombay lavita.talukdar@gmail.com Pushpak Bhattacharyya IIT Bombay

More information

ह द स ख! Hindi Sikho!

ह द स ख! Hindi Sikho! ह द स ख! Hindi Sikho! by Shashank Rao Section 1: Introduction to Hindi In order to learn Hindi, you first have to understand its history and structure. Hindi is descended from an Indo-Aryan language known

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL

The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL 2011 33 50 Machine Learning Approach for the Classification of Demonstrative Pronouns for Indirect Anaphora in Hindi News Items Kamlesh Dutta

More information

F.No.29-3/2016-NVS(Acad.) Dated: Sub:- Organisation of Cluster/Regional/National Sports & Games Meet and Exhibition reg.

F.No.29-3/2016-NVS(Acad.) Dated: Sub:- Organisation of Cluster/Regional/National Sports & Games Meet and Exhibition reg. नव दय ववद य लय सम त (म नव स स धन ववक स म त र लय क एक स व यत स स न, ववद य लय श क ष एव स क षरत ववभ ग, भ रत सरक र) ब -15, इन स लयट य यन नल एयरय, स क लर 62, न यड, उत तर रद 201 309 NAVODAYA VIDYALAYA SAMITI

More information

Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features

Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features Dhirendra Singh Sudha Bhingardive Kevin Patel Pushpak Bhattacharyya Department of Computer Science

More information

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE Pratibha Bajpai 1, Dr. Parul Verma 2 1 Research Scholar, Department of Information Technology, Amity University, Lucknow 2 Assistant

More information

व रण क ए आ दन-पत र. Prospectus Cum Application Form. न दय व kऱय सम त. Navodaya Vidyalaya Samiti ਨਵ ਦ ਆ ਦਵਦ ਆਦ ਆ ਸਦ ਤ. Navodaya Vidyalaya Samiti

व रण क ए आ दन-पत र. Prospectus Cum Application Form. न दय व kऱय सम त. Navodaya Vidyalaya Samiti ਨਵ ਦ ਆ ਦਵਦ ਆਦ ਆ ਸਦ ਤ. Navodaya Vidyalaya Samiti व रण क ए आ दन-पत र ENGLISH / ह द / ਪ ਜ ਬ Prospectus Cum Application Form PROSPECTUS IS FREE OF COST न दय व kऱय सम त Navodaya Vidyalaya Samiti ਨਵ ਦ ਆ ਦਵਦ ਆਦ ਆ ਸਦ ਤ व रण क तन:श ल क Navodaya Vidyalaya Samiti

More information

The Role of the Head in the Interpretation of English Deverbal Compounds

The Role of the Head in the Interpretation of English Deverbal Compounds The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Corpus Linguistics (L615)

Corpus Linguistics (L615) (L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S N S ER E P S I M TA S UN A I S I T VER RANKING AND UNRANKING LEFT SZILARD LANGUAGES Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A-1997-2 UNIVERSITY OF TAMPERE DEPARTMENT OF

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Universiteit Leiden ICT in Business

Universiteit Leiden ICT in Business Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix

More information

Language properties and Grammar of Parallel and Series Parallel Languages

Language properties and Grammar of Parallel and Series Parallel Languages arxiv:1711.01799v1 [cs.fl] 6 Nov 2017 Language properties and Grammar of Parallel and Series Parallel Languages Mohana.N 1, Kalyani Desikan 2 and V.Rajkumar Dare 3 1 Division of Mathematics, School of

More information

ScienceDirect. Malayalam question answering system

ScienceDirect. Malayalam question answering system Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3 Inleiding Taalkunde Docent: Paola Monachesi Blok 4, 2001/2002 Contents 1 Syntax 2 2 Phrases and constituent structure 2 3 A minigrammar of Italian 3 4 Trees 3 5 Developing an Italian lexicon 4 6 S(emantic)-selection

More information

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks 3rd Grade- 1st Nine Weeks R3.8 understand, make inferences and draw conclusions about the structure and elements of fiction and provide evidence from text to support their understand R3.8A sequence and

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis International Journal of Arts Humanities and Social Sciences (IJAHSS) Volume 1 Issue 1 ǁ August 216. www.ijahss.com Linguistic Variation across Sports Category of Press Reportage from British Newspapers:

More information

Grammar Extraction from Treebanks for Hindi and Telugu

Grammar Extraction from Treebanks for Hindi and Telugu Grammar Extraction from Treebanks for Hindi and Telugu Prasanth Kolachina, Sudheer Kolachina, Anil Kumar Singh, Samar Husain, Viswanatha Naidu,Rajeev Sangal and Akshar Bharati Language Technologies Research

More information

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles) New York State Department of Civil Service Committed to Innovation, Quality, and Excellence A Guide to the Written Test for the Senior Stenographer / Senior Typist Series (including equivalent Secretary

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

A Bayesian Learning Approach to Concept-Based Document Classification

A Bayesian Learning Approach to Concept-Based Document Classification Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Chad Langley, Alon Lavie, Lori Levin, Dorcas Wallace, Donna Gates, and Kay Peterson Language Technologies Institute Carnegie

More information

Short Text Understanding Through Lexical-Semantic Analysis

Short Text Understanding Through Lexical-Semantic Analysis Short Text Understanding Through Lexical-Semantic Analysis Wen Hua #1, Zhongyuan Wang 2, Haixun Wang 3, Kai Zheng #4, Xiaofang Zhou #5 School of Information, Renmin University of China, Beijing, China

More information

Dialog Act Classification Using N-Gram Algorithms

Dialog Act Classification Using N-Gram Algorithms Dialog Act Classification Using N-Gram Algorithms Max Louwerse and Scott Crossley Institute for Intelligent Systems University of Memphis {max, scrossley } @ mail.psyc.memphis.edu Abstract Speech act classification

More information

Development of the First LRs for Macedonian: Current Projects

Development of the First LRs for Macedonian: Current Projects Development of the First LRs for Macedonian: Current Projects Ruska Ivanovska-Naskova Faculty of Philology- University St. Cyril and Methodius Bul. Krste Petkov Misirkov bb, 1000 Skopje, Macedonia rivanovska@flf.ukim.edu.mk

More information

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego

More information

Problems of the Arabic OCR: New Attitudes

Problems of the Arabic OCR: New Attitudes Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing

More information

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny By the End of Year 8 All Essential words lists 1-7 290 words Commonly Misspelt Words-55 working out more complex, irregular, and/or ambiguous words by using strategies such as inferring the unknown from

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume ISSN 1930-2940 Managing Editor: M. S. Thirumalai, Ph.D. Editors: B. Mallikarjun, Ph.D. Sam Mohanlal, Ph.D. B. A. Sharada, Ph.D.

More information

Leveraging Sentiment to Compute Word Similarity

Leveraging Sentiment to Compute Word Similarity Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade Trend Survey on Japanese Natural Language Processing Studies over the Last Decade Masaki Murata, Koji Ichii, Qing Ma,, Tamotsu Shirado, Toshiyuki Kanamaru,, and Hitoshi Isahara National Institute of Information

More information

Building a Semantic Role Labelling System for Vietnamese

Building a Semantic Role Labelling System for Vietnamese Building a emantic Role Labelling ystem for Vietnamese Thai-Hoang Pham FPT University hoangpt@fpt.edu.vn Xuan-Khoai Pham FPT University khoaipxse02933@fpt.edu.vn Phuong Le-Hong Hanoi University of cience

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

Learning Computational Grammars

Learning Computational Grammars Learning Computational Grammars John Nerbonne, Anja Belz, Nicola Cancedda, Hervé Déjean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard and Erik Tjong Kim Sang Abstract

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

Indian Institute of Technology, Kanpur

Indian Institute of Technology, Kanpur Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar

More information

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet

More information

Coast Academies Writing Framework Step 4. 1 of 7

Coast Academies Writing Framework Step 4. 1 of 7 1 KPI Spell further homophones. 2 3 Objective Spell words that are often misspelt (English Appendix 1) KPI Place the possessive apostrophe accurately in words with regular plurals: e.g. girls, boys and

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu

More information

INDIAN INSTITUTE OF SCIENCE EDUCATION AND RESEARCH KOLKATA Mohanpur Ref.No.: IISER-K/Rectt.NT-01/2016/Admn Date:

INDIAN INSTITUTE OF SCIENCE EDUCATION AND RESEARCH KOLKATA Mohanpur Ref.No.: IISER-K/Rectt.NT-01/2016/Admn Date: -741 246 INDIAN INSTITUTE OF SCIENCE EDUCATION AND RESEARCH KOLKATA Mohanpur 741 246 Ref.No.: IISER-K/Rectt.NT-01/2016/Admn Date: 13.09.2016 (Apply online on or before 30.09.2016) INDIAN INSTITUTE OF SCIENCE

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Sriram Venkatapathy Language Technologies Research Centre, International Institute of Information Technology

More information

Writing a composition

Writing a composition A good composition has three elements: Writing a composition an introduction: A topic sentence which contains the main idea of the paragraph. a body : Supporting sentences that develop the main idea. a

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

Grammars & Parsing, Part 1:

Grammars & Parsing, Part 1: Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

BULATS A2 WORDLIST 2

BULATS A2 WORDLIST 2 BULATS A2 WORDLIST 2 INTRODUCTION TO THE BULATS A2 WORDLIST 2 The BULATS A2 WORDLIST 21 is a list of approximately 750 words to help candidates aiming at an A2 pass in the Cambridge BULATS exam. It is

More information

Analysis of Probabilistic Parsing in NLP

Analysis of Probabilistic Parsing in NLP Analysis of Probabilistic Parsing in NLP Krishna Karoo, Dr.Girish Katkar Research Scholar, Department of Electronics & Computer Science, R.T.M. Nagpur University, Nagpur, India Head of Department, Department

More information

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18 Version Space Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Version Space Term 2012/2013 1 / 18 Outline 1 Learning logical formulas 2 Version space Introduction Search strategy

More information

Adjectives tell you more about a noun (for example: the red dress ).

Adjectives tell you more about a noun (for example: the red dress ). Curriculum Jargon busters Grammar glossary Key: Words in bold are examples. Words underlined are terms you can look up in this glossary. Words in italics are important to the definition. Term Adjective

More information

LING 329 : MORPHOLOGY

LING 329 : MORPHOLOGY LING 329 : MORPHOLOGY TTh 10:30 11:50 AM, Physics 121 Course Syllabus Spring 2013 Matt Pearson Office: Vollum 313 Email: pearsonm@reed.edu Phone: 7618 (off campus: 503-517-7618) Office hrs: Mon 1:30 2:30,

More information

A Corpus-based Evaluation of a Domain-specific Text to Knowledge Mapping Prototype

A Corpus-based Evaluation of a Domain-specific Text to Knowledge Mapping Prototype A Corpus-based Evaluation of a Domain-specific Text to Knowledge Mapping Prototype Rushdi Shams Department of Computer Science and Engineering, Khulna University of Engineering & Technology (KUET), Bangladesh

More information

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)

More information