Guidelines for tagging of Sanskrit Compounds

Similar documents
DCA प रय जन क य म ग नद शक द र श नद श लय मह म ग ध अ तरर य ह द व व व लय प ट ह द व व व लय, ग ध ह स, वध (मह र ) DCA-09 Project Work Handbook


S. RAZA GIRLS HIGH SCHOOL

क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD

Question (1) Question (2) RAT : SEW : : NOW :? (A) OPY (B) SOW (C) OSZ (D) SUY. Correct Option : C Explanation : Question (3)

ENGLISH Month August

HinMA: Distributed Morphology based Hindi Morphological Analyzer

ह द स ख! Hindi Sikho!

The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL

F.No.29-3/2016-NVS(Acad.) Dated: Sub:- Organisation of Cluster/Regional/National Sports & Games Meet and Exhibition reg.

Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

व रण क ए आ दन-पत र. Prospectus Cum Application Form. न दय व kऱय सम त. Navodaya Vidyalaya Samiti ਨਵ ਦ ਆ ਦਵਦ ਆਦ ਆ ਸਦ ਤ. Navodaya Vidyalaya Samiti

The Role of the Head in the Interpretation of English Deverbal Compounds

Memory-based grammatical error correction

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

The stages of event extraction

Using dialogue context to improve parsing performance in dialogue systems

Beyond the Pipeline: Discrete Optimization in NLP

Cross Language Information Retrieval

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Ensemble Technique Utilization for Indonesian Dependency Parser

CS 446: Machine Learning

Prediction of Maximal Projection for Semantic Role Labeling

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Linking Task: Identifying authors and book titles in verbose queries

Learning Methods in Multilingual Speech Recognition

AQUA: An Ontology-Driven Question Answering System

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Parsing of part-of-speech tagged Assamese Texts

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Corpus Linguistics (L615)

A Comparison of Two Text Representations for Sentiment Analysis

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

(Sub)Gradient Descent

Universiteit Leiden ICT in Business

Online Updating of Word Representations for Part-of-Speech Tagging

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Language properties and Grammar of Parallel and Series Parallel Languages

ScienceDirect. Malayalam question answering system

Applications of memory-based natural language processing

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

CS 598 Natural Language Processing

Detecting English-French Cognates Using Orthographic Edit Distance

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Grammar Extraction from Treebanks for Hindi and Telugu

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

A Bayesian Learning Approach to Concept-Based Document Classification

Probabilistic Latent Semantic Analysis

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Short Text Understanding Through Lexical-Semantic Analysis

Dialog Act Classification Using N-Gram Algorithms

Development of the First LRs for Macedonian: Current Projects

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Problems of the Arabic OCR: New Attitudes

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Constructing Parallel Corpus from Movie Subtitles

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN

Leveraging Sentiment to Compute Word Similarity

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

CS Machine Learning

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

Building a Semantic Role Labelling System for Vietnamese

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Rule Learning With Negation: Issues Regarding Effectiveness

Lecture 1: Machine Learning Basics

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Learning Computational Grammars

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Indian Institute of Technology, Kanpur

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Coast Academies Writing Framework Step 4. 1 of 7

Disambiguation of Thai Personal Name from Online News Articles

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

INDIAN INSTITUTE OF SCIENCE EDUCATION AND RESEARCH KOLKATA Mohanpur Ref.No.: IISER-K/Rectt.NT-01/2016/Admn Date:

Rule Learning with Negation: Issues Regarding Effectiveness

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Writing a composition

Evolutive Neural Net Fuzzy Filtering: Basic Description

Grammars & Parsing, Part 1:

Speech Recognition at ICSI: Broadcast News and beyond

BULATS A2 WORDLIST 2

Analysis of Probabilistic Parsing in NLP

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Adjectives tell you more about a noun (for example: the red dress ).

LING 329 : MORPHOLOGY

A Corpus-based Evaluation of a Domain-specific Text to Knowledge Mapping Prototype

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Transcription:

Guidelines for tagging of Sanskrit Compounds K V Ramkrishnamacharyulu, Amba Kulkarni, Tirumala Kulkarni and Anil Kumar Final draft for circulation among the SHMT consortium members dated 12/03/2012 1 Background With the advent of computers, and the advances in the field of NLP, the annotated corpora is gaining importance. Annotated corpora not only serves as an important resource for building Statistical tools for automatic annotation, but also provides useful insights for language teachers, language learners, and researchers working on various aspects of language. Compound formation is very productive in Sanskrit. On an average every fourth word in a Sanskrit text is a compound. It is not practical to store all the compounds and their analysis/meaning. So for Sanskrit, one needs a very good compound identifier and a program to generate the paraphrase of the compound. Pāṇini has provided rules for compound formation and also semantic restrictions for many of the compound formations. But to implement these rules on machine, one requires a knowledge base. A good collection of tagged compounds will be useful in deciding the parameters for development of such a knowledge base. Similarly, a collection of tagged corpus will be useful for getting a frequency distribution of various compounds. Such a tagged corpus will also be useful for developing automatic compound identifiers using suitable machine learning algorithms. 2 Tag set for compounds The Indian Grammatical Tradition has a vast literature on samāsa. Following the literature, Sanskrit compounds are classified into 5 major and 55 The original draft has been modified after a series of workshops on samāsa tagging for Sanskrit. The details of the workshops are available on the SHMT protal viz: http://sanskrit.uohyd.ernet.in/shmt/login.php 1

minor categories. The major categories are अ य भ व त ष कम ध रय ब ह Note here that there is a small deviation from the standard literature. कम ध रय has been added as a major type of compound, instead of sub-class of त ष. The reason is purely from the convenience of the tag-names. We give below the sub-classification along with the associated tags. 2

compound type: अ य भ व compound type: कम ध रय अ य-प व पद A1 वश षण-प व पद-कम ध रय K1 अ य-उ रपद A2 वश षण-उ रपद-कम ध रय K2 त भ त A3 वश षण- उभयपद-कम ध रय K3 स प व पद-न रपद A4 उपम न-प व पद-कम ध रय K4 न रपद-अ पद थ स य म A5 उपम न-उ रपद-कम ध रय K5 स प व पद-व य रपद A6 अवध रण प व पद-कम ध रय K6 प र -म -प व पद ष रपद A7 स वन प व पद-कम ध रय K7 म मपदल प -कम ध रय Km compound type: त ष compound type: ब ह थम त ष T1 त य त ष T2 त य थ -ब ह (सम न धकरण ) Bs2 त त य त ष T3 त त य थ -ब ह (सम न धकरण ) Bs3 चत थ त ष T4 चत थ -ब ह (सम न धकरण ) Bs4 प म त ष T5 प थ -ब ह (सम न धकरण ) Bs5 ष त ष T6 ष थ -ब ह (सम न धकरण ) Bs6 स म त ष T7 स थ -ब ह (सम न धकरण ) Bs7 न त ष Tn द व चक-ब ह (सम न धकरण ) Bsd सम ह र- ग Tds हरण वषयक-ब ह (सम न धकरण ) Bsp त त थ ग Tdt हण वषयक-ब ह (सम न धकरण ) Bsg उ रपद ग Tdu अ थ -म मपदल प (नञ)-ब ह Bsmn ग तसम स Tg द-ब ह Bvp क सम स Tk स भयपद-ब ह (सम न धकरण ) Bss दसम स Tp उपम नप व पद-ब ह (सम न धकरण ) Bsu मय र सक द Tm धकरण-ब ह Bv त ष ब पद Tb स पद धकरण-ब ह Bvs त ष उपपद U सहप व पद- धकरण-ब ह BvS उपम नप व पद- धकरण-ब ह BvU ब पद-ब ह Bb 3

compound type: इतर तरय ग- सम ह र- एकश ष compound type: क वलसम स compound type: Di Ds E क वल S d 3 General Guidelines The tagging of a compound involves the following separating the constituent padas by - s, Undoing the sandhi, changing the samāsa purvapadas to their pratipadikas, assigning a tag. Below we give some examples of samasta padas and their tagging. र जप ष ल य धन व ष म स न <र जन-प ष >T6 <ल -छ य >T6 <धन ष- व >T6 <षट-म स न>Tds Do not split the padas in following cases तङ or क द with upa-padas as in आग, ह र ण etc. If the words have ढ-अथ as in म डप, तर श, उप ह र etc. In case either the प व पद or the उ रपद or both in turn are सम पदs, then they are also to be tagged. E.g. तप य नरतम should be tagged as <<तपस- य>Di- नरतम >T7 Note the use of < and > to indicate the words constituting the compound. If a pada is य - वश षण in a given context, then it should be marked as Bvs only. 4

In case there are more than one possible tags, then show both the tags separated by. E.g. व क इव अ वरत च पश न, the word अ वरत is ambiguous between Tn and T7. Hence it should be tagged as: <अ- वरत>Tn- च >Bs6 <अ व-रत>T7- च >Bs6 Handling taddhita constituents: If a constituent of a compund is a taddhita formed from a compound pratipadika, then, the taddhita suffix is to be added after indicating the samāsa tag. E.g. a) ह मम लक त is to be tagged as <<<ह म-म ल >T6 ˆ इ>-क त >T6. Note the use of i suffix, which indicates a taddhita pratyaya. b) एक ण व क त = <एक -अण व>K1 ˆ ई>-क त c) र मक वत = <र म-क >Di ˆ वत d) म हनन मक = <म हन-न म ˆ क>Bs6 In case the vigraha vākya must specify the number, then we suggest you to specify the number information while tagging the compound, as in न प त = <न {3}-प त >T6 धम भर म य = <<धम -अ भर म>Bs6{3}- य >T6 Similarly, if the ल information is also required, it may also be specified as सक पम = <स-क पम {P}>Bs, if क पम refers to क प च य, or = <स-क पम {S}>Bs, if the pratipadika of क पम is क प. 4 Rules for generating vigraha vākya Though a substantial amount of literature on Sanskrit Compounds is available, for the benefit of annotators, we give below the vigraha vākya for each type of the compound, with an example. Further, for the benefit of the programmers, we also give rules to generate a vigraha vākya from the properly tagged compound. Thus each of the example below consists of name of a compound, its major class, its tag, an example, paraphrase describing the meaning of the example compound( व हव ), and the rule to get the paraphrase mechanically (wherever possible) from its components. 5

4.1 Major class: अ य भ व 1) compound type: अ य भ व compound sub-type: अ य-प व पद A1 उपक म Example with <उप-क म >A1 paraphrase: क सम पम paraphrase rule: <x-y>a1 => y{6} f{x} where f maps x to a noun with same semantic content. A function f needs to be defined. 2) compound type: अ य भ व compound sub-type: अ य-उ रपद A2 अ प र Example with <अ -प र>A2 paraphrase: अ ण वपर तम व म paraphrase rule: <x-y>a2 => x{3} वपर तम व म 3) compound type: अ य भ व compound sub-type: त भ त A3 त Example with < त द -ग >A2 paraphrase: त ग व य न द श paraphrase rule: List to be given; collect from प ण न s अ य 4) compound type: अ य भ व compound sub-type: स प व पद-न रपद A4 स ग म Example with <स न-ग म >A4 paraphrase: स न म ग न म सम ह र paraphrase rule: <x-y>a4 => x{6} y {6} सम ह र y is the तप दक of y. 6

5) compound type: अ य भ व compound sub-type: न रपद-अ पद थ स य म A5 उ ग म Example with <उ -ग म >A5 paraphrase: उ ग य न द श paraphrase rule: <x-y>a5 => x {1} y{1} य न द श y is the तप दक of y. x is derived from x by changing the gender to that of y. the number of x and y will be plural except when x = they are in वचन 6) compound type: अ य भ व compound sub-type: स प व पद-व य रपद A6 म न Example with < -म न>A6 paraphrase: य ण म म न न म सम ह र paraphrase rule: <x-y>a6 => x{6} y{6} सम ह र 7) compound type: अ य भ व compound sub-type: प र -म -प व पद ष रपद A7 प र ग म Example with <प र -ग म >A7 paraphrase: ग य -प र paraphrase rule: <x-y>a7 => y{6} x 4.2 Major class: त ष 8) compound type: त ष compound sub-type: थम त ष T1 अध प ल Example with <अध - प ल >T1 paraphrase: अध म प paraphrase rule: <x-y>t1 => x{1} y{6} 7

9) compound type: त ष compound sub-type: त य त ष T2 क त Example with <क - त >T2 paraphrase: क म त paraphrase rule: <x-y>t2 => x{2} y 10) compound type: त ष compound sub-type: त त य त ष T3 श ल ख ड Example with <श ल -ख ड >T3 paraphrase: श लय ख ड paraphrase rule: <x-y>t3 => x{3} y 11) compound type: त ष compound sub-type: चत थ त ष T4 य पद Example with <य प-द >T4 paraphrase: य प य द paraphrase rule: <x-y>t4 => x{4} y 12) compound type: त ष compound sub-type: प म त ष T5 च रभयम Example with <च र-भयम >T5 paraphrase: च र त भयम paraphrase rule: <x-y>t5 => x{5} y 8

13) compound type: त ष compound sub-type: ष त ष T6 दशरथप Example with <दशरथ-प >T6 paraphrase: दशरथ प paraphrase rule: <x-y>t6 => x{6} y 14) compound type: त ष compound sub-type: स म त ष T7 अ श ड Example with <अ -श णड >T7 paraphrase: अ ष श ड paraphrase rule: <x-y>t7 => x{7} y 15) compound type: त ष - नञ compound sub-type: न त ष Tn अ ण /अन Example with <न- ण >Tn / <न-अ >Tn paraphrase: न ण / न अ paraphrase rule: <x-y>tn => न y 16) compound type: त ष - ग compound sub-type: सम ह र- ग Tds प गवम Example with <प न-गवम >Tds paraphrase: प न म गव म सम ह र paraphrase rule: <x-y>tds => x{6;ba} y{6;ba} सम ह र 9

17) compound type: त ष compound sub-type: त त थ ग Tdt अ कप ल Example with <अ न-कप ल >Tdt paraphrase: अ स कप ल ष स स त paraphrase rule: No Specific Rule 18) compound type: त ष compound sub-type: उ रपद ग Tdu प गवधन Example with <<प न-गव>Tdu-धन >>Bs paraphrase: paraphrase rule: 19) compound type: त ष compound sub-type: ग तसम स Tg स ग Example with <सम-ग >Tg paraphrase: No paraphrase, since it is a न compound paraphrase rule: 20)compound type: compound sub-type: Example with paraphrase: paraphrase rule: त ष क सम स Tk क प ष /क प ष <क -प ष >Tk / <क -प ष >Tk No paraphrase, since it is a ' न ' compound 10

21) compound type: त ष compound sub-type: दसम स Tp च य Example with < -आच य >Tp paraphrase: क आच य paraphrase rule: <x-y>tp =<fx y> Meanings of upasargas (fx) need to be listed. 22) compound type: त ष compound sub-type: मय र सक द Tm र ज रम Example with <र जन-अ रम >Tm paraphrase: paraphrase rule: <x-y>tm =>?? गणप ठ is there. So no rule for making व हव is required. The list should be given 23) compound type: त ष compound sub-type: त ष ब पद Tb स नम Example with < -अ -स नम >Tb paraphrase: paraphrase rule: <x-y-z>tb = x{1} y{1} z{1} Here y is the prathama puruṣa ekavacana rūpa of the verb whose kṛdanta 24) compound type: त ष compound sub-type: त ष उपपद U क क र Example with <क -क र >U paraphrase: क म कर त paraphrase rule: <x-y>u => x{2} y 11

4.3 Major class: कम ध रय 25) compound type: कम ध रय compound sub-type: वश षण-प व पद-कम ध रय K1 न ल लम Example with <न ल-उ लम >K1 paraphrase: न लम तत उ लम च paraphrase rule: <x-y>k1 => x{1} तत y{1} च 26) compound type: कम ध रय compound sub-type: वश षण-उ रपद-कम ध रय K2 णब ल Example with < ण-ब ल >K2 paraphrase: ण च ब ल च paraphrase rule: <x-y>k2 => x{1} च y{1} च 27) compound type: कम ध रय compound sub-type: वश षण-उभयपद-कम ध रय K3 म श तल Example with <म -श तल >K3 paraphrase: म च अस श तल च paraphrase rule: <x-y>k3 => x{1}च अस y{1} च 28) compound type: कम ध रय compound sub-type: उपम न-प व पद-कम ध रय K4 म घ य म Example with <म घ- य म >K4 paraphrase: म घ इव य म paraphrase rule: <x-y>k4 => x{1} इव y{1} 12

29) compound type: कम ध रय compound sub-type: उपम न-उ रपद-कम ध रय K5 प ष Example with <प ष- >K5 paraphrase: प ष इव paraphrase rule: <x-y>k5 => x{1} y{1} इव 30) compound type: कम ध रय compound sub-type: अवध रण -प व पद K6 ग द व Example with <ग -द व >K6 paraphrase: ग एव द व paraphrase rule: <x-y>k6 => x{1} एव y{1} 31) compound type: कम ध रय compound sub-type: स भ वन -प व पद K7 अय नगर Example with <अय -नगर >K7 paraphrase: अय इ त नगर paraphrase rule: <x-y>k7 => x{1} इ त y{1} 32) compound type: कम ध रय compound sub-type: म मपदल प Km श कप थ व Example with <श क-प थ व >Km paraphrase: श क य प थ व paraphrase rule: <x-y>km => x{1} z* y{1} z* is a missing madhyama pada. 13

4.4 Major class: ब ह 33) compound type: ब ह compound sub-type: त य थ -ब ह (सम न धकरण ) Bs2 दक Example with < -उदक >Bs2 paraphrase: उदक यम paraphrase rule: <x-y>bs2 => x{1} y{1} यत {g}{2} where g is the gender of the given compound. 34) compound type: ब ह compound sub-type: त त य थ -ब ह (सम न धकरण ) Bs3 ऊढरथ Example with <ऊढ-रथ >Bs3 paraphrase: ऊढ रथ य न paraphrase rule: <x-y>bs3 => x{1} y{1} य न/यय /य न 35) compound type: ब ह compound sub-type: चत थ -ब ह (सम न धकरण ) Bs4 द व Example with <द -व >Bs4 paraphrase: द म व म य paraphrase rule: <x-y>bs4 => x{1} y{1} य /य /य 36) compound type: ब ह compound sub-type: प थ -ब ह (सम न धकरण ) Bs5 अपगतज व Example with <अपगत-ज व >Bs5 paraphrase: अपगत ज व य त paraphrase rule: <x-y>bs5 => x{1} y{1} य त /य /य त 14

37) compound type: ब ह compound sub-type: ष थ -ब ह (सम न धकरण ) Bs6 प त र Example with <प त-अ र >Bs6 paraphrase: प तम अ रम य paraphrase rule: <x-y>bs6 => x{1} y{1} य /य /य 38) compound type: ब ह compound sub-type: स थ -ब ह (सम न धकरण ) Bs7 न व स Example with < न - व स >Bs7 paraphrase: न व स य न paraphrase rule: <x-y>bs7 => x{1} y{1} य न/य म /य न 39) compound type: ब ह compound sub-type: द व चक-ब ह (सम न धकरण ) Bsd प व र Example with <प व -उ र >Bsd paraphrase: प व च उ र च यद र लम paraphrase rule: <x-y>bsd => x{6} च y{6} च यद र लम 40) compound type: ब ह compound sub-type: हरण वषयक-ब ह (सम न धकरण ) Bsp द ड द ड Example with <द ड -द ड>Bsp paraphrase: द ड च द ड च इदम य म व म paraphrase rule: <x-y>bsp => x{3} च y{3} च इदम य म व म 15

41) compound type: ब ह compound sub-type: हण वषयक-ब ह (सम न धकरण ) Bsg क श क श Example with <क श -क श>Bsg paraphrase: क श ष क श ष ग ह इदम य म व म paraphrase rule: <x-y>bsg => x{7}-y{7} ग ह इदम य म व म 42) compound type: ब ह compound sub-type: अ थ -म मपदल प -(नञ)ब ह Bsmn अप Example with <अ-प >Bsmn paraphrase: न व त प य paraphrase rule: <x-y>bsmn => न व त -y{1} य /य /य 43) compound type: ब ह compound sub-type: द-ब ह Bvp नद य Example with < नर-दय >Bvp paraphrase: नग त दय य त paraphrase rule: 44) compound type: ब ह compound sub-type: स भयपद-ब ह (सम न धकरण ) Bss चत र Example with < -चत र >Bss paraphrase: य व चत र व य paraphrase rule: <x-y>bss = > x{1} व y{1} य /य /य 16

45) compound type: ब ह compound sub-type: उपम न-प व पद-ब ह (सम न धकरण ) Bsu च म ख Example with <च -म ख >Bsu paraphrase: च इव म खम य paraphrase rule: <x-y>bsu => x{1} इव y{1} य /य /य 46) compound type: ब ह compound sub-type: धकरण-ब ह Bv क ठ क ल /च श खर Example with <क ठ -क ल >Bv/<च -श खर >Bv paraphrase: क ठ क ल य /च श खर य paraphrase rule: <x-y>bv => x y{1} य /य /य 47) compound type: ब ह compound sub-type: स रपद- धकरण-ब ह Bvs उपदश Example with <उप-दश >Bvs paraphrase: दश न म सम प य स त paraphrase rule: <x-y>bvs => y{6} x य स त 48) compound type: ब ह compound sub-type: सहप व पद- धकरण-ब ह BvS सप Example with <स-प >BvS paraphrase: प ण सह paraphrase rule: <x-y>bvs => y{3} सह 17

49) compound type: ब ह compound sub-type: उपम नप व पद- धकरण-ब ह BvU उ म ख Example with <उ -म ख >BvU paraphrase: उ इव म खम य paraphrase rule: <x-y>bvu => x{6} इव y य /य /य 50) compound type: ब ह compound sub-type: ब पद-ब ह Bb Example with paraphrase: paraphrase rule: 4.5 Major class: 51) compound type: compound sub-type: इतर तरय ग- Di र मक Example with <र म-क >Di paraphrase: र म च क च paraphrase rule: <x-y+>di => x{1} च (y{1} च)+ Here + indicates one or more occurences. 52) compound type: compound sub-type: सम ह र- Ds स प रभ षम Example with <स -प रभ षम >Ds paraphrase: स च प रभ ष च एतय सम ह र paraphrase rule: <x-y+>ds => x{1} च (y{1} च)+ एतत n सम ह र Here + indicates one or more occurences. n=2 if there are only two components. n=3 otherwise. 18

53) compound type: एकश ष compound sub-type: एकश ष- E पतर Example with < पतर >E paraphrase: म त च पत च paraphrase rule: Give a list of exceptions with व हव म No common rule 54) compound type: क वल compound sub-type: क वल S भ तप व Example with <भ त-प व >S paraphrase: प व म भ त paraphrase rule: <x-y>s => y{1} x{1} 55) compound type: compound sub-type: d उपय प र Example with <उप र-उप र>d paraphrase: उप र उप र paraphrase rule: <x-y>d => x y 5 Examples of compound tagging from ब लक ड of व कर म यणम Sloka 1.1.1: <<तपस- य>Di- नरतम >T7 तप <व ग- वद म >U वरम न रदम प रप व क <म न-प वम >T7 1.1.1 Sloka 1.1.8: <<इ क -व श>T6- भव >Bs6 र म न म जन त < नयत-आ >Bs6 <मह (महत)-व य >Bs6 तम न ध तम न वश 1.1.8 19

Sloka 1.1.14: <<<व द-<व द-अ >T6>Di-त >T6- >U <धन र-व द >T6 च न त <<<<सव -श >K1-अथ >T6-त >T6- >U तम न तभ नव न 1.1.14 6 Structure of Sanskrit Compounds The Sanskrit compounds are binary in nature (with an exception of, and ब पद-ब ह). Hence they can be faithfully represented as binary trees as in Figure 1. The analysis shown in this figure may be represented in a linear notation as <A-<B-C>>. We add a tag to each of the compounds labeling its name. Thus the compound ABC after proper labeling will be <A-<B-C>tag1>tag2, where tag1 is the name of compound formed by the words B and C, and tag2 is the name of the compound formed by A and BC. The grammar for validation of tagged compounds is given below. 7 Grammar of tagged compounds compound: < component - component > tag < component - component > tag taddhita < component - component > tag number < component - component > tag gender < component - dvandvacomponents > dvandvatag < component > Etag ; dvandvacomponents: dvandvacomponents - component component ; component: pada compound ; A[1-7] 20

Bs[2-7] Bs[dpgsu] Bsmn Bv[sSU] B[bv] K[1-7] Km T[1-7] T[bgkmnp] Td[stu] [ESUd] ; dvandva D[is] ; Etag : E ; pada: [a-za-z]+ ; 21