Available online at ScienceDirect. Procedia Computer Science 58 (2015 ) Manish Kumar a, Mohit Dua b

Similar documents
Grammar Extraction from Treebanks for Hindi and Telugu

Two methods to incorporate local morphosyntactic features in Hindi dependency

Parsing of part-of-speech tagged Assamese Texts

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

HinMA: Distributed Morphology based Hindi Morphological Analyzer

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

A Graph Based Authorship Identification Approach

AQUA: An Ontology-Driven Question Answering System

ScienceDirect. Malayalam question answering system

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

DCA प रय जन क य म ग नद शक द र श नद श लय मह म ग ध अ तरर य ह द व व व लय प ट ह द व व व लय, ग ध ह स, वध (मह र ) DCA-09 Project Work Handbook

Context Free Grammars. Many slides from Michael Collins

Control and Boundedness

क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD

A Simple Surface Realization Engine for Telugu

LTAG-spinal and the Treebank

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Prediction of Maximal Projection for Semantic Role Labeling

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Procedia - Social and Behavioral Sciences 143 ( 2014 ) CY-ICER Teacher intervention in the process of L2 writing acquisition

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

English to Marathi Rule-based Machine Translation of Simple Assertive Sentences

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Developing a TT-MCTAG for German with an RCG-based Parser

Some Principles of Automated Natural Language Information Extraction

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Specifying a shallow grammatical for parsing purposes

Procedia - Social and Behavioral Sciences 146 ( 2014 )

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Natural Language Processing. George Konidaris

Compositional Semantics

Constraining X-Bar: Theta Theory

Procedia - Social and Behavioral Sciences 237 ( 2017 )

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Dependency Annotation of Coordination for Learner Language

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

CS 598 Natural Language Processing

S. RAZA GIRLS HIGH SCHOOL

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Character Stream Parsing of Mixed-lingual Text

Chapter 4: Valence & Agreement CSLI Publications

Ensemble Technique Utilization for Indonesian Dependency Parser

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Grammars & Parsing, Part 1:

An Interactive Intelligent Language Tutor Over The Internet

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Adapting Stochastic Output for Rule-Based Semantics

A Computational Evaluation of Case-Assignment Algorithms

Quality Framework for Assessment of Multimedia Learning Materials Version 1.0

LEGO training. An educational program for vocational professions

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Procedia - Social and Behavioral Sciences 197 ( 2015 )

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Linking Task: Identifying authors and book titles in verbose queries

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Use of Online Information Resources for Knowledge Organisation in Library and Information Centres: A Case Study of CUSAT

Hyperedge Replacement and Nonprojective Dependency Structures

Using dialogue context to improve parsing performance in dialogue systems

The Effect of Multiple Grammatical Errors on Processing Non-Native Writing

Procedia - Social and Behavioral Sciences 180 ( 2015 )

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

The stages of event extraction

Update on Soar-based language processing

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Hindi-Urdu Phrase Structure Annotation

Evolution of Symbolisation in Chimpanzees and Neural Nets

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Derivational and Inflectional Morphemes in Pak-Pak Language

Pseudo-Passives as Adjectival Passives

Taxonomy of the cognitive domain: An example of architectural education program

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Accurate Unlexicalized Parsing for Modern Hebrew

Underlying and Surface Grammatical Relations in Greek consider

Procedia - Social and Behavioral Sciences 200 ( 2015 )

International Journal of Innovative Research and Advanced Studies (IJIRAS) Volume 4 Issue 5, May 2017 ISSN:

Copyright and moral rights for this thesis are retained by the author

Developing a large semantically annotated corpus

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Words come in categories

Today we examine the distribution of infinitival clauses, which can be

"f TOPIC =T COMP COMP... OBJ

The Role of the Head in the Interpretation of English Deverbal Compounds

Procedia - Social and Behavioral Sciences 197 ( 2015 )

The Smart/Empire TIPSTER IR System

Syntactic Dependencies for Multilingual and Multilevel Corpus Annotation

The Discourse Anaphoric Properties of Connectives

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Transcription:

Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 58 (2015 ) 363 370 Second International Symposium on Computer Vision and the Internet(VisionNet 15) Adapting Stanford Parser s Dependencies to Paninian Grammar s Karaka relations using VerbNet Manish Kumar a, Mohit Dua b a M. Tech Scholar, NIT Kurukshetra, India b Assistant Professor, NIT Kurukshetra, India Abstract Paninian Grammar framework provides a better solution for parsing free word order languages and Stanford Parser gives the dependencies for English language (Fixed word order language). In this paper, we map the Stanford parser dependencies to karaka relations. By using VerbNet, we capture the syntax and semantics of verb. We present the issues that encounter while doing adaptation and proposed solution to overcome these problems. We are using Hindi Dependency parser for verification of results. With this adaptation of Stanford Parser, an English-Hindi parallel treebank can be created. 2015 The Authors. Published by by Elsevier B.V. B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of the Second International Symposium on Computer Vision and the Peer-review Internet(VisionNet 15). under responsibility of organizing committee of the Second International Symposium on Computer Vision and the Internet (VisionNet 15) Keywords: Stanford Parser;VerbNet; Hindi Dependency Parser; Paninian Grammar. 1. Introduction Paninian theory was given by Panini for Sanskrit language. In Paninian grammar framework, a sentence is treated as modifier-modified relations. In a sentence, Karaka is the name given to the relation substituting between a verb and noun 1. There are basically six types of Karaka relations. Karaka relations k1 k2 k3 k4 Karta, carries out the action. Karma, represents the object/patient of the verb Karna, represents the instrument of the action Sampradana, is the beneficiary of the action 1877-0509 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of the Second International Symposium on Computer Vision and the Internet (VisionNet 15) doi:10.1016/j.procs.2015.08.033

364 Manish Kumar and Mohit Dua / Procedia Computer Science 58 ( 2015 ) 363 370 k5 k7 Apadaan, represents source of the activity Adhikarana, is the locus of the karma Some other relations also exist that shows dependency relations indirectly. Like k1s which means karta samanadhikarana which resembles to karta. It is well known that dependency grammar is well-suited for free word 1, 2, 3, 4 order language. PG (Paninian Grammar) has been successfully applied to Indian languages 5, and it is argued that PG is suited to languages that have free word order languages. In 1997, Bharati et al., states that PG can be applied to English 6. Initially, Begum et al. in 2008, gives the dependency annotation scheme for Indian languages using PG framework 7. This is done by mapping between post-positions and Karakas. Later, Vaidya et al., presented an annotation scheme for English based on Karakas 8. H. Chaudhry et al., discussed the issues in building English dependency treebank 9 and divergences between English and Hindi parallel dependency treebank with PG 10. In our proposed solution, we are using Stanford parser 13, Hindi dependency parser with Anncorra guidelines 12 and VerbNet 11. Stanford parser takes English language sentences as input and gives output in terms of typed dependencies between different words of sentence. The output is also shown in tagging, parsing, collapsed dependencies. Hindi Full Parser gives the analysis of a sentence in terms of syntactic dependency relations using the information obtained from shallow parser as input. Suppose an example: 1. कत ब म ज पर रख ह kiwabe meja para raki hem In fig. 1, Dependency tree is shown, given by the Hindi full parser when we parse the above Hindi sentence. raki hem k1 KkiwAbe k7 meja para Fig. 1. Dependency tree. VerbNet is a lexicon of approximately 5800 English verbs, and groups verbs according to shared syntactic behaviors, thereby revealing generalizations of verb behavior. Verb plays an central role in sentence construction. So, with the use of semantics and syntax of verbs, we will find the karaka relations. 2. Problem description The annotation scheme for English using PG is challenging task. Identifying Karaka relations in English is difficult due to its word order. We are using Stanford parser for parsing the English sentences. To find the Karaka relations, we follow Anncorra guidelines 12. Then we look, what are the issues that encounters while we do mapping the output of Stanford parser to karaka relations. These issues are below. 2.1. Not direct mapping We cannot direct map subject-object-verb dependencies to Karaka relations. There are some sentences where direct mapping works fine but not for all the sentences. For this, we compare the dependency of word given by parser and the corresponding karaka relations given by Paninian framework. Here are some examples for karaka relations. For each karaka relation, example sentence is shown followed by the typed dependency given by the Stanford parser. k1: Karta denotes the agent who is doing the action for the verb. 2. Ram Killed the Rawan in Lanka nsubj(killed-2, Ram-1)

Manish Kumar and Mohit Dua / Procedia Computer Science 58 ( 2015 ) 363 370 365 This nsubj dependency is showing that Ram is subject and we map subject to k1, which is correct. Now consider an another sentence: 3. Rice-pudding was eaten by Ram dobj(by-4, Ram-5) Here, dobj is showing that Ram is object, which is not true. Hence, there is no direct mapping exist. k2: Here, we have shown that there are many dependencies that can be mapped to k2. Dependencies like nsubjpass, dobj, iobj, prep_for can be used for k2. 4. Dole was defeated by Clinton nsubjpass(defeated-3, Dole-1) 5. What does S.O.S. stand for? prep_for(stand-4, What-1) From here, we can conclude that there is no single and unique dependency that can be direct mapped to k2 k4: k4 is the beneficiary of the action, means for whom the action is carried out. 6. What famous model was married to Billy Joel? prep_to(married-5, Joel-8) Joel is the beneficiary in sentence (6) so Prep_to is k4 and in sentence (7) country is a place not beneficiary, so cannot be k4, which contradicts. 7. What country do the Galapagos Islands belong to? det(country-2, What-1) prep_to(belong-7, country-2) For Adhikarana(k7): k7 shows the location of the karta or karma. K7 can be drawn from prep_on dependencies. 8. Books are on the table prep_on(are-2, table-5) In the above sentence (8), table is the location of books. So Prep_on can be mapped to k7. 9. On average, how many miles are there to the moon? prep_on(are-7, average-2) Here, k7 is average which is not corresponds to any location. So, Prep_on dependencies cannot be mapped to k7 always. 2.2. Copula verbs In English, there is a concept of copula verbs. Is, am, are, was, were, and are used as copula verbs. They link the subject to a predicate (such as a subject complement). 10. What are some interesting facts and information about dogsledding? Root(ROOT-0, what-1) Cop(what-1, are-2) From above dependencies we can see that root is What because of the copula verb are. But in PG framework root is verb always. So, while using PG, we have to take care of these copula verbs. 3. Proposed solution From above examples, we have seen that one karaka relations can be mapped to many Stanford typed dependencies. On basis of these, we have prepared a karaka mapping table. This is shown below for some karaka relations in Table 1. For each of the karaka relation we have to select one dependency for a particular sentence. We have divide the all sentences into two types i.e. verb dependent and verb independent.

366 Manish Kumar and Mohit Dua / Procedia Computer Science 58 ( 2015 ) 363 370 Table 1.Karaka Mapping table. k1 k2 k3 k4 k5 k7 agent prep_by nsubj prep_for prep_on xsubj prep_of attr iobj nsubj prep_for nsubjpass prep_on xcomp ccomp dobj prep_with Below are the steps of our proposed solution. prep_to nsubjpass nsubj iobj prep_from nsubj nsubjpass 3.1. Check whether karaka relations are verb independent or dependent prep_on tmod dobj In many cases, sentences having copula verbs are verb independent. Let us see some examples sentences. We are handling the copula verbs by exchanging it with root. Like in the below sentence, verb will be is instead of what. 11. What is fedora? root(root-0, What-1) cop(what-1, is-2) nsubj(what-1, Fedora-3) In the above sentence, the word what is not dependent of verb is, it resembles to the k1. From here, we will determine the k1s karaka relation. 3.2. Verb dependent cases For verb dependent type of sentences, we are using VerbNet. Below are the steps: 1) Find the verb of the sentence given by Stanford Parser. Use morph analyzer or Hindi shallow parser to find actual root word of verb if verb contains any suffix such as ing etc. 2) Find the corresponding verb class of that verb from VerbNet. 3) Now, we have to find the verb frame or Description number. For finding this, we are matching the syntax of sentence that is given by Stanford parser with the syntax in VerbNet for a particular verb in Description tag. 4) After find syntax, we look at the values of it in corresponding verb class. 5) Now, we compare these syntax values with karaka mapping table. Each VerbNet class has an ID and members that have the behavior as the base class. Again these members can have subclasses. In the second step, we have to find the base verb class in which the particular verb is used as a member or as a subclass. VerbNet has defined some types of frames (like basic transitive, resultative etc) for which that particular verb is used and each type of frames have different syntax. This syntax is also different for different verbs. We can differentiate between the NP V NP PP. instrument and NP V NP PP. resultative type of frames by preposition. If the preposition with is used, then it is instrument and if preposition to is used, then it is a result. Let s have an example to explain all these above steps. 12. The student needs a book from library The following are the parse tree and typed dependencies of above sentence. Parse tree: (ROOT (S (NP (DT The) (NN student)) (VP (VBZ needs) (NP (DT a) (NN book)) (PP (IN from) (NP (DT the) (NN library))))))

Manish Kumar and Mohit Dua / Procedia Computer Science 58 ( 2015 ) 363 370 367 Typed dependencies: det(student-2, The-1) nsubj(needs-3, student-2) root(root-0, needs-3) det(book-5, a-4) dobj(needs-3, book-5) det(library-8, the-7) prep_from(needs-3, library-8) According to our first step, our main verb in sentence (12) is need which is shown by root dependency. Now, we will find the base class of need verb in VerbNet. The base class of need is require. The syntax of our example is: NP VP NP PP NP This is easily visible in the above parse tree. Now we will match this syntax structure to require verb syntax structures. Below is the some part of require verb class that matches to above syntax. <DESCRIPTION descriptionnumber="8.1" primary="np V NP PP.source" secondary="np-pp; from-pp" xtag="0.2" /> - <SYNTAX> - <NP value="pivot"> <VERB /> - <NP value="theme"> - <PREP value="from"> <SELRESTRS /> </PREP> - <NP value="source"> </SYNTAX> In description tag, primary= NP V NP PP. Source. Here PP. Source shows that the word which is followed by preposition is the source of the activity. We have stored these words like pivot etc to its specified karaka relations in a table. From that table we are matching both syntax structures. This is explained in the following fig 2. NP V NP PP Source Pivot verb theme from source k1 root k2 k5 NP VP NP PP NP 3.3. Handling of control verbs Fig. 2. Matching of Syntax. In the similar way as we done above, control verbs can be handled. Promise and persuade are two control verbs. Let s have an example sentence with promise verb. It is a subject control verb.

368 Manish Kumar and Mohit Dua / Procedia Computer Science 58 ( 2015 ) 363 370 13. Ram promised Mohan to leave. nsubj(promised-2, Ram-1) root(root-0, promised-2) iobj(promised-2, Mohan-3) det(house-5, the-4) dobj(promised-2, house-5) Parse tree: (ROOT (S (NP (NNP Ram)) (VP (VBD promised) (NP (NNP Mohan)) (NP (DT the) (NN house))))) promised k1 k4 k2 Ram Mohan the house Fig. 3. Dependency tree of sentence Fig.3 shows the karaka relation of this sentence. We can clearly see the contradiction of dependencies for the word Mohan. Stanford parser is showing it is indirect object (k2) and in fig.3 it is k4. Now, we will solve this by using VerbNet. Promise is the Main verb and its structure is shown below: <DESCRIPTION descriptionnumber="0.2" primary="np V NP NP" secondary="np-np" xtag="0.2" /> - <EXAMPLES> <EXAMPLE>Ram promised Mohan the house</example> </EXAMPLES> - <SYNTAX> - <NP value="agent"> <VERB /> - <NP value="recipient"> - <NP value="topic"> </SYNTAX> We map these NP values to karaka relations as shown in the following fig.4. We map Mohan to k4 because k4 is always a recipient of action done by the verb. The mappings are: NP-k1-agent V-verb NP-K4-recipient NP-K2-topic 4. Results For validation of our system output, we are using Hindi Full Parser that generates the karaka information for a Hindi sentence. Firstly, for a English sentence, we map its Stanford dependencies to Karaka relations using our

Manish Kumar and Mohit Dua / Procedia Computer Science 58 ( 2015 ) 363 370 369 approach. NP V NP NP Agent verb recipient topic K1 root k4 k2 NP VP NP NP Fig. 4. Matching of Syntax. Then, we use Hindi Full Parser output of corresponding English Sentence. Finally, we match the output. If the output matches, then our mapping is done correctly. For example, consider the previous sentence (12). Using our approach we conclude the following: Karta (k1) : Student, Karma (k2): book, Aapadaan (k5): library And now, the corresponding Hindi sentence is: वध थ क प त क लय स एक कत ब च हए vixarwi ko puswakalaya se eka KiwAba cahie For this sentence, the output of Hindi Full Parser is shown below: cahie k1 k2 k5 vixarwi ko eka KiwAba puswakalaya se Fig.5. Dependency tree. As we can see in the above fig. 5, our approach output matches the Hindi Parser s output. For result evaluation, we have taken 1000 English sentences. We apply our procedure and mapping to karaka relations for each sentence and then compare the output to its corresponding Hindi Parser output. The percentage is calculated for each karaka relation separately. For this, firstly the total number of sentences are taken in which that karaka is involved and then the number of sentences that mapped correctly by our approach. The following results are obtained that are shown in table 2. Table 2.Percentage of karaka relation that are mapped correctly k1 k2 k3 k4 k5 k7 69.7% 57.7% 72.2% 44.7% 51.6% 74.1% References 1. Bharati A, Chaitanya V, Sangal R. Natural language processing: A paninian perspective, Prentice-Hall, New Delhi; 1995. 2. Shieber S M. Evidence against the context-freeness of natural language, Linguistics and Philosophy; 1985. p. 334-343. 3. Hudson R. Word Grammar, Basil Blackwell; 108 Cowley Rd,Oxford, OX4 1JF; England; 1984. 4. Mel cuk I. Dependency Syntax: Theory and Practice, State University, Press of New York; 1988. 5. Bharati A,Sangal R. Parsing free word order languages in Paninian Framework, Proceedings of Annual meeting of Association for Computational linguistics; 1993. 6. Bharati A, Bhatia M, Chaitanya V, Sangal R. Paninian grammar framework applied to English, South Asian Language Review; 1997.

370 Manish Kumar and Mohit Dua / Procedia Computer Science 58 ( 2015 ) 363 370 7. Begum R, Hussain S, Dhwaj A, Sharma DM, Bai. L, Sanghal R. Dependency annotation scheme for Indian languages, Proceedings of IJCNLP-2008. 8. Vaidya A, Husain S, Mannem P, Sharma DM. A Karaka based annotation scheme for english, Computational Linguistics and Intelligent Text Processing, Springer; 2009.p. 41 52. 9. Chaudhary H, Sharma DM. Annotation and Issues in Building an English dependency treebank, Proceedings of ICON-2011: 9th International Conference on Natural Language Processing, Chennai; 2011. 10.Chaudhary H, Sharma H, Sharma DM. Divergences in English-Hindi Parallel Dependency Treebank, Proceedings of the Second International Conference on Dependency Linguistics, Prague; 2013.p. 33 40. 11. Kipper K, Dong HT, Palmer M. Class-based construction of a verb lexicon, American association for artificial intelligence; 2000. 12. Bharati A, Sharma DM, Hussain S, Bai L, Begum R, Sangal R. Anncorra: Treebanks for Indian languages, gudelines for annotating Hindi treebank (version 2), LTRC, IIIT Hyderabad, India; 2009. 13. De Marneffe MC, Christopher D. Manning. Stanford typed dependencies manual, September; 2008.