Morphological Generator for Tamil. and Sobha Lalitha Devi, AU-KBC Research Centre

Similar documents
BULATS A2 WORDLIST 2

Words come in categories

A Simple Surface Realization Engine for Telugu

Syntactic types of Russian expressive suffixes

CS 598 Natural Language Processing

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Presentation Exercise: Chapter 32

Derivational and Inflectional Morphemes in Pak-Pak Language

Ch VI- SENTENCE PATTERNS.

Using a Native Language Reference Grammar as a Language Learning Tool

On the Notion Determiner

Emmaus Lutheran School English Language Arts Curriculum

Developing a TT-MCTAG for German with an RCG-based Parser

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Building an HPSG-based Indonesian Resource Grammar (INDRA)

ScienceDirect. Malayalam question answering system

Parsing of part-of-speech tagged Assamese Texts

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

C.A.E. LUSCHNIG ANCIENT GREEK. A Literary Appro a c h. Second Edition Revised by C.A.E. Luschnig and Deborah Mitchell

Writing a composition

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

HinMA: Distributed Morphology based Hindi Morphological Analyzer

French II Map/Pacing Guide

Comprehension Recognize plot features of fairy tales, folk tales, fables, and myths.

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Using dialogue context to improve parsing performance in dialogue systems

THE FU CTIO OF ACCUSATIVE CASE I MO GOLIA *

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

The suffix -able means "able to be." Adding the suffix -able to verbs turns the verbs into adjectives. chewable enjoyable

5/29/2017. Doran, M.K. (Monifa) RADBOUD UNIVERSITEIT NIJMEGEN

CORPUS ANALYSIS CORPUS ANALYSIS QUANTITATIVE ANALYSIS

Constructing Parallel Corpus from Movie Subtitles

Minding the Absent: Arguments for the Full Competence Hypothesis 1. Abstract

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

UKLO Round Advanced solutions and marking schemes. 6 The long and short of English verbs [15 marks]

Modeling full form lexica for Arabic

Development of the First LRs for Macedonian: Current Projects

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Control and Boundedness

What the National Curriculum requires in reading at Y5 and Y6

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Course Outline for Honors Spanish II Mrs. Sharon Koller

The Acquisition of Person and Number Morphology Within the Verbal Domain in Early Greek

Today we examine the distribution of infinitival clauses, which can be

THE VERB ARGUMENT BROWSER

Phenomena of gender attraction in Polish *

Sample Goals and Benchmarks

Inflection Classes and Economy

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

The Use of Inflectional Suffixes by Third Year English Undergraduates at the College of Education, University of Mosul Adday Mahmood Adday (1)

Hindi Aspectual Verb Complexes

Common Core ENGLISH GRAMMAR & Mechanics. Worksheet Generator Standard Descriptions. Grade 2

LING 329 : MORPHOLOGY

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

Character Stream Parsing of Mixed-lingual Text

Coast Academies Writing Framework Step 4. 1 of 7

California Department of Education English Language Development Standards for Grade 8

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

2017 national curriculum tests. Key stage 1. English grammar, punctuation and spelling test mark schemes. Paper 1: spelling and Paper 2: questions

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

International Journal of Informative & Futuristic Research ISSN (Online):

In Udmurt (Uralic, Russia) possessors bear genitive case except in accusative DPs where they receive ablative case.

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

The Role of the Head in the Interpretation of English Deverbal Compounds

Participate in expanded conversations and respond appropriately to a variety of conversational prompts

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Unit 8 Pronoun References

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

SENTENCE PARTS AND PATTERNS

Memory-based grammatical error correction

A Computational Evaluation of Case-Assignment Algorithms

A Syllable Based Word Recognition Model for Korean Noun Extraction

Semantic Modeling in Morpheme-based Lexica for Greek

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

Natural Language Processing. George Konidaris

Lexical specification of tone in North Germanic

Greeley-Evans School District 6 French 1, French 1A Curriculum Guide

More Morphology. Problem Set #1 is up: it s due next Thursday (1/19) fieldwork component: Figure out how negation is expressed in your language.

First Grade Curriculum Highlights: In alignment with the Common Core Standards

INSTANT VOCABULARY 6-10

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Detecting English-French Cognates Using Orthographic Edit Distance

Advanced Grammar in Use

Tutorial on Paradigms

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

Grammars & Parsing, Part 1:

Proposed syllabi of Foundation Course in French New Session FIRST SEMESTER FFR 100 (Grammar,Comprehension &Paragraph writing)

English For All. Episode Guide. A General Description of EFA and A Guide to the Content and Learning Elements of Each Episode

1. Introduction. 2. The OMBI database editor

INTRODUCTION TO MORPHOLOGY Mark C. Baker and Jonathan David Bobaljik. Rutgers and McGill. Draft 6 INFLECTION

Specifying a shallow grammatical for parsing purposes

A First-Pass Approach for Evaluating Machine Translation Systems

SAMPLE. Chapter 1: Background. A. Basic Introduction. B. Why It s Important to Teach/Learn Grammar in the First Place

Phonological and Phonetic Representations: The Case of Neutralization

Transcription:

Morphological Generator for Tamil -Menaka S, Vijay Sundar Ram and Sobha Lalitha Devi, AU-KBC Research Centre

Overview Tamil Morphology Key ideas Morphosyntax and Morphophonemics Finite State Automata Morphological generator Evaluation

Morphological Generator What is it? Tool used in NLP What does it do? Root word -> Inflected form Who needs it? Inflecting languages Where is it used? MT, IR

Methods used Rule-based method (Ganapathiraju and Levin 2006) Corpus-based method (Lantin et al, Dasgupta and Ng, 2007) Finite-state method (Beesley and Karttunnen. 2003)

Tamil Morphology key ideas Agglutinative Suffixes attach in series to the root. arapi + katal + in + araci => arapikkatalinaraci 'Arabian' + 'sea' + GEN + 'queen' => 'Queen of the Arabian Sea' Morphosyntax Order in which suffixes attach to the root. Morphophonemics Changes that take place during suffixation.

MorphoSyntax of Lexical Categories - Nouns Nouns (include pronouns)- Take Inflectional and Derivational Suffixes. Root + {number} + {case} + {DISJ/COOR/EMPH} + {PSP} + {EMP} + {INT/SUPP} paiyan-kal-ai-a: => paiyankalaiya: boy -PL-ACC-INT => the boys(obj)? Derivation of verbs, adjectives, adverbs from nouns is possible. azaku + a:na => azaka:na 'beauty' + ADJ => 'beautiful'

MorphoSyntax of Lexical Categories Verbs (1) Finite Verbs Root + Tense + PNG + {DISJ/EMPH/EMP/INT/SUPP} pa:r-tt-a:n-a:m => pa:rtta:na:m see -PST-3SM-SUPP => it seems (he) saw Root + INF + NEGVERB + {DISJ/COOR/EMPH/EMP/INT/SUPP } pa:r-a-illai-a:m => pa:rkkavillaiya:m see -INF-NEGVERB-SUPP => it seems (x) did not see

MorphoSyntax of Lexical Categories Verbs (2) Relative participle Root + Tense/NEG + RP Pronominalisation pa:r-tt-a => pa:rtta see -PST-RP => who saw pa:r-tt-a-avan => pa:rttavan see -PST-RP-he => he who saw

MorphoSyntax of Lexical Categories Verbs (3) Non-Finite Verbs root + {NEG} + INF/VBP/COND/CONC/HORT/OPT + {DISJ/COOR/EMPH} + {EMP} + {INT/SUPP} pa:tu-a => pa:ta 'sing -INF => to sing Derivation of nouns, adjectives and adverbs.

Morphophonemics Changes that occur when a suffix attaches to a root word. Change depends on the nature of the end letter of the root word the nature of the start letter of the suffix ma:la:-a:l => ma:la:va:l pal-a:l => palla:l Mala -INS => by Mala tooth -INS => using tooth

Paradigm-based approach. Follows from the morphophonemic changes. Those root words which behave similar are grouped. Paradigmatic classification for Tamil 36 noun paradigms and 34 verb paradigms ya:ci beg takes tt/kkir/pp as the three tense markers. viya 'wonder' takes Ńt/kkiR/pp as the three tense markers. ya:ci-tt-a:n beg -PST-3SM viya-ńt-a:n wonder -PST- 3SM ya:ci-kkir-a:n beg -PRE-3SM viya-kkir-a:n wonder -PRE- 3SM ya:ci-pp-a:n beg -FUT-3SM viya-pp-a:n wonder -FUT- 3SM

Finite State Automata (1) A Finite-state automaton is a model of behavior consisting of a finite number of states, transitions from each state to another state and actions at each transition. Morphological generator moves from one state to another as each attribute is applied to the stem and the suffix is generated. paiyan-kal-ai-a: => paiyankalaiya: boy -PL-ACC-INT => the boys(obj)?

Finite State Automata (2) Input: paiyan, Plural, Accusative, Interrogative. From To Attribute Form Finalform State State Generated 0 1 PL paiyankal paiyankal 1 2 ACC paiyankalai paiyankalai 2 3 INT paiyankalaia: paiyankalaiya:

Design of MorphGenerator for Tamil A finite state automaton Moves from one state to another while attaching suffixes. End state produces the desired output Resource files Lexicon Suffix table State table Morphophonemic rules

Evaluation Experiment 1 2556 input words with noun roots spanning different paradigms and different attributes were tested. No. of True Positives (TP) No. True Negative s(tn) No. of False Positives (FP) No. of False negatives (FN) Precisio n TP/(TP +FP) Recall TP/(TP + FN) F- measur e 2413 115 5 23 0.997 0.99 0.99

Evaluation Experiment 2 19152 input words with verb roots spanning all the paradigms and various attributes were tested. No. of True Positives (TP) No. True Negative s(tn) No. of False Positives (FP) No. of False negatives (FN) Precisio n TP/(TP +FP) Recall TP/(TP + FN) F- measur e 17361 1451 38 302 0.997 0.98 0.99

Thank You!