Supporting Tulu language written in the Kannada script. 1. Introduction

Similar documents
Arabic Orthography vs. Arabic OCR

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

MARK 12 Reading II (Adaptive Remediation)

On-Screen Font in Telugu

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Detecting English-French Cognates Using Orthographic Edit Distance

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE

Problems of the Arabic OCR: New Attitudes

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Using a Native Language Reference Grammar as a Language Learning Tool

Sri Lanka. On the scale of a world map, Sri Lanka previously known as Ceylon appears to hang like a Pearl over the Indian Ocean.

August 22, Materials are due on the first workday after the deadline.

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

CX 105/205/305 Greek Language 2017/18

International Business BADM 455, Section 2 Spring 2008

DIBELS Next BENCHMARK ASSESSMENTS

Physics 270: Experimental Physics

Summer Assignment AP Literature and Composition Mrs. Schwartz

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

DEPARTMENT OF EXAMINATIONS, SRI LANKA GENERAL CERTIFICATE OF EDUCATION (ADVANCED LEVEL) EXAMINATION - AUGUST 2016

Phonological and Phonetic Representations: The Case of Neutralization

Ohio s Learning Standards-Clear Learning Targets

CLASS EXPECTATIONS Respect yourself, the teacher & others 2. Put forth your best effort at all times Be prepared for class each day

First Grade Standards

Mandarin Lexical Tone Recognition: The Gating Paradigm

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Fisk Street Primary School

GCSE. Mathematics A. Mark Scheme for January General Certificate of Secondary Education Unit A503/01: Mathematics C (Foundation Tier)

CEFR Overall Illustrative English Proficiency Scales

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Procedia - Social and Behavioral Sciences 146 ( 2014 )

CROSS-LANGUAGE MAPPING FOR SMALL-VOCABULARY ASR IN UNDER-RESOURCED LANGUAGES: INVESTIGATING THE IMPACT OF SOURCE LANGUAGE CHOICE

English Policy Statement and Syllabus Fall 2017 MW 10:00 12:00 TT 12:15 1:00 F 9:00 11:00

An Interactive Intelligent Language Tutor Over The Internet

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Transliteration Systems Across Indian Languages Using Parallel Corpora

Using Proportions to Solve Percentage Problems I

TROY UNIVERSITY MASTER OF SCIENCE IN INTERNATIONAL RELATIONS DEGREE PROGRAM

Date : Controller of Examinations Principal Wednesday Saturday Wednesday

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

The Journey to Vowelerria VOWEL ERRORS: THE LOST WORLD OF SPEECH INTERVENTION. Preparation: Education. Preparation: Education. Preparation: Education

Lexical phonology. Marc van Oostendorp. December 6, Until now, we have presented phonological theory as if it is a monolithic

1. Introduction. 2. The OMBI database editor

New Features & Functionality in Q Release Version 3.1 January 2016

Characteristics of the Text Genre Informational Text Text Structure

Florida Reading Endorsement Alignment Matrix Competency 1

MANAGERIAL LEADERSHIP

The Bruins I.C.E. School

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

GCSE English Language 2012 An investigation into the outcomes for candidates in Wales

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Alignment of Australian Curriculum Year Levels to the Scope and Sequence of Math-U-See Program

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Student Perceptions of Reflective Learning Activities

Experience College- and Career-Ready Assessment User Guide

Learning Methods in Multilingual Speech Recognition

CAN PICTORIAL REPRESENTATIONS SUPPORT PROPORTIONAL REASONING? THE CASE OF A MIXING PAINT PROBLEM

Creating a Test in Eduphoria! Aware

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

Strategies for Solving Fraction Tasks and Their Link to Algebraic Thinking

PHYSICS 40S - COURSE OUTLINE AND REQUIREMENTS Welcome to Physics 40S for !! Mr. Bryan Doiron

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Philosophy in Literature: Italo Calvino (Phil. 331) Fall 2014, M and W 12:00-13:50 p.m.; 103 PETR. Professor Alejandro A. Vallega.

TAI TEAM ASSESSMENT INVENTORY

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Student Handbook. This handbook was written for the students and participants of the MPI Training Site.

Guidelines on how to use the Learning Agreement for Studies

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

EFFECTIVE CLASSROOM MANAGEMENT UNDER COMPETENCE BASED EDUCATION SCHEME

End-of-Module Assessment Task

National Literacy and Numeracy Framework for years 3/4

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

SARDNET: A Self-Organizing Feature Map for Sequences

CAVE LANGUAGES KS2 SCHEME OF WORK LANGUAGE OVERVIEW. YEAR 3 Stage 1 Lessons 1-30

Psychology of Speech Production and Speech Perception

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Competition in Information Technology: an Informal Learning

Richardson, J., The Next Step in Guided Writing, Ohio Literacy Conference, 2010

COMM370, Social Media Advertising Fall 2017

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 3 March 2011 ISSN

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 12 : 2 February 2012 ISSN

A heuristic framework for pivot-based bilingual dictionary induction

ANGLAIS LANGUE SECONDE

The ABCs of O-G. Materials Catalog. Skills Workbook. Lesson Plans for Teaching The Orton-Gillingham Approach in Reading and Spelling

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Using Online Communities of Practice for EFL Teacher Development

Emporia State University Degree Works Training User Guide Advisor

Global English-related digraphia and Roman-Cyrillic biscriptal practices

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge

Focus of the Unit: Much of this unit focuses on extending previous skills of multiplication and division to multi-digit whole numbers.

GENERAL COMMENTS Some students performed well on the 2013 Tamil written examination. However, there were some who did not perform well.

CREATE YOUR OWN INFOMERCIAL

Transcription:

Supporting Tulu language written in the Kannada script Shriramana Sharma, jamadagni-at-gmail-dot-com, India 2012-Jun-07 1. Introduction In addition to the regular vowel repertoire of other South Indian languages a, ā, i, ī, u, ū, e, ē, ai, o, ō, au (and the Sanskrit-based vocalic ṛ etc) the Tulu languags has four additional vowels. In the Tulu Lexicon, these are represented in romanization as ụ, ụụ, ɛ, ɛɛ. In this document, however, we shall designate the long vowels as ụ and ɛ after the ISO pattern. In the Kannada orthography for Tulu adopted by the Tulu Lexicon, these additional vowels are written as follows: ụ ụ ɛ ɛ Independent vowels : Dependent vowels : Scans from the Tulu Lexicon (courtesy Vaishnavi Murthy) follow as attestation. List of Tulu Independent Vowels: Special vowels in Tulu: 1

Given this existing attested practice, this document will discuss how it may be implemented in Unicode and what problems are associated therewith. 2. Independent Vowels As far as the independent vowels are concerned, there is not much problem. The existing independent vowels 0C85 KANNADA LETTER A ಅ and 0C8E KANNADA LETTER E ಎ may be combined with 0CCD KANNADA SIGN VIRAMA and 0CD5 KANNADA LENGTH MARK appropriately to achieve the desired effect as follows: 0C85 ಅ + 0CCD = 0C85 ಅ + 0CCD + 0CD5 = 0C8E ಎ + 0CCD = 0C8E ಎ + 0CCD + 0CD5 = The only issues involved are that rendering engines (especially OpenType-based ones) should recognize these sequences as valid and not spew out dotted circles, and that fonts (i.e. Kannada fonts desiring to cater to Tulu language writing) should provide the appropriate glyphs and glyph substitutions. 3. Dependent Vowels As for the dependent vowels, supporting ɛ and ɛ is also not a serious issue. The same characters 0CCD KANNADA SIGN VIRAMA and 0CD5 KANNADA LENGTH MARK can be appropriately combined with the use of 0CC6 KANNADA VOWEL SIGN E as follows: 0C95 ಕ + 0CC6 + 0CCD = 0C95 ಕ + 0CC6 + 0CCD + 0CD5 = Again, it is merely a matter of the rendering engine accepting the sequences and the font providing appropriate glyph substitutions. However, the matter of the vowels ụ and ụ is quite different. What we see in the attestation is that they have re-used the Kannada script virama sign as a vowel marker to denote this sound. While this requires almost no additional glyph/engine design, and the appropriate sequences might seem obvious: 2

0C95 ಕ + 0CCD = ಕ 0C95 ಕ + 0CCD + 0CD5 = ಕ the implementation is not straightforward in the case of the short vowel. While the long vowel can be represented with no issues, representing the short vowel by the 0CCD KANNADA SIGN VIRAMA alone will cause problems that are clear when one examines entire words (rather than isolate sequences). Observe some examples from the Lexicon: The examples circled in red have the special vowel ụ in word medial position: akụḍụ and ajakụlụ. How are these Tulu words written in the Kannada script to be encoded? It is not possible to simply use just 0CCD KANNADA SIGN VIRAMA as follows: 0C85 ಅ + 0C95 ಕ + 0CCD + 0CA1 ಡ + 0CCD?= ಅಕ ಡ because as is the standard with Indic scripts, the sequence 0C95 0CCD 0CA1 being of the pattern CONSONANT + VIRAMA + CONSONANT will produce combining behaviour: 0C85 ಅ + 0C95 ಕ + 0CCD + 0CA1 ಡ + 0CCD = ಅ which is obviously not what is desired. It is also not possible to suggest that a simplified Kannada font without the required combining forms should do for Tulu, because language content should be represented by encoded text and not by font changes and because Tulu itself uses consonant clusters which are represented by the regular stacking behaviour of the Kannada script, as seen in the examples above circled in blue. Therefore one is faced with a dilemma as to what to do. 3

4. Possible solutions to the problem with dependent vowel ụ 4.1. Invisible spacing characters While inserting 200C ZERO WIDTH NON-JOINER to prevent the combining behaviour would achieve the desired appearance: 0C85 ಅ + 0C95 ಕ + 0CCD + 200C + 0CA1 ಡ + 0CCD = ಅಕ ಡ this is an insufficient solution because this character is default-ignorable in text processes such as searching and collation, resulting in minimal pairs (such as if there existed a word akḍụ* which differs from akụḍụ only by a medial ụ) being mutually indistinguishable. This would go against established collation orders for Tulu orthography in Kannada script and is hence unacceptable. 2060 WORD JOINER (which was a choice as it does not cause a line-break opportunity unlike the other invisible space characters) in lieu of 200C ZWNJ was considered but is inappropriate because, as per TUS 6.1 p 546 (p 576 of PDF): inserting a word joiner between two characters has no effect on their ligating and cursive joining behavior. The word joiner should be ignored in contexts other than word or line breaking. it seems it cannot prevent the combining behaviour of the CONSONANT + VIRAMA + CONSONANT combination (although it is unclear whether the above statement of TUS applies to Indic contexts too), and furthermore it also appears to be default ignorable (as it has GC=Cf; vide http://www.unicode.org/reports/tr44/#default_ignorable_code_point) which again makes it no different than ZWNJ. 4.2. Alternative written representation for ụ based on u It is clear* that the ụ sound of Tulu is morphologically related to the regular u and is in fact cognate to the kuṟṟiyalukaram of Tamil or the saṃvṛtōkāram of Malayalam (except for the fact that in Tulu it is phonemic rather than just an allophone of /u/). * Old Kannaḍa has the inherited ten vowels /i e a o u ī ē ā ō ū/.... Tuḷu has the same core system but it has added ɛ (front low unrounded vowel, historically from -ay word-finally) and ï (high central unrounded) (Bhat 1998), which 4

Given this, in retrospect it might have been better if the Tulu Lexicon editors (who developed the Kannada script orthography for Tulu) had paralleled the writing of the same sound in old Malayalam (and in pedagogical old Tamil) and defined ụ to be written in the Kannada script as VOWEL SIGN U + VIRAMA just like in Malayalam the saṃvṛtōkāram is written as VOWEL SIGN U + VIRAMA. [See TUS 6.1 p 319.] This would mirror the other special vowel ɛ which is written as VOWEL SIGN E + VIRAMA. Further, the long vowel ụ could also be written simply by adding the length mark to follow the existing pattern. Observe: 0C95 ಕ + 0CC1 + 0CCD = 0C95 ಕ + 0CC1 + 0CCD + 0CD5 = (On a typographic note, it seems that here the vowel sign U should probably take its allograph [which it takes with PA and VA] to avoid overlapping with the virama sign.) It is obvious that if the dependent ụ were to thus be written using VOWEL SIGN U + VIRAMA, the problem caused by using the virama alone for ụ would be avoided, as now the sequence CONSONANT + VIRAMA + CONSONANT will not be used except when there is actually a consonant cluster in the underlying language content. Of course, in this case, the independent vowel also would be changed accordingly: 0C89 ಉ + 0CCD = 0C89 ಉ + 0CCD + 0CD5 = Now while this is an entirely viable option technically and linguistically, it requires buy-in from the Tulu scholars and user community, which would of course be based on the degree of their need of representing their language in Kannada Unicode. mainly occur finally. ï and u result from a split of older /u/ and ï corresponds to the enunciative vowel of the other Southern languages. The long counterparts of ï and ɛ are extremely restricted. The Dravidian Languages, Bhadriraju Krishnamurthy, Cambridge University Press, Cambridge, 1 st South Asian Edition, 2003, ISBN: 978-0- 521-77111-5, p 52. On the enunciative vowel, ibid p 49: A short /u/ following a word-final stop (in Modern Tamil any consonant) is phonetically a back unrounded vowel [ï] which was called the enunciative vowel. 5

4.3. Alternative written representation for all special vowels All the above methods involve devising new Kannada vowel signs for Tulu. This imposes much work on designers of fonts and rendering software since requisite glyphs need to be constructed and the above recommended sequences need to be programmed into the script grammar. It is unlikely that this would happen soon at a large scale and would prove a serious stumbling block for those desiring to write Tulu immediately in Kannada Unicode. A simple solution would be to use an existing simple character as a vowel modifier. I considered 02BC MODIFIER LETTER APOSTROPHE as it is already used in Devanagari for Bodo etc, but the Tulu native users objected to this as it would be confusable with quotation marks. A viable alternate is using 02C2 MODIFIER LETTER LEFT ARROWHEAD and doubling it for length: Independent Vowels: ụ 0C85 ಅ + 02C2 = ಅ ụ 0C85 ಅ + 02C2 02C2 = ಅ ɛ 0C8E ಎ + 02C2 = ಎ ɛ 0C8E ಎ + 02C2 02C2 = ಎ Dependent Vowels: kụ 0C95 ಕ + 02C2 = ಕ kụ 0C95 ಕ + 02C2 02C2 = ಕ kɛ 0C95 ಕ + 0CC6 + 02C2 = ಕ kɛ 0C95 ಕ + 0CC6 + 02C2 02C2 = ಕ Again, despite the simplicity of this model (all that is required would be a font on one s system providing this character and a simple method to input it), it is unknown as to what extent the Tulu scholars and user community will be ready to adopt this new orthography. 5. Conclusion This document has discussed how Tulu written in the Kannada script may be implemented in Unicode. Initiative from the native scholars will be required for further action. -o-o-o- 6