Unicode Public Review Issue #66: Encoding of Chillu Forms in Malayalam

Similar documents
Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Florida Reading Endorsement Alignment Matrix Competency 1

Arabic Orthography vs. Arabic OCR

First Grade Curriculum Highlights: In alignment with the Common Core Standards

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Phonological Processing for Urdu Text to Speech System

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge

Underlying Representations

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Automatic English-Chinese name transliteration for development of multilingual resources

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

The ABCs of O-G. Materials Catalog. Skills Workbook. Lesson Plans for Teaching The Orton-Gillingham Approach in Reading and Spelling

Grade 4. Common Core Adoption Process. (Unpacked Standards)

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Student Name: OSIS#: DOB: / / School: Grade:

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

English Language and Applied Linguistics. Module Descriptions 2017/18

Rule-based Expert Systems

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Multi-sensory Language Teaching. Seamless Intervention with Quality First Teaching for Phonics, Reading and Spelling

Rhode Island College

South Carolina English Language Arts

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Grade 5: Module 3A: Overview

DIBELS Next BENCHMARK ASSESSMENTS

21st Century Community Learning Center

Understanding and Supporting Dyslexia Godstone Village School. January 2017

APA Basics. APA Formatting. Title Page. APA Sections. Title Page. Title Page

Alignment of Iowa Assessments, Form E to the Common Core State Standards Levels 5 6/Kindergarten. Standard

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Washington Homeschool Organization

Phonological and Phonetic Representations: The Case of Neutralization

Robert Woore a a Department of Education, University of Oxford, UK. Published online: 29 Mar 2010.

SARDNET: A Self-Organizing Feature Map for Sequences

1. Introduction. 2. The OMBI database editor

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Mandarin Lexical Tone Recognition: The Gating Paradigm

The Bruins I.C.E. School

TRAITS OF GOOD WRITING

Coast Academies Writing Framework Step 4. 1 of 7

SLINGERLAND: A Multisensory Structured Language Instructional Approach

DOWNSTEP IN SUPYIRE* Robert Carlson Societe Internationale de Linguistique, Mali

The influence of orthographic transparency on word recognition. by dyslexic and normal readers

Rubric for Scoring English 1 Unit 1, Rhetorical Analysis

Problems of the Arabic OCR: New Attitudes

I N T E R P R E T H O G A N D E V E L O P HOGAN BUSINESS REASONING INVENTORY. Report for: Martina Mustermann ID: HC Date: May 02, 2017

Sources of difficulties in cross-cultural communication and ELT: The case of the long-distance but in Chinese discourse

Missouri GLE FIRST GRADE. Communication Arts Grade Level Expectations and Glossary

A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

BEFORE THE ARBITRATOR. In the matter of the arbitration of a dispute between ADMINISTRATORS' AND SUPERVISORS' COUNCIL. And

Stages of Literacy Ros Lugg

COURSE DESCRIPTION PREREQUISITE COURSE PURPOSE

Finding, Hiring, and Directing e-learning Voices Harlan Hogan, E-learningvoices.com

Lexical phonology. Marc van Oostendorp. December 6, Until now, we have presented phonological theory as if it is a monolithic

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Weave the Critical Literacy Strands and Build Student Confidence to Read! Part 2

5 th Grade Language Arts Curriculum Map

Lecture 1: Basic Concepts of Machine Learning

Learning to Read and Spell Words:

Get Your Hands On These Multisensory Reading Strategies

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Writing Research Articles

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

MKTG 611- Marketing Management The Wharton School, University of Pennsylvania Fall 2016

Proof Theory for Syntacticians

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Plenary Session The School as a Home for the Mind. Presenters Angela Salmon, FIU Erskine Dottin, FIU

Facing our Fears: Reading and Writing about Characters in Literary Text

Project-based learning... How does it work and where do I begin?

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE

Sample Problems for MATH 5001, University of Georgia

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

UKLO Round Advanced solutions and marking schemes. 6 The long and short of English verbs [15 marks]

Ontologies vs. classification systems

Number of students enrolled in the program in Fall, 2011: 20. Faculty member completing template: Molly Dugan (Date: 1/26/2012)

Considerations for Aligning Early Grades Curriculum with the Common Core

CS 598 Natural Language Processing

Tier 2 Literacy: Matching Instruction & Intervention to Student Needs

Summer Assignment AP Literature and Composition Mrs. Schwartz

Niger NECS EGRA Descriptive Study Round 1

Word Stress and Intonation: Introduction

Date : Controller of Examinations Principal Wednesday Saturday Wednesday

Language Acquisition Chart

Description: Pricing Information: $0.99

Transcription:

1 of 5 5/2/2005 2:30 PM Unicode Public Review Issue #66: Encoding of Chillu Forms in Malayalam Author: Cibu C Johny Email: cibu (at) yahoo.com Date: May 2, 2005 Abstract This document proposes two solutions to the public review issue #66 and suggests introducing code points for Chillu letters as the preferred solution. Also describes various issues with the current representation of Chillu letters. Conventions used in this document Pronunciations are transliterated to Latin and indicated by single quotes ( ). Unicode code points for characters used in this document: A U+0D05 U - U+0D09 TA U+0D24 NA U+0D28 MA U+0D2E RA U+0D30 LA U+0D32 VA U+0D35 Virama U+0D4D Proposed solution to the Issue #66 Since Chillu-NA and NA + visible Virama can give different meaning to a word, we cannot let the rendering system choose the output of NA + Virama. Here are my preferences in the decreasing order: 1) Explicitly encode Chillu characters. Various issues are discussed in detail below. 2) <NA, Virama> (without any joiner) should be mapped to NA with visible Virama since it enforces uniformity. That is, Consonant + Virama will always produce visible Virama symbol, irrespective of whether the consonant is capable of forming a Chillu or not. If we follow this, both of following sample combinations without any joiner will have visible Virama symbol.

2 of 5 5/2/2005 2:30 PM VA + Virama = NA + Virama = Issues in current representation of Chillu letter as Consonant + Virama + ZWJ 1) ZWJ and ZWNJ are supposed to be font directives, directing a font to select from two or more semantically same renderings. In case of Malayalam, this is no longer true. ZWJ becomes an alien language construct introduced to Malayalam by Unicode to produce Chillu letters. Thus, it is possible to produce two semantically different words, which differ only by ZWJ in their Unicode representation. In the following examples, words differ only by ZWJ. Example 1.1: This word is with visible Virama after NA and pronounced as avanu. This word means for him. This word is with Chillu NA and pronounced as avan. This word means he. Example 1.2: Malayalam. This word is with Chillu RA. This is a valid word in This word is with RA in full form and VA in C2-conjoing form. This is NOT a valid word in Malayalam. 2) When a word is searched in Unicode text, the search algorithm should ignore ZWJ and ZWNJ because it should not care about the rendering of the word. From the argument 1, Malayalam can have words differ by a joiner alone. So the search for, say, will return also. That is plain wrong. As a work around, the search algorithm could match joiners, only in the case of Malayalam. Then the algorithm will not match those words that are semantically same but rendered differently by using or omitting a joiner (ZWJ or ZWNJ). For example, search for

3 of 5 5/2/2005 2:30 PM will not match, if later is written using ZWNJ. This issue has repercussions beyond the search algorithm. Future development of language tools (for example grammar checker) for Malayalam will be impeded by this inconsistency. 3) Confusion on whether (Chillu LA/TA) belongs to LA or TA. For Sanskrit words used Malayalam, (TA) is pronounced as it is, only when a vowel or semi-vowel comes after it. For all other occasions, it is pronounced as (LA). An example would be Sanskrit originated form is pronounced in Malayalam as ( ulsavam ). Even though, it s ( uthsavam ), it is ( ulsavam ). This means, Chillu form of (TA) should be pronounced as if it is Chillu form of (LA). Thus, (Chillu LA/TA) is in a very curious situation: Grapheme level: Graphically it is Chillu of (TA). Character level: Phoneme level: It can represent the characters either (TA) or (LA). Its pronunciation is the Chillu of (LA). Since Unicode is standardizing characters, this Chillu has to be considered the Chillu of both LA and TA. However, this will lead to two representations of a word with same rendering.

4 of 5 5/2/2005 2:30 PM 4) Chillu of a consonant is phonetically different from its C1-conjoining form without inherent (A). This is in direct contrast with that Unicode assumption and this inconsistency produces issues described in arguments 1 and 2. Consider the combination: Vow + CC + Con Vow - a vowel CC - a consonant capable of forming Chillu Con - a consonant When CC takes its Chillu form, it is joins more with Vow. This effect produces a noticeable small stop between CC and Con. When CC without inherent (A) forms a conjunct ligature with Con, it is pronounced together with Con without any pronunciation stop in-between. Two sample letter combinations to show the pronunciation difference: - RA in Chillu form - Full form of RA with C2-conjoining form of VA 5) Chillu of a consonant can be treated like Anusvara R. Raja Raja Varma states in his Keralapanineeyam (which is the foremost grammar book of Malayalam) "Anusvara is the Chillu form of MA". This is essentially same as saying Malayalam Anusvara and other Chillu characters share same properties. As a demonstration of that fact, we can see that, the half-stop phonetic property described in argument 4 is same for Anusvara and other Chillu characters. Following two sample letter combinations show the pronunciation similarity with the example in argument 4: Background A) Overloading of visible Virama in Malayalam Following are the functions of Visible Virama: A.1)At end of a word, it acts as quarter vowel (U). Example:

5 of 5 5/2/2005 2:30 PM ( avanu ) A.2)In the middle of a word, it means the consonant before is forming a conjunct with consonant after. For example, consider ( Sabdam ). In this context, it does not produce any sound what so ever. Functionality-(A.2) has been overloaded with this grapheme when typesetting friendly new orthography has been introduced. Unicode recognizes functionality-(a.2) alone with visible Virama of Malayalam. This contributes to the problem that Unicode representation of the words ( avan ) and ( avanu ) differ only by a joiner (ZWJ or ZWNJ). However, they have two different meanings. Reference: keralapaanineeyam, peethika - A. R. Raja Raja Varma