Title: On the unsuitability of the COENG encoding model for Khmer Source: Date:

Similar documents
Reading Project. Happy reading and have an excellent summer!

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Guidelines for Writing an Internship Report

Coast Academies Writing Framework Step 4. 1 of 7

The Algebra in the Arithmetic Finding analogous tasks and structures in arithmetic that can be used throughout algebra

Test Blueprint. Grade 3 Reading English Standards of Learning

Inquiry Learning Methodologies and the Disposition to Energy Systems Problem Solving

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Rhode Island College

STANISLAUS COUNTY CIVIL GRAND JURY CASE #08-04 LA GRANGE ELEMENTARY SCHOOL DISTRICT

Statewide Framework Document for:

Critical Thinking in Everyday Life: 9 Strategies

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Writing for the AP U.S. History Exam

The ABCs of O-G. Materials Catalog. Skills Workbook. Lesson Plans for Teaching The Orton-Gillingham Approach in Reading and Spelling

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Sacramento State Degree Revocation Policy and Procedure

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Word Stress and Intonation: Introduction

Date Re Our ref Attachment Direct dial nr 2 februari 2017 Discussion Paper PH

Holy Family Catholic Primary School SPELLING POLICY

MASTER S THESIS GUIDE MASTER S PROGRAMME IN COMMUNICATION SCIENCE

b) Allegation means information in any form forwarded to a Dean relating to possible Misconduct in Scholarly Activity.

Phonological and Phonetic Representations: The Case of Neutralization

Laporan Penelitian Unggulan Prodi

SOCIAL SCIENCE RESEARCH COUNCIL DISSERTATION PROPOSAL DEVELOPMENT FELLOWSHIP SPRING 2008 WORKSHOP AGENDA

ACADEMIC POLICIES AND PROCEDURES

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE

1. Introduction. 2. The OMBI database editor

Practice Examination IREB

Grade 4. Common Core Adoption Process. (Unpacked Standards)

Using Rhetoric Technique in Persuasive Speech

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

UML MODELLING OF DIGITAL FORENSIC PROCESS MODELS (DFPMs)

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Just Because You Can t Count It Doesn t Mean It Doesn t Count: Doing Good Research with Qualitative Data

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Empiricism as Unifying Theme in the Standards for Mathematical Practice. Glenn Stevens Department of Mathematics Boston University

Writing a composition

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

Multi-genre Writing Assignment

Rendezvous with Comet Halley Next Generation of Science Standards

Plenary Session The School as a Home for the Mind. Presenters Angela Salmon, FIU Erskine Dottin, FIU

Physics 270: Experimental Physics

M.S. in Environmental Science Graduate Program Handbook. Department of Biology, Geology, and Environmental Science

Sri Lanka. On the scale of a world map, Sri Lanka previously known as Ceylon appears to hang like a Pearl over the Indian Ocean.

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Problems of the Arabic OCR: New Attitudes

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

Phonological Processing for Urdu Text to Speech System

candidates) in aggregate in M.Com./MIB/ MHROD/ MFC/ MBA and other such

A Diverse Student Body

What is Thinking (Cognition)?

TU-E2090 Research Assignment in Operations Management and Services

Text and task authenticity in the EFL classroom

School: Business Course Number: ACCT603 General Accounting and Business Concepts Credit Hours: 3 hours Length of Course: 8 weeks Prerequisite: None

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Foundations of Knowledge Representation in Cyc

The Political Engagement Activity Student Guide

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

On the implementation and follow-up of decisions

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Arabic Orthography vs. Arabic OCR

UKLO Round Advanced solutions and marking schemes. 6 The long and short of English verbs [15 marks]

Functional Skills. Maths. OCR Report to Centres Level 1 Maths Oxford Cambridge and RSA Examinations

Dakar Framework for Action. Education for All: Meeting our Collective Commitments. World Education Forum Dakar, Senegal, April 2000

GRADUATE PROGRAM IN ENGLISH

REGULATIONS RIGHTS AND OBLIGATIONS OF THE STUDENT

Section 3 Scope and structure of the Master's degree programme, teaching and examination language Appendix 1

Classroom Assessment Techniques (CATs; Angelo & Cross, 1993)

Oakland Schools Response to Critics of the Common Core Standards for English Language Arts and Literacy Are These High Quality Standards?

5 Star Writing Persuasive Essay

On-Screen Font in Telugu

Northern Virginia Alumnae Chapter of Delta Sigma Theta Sorority, Incorporated Scholarship Application Guidelines and Requirements

Mercer County Schools

GCSE English Language 2012 An investigation into the outcomes for candidates in Wales

Tap vs. Bottled Water

Unit 8 Pronoun References

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Florida Reading Endorsement Alignment Matrix Competency 1

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50

Chapter 4 - Fractions

Last Editorial Change:

Myths, Legends, Fairytales and Novels (Writing a Letter)

Building a Sovereignty Curriculum

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Heritage Korean Stage 6 Syllabus Preliminary and HSC Courses

Be aware there will be a makeup date for missed class time on the Thanksgiving holiday. This will be discussed in class. Course Description

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

AUTHORITATIVE SOURCES ADULT AND COMMUNITY LEARNING LEARNING PROGRAMMES

Answer the following questions in complete sentences on a separate sheet of paper:

Supplemental Focus Guide

Stages of Literacy Ros Lugg

I. General provisions. II. Rules for the distribution of funds of the Financial Aid Fund for students

Weave the Critical Literacy Strands and Build Student Confidence to Read! Part 2

The IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs. 20 April 2011

Text: envisionmath by Scott Foresman Addison Wesley. Course Description

Transcription:

Title: On the unsuitability of the COENG encoding model for Khmer Source: Date: 2002-05-03 We welcome Mr. Michael Everson s recent submission (ISO/IEC JTC1/SC2/WG2 N2412) on the suitability of the COENG encoding model for Khmer, though we cannot agree with him on the main points. We would also appreciate it if he could bring counterarguments, if any, to the remaining points we raised before in our documents (ISO/IEC JTC1/SC2/WG2 N2380R and N2406), which so far remain unanswered. First of all, we have to reconfirm a basic point. The model he calls COENG encoding model had been called virama model until recently. The critical decision to adopt the existing model in 1998 was made principally on the reasoning that (T)he main benefit of the virama model was ease of implementation as it is a well-known model (ISO/IEC JTC1/SC2/WG2 N1729). We have previously shown that there is no virama sign as a general killer in Khmer script, unlike, for example, in Devanagari script. So the proponents of the current model had to invent a fictional character as just a control code, which led to a different model from the virama model. The fact that they had to change the name of the model when applying it to Khmer, supports our position that it does not correspond to the Khmer reality. Moreover, the ease of implementation of the existing model is even denied by implementers themselves, nullifying the reasoning of N1729 For both rendering and sorting, the explicitly encoded subscript model is better than the existing model. In sum, the existing model was decided based on critical misunderstandings. Now we wish to turn to refuting the new points raised in N2412. On the existing model s similarities to Brahmi script Mr. Everson quoted a figure from Daniels & Bright 1990 to show that Khmer script came from Indian Pallava prototype, a descendent of Brahmi. (We have found the figure 55 rather on p.448 of Peter T. Daniels and William Bright, eds., The World s Writing Systems, Oxford University Press, 1996.) We have never argued against the point that Brahmi script is an ancestor of Khmer script. We can, however, refer to significant differences, too, with regard to each of the five points of similarity advanced to justify utilizing the same model.

1 While Khmer does indeed have independent vowel characters, their use is very limited. Usually they were written by a consonant character QA for a glottal stop sound and a dependent vowel sign. The existence of the consonant character QA is one proof of the unique development of Khmer script. 2 While in Khmer each consonant does have an inherent vowel, the Khmer system introduces a new feature with categorizing the consonant characters into two series, and varying the inherent vowel sound for a consonant character depending on which series it belongs to. There are many pairs of characters whose consonant sounds are the same but whose inherent vowel sounds are different. 3. While vowel signs are added to change the inherent vowel sound, because of the unique system of Khmer script mentioned above, the sound of the same vowel sign changes according to the series of the consonant character it is attached to. 4 & 5 They are important points. Another figure in p.380 of the 1996 Daniels & Bright book referred to by Mr. Everson shows that Brahmi script diverged into northern scripts and southern scripts before the third century. Pallava is among the southern ones, while Devanagari belongs to the northern group. The northern scripts generally constitute a conjunct consonant character to represent a consonant cluster, where the original entities cannot be seen separately. There may be multiple representation forms for a single conjunct character. These scripts have utilized a killer sign (virama) to suppress the preceding inherent vowel sound. Historically its use was limited to denote the absence of the inherent vowel sound of a final consonant of a syllable, but in the modern age it is also used to suppress the inherent vowel of the first consonant(s) in a consonant cluster in order to simplify complex conjuncts. It is not always the case with the southern scripts. For them, complex conjunct consonant characters are rather exceptional. Tamil script has a real general killer sign (pulli), which makes most conjunct consonant characters unnecessary. Telugu developed another way. It developed consonant signs independent from consonant characters, and put them to the first consonant character to denote consonant clusters. Such differences between northern and southern scripts can be easily seen in the examples of kta, as Mr. Everson showed in p.1 of N2412.

Khmer script came from the southern line, but has had its own history of development for more than 1400 years. It developed a complete system of consonant signs that are positioned below a consonant character. Because of this vertical positioning, a consonant sign is called COENG. A consonant character and a consonant sign are completely independent entities. In most cases you can combine them as you like without changing their shapes. Complex conjunct consonant characters are not necessary at all. This system also widened the use of the consonant signs. Sometimes they are used to denote a final consonant sound in a syllable, as follows: ƒ = ƒ ƒ (name) = (both) Please note that the consonant sign DOES NOT KILL any preceding inherent vowel in these cases. So not only a consonant character but also an independent vowel character can have a consonant sign below it (give) These features show the uniqueness of Khmer compared with Indic scripts, especially Devanagari. The logic of the virama model is artificial. As Mr. Everson himself admits, there is no virama in Brahmi script itself, which means it is not a common or natural feature of those scripts derived from Brahmi. It is just one possible way to deal with complex conjunct consonant characters efficiently by a system of ligature control based? on the phonetic function of the virama to kill the preceding inherent vowels. Thus Mr. Everson s assertion that all the scripts rooted in Brahmi should use the existing model is groundless. It is clear that such logic is not adequate for Khmer. As shown above, Khmer script is a different script model. The existence of consonant signs independent from consonant characters is the core of the model. Consequently, the explicitly encoded subscript model is far better than the existing model, not only for storing data but also for sorting,

searching and rendering precisely because it fits the model of the script itself. On the process As for the lack of due process that is necessary in making international standards, we wrote basic important facts in ISO/IEC JTC1/SC2/WG2 N2406, so we will not repeat them here, and will limit ourselves to saying that we stand by our position that an irregular and unacceptable process was followed, without proper consultation with the designated national body. The tentative results of the five meetings Mr. Everson mentioned were summarized in a private report of National Higher Education Task Force dated on August 14, 1996, addressed to Mr. Maurice Bauhahn. Although it is true that eminent linguists gathered, they did not decide any official or final stance of Cambodia. The report itself says it is not sufficient. This task force was not given a mandate to make an official decision on this issue. It had nothing to do with the national standards body of Cambodia that had already been registered with ISO in 1995. Nevertheless, it is still useful to confirm here that the report clearly listed subscript consonants independently from consonant characters among the necessary characters that should be encoded. While non-cambodians might have suggested to them to accept virama model they evidently refused to do so. Mr. Everson s assertion that they were not explicitly against virama model is not supported by the facts shown in the report, as indeed admitted by Mr Eversson, and testified by several participants in the meeting. We would like to add that some of the scholars mentioned by Mr. Everson are clearly supporting the current Cambodian stance. On ROBAT In modern Khmer script, ROBAT has lost its original meaning as a part of a ligature for a consonant cluster including RO. In some old loan words from Sanskrit/Pali, it is pronounced according to its original rule i.e. just before the base character it is attached above. In the other old loan words, however, it is not pronounced at all. It is kept just for information of the original spelling. ROBAT is not used for the other words. It is not a rule for Khmer script itself to spell a consonant cluster beginning with RO by ROBAT.

The rule is to spell a consonant character RO and a consonant sign of another consonant character (= a subscript consonant) below it. Many examples can be found. (civilized) Š etc. Thus it is proper to deal with ROBAT as just a diacritical mark as it is in the existing model. On other points Mr. Everson is trying to play down some of the strong points of the explicitly encoded subscript model, but he cannot deny them. That is enough for us. The ultimate reasons for not adopting our model seem to be procedural ones. We also have much to say about procedures, as we wrote in N2406. Mr. Everson asserts that UCS as a universal encoding standard and interchange platform would be compromised if our requests are accepted. We do not think so. Universal does not mean all the same. It should mean everyone can enjoy it. For that purpose, the credibility of Unicode for everyone should be important. Please note that we are making our proposal to make UCS/Unicode better, not to put it down.