User Feedback on Draft Devanagari Script Behaviour for Hindi Version 1.4.9

Similar documents
DCA प रय जन क य म ग नद शक द र श नद श लय मह म ग ध अ तरर य ह द व व व लय प ट ह द व व व लय, ग ध ह स, वध (मह र ) DCA-09 Project Work Handbook

S. RAZA GIRLS HIGH SCHOOL

क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD

Question (1) Question (2) RAT : SEW : : NOW :? (A) OPY (B) SOW (C) OSZ (D) SUY. Correct Option : C Explanation : Question (3)


ह द स ख! Hindi Sikho!

HinMA: Distributed Morphology based Hindi Morphological Analyzer

ENGLISH Month August

The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features

F.No.29-3/2016-NVS(Acad.) Dated: Sub:- Organisation of Cluster/Regional/National Sports & Games Meet and Exhibition reg.

व रण क ए आ दन-पत र. Prospectus Cum Application Form. न दय व kऱय सम त. Navodaya Vidyalaya Samiti ਨਵ ਦ ਆ ਦਵਦ ਆਦ ਆ ਸਦ ਤ. Navodaya Vidyalaya Samiti

CEFR Overall Illustrative English Proficiency Scales

Arabic Orthography vs. Arabic OCR

Guidelines for Writing an Internship Report

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Florida Reading Endorsement Alignment Matrix Competency 1

ADMN-1311: MicroSoft Word I ( Online Fall 2017 )

Writing Research Articles

Getting Started with Deliberate Practice

USE OF ONLINE PUBLIC ACCESS CATALOGUE IN GURU NANAK DEV UNIVERSITY LIBRARY, AMRITSAR: A STUDY

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

A process by any other name

Transliteration Systems Across Indian Languages Using Parallel Corpora

STUDENT MOODLE ORIENTATION

Transfer of Training

ENGLISH 298: Intensive Writing

APA Basics. APA Formatting. Title Page. APA Sections. Title Page. Title Page

TRAITS OF GOOD WRITING

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

HDR Presentation of Thesis Procedures pro-030 Version: 2.01

BIOH : Principles of Medical Physiology

Cross Language Information Retrieval

Unit Lesson Plan: Native Americans 4th grade (SS and ELA)

INDIAN INSTITUTE OF SCIENCE EDUCATION AND RESEARCH KOLKATA Mohanpur Ref.No.: IISER-K/Rectt.NT-01/2016/Admn Date:

Conducting an interview

Grade 4. Common Core Adoption Process. (Unpacked Standards)

Coast Academies Writing Framework Step 4. 1 of 7

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

B.A.B.Ed (Integrated) Course

1. Introduction. 2. The OMBI database editor

21st Century Community Learning Center

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Notetaking Directions

ENG 111 Achievement Requirements Fall Semester 2007 MWF 10:30-11: OLSC

Phonological Processing for Urdu Text to Speech System

CX 105/205/305 Greek Language 2017/18

The Future of Consortia among Indian Libraries - FORSA Consortium as Forerunner?

Illinois WIC Program Nutrition Practice Standards (NPS) Effective Secondary Education May 2013

Taking into Account the Oral-Written Dichotomy of the Chinese language :

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

CENTRAL MAINE COMMUNITY COLLEGE Introduction to Computer Applications BCA ; FALL 2011

Problems of the Arabic OCR: New Attitudes

Chapter 9 Banked gap-filling

IEP AMENDMENTS AND IEP CHANGES

South Carolina English Language Arts

Rendezvous with Comet Halley Next Generation of Science Standards

LING 329 : MORPHOLOGY

2 nd grade Task 5 Half and Half

ELP in whole-school use. Case study Norway. Anita Nyberg

Weave the Critical Literacy Strands and Build Student Confidence to Read! Part 2

Conditions of study and examination regulations of the. European Master of Science in Midwifery

essays. for good college write write good how write college college for application

Diploma in Library and Information Science (Part-Time) - SH220

ACCOUNTING FOR LAWYERS SYLLABUS

THESIS GUIDE FORMAL INSTRUCTION GUIDE FOR MASTER S THESIS WRITING SCHOOL OF BUSINESS

Exclusions Policy. Policy reviewed: May 2016 Policy review date: May OAT Model Policy

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Physics 270: Experimental Physics

A Pumpkin Grows. Written by Linda D. Bullock and illustrated by Debby Fisher

Modeling full form lexica for Arabic

MARK 12 Reading II (Adaptive Remediation)

Case study Norway case 1

E-LEARNING IN LIBRARY OF JAMIA HAMDARD UNIVERSITY

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

Success Factors for Creativity Workshops in RE

Consequences of Your Good Behavior Free & Frequent Praise

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Accessing Higher Education in Developing Countries: panel data analysis from India, Peru and Vietnam

Carolina Course Evaluation Item Bank Last Revised Fall 2009

TRI-STATE CONSORTIUM Wappingers CENTRAL SCHOOL DISTRICT

Competition in Information Technology: an Informal Learning

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

Calculators in a Middle School Mathematics Classroom: Helpful or Harmful?

Preparing for the School Census Autumn 2017 Return preparation guide. English Primary, Nursery and Special Phase Schools Applicable to 7.

Global Perspectives on Reproduction and Childbirth ANTH 197

SOFTWARE EVALUATION TOOL

Highlighting and Annotation Tips Foundation Lesson

LEGO MINDSTORMS Education EV3 Coding Activities

Progressive Aspect in Nigerian English

LITPLAN TEACHER PACK for The Indian in the Cupboard

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Transcription:

User Feedback on Draft Devanagari Script Behaviour for Hindi Version 1.4.9 S. No. Version Feedback/ Remark From TDIL-DC Portal Users Pertinence Comments 1. 1.4.9 I would strongly suggest the use of a fixed 4 byte completely Indic script to include ALL Indian languages using bit 22 and 23, etc in a LE32 system. The system would accept conjoint constants, etc. (what the Latin group calls syllables) as distinct characters. This would help enormously in collation, as huge data (composed in Indic languages) is stored in India and abroad. From Dr. Navinchandra Mehta. Point well-taken, however not pertinent since the document deals with Script behaviour and not with storage issues 2. 1.4.9 The modern generation is not identifying each syllable as a character or (अक षर). From ancient times till as recently as mid-20th century, we accepted each syllable as अक षर. I think it is essential that we as Indians reserve some 128K or 256K bytes available in the unused 32 bit range before others claim it. The modern computers are fast enough to handle 32 bit character comfortably. From Dr. Navinchandra Mehta. 3. 1.4.9 We could get क़ the short vowel sound in Hindi. But We want क the long vowel sound. How can We get? Try to arrange in fonts the original vowels ऋ the short vowel and the long vowel; and their sounds also. From Chandra Sekhar j k Point well-taken, however not pertinent since the document deals with Script behaviour and not with storage issues Long ॠ is not used in Hindi and hence has been not incorporated.

4. 1.4.9 Dr Navin Mehta has documented page-wise analysis of script behaviour, His document is attached for reference. I have already commented that we have failed in not reserving more space for the Indic scripts which I believe should have been much more syllabic oriented and should occupy 200,000 to 400,000 characters. I would strongly urge to reserve space at bit 22 and 23. The following is my detailed analysis and suggestions about your document: Devanagari Script Behaviour for Hindi Ver 1 4 9.pdf In this analysis, I have tried to use your version in black, and my comments and suggestions in red, using brown for Hindi characters. I may have missed some areas, but I think it is generally readable. Here go the comments: Each of these is explained below: a. Choice of Character: Languages differ in the choice of the characters from the Devanagari code-page. Thus Marathi and Konkani use ळ and ऱ (for generating out the eyelash ra). These are not present in Hindi or Dogri. The Hindi ऍ (U+090D) is represented in Marathi and Konkani as (U+0972). Nukta is used in Hindi and Dogri but not in Marathi or Konkani. b. The shape of the given character. Although Marathi and Hindi share the same script Devanāgarī, not only do they not share the same character inventory but in addition the representation of certain characters is different. Thus the Hindi /la/ is different from the Marathi /la/ in so far as the placement of the stem is concerned Hindi /ल/Marathi /. c. Choice of Character: In general, I believe that each regional language should be allowed to use their own character form, e.g. ळ and ऱ. My argument is based on the fact that in Gujarati, for example, one uses લ for the Hindi-Devnagari character ल. I do agree that the absence of the vertical stem in Marathi-devnagari makes it somewhat difficult to split it but The document deals with Hindi and not Marathi.

one has to learn to do it the way they have always done it. The same argument applies to the difference in shapes of the numerals.in short Marathi, although they use an almost Devnagari Script, the script should be treated as a separate language script within the Sanskrit group. On page 4 I would accept the different way in which Marathi-Devnagari treats the word Still on page 4, d. The collation order within the language. The collation order varies from language to language although they all share the same script. In the case of Hindi are sorted along with the first consonant of each ligature. Thus is sorted along with क, with ज and with त. In Marathi a letter. In Nepali occur at the end of the lexical sort, giving the two conjuncts a specific value of are sorted at the end. Not pertinent. Please see remark above Contrary to my acceptance of difference in the shape of characters in Marathi-Devnagari, I think it is inadvisable to have different collation for the different languages. In all, i.e. be it Hindi, Gujarati of Marathi, the collation should follow the Hindi order of collation, i,e, are sorted along with the first consonant of each ligature. Thus along with क, with ज and with त is sorted The document deals only with Hindi. The collation order is the one provided in the latest CLDR on Unicode site and represents the sort order of Hindi accurately.

Page 8: The other target group is the OS and application developer. Once the possible ligatures and consonant Mātrā combinations have been identified, there is a need to provide a list of maximum combinations within the language. Devanagari Script behaviour for Hindi is equally important for keyboard design, especially when supplemented by frequency data from a corpus. This is of great importance to me as a developer. I am glad you are addressing the issue. No comment, since it approves of the remark. Page 10: (and pages 22 and 23) Example ख in Hindi. What about the deeply gruttal ख of Arabic? Have you provided or will you be providing for it? What about the v in have, which is different from the sound in wind. The word have is one which can identify an Indian as distinct from an English person. Considering the overwhelming usage of /English words both in Devnagari and in Gujarati Media, it is necessary to have the separate character for v which is not the same as व or व See you comment 21 on page 18. I think a Nukta on व is required to bring out the proper sound of v in the word have, revolve, vine and many other English words. In particular I want to point out that the English pronounce vine and wine differently. Therefore, the character व Eis the sound in wine but NOT in vine, which should be represented with a nukta added. A LOT OF Indians do not understand this difference. My suggestion is toa dd this in the table of consonants on page 22, as similar to the addition of फ़ an Urdu Import. The deeply gruttal (sic. guttural) ख of Arabic is represented by ख़. Since Unicode does not distinguish between /w/ and /v/ of English insofar as Devanagari is concerned, the matter be addressed to Unicode.

Page 14. The Devanagari Script behaviour for Hindi is limited to its synchronic use, i.e. the manner in which a given language as of today admits a character set within the script used to write it. It is not diachronic or historical in nature and does not study the evolution of the given script across centuries. I do not fully agree with this approach or interpretation or narrowness. The document is pertinent to actual use of Hindi today and has no diachronic pertinence. Page 16. 6.1.3. Amendments needed in Unicode for Hindi language None has been proposed by the experts who have mandated the document. I am surprised and appalled at the lack of interest. The Indic portion of Unicode is so difficient as to need at least another 256K characters which can only be had by use of a 32 bit character using the available bits 22,23, etc. Page 18. ऽ - Avagraha For extra length with long vowels as seen in the Sanskrit text /उपद श ऽजन न सक/ I do not see any need for this item. It is but rarely a part of Hindi or any other current languages of India. It occurs a lot in Sankrit and Prakrit, but then there are so many more occurring in Sanskrit that it becomes a separate subject altogether. See http://www.sanskritweb.net/ http://www.sanskritweb.net/itrans/addendum.pdf Pages 22 and 23 My strong request as covered above at Page 10 comments. The matter be referred to Unicode. This document is restricted to the shapes. Used in poetry and to quote Sanskrit words and hence the experts agreed to maintain the same. Please refer to

Page 25. We accept क + ऋ = क We also write I have never understood why we do not have a place of the half र as a vowel sound, as above, but I am too small a person to even attempt to change things in that area. FOR MY PERSONAL USE ONLY in the E000 area I use it as a vowel for the convenience of compostion and the fact that I have three vacancies in a line of 16 decimals. remarks above. The rafar is not a vowel sound. Page 26 I also use bottom rakar in my list of vowel attachments (see page 25 comments above). I will re-emphasize that I do it for MY convenience, and certainly do not wish to be drawn into any linguistic discussion about my interpretation. The remarks are for personal use. The displaced Catenator as in is not a problem to me. Page 29 The and are God sent to me, as I use them as end of sentence and end of paragraph respectively and I am very comfortable with it. Page 30 I have already brought your attention to the request for addition of व with a nukta in my comments about Page 10 and Pages 22 and 23. Pages 31 onwards These would be affected if you accept the insertion of व with a nukta. The remarks are for personal use. The matter be addressed to Unicode. Please see detailed comment re. the same above. Please see remark above.

Page 55 I am uncomfortable with in Sanskrit but rarely in Hindi. and other three consonants conjoints. They exist prolifically are available through Google transliteration software, so I suppose they have to exist. For three consonant conjoints like have usually found and used myself more conveniently and comfortably. However, enough said about three consonants conjoints. Page 61 If the collation order is as you have shown there. I have been taught and usually used the anuswar, the chandrabindu and क: AFTER क. But I will not quibble.universal acceptability for collation is more important. I The conjunct characters have been provided by Hindi experts who have validated the document. The collation order is based on the CLDR as provided by Unicode.

5. 1.4. 9 1.This document does not talk about keyboard layout (Inscript keyboard layout) It will be a useful addition to this document to include this topic in this document. It will make it more complete. In relation to this topic, there is certain scope for improvement in the current layout of the Hindi Inscript keyboard. Currently, in the Inscript keyboard the purna viram chihn ( ) occurs in the shift level of the keyboard which greatly inconveniences Hindi writers. Since the Hindi full stop occurs at the end each sentence, it is a frequently used character. It is suggested that in the Hindi keyboard the position of the English full stop (.) which currently occurs in the normal level of the keyboard be switched with the position of the Hindi purna viram, that is the Hindi purna viram be brought into the normal level of the keyboard, so that it can be typed by pressing just one key. Currently it requires the pressing of two keys (shift and full stop). This will greatly facilitate Hindi typing. Suitable changes in the Unicode numbers of. and should be made to accommodate this change. Another drawback of the Hindi Inscript keyboard is that it misses the frequently used symbols like? + % @ ' " ; : etc. For all these, Hindi writers have to switch to the English keyboard, type these symbols and then switch back to the Hindi keyboard for continuing to type the Hindi text. This greatly slows Hindi typing. The Enhanced Inscript Keyboard document is a separate proposal pending approval before the Bureau of Indian Standards. The remark may be addressed to the team working on the said document. Ways should be found to accommodate all these symbols so that Hindi typing speed can be increased and work efficiency increased. From L. Balasubramaniam

6. 1.4.9 Although this topic is not strictly related to script, it would be a useful topic to have in a comprehensive document of this nature. Currently it is very difficult to make out the gender of loan words from English, Urdu and other languages. This has led to great confusion in Hindi regarding the gender of words. Words like ट क, प स ल, ट र न, (to cite a few examples from English) are found to be used in both genders in Hindi. Even dictionaries give different genders for different words, eg., फ ख त which according to some dictionaries is masculine (because of the masculine-indicating आ ending of Hindi) and according to some is feminine (because of the feminine-indicating आ ending of Sanskrit loan words like लत, पत, त, etc). Clear guidelines on determining the gender of such words will greatly help to standardize the Hindi language. I am sure Central Hindi Directorate and other Hindi institutions have deliberated on the gender issue of loan words, and it should be relatively easy to summarize their recommendations and include them in this document. I hope you will be able to add these two topics to this document and make it more comprehensive and useful. This document is concerned with shapes and issues afferent to the same. Grammar and Morphology are not within the purview of the document. It is requested that CHD be contacted to prepare a document on the issue. From L. Balasubramaniam

7. 1.4.9 two short vowels ऎ,ऒ should be included by Shree Devi Kumar Short E/O are not really for Dravidian transliteration only, but were originally introduced by Hoernle for the Bihari languages Bhojpuri, Magadhi and Maithili" Please see: http://www.unicode.org/l2/l2010/10471-dev-short-vowels.pdf As per LSI by Grierson, Bihari and Awadhi "As in Bihari, there is a short e as well as a long one, and a short o as well as o. Also a short at and a short au." https://archive.org/search.php?query=rosettaproject%20awadhi%20and%20subject%3a%2 2Awadhi%20Detailed%20Description%22 " As in other Bihari dialects, the vowels e and o, and the diphthongs ai and au have each two sounds, a short and a long one. Accurate writers distinguish these when writing in the Devanagari character, " http://www.joao-roiz.jp/lsi/pdf/vol=5-2ff=36ft=36tid=fcbbfe64c14e8d3678c67e55b6b0e2ce4c083ec2 The document deals with Hindi alone and not with Bhojpuri, Maithili or Magadh. Information provided is gratefully acknowledged.

Comment 1: Nukta is not used in Nepali also. Because, Nepali does not have uvular sounds and other sounds where Hindi uses Nukta. Comment 1: Nukta is not used in Nepali also. Because, Nepali does not have uvular sounds and other sounds where Hindi uses Nukta. Comment 2: In Nepali, the collation order of क ष, ज ञ, त र is with क, ज and त respectively, not at the end. However, in the primary school level, they are taught to read and write as they are at the end. Comment 3: In Nepali, is used for nazalization uniformmaly. is used in Sanskrit tatsam words and they follow the Sanskrit Pancham varna rule,ie, they pronuciation depends upon the following character. Comment 4: Nepali has only six vowels: अ, आ, इ, उ, ए and ओ; and two dipththongs ऐ and औ However, ई, ऊ and ऋ are also used in writing. Comment 5: Since the enconidng system (unicode) has been well established, the project can be extended to the entire indic langage where Devanagari scripit is used. Why only for 'Hindi'? The document deals with Hindi alone and not with Nepali or any other language using Devanagari script. However the information provided is appreciated.

8. 1.4.9 K P Tiwari सह यक मह प रबन धक Reserve Bank Of India The points raised by Shri Tiwari are interesting. kptiwari@rbi.org.in आपक आल ख म प ष ठ 44 पर ददख य गय ह श जबदक अध क षर श म द नह लग ग य त श + र ह ग य द र ऋ क म त र श + क सय ग ह ग क पय इस ब र म क छ ध य न ददय ज ए The first point regarding श is adequately covered in the footnote on the page mentioned. द सर ब त यह दक जबस कम प य टर आए त आर भ म क तपय कठ न इय और सस धन क कम क क रण लस ट प क प रय ग करत ह ए ह दशमलव भ लख ज न लग जबदक ग णत य द स दशमलव क स थ न अध र ख स क छ ऊपर ह आ करत थ ग णत क प र न दकत ब म यह स पष ट ह यदद द वन गर म दशमलव क स थ न सह कर ददय ज ए त यह अत य धक प रशसन य क य ह ग यह ब त त सव वददत ह दक पहल ट कण यत र क अपन स म ए थ इस लए द वन गर ल खन क लए भ क छ स म ओ म रहकर ह वण आदद क क म चल ऊ तर क अपन य गय ल दकन आज वह स म ट ट गई ह अग र ज म ह द खए फ र न च य जम न क जन शब द यथ tête-à-tête क ट कण र मन म करत समय ऊपर क न क त नह लग ए ज त थ ल दकन वर प र स ससग म आज यह अपन आप लग ज त ह इस प रक र क व यवस थ हन द म भ क ज सकत ह जन शब द क अश द ध र प चल रह ह, य जनक सय क त क षर बन न Insofar as the use of the full stop as a decimal point or a temperature mark is concerned, the issue although interesting, is beyond the purview of this document. CHD be requested to consider the same.

सभव नह थ आज आस न स बन ज त ह द ववद व क इतन लब लखन क क य आवश यकत ह जब द व क सय क त र प आस न स बन सकत ह यह न त पढ न म आस न ह न द खन म, ब कक म न हन द स खन व ल क इस दवदव ब लत स न ह, जबदक सय क त क षर म यह द ष नह रह ज त आश ह आप इन स झ व पर ध य न द कर म झ क त थ कर ग