Language Specific Peculiarities Document for. Assamese as Spoken in Assam

Similar documents
BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Consonants: articulation and transcription

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Automatic English-Chinese name transliteration for development of multilingual resources

Arabic Orthography vs. Arabic OCR

DIBELS Next BENCHMARK ASSESSMENTS

Problems of the Arabic OCR: New Attitudes

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Universal contrastive analysis as a learning principle in CAPT

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Florida Reading Endorsement Alignment Matrix Competency 1

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

No. Distributor Address Contact No. 57 GAS GUILD INDANE STAR DISTRIBUTOR, K.P.M. CHARIALI, P.O.& DIST. SIVASAGAR (O) (M)

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Year 4 National Curriculum requirements

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Phonological Processing for Urdu Text to Speech System

The Journey to Vowelerria VOWEL ERRORS: THE LOST WORLD OF SPEECH INTERVENTION. Preparation: Education. Preparation: Education. Preparation: Education

Phonological and Phonetic Representations: The Case of Neutralization

GCSE. Mathematics A. Mark Scheme for January General Certificate of Secondary Education Unit A503/01: Mathematics C (Foundation Tier)

Sounds of Infant-Directed Vocabulary: Learned from Infants Speech or Part of Linguistic Knowledge?

2017 national curriculum tests. Key stage 1. English grammar, punctuation and spelling test mark schemes. Paper 1: spelling and Paper 2: questions

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

English Language and Applied Linguistics. Module Descriptions 2017/18

End-to-End SMT with Zero or Small Parallel Texts 1. Abstract

A Believable Accent: The Phonology of the Pink Panther

Chapter 5: Language. Over 6,900 different languages worldwide

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Coast Academies Writing Framework Step 4. 1 of 7

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Using SAM Central With iread

TEKS Comments Louisiana GLE

Speech Recognition at ICSI: Broadcast News and beyond

Culture, Tourism and the Centre for Education Statistics: Research Papers

The Indian English of Tibeto-Burman language speakers*

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

The Bruins I.C.E. School

MASTERY OF PHONEMIC SYMBOLS AND STUDENT EXPERIENCES IN PRONUNCIATION TEACHING. Master s thesis Aino Saarelainen

Phonetics. The Sound of Language

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

List of candidates for interview for the post of MO(MBBS) under NHM, Assam

Using a Native Language Reference Grammar as a Language Learning Tool

Learning Methods in Multilingual Speech Recognition

Primary English Curriculum Framework

Functional Skills Mathematics Level 2 assessment

DOWNSTEP IN SUPYIRE* Robert Carlson Societe Internationale de Linguistique, Mali

The influence of orthographic transparency on word recognition. by dyslexic and normal readers

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Affricates. Affricates 11/20/2015. Phonetics of English 1

What the National Curriculum requires in reading at Y5 and Y6

Modeling full form lexica for Arabic

Mandarin Lexical Tone Recognition: The Gating Paradigm

Teachers: Use this checklist periodically to keep track of the progress indicators that your learners have displayed.

Rhode Island College

Learning to Read and Spell Words:

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Sari locative noun classes Contents

Why Is the Chinese Curriculum Difficult for Immigrants Children from Southeast Asia

The ABCs of O-G. Materials Catalog. Skills Workbook. Lesson Plans for Teaching The Orton-Gillingham Approach in Reading and Spelling

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

National Standards for Foreign Language Education

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

TABLE OF CONTENTS Credit for Prior Learning... 74

Parsing of part-of-speech tagged Assamese Texts

Weave the Critical Literacy Strands and Build Student Confidence to Read! Part 2

MFL SPECIFICATION FOR JUNIOR CYCLE SHORT COURSE

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Study questions

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

HIGH SCHOOL COURSE DESCRIPTION HANDBOOK

Physics 270: Experimental Physics

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Sul Ross State University Spring Syllabus for ED 6315 Design and Implementation of Curriculum

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

Underlying Representations

Assessing Children s Writing Connect with the Classroom Observation and Assessment

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

Niger NECS EGRA Descriptive Study Round 1

Large Kindergarten Centers Icons

MASTER S THESIS GUIDE MASTER S PROGRAMME IN COMMUNICATION SCIENCE

Guidelines for blind and partially sighted candidates

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

Information Session 13 & 19 August 2015

REVIEW OF CONNECTED SPEECH

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE

Fisk Street Primary School

Vocabulary Cycle B. Teacher s Notes

COMMUNICATION & NETWORKING. How can I use the phone and to communicate effectively with adults?

Language properties and Grammar of Parallel and Series Parallel Languages

Transcription:

Language Specific Peculiarities Document for 1. Special handling of dialects Assamese as Spoken in Assam Dialects Eastern Assamese Central Western Districts in Assam Sibsagar, Lakhimpur, Jorhat, Golaghat, Sonitpur, Karbi Anglong, Nagaoan, Dibrugarh, Dhemaji, Tinsukia Marigaon, (Eastern) Kamrup Goalpara, Dhubri, Bongaigaon, Kokrajhar, Barpeta, Nalbari, Darrang, (Western) Kamrup Despite the language spoken in Sibsagar being considered the standard, and the literary dominance of Eastern Assamese since the 13 th Century, the modern standard dialect has started assimilating to the dialect spoken in Guwahati (Eastern Kamrup in the table above), mirroring a recent shift of the cultural center from Sibsagar to Guwahati. Guwahati did not historically have a dialect of its own, as it was largely a place of military fortification. 2. Deviation from native-speaker principle No special deviation only native speakers of Assamese, born in India will be collected in this project. 3. Special handling of spelling There will be no particular special handling of spelling in this collection. English loan words will be spelled in the Assamese script rather than the Latin alphabet. The Hemkosh (also known as the Hema Kosha) will be used as a reference for spelling. 4. Description of character set used for orthographic transcription The Assamese script will be used for the orthographic transcription of Assamese. The Unicode range for both Assamese and Bengali is U+0980-U+09FF. Presentation forms of these glyphs depend on the display font used, however, this does not affect the underlying Unicode. The Lohit Assamese font will be used. This Unicode-based font correctly renders all Assamese characters and can be downloaded from the following website: https://fedorahosted.org/lohit/. Assamese LSP Page 1

Combined presentation glyphs are represented through the use of the virama ( hashanta in Assamese) character (U+09CD), which suppresses the inherent vowel and dictates that the characters should render together. In some cases, the characters should render separately, including the virama (hashanta) symbol, but this is generally a function of the font used to view the text. In some cases where separate rendering must be forced, such as for morphological boundaries or loan words, zero-width characters (U+200c and U+200d) may be used. Certain words that would otherwise be homographs are distinguished by the use of the apostrophe ( urdhokôma in Assamese; U+0027), as in ল'ৰ "boy" and লৰ "move. This apostrophe is added to the inherent vowel and its initial variant অ to specify which of its pronunciations is being used. 5. Description of Romanization scheme The following is Appen Butler Hill's Romanization scheme which is fully reversible. Appen Butler Hill s Romanization schemes are being used for this project. These schemes are designed to be as similar in form as possible, but cannot be identical due to the different writing systems and spelling conventions in each language. 5.1. Assamese Romanization Scheme Bengali script is used for Assamese. UNICODE ASSAMESE ROMAN DESCRIPTION 0x981 M BENGALI SIGN CANDRABINDU 0x982 W BENGALI SIGN ANUSVARA 0x983 9 BENGALI SIGN VISARGA 0x985 অ a BENGALI LETTER A 0x986 আ A BENGALI LETTER AA 0x987 ই I BENGALI LETTER I 0x988 ঈ i BENGALI LETTER II 0x989 উ U BENGALI LETTER U 0x98a ঊ u BENGALI LETTER UU 0x98b ঋ r[ BENGALI LETTER VOCALIC R. 0x98f এ e BENGALI LETTER E Assamese LSP Page 2

UNICODE ASSAMESE ROMAN DESCRIPTION 0x990 ঐ e3 BENGALI LETTER AI 0x993 ও o BENGALI LETTER O 0x994 ঔ o3 BENGALI LETTER AU 0x995 ক k BENGALI LETTER KA 0x996 খ K BENGALI LETTER KHA 0x997 গ g BENGALI LETTER GA 0x998 ঘ G BENGALI LETTER GHA 0x999 ঙ N BENGALI LETTER NGA 0x99a চ c BENGALI LETTER CA 0x99b ছ C BENGALI LETTER CHA 0x99c জ j BENGALI LETTER JA 0x99d ঝ Z BENGALI LETTER JHA 0x99e ঞ J BENGALI LETTER NYA 0x99f ট t` BENGALI LETTER TTA 0x9a0 ঠ T` BENGALI LETTER TTHA 0x9a1 ড d` BENGALI LETTER DDA 0x9a2 ঢ D` BENGALI LETTER DDHA 0x9a3 ণ n` BENGALI LETTER NNA 0x9a4 ত t BENGALI LETTER TA 0x9a5 থ T BENGALI LETTER THA 0x9a6 দ d BENGALI LETTER DA Assamese LSP Page 3

UNICODE ASSAMESE ROMAN DESCRIPTION 0x9a7 ধ D BENGALI LETTER DHA 0x9a8 ন n BENGALI LETTER NA 0x9aa প p BENGALI LETTER PA 0x9ab ফ P BENGALI LETTER PHA 0x9ac ব b BENGALI LETTER BA 0x9ad ভ B BENGALI LETTER BHA 0x9ae ম m BENGALI LETTER MA 0x9af য Y BENGALI LETTER YA 0x9b2 ল l BENGALI LETTER LA 0x9b6 শ S BENGALI LETTER SHA 0x9b7 ষ s` BENGALI LETTER SSA 0x9b8 স s BENGALI LETTER SA 0x9b9 হ h BENGALI LETTER HA 0x9be A2 BENGALI VOWEL SIGN AA 0x9bf I2 BENGALI VOWEL SIGN I 0x9c0 i2 BENGALI VOWEL SIGN II 0x9c1 U2 BENGALI VOWEL SIGN U 0x9c2 u2 BENGALI VOWEL SIGN UU 0x9c3 r2 BENGALI VOWEL SIGN VOCALIC R 0x9c7 e2 BENGALI VOWEL SIGN E 0x9c8 e4 BENGALI VOWEL SIGN AI Assamese LSP Page 4

UNICODE ASSAMESE ROMAN DESCRIPTION 0x9cb o2 BENGALI VOWEL SIGN O 0x9cc o4 BENGALI VOWEL SIGN AU 0x9cd + BENGALI SIGN VIRAMA 0x9ce ৎ t2 BENGALI LETTER KHANDA TA 0x9dc ড় R BENGALI LETTER RRA 0x9dd ঢ় r` BENGALI LETTER RHA 0x9df য় y BENGALI LETTER YYA 0x9f0 ৰ r 0x9f1 ৱ w BENGALI LETTER RA WITH MIDDLE DIAGONAL BENGALI LETTER RA WITH LOWER DIAGONAL 0x027 APOSTROPHE 6. Description of method for word boundary detection Word boundaries in the orthography are determined by localization of white spaces (blank, tab, etc). In terms of word boundary issues, words in Assamese (such as compound words and stems with grammatical endings) often combine together to form one word. In such cases, words will be spelled without white spaces and will use the traditional spelling alterations associated with this phenomenon. Note however, that in some cases those words will appear next to each other with white space (this carries a different meaning). Spelling of words as either a compound (without white spaces) or as separate words is checked and standardized throughout the transcription project by identifying and reviewing words which have been spelled both together and apart. Occurrences of words both with and without white spaces typically carry different meanings. Hyphens are used in compounds to join components together when any of the components do not carry meanings on their own. 7. All phonemes in the stipulated notation The phonemic transcription of the words in this database uses X-SAMPA symbols, which can be found at http://www.phon.ucl.ac.uk/home/sampa/x-sampa.htm. The total number of phones is 48. There are 30 consonants, 9 vowels (7 monophthongs and 2 diphthongs) and 9 nasal vowels (7 monophthongs and 2 diphthongs). 7 of these are foreign phones (/f, v, ts, dz, Z, S, j/) which are not part of the native Assamese sound system but are commonly heard in English words. These phones are represented as being in allophonic variation with their equivalent pronunciations for a Assamese LSP Page 5

native speaker of the loan words. The orthography for these foreign phones would be the same as that given for the native equivalent phonemes. TYPICAL ASSAMESE CORRESPONDENCE Assamese Phone Chart UNICODE ROMAN IPA SAMPA COMMENTS CONSONANTS প 0x9aa p p p ফ 0x9ab P pʰ f p_h f ব 0x9ac b b b allophone of /p_h/, mainly occurring in English words e.g. "phone, final" bʰ b_h ভ 0x9ad B v v allophone of /b_h/, mainly occurring in English words e.g. "video, very". ত 0x9a4 t ট 0x99f t` t t ৎ 0x9ce t2 থ 0x9a5 T ঠ 0x9a0 T` tʰ t_h দ 0x9a6 d ড 0x9a1 d` d d ধ 0x9a7 D ঢ 0x9a2 D` dʰ d_h ক 0x995 k k k খ 0x996 K kʰ k_h গ 0x997 g g g ঘ 0x998 G gʰ g_h চ 0x99a c s s ছ 0x99b C tʃ ts allophone of /s/ in some English words, e.g. "speech" Assamese LSP Page 6

TYPICAL ASSAMESE CORRESPONDENCE UNICODE ROMAN IPA SAMPA COMMENTS z z জ 0x99c j ʒ Z dʒ dz য 0x9af Y ʝ j\ ঝ 0x99d Z ʝʰ j\_h ম 0x9ae m m m allophone of /z/ in some English words, e.g. "measure, vision" allophone of /z/ in some other English words, e.g. "judge" ন 0x9a8 n ণ 0x9a3 n` ঞ 0x99e J ঙ 0x999 N 0x982 W শ 0x9b6 S ষ 0x9b7 s` n ŋ x n N x স 0x9b8 s হ 0x9b9 h h h ৰ 0x9f0 r ɾ, r r ʃ S allophone of /x/ in some English words such as "sheet" ড় 0x9dc R ঢ় 0x9dd r` ɽ r` ল 0x9b2 l l l ৱ 0x9f1 w w w য় 0x9df y j j occurs in English words, such as "unite, yes". May be substituted with native vowel /i/ or /e/ Assamese LSP Page 7

TYPICAL ASSAMESE CORRESPONDENCE UNICODE ROMAN IPA SAMPA COMMENTS ORAL VOWELS অ 0x985 a ɔ O may be pronounced /o/ in some environments; see notes below আ 0x986 A 0x9be A2 a a ই 0x987 I 0x9bf I2 ঈ 0x988 i i i 0x9c0 i2 উ 0x989 U 0x9c1 U2 ঊ 0x98a u u u 0x9c2 u2 এ 0x98f e 0x9c7 e2 e e এ 0x98f e 0x9c7 e2 ɛ E may be pronounced /e/ in some environments; see notes below য় 0x9df 0x9be ya2 ও 0x993 o 0x9cb o2 ʊ U ঐ 0x990 e3 0x9c8 e4 oi oi ঔ 0x994 o3 0x9cc o4 ou NASAL VOWELS ou অ 0x985 0x981 am ɔ O~ Assamese LSP Page 8

TYPICAL ASSAMESE CORRESPONDENCE UNICODE ROMAN IPA SAMPA COMMENTS আ 0x986 0x981 AM a a~ ই 0x987 0x981 IM ঈ 0x988 0x981 im উ 0x989 0x981 UM ঊ 0x98a 0x981 um i i~ u u~ এ 0x98f 0x981 em e e~ এ 0x98f 0x981 em ɛ E~ ও 0x993 0x981 om ʊ U~ ঐ 0x990 0x981 e3m oi oi~ ঔ 0x994 0x981 o3m ou ou~ may be rarer than other nasal vowels may be rarer than other nasal vowels Notes Note that unlike in closely related Bengali, the Assamese consonants /t, d, t_h, d_h, n, l/ are alveolar, as indicated by the IPA symbols used above. Assamese has lost the distinction between dental and palatal consonants; both have merged to an alveolar place of articulation. The nasalised vowels are quite rare, but do occur in certain words and have phonemic status in Assamese. This phone set allows for a nasalised counterpart for all of the native vowels. The inherent vowel may be pronounced as /O/ or /o/. The two have a near-systematic pattern of distribution in Assamese (there are some exceptions). /O/ is pronounced as /o/ if there is a following /i/ or /u/. The same pattern of distribution applies to /E/ and /e/. /p/ and /b/ could be pronounced in their allophonic forms (IPA) [ɸ] and [β] between vowels and in word final positions. Depending on the variety spoken, however, this may not be a systematic occurrence. 8. Complete list of all rare phonemes 8.1. List of rare phonemes We expect the following phonemes to be rare in the database: ou O~ a~ i~ u~ e~ Assamese LSP Page 9

E~ o~ oi~ ou~ 8.2. List of foreign phonemes The following phonemes are foreign (English): j v f ts dz Z S 9. Other language specific items 9.1. Table of digits Digit Digit Assamese Assamese Romanization 0 ০ শ ন য Su2n+Y 1 ১ এক ek 2 ২ দ ই du2i 3 ৩ ত তন ti2ni2 4 ৪ চ ত ca2ri2 5 ৫ প চ pa2mc 6 ৬ ছয় Cy 7 ৭ স sa2t 8 ৮ আঠ AT` 9 ৯ ন n 9.2. Other Numbers Number Number Assamese Assamese Romanization 100 ১০০ এশ es 10,000 ১০,০০০ দহ হ জ dh ha2ja2r Assamese LSP Page 10

Number Number Assamese Assamese Romanization 100,000 ১,০০,০০০ এক ল খ ek la2k 10 million ১,০০,০০,০০০ এক কক ট, এক কক ট ek ko4t`i2, ek ko2t`i2 10. References Baruah, Sanjib and Masica, Colin P. (2001). Assamese. In J Garry & C. Rubino (Eds.), Facts about the world s languages: An encyclopedia of the world s major languages, past and present (pp. 43-47). New York: New England Publishing Associates. Goswami, G.C. and Tamuli, Jyotiprakash. Asamiya. In Cardona, George and Jain, Dhanesh. 2003. The Indo-Aryan Languages. Routledge. Immihelp.com, Telephone numbering scheme in India www.immihelp.com/nri/ phone-numberscheme-india.html. Kloss, Heinz and McConnell, Grant D. The Written languages of the world: a survey of the degree and modes of use, Volume 2, Book 1., Université Laval. Centre international de recherches sur le bilinguisme Presses Université Laval, 1978. Assamese LSP Page 11