Version

Similar documents
On-Screen Font in Telugu

DCA प रय जन क य म ग नद शक द र श नद श लय मह म ग ध अ तरर य ह द व व व लय प ट ह द व व व लय, ग ध ह स, वध (मह र ) DCA-09 Project Work Handbook

S. RAZA GIRLS HIGH SCHOOL

क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD

HinMA: Distributed Morphology based Hindi Morphological Analyzer

ह द स ख! Hindi Sikho!


Question (1) Question (2) RAT : SEW : : NOW :? (A) OPY (B) SOW (C) OSZ (D) SUY. Correct Option : C Explanation : Question (3)

The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

ENGLISH Month August

Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features

On the Formation of Phoneme Categories in DNN Acoustic Models

The ABCs of O-G. Materials Catalog. Skills Workbook. Lesson Plans for Teaching The Orton-Gillingham Approach in Reading and Spelling

End-to-End SMT with Zero or Small Parallel Texts 1. Abstract

The Journey to Vowelerria VOWEL ERRORS: THE LOST WORLD OF SPEECH INTERVENTION. Preparation: Education. Preparation: Education. Preparation: Education

Universal contrastive analysis as a learning principle in CAPT

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

व रण क ए आ दन-पत र. Prospectus Cum Application Form. न दय व kऱय सम त. Navodaya Vidyalaya Samiti ਨਵ ਦ ਆ ਦਵਦ ਆਦ ਆ ਸਦ ਤ. Navodaya Vidyalaya Samiti

Automatic English-Chinese name transliteration for development of multilingual resources

F.No.29-3/2016-NVS(Acad.) Dated: Sub:- Organisation of Cluster/Regional/National Sports & Games Meet and Exhibition reg.

A Hybrid Approach to Lao Word Segmentation using Longest Syllable Level Matching with Named Entities Recognition

MARK 12 Reading II (Adaptive Remediation)

Transliteration Systems Across Indian Languages Using Parallel Corpora

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Learning Methods in Multilingual Speech Recognition

Arabic Orthography vs. Arabic OCR

Primary English Curriculum Framework

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

MARK¹² Reading II (Adaptive Remediation)

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Year 4 National Curriculum requirements

Unit 9. Teacher Guide. k l m n o p q r s t u v w x y z. Kindergarten Core Knowledge Language Arts New York Edition Skills Strand

Phonological Processing for Urdu Text to Speech System

Investigation of Indian English Speech Recognition using CMU Sphinx

Richardson, J., The Next Step in Guided Writing, Ohio Literacy Conference, 2010

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Learning to Read and Spell Words:

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

MASTERY OF PHONEMIC SYMBOLS AND STUDENT EXPERIENCES IN PRONUNCIATION TEACHING. Master s thesis Aino Saarelainen

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Sari locative noun classes Contents

2017 national curriculum tests. Key stage 1. English grammar, punctuation and spelling test mark schemes. Paper 1: spelling and Paper 2: questions

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

SARDNET: A Self-Organizing Feature Map for Sequences

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Why Is the Chinese Curriculum Difficult for Immigrants Children from Southeast Asia

Sounds of Infant-Directed Vocabulary: Learned from Infants Speech or Part of Linguistic Knowledge?

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

The Indian English of Tibeto-Burman language speakers*

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Affricates. Affricates 11/20/2015. Phonetics of English 1

Corrections to and clarifications of the Seri data in Greenberg & Ruhlen s An Amerind Etymological Dictionary

Language Change: Progress or Decay?

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Rhode Island College

The Bruins I.C.E. School

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Highlighting and Annotation Tips Foundation Lesson

ENGLISH LANGUAGE ARTS SECOND GRADE

PRESENTED BY EDLY: FOR THE LOVE OF ABILITY

Florida Reading Endorsement Alignment Matrix Competency 1

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Using SAM Central With iread

Fisk Street Primary School

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Named Entity Recognition: A Survey for the Indian Languages

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Study questions

Phonetics. The Sound of Language

Language properties and Grammar of Parallel and Series Parallel Languages

Problems of the Arabic OCR: New Attitudes

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

TEKS Comments Louisiana GLE

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

KP Select Provider Directory

Mixed Accents: Scottish Children with English Parents

Segregation of Unvoiced Speech from Nonspeech Interference

Word Stress and Intonation: Introduction

Documenting incipient obsolescence: a multi-pronged approach to Dhao, eastern Indonesia

English to Marathi Rule-based Machine Translation of Simple Assertive Sentences

arxiv: v1 [math.at] 10 Jan 2016

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge

A Believable Accent: The Phonology of the Pink Panther

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Printed in the United States of America

COMMISSIONER AND DIRECTOR OF SCHOOL EDUCATION ANDHRA PRADESH :: HYDERABAD NOTIFICATION FOR RECRUITMENT OF TEACHERS 2012

Grade 4. Common Core Adoption Process. (Unpacked Standards)

Self-Supervised Acquisition of Vowels in American English

Transcription:

Indian Language Speech sound Label set (ILSL12) Version 2.1.6 Indian Language Speech sound Label set (ILSL12), 2012 developed by Indian Language TTS Consortium & ASR Consortium Copyright (c) 2012 Indian Language TTS Consortium & ASR Consortium Dr Samudravijaya, Tata Institute of Fundamental Research, Mumbai (chief@tifr.res.in, samudravijaya@gmail.com) Dr Hema A Murthy, Indian Institute of Technology, Madras (hema@cse.iitm.ac.in, hema.arunachalam@gmail.com ) This work is licensed under the Creative Commons AttributionNonCommercialShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/byncsa/4.0/. This document specifies a standard set of labels (in Roman script) for speech sounds commonly used in Indian languages. This document lists the label set for 13 languages (currently being processed by ASR/TTS consortia of TDIL, DIT, GoI). These labels are to be used for computer processing of spoken Indian languages. (1) Similar sounds in different languages are given a single label. (2) The IPA symbol refers to an exemplar (Hindi/Tamil/other) language. (3) This is not an IPA chart of sounds of Indian languages. (4) The label set is designed such that the native script is largely recoverable from the transliteration. A label may consist of a sequence of alphanumeric characters of the Roman alphabet; they will not contain any special character such as quote, hyphen etc. All labels are in lower case even though the labels are case insensitive. Since the number of speech sounds are larger than the Roman alphabet, a system of suffixes as well as letter combinations are used for labels. (A) Notes on suffixes: 1. Aspiration: Use suffix h to denote aspiration: k (क) versus kh (ख). 2. Retroflex consonants: Use suffix x to denote retroflex place of articulation: t (त) versus tx (ट). 3. Nukta / bindu : Use suffix q to denote a nukta/bindu: dx (ड) versus dxq (ड). Nukta (a dot below the glyph) may denote a flap/tap or a fricative variant of the consonant. Bindu (a dot

above an [vowel] glyph) denotes a nasal after the vowel; the place of articulation of the nasal will be the same as that of the following consonant. If there is no consonant after the bindu, the vowel is nasalized. 4. Nasalized vowel: Use suffix n to denote nasalization of a vowel: k a h aa (कह ) versus k a h aan (कह ). 5. Geminated sounds: The label for a geminated consonant is the label of the corresponding single consonant with the first letter of the label repeated. Example: p a k aa (पक ) versus p a kk aa (पकक ) in Hindi; a ddx aa (अड ) in Hindi; a ll a m (ginger in Tamil). 6. Other special cases: Use suffix x to denote certain special cases: reduced vowel (axx) in various languages; a of Bangla; apical affricates of Marathi; special r of Dravidian languages etc. 7. Priority of suffixes: Some symbols may have multiple suffixes. In such cases the following is the priority (in decreasing order): x h q n (B) Notes on Matras, Diphthongs and Halant: 1. The label for an vowel matra is the same as that of the vowel. 2. The label of a diphthong is generated as a concatenation of the labels of the corresponding vowel. The exceptions to this rule are ae, ea and eu ; these are monophthongs. 3. The halant in Indian scripts denotes the absence of the implicit a in Indian consonant characters. It is not a sound and hence there is no label for halant. The morphological analyser of the language shall delete the implicit a when a halant is present in the script. 4. Punctuation marks: The 'transliteration' module will retain the punctuation marks (exception: ' ' and will be replaced by fullstop); these are useful for prosody generation. The morphological analyser will remove the punctuation marks while generating the word/phone level transcription. (C) Language specific notes* : This section has notes on sounds (and labels) specific to a subset of the languages. Whenever possible, minimal pairs for language specific phonemes are provided. In other cases, examplar words containing common phones in that language are written. 1. Hindi (i) The glyph ञ is pronounced as a sequence of two phones: "g y". The morphological analyser of the language will do this translation from glyph to the phone sequence. (i) The compound glyph ज= ज + ञ " is pronounced as a sequence of two phones: "g y". The morphological analyser of the language will do this translation from glyph to the phone sequence. 2. Marathi Label IPA Glyph Minimal pair Exemplar words/ Comment nxh /ɳ h / णह nh /n h / नह mh /m h / मह rh /r h / ऱह न ण (coin) नह ण (take bath) महण (say) मण () र स (heap)

Label IPA Glyph Minimal pair Exemplar words/ Comment ऱह स (decay) lh /l h / लह उलह स, क लह य wh /w h / वह व ळ (a kind of cereal) वह ळ (will become) * content not complete

(1) This is a common Label Set (in Roman script) for the purpose of computer processing of spoken Indian languages. (2) Similar sounds in different languages are given a single label. (3) The IPA label refers to an exemplar (Hindi/Tamil/other) language. (4) This is NOT an IPA chart of sounds of Indian languages. Sl.No. Label IPA Hindi Marathi Rajasthani Gujarati Odia 1 a a अ अ अ અ 2 ax ɔ ऑ ઑ ଅ 3 aa aː आ आ आ આ ଆ 4 axx ə 5 i ɪ, i इ इ इ ઇ 6 ii iː ई ई ई ઈ ଇ, ଈ 7 u u, ʊ उ उ उ ઉ ଉ, ଊ 8 eu ɯ 9 uu uː ऊ ऊ ऊ ઊ 10 rq ऋ,ॠ ऋ,ॠ ऋ ଋ 11 e e ଏ 12 ee eː ए ए,ऎ ए એ 13 ea ɛ 14 ei ɛː ऐ ऐ ઐ 15 ai ai ऐ ଐ 16 oi oj 17 o o ओ ओ,ऒ ओ ઓ ଓ 18 oo oː 19 ae ae ऍ ઍ Page 1

20 au aʊ औ ଔ 21 ou oʊ औ औ ઔ 22 k k क क क ક କ 23 kh kʰ ख ख ख ખ ଖ 24 g g ग ग ग ગ ଗ 25 gh ɡʰ घ घ घ ઘ ଘ 26 ng ŋ ङ ङ ङ ઙ ଙ 27 c tʃ च च च ચ ଚ 28 ch tʃʰ छ छ छ છ ଛ 29 cx t ʃ च 30 j dʒ ज ज ज જ ଜ,ଯ 31 jh dʒʰ झ झ झ ઝ ଝ 32 jx d ʃ ज 33 nj ɲ ञ ञ ઞ ଞ 34 tx ʈ ट ट ट ટ ଟ 35 txh ʈʰ ठ ठ ठ ઠ ଠ 36 dx ɖ ड ड ड ડ ଡ 37 dxh ɖʰ ढ ढ ढ ઢ ଢ 38 nx ɳ ण ण ण ણ ଣ 39 t t त त त ત ତ 40 th t ʰ थ थ थ થ ଥ 41 d d द द द દ ଦ 42 dh d ʰ ध ध ध ધ ଧ 43 n n न,ऩ न,ऩ न ન ନ Page 2

44 nd 45 p p प प प પ ପ 46 ph pʰ फ फ फ ફ ଫ 47 b b ब ब ब બ ବ 48 bh bʰ भ भ भ ભ ଭ 49 m m म म म મ ମ 50 y j य,य़ य,य़ य ય ୟ 51 r r र,ऱ र,ऱ र ર ର 52 l l ल ल ल લ ଲ 53 lx ɭ ळ,ऴ ळ ળ ଳ 54 w ʋ व व व વ ୱ,ଵ 55 sh ʃ श श श શ 56 sx ʂ ष ष ष ષ 57 s s स स स સ ସ, ଶ, ଷ 58 h ɦ ह ह ह હ ହ 59 kq q क़ क़ 60 khq x ख़ ख़ 61 gq ɣ ग़ ग़ 62 z z ज़ ज़ 63 jhq ʒ झ झ 64 dxq ɽ ड़ ड़ ड़ ଡ 65 dxhq ɽʰ ढ़ ढ़ ढ़ ଢ 66 dhq ध 67 f f फ़ फ़ Page 3

68 bq ब 69 yq 70 nq 71 rx ɾ 72 sq स 73 zh ɻ 74 nxh ɳʰ ण ह 75 nh nʰ न ह 76 mh mʰ म ह म ह 77 rh lʰ ऱ ह 78 lh lʰ ल ह 79 wh wʰ व ह व ह 80 q 81 hq 82 mq 83 x x Page 4

(1) This is a common Label Set (in Roman script) for the purpose of computer processing of spoken Indian languages. (2) Similar sounds in different languages are given a single label. (3) The IPA label refers to an exemplar (Hindi/Tamil/other) language. (4) This is NOT an IPA chart of sounds of Indian languages. Sl.No. Label Bengali Assamese Manipuri Bodo P G P G P G 1 a অ অ अ 2 ax অ অ অ অ 3 aa আ আ আ আ আ আ आ 4 axx অ অ 5 i ই,ঈ ই ই,ঈ ই ই ই इ 6 ii ঈ ঈ ঈ ঈ ई 7 u উ,ঊ উ উ,ঊ উ উ উ उ 8 eu 9 uu ঊ ঊ ঊ ঊ ऊ 10 rq ঋ,ৠ ঋ,ৠ ঋ ঋ ऋ 11 e এ এ এ এ এ এ ए 12 ee 13 ea 14 ei ऐ 15 ai ঐ ঐ 16 oi ঐ ঐ ঐ ঐ ওই ওই 17 o ও ও ও ও ও ও ओ 18 oo 19 ae অয Page 5

20 au ঔ ঔ 21 ou ঔ ঔ ঔ ঔ औ 22 k ক ক ক ক ক ক क 23 kh খ খ খ খ খ খ ख 24 g গ গ গ গ গ গ ग 25 gh ঘ ঘ ঘ ঘ ঘ ঘ घ 26 ng ঙ, ঙ ঙ, ঙ ঙ ঙ ङ 27 c চ চ চ চ च 28 ch ছ ছ ছ ছ छ 29 cx 30 j জ,য জ জ,য জ,য জ জ ज 31 jh ঝ ঝ ঝ ঝ ঝ ঝ झ 32 jx 33 nj ঞ ঞ ঞ ঞ ञ 34 tx ট ট ট ট ट 35 txh ঠ ঠ ঠ ঠ ठ 36 dx ড ড ড ড ड 37 dxh ঢ ঢ ঢ ঢ ढ 38 nx ণ ণ ণ ণ ण 39 t ত,ৎ ত,ৎ ট,ত,ৎ ত,ৎ ট,ত,ৎ ত,ৎ त 40 th থ থ ঠ,থ থ ঠ,থ থ थ 41 d দ দ ড,দ দ ড,দ দ द 42 dh ধ ধ ঢ,ধ ধ ঢ,ধ ধ ध 43 n ন ন ণ,ন ন ণ,ন ন न Page 6

44 nd 45 p প প প প প প प 46 ph ফ ফ ফ ফ ফ ফ फ 47 b ব ব ব ব ব ব ब 48 bh ভ ভ ভ ভ ভ ভ भ 49 m ম ম ম ম ম ম म 50 y য় য য় য় য়,য য়,য य 51 r র র ৰ,ড় ৰ,ড় র র र 52 l ল ল ল ল ল ল ल 53 lx 54 w ওয় ৱ ৱ ৱ ৱ व 55 sh শ,ষ শ শ শ श 56 sx ষ ষ ষ ष 57 s স স চ, ছ চ, ছ স স स 58 h হ হ হ হ হ হ ह 59 kq 60 khq 61 gq 62 z জ 63 jhq झ 64 dxq ড় ড় ड़ 65 dxhq ঢ় ঢ় ঢ় ঢ় ढ़ 66 dhq 67 f ফ Page 7

68 bq 69 yq য় 70 nq 71 rx 72 sq 73 zh 74 nxh 75 nh 76 mh 77 rh 78 lh 79 wh 80 q 81 hq 82 mq 83 x শ,ষ,স শ,ষ,স Page 8

(1) This is a common Label Set (in Roman script) for the purpose of computer processing of spoken Indian languages. (2) Similar sounds in different languages are given a single label. (3) The IPA label refers to an exemplar (Hindi/Tamil/other) language. (4) This is NOT an IPA chart of sounds of Indian languages. Sl.No. Label Tamil Malayalam Telugu Kannada 1 a அ അ అ ಅ 2 ax 3 aa ஆ ആ ఆ ಆ 4 axx 5 i இ ഇ ఇ ಇ 6 ii ஈ ഈ ఈ ಈ 7 u உ ഉ ఉ ಉ 8 eu உ 9 uu ஊ ഊ ఊ ಊ 10 rq ഋ ఋ,ౠ ಋ,ೠ 11 e எ എ ఎ ಎ 12 ee ஏ ഏ ఏ ಏ 13 ea ಎ 14 ei 15 ai ஐ ഐ ఐ ಐ 16 oi 17 o ஒ ഒ ఒ ಒ 18 oo ஓ ഓ ఓ ಓ 19 ae Page 9

20 au ஔ ഔ ఔ ಔ 21 ou 22 k க ക క ಕ 23 kh ഖ ఖ ಖ 24 g கv ഗ గ ಗ 25 gh ഘ ఘ ಘ 26 ng ங ങ ఙ ಙ 27 c ச ച చ ಚ 28 ch ഛ ఛ ಛ 29 cx 30 j ஜ ജ జ ಜ 31 jh ഝ ఝ ಝ 32 jx 33 nj ஞ ഞ ఞ ಞ 34 tx ட ട ట ಟ 35 txh ഠ ఠ ಠ 36 dx ட v ഡ డ ಡ 37 dxh ഢ ఢ ಢ 38 nx ண ണ ణ ಣ 39 t த ത త ತ 40 th ഥ థ ಥ 41 d த v ദ ద ದ 42 dh ധ ధ ಧ 43 n ந,ன ന న ನ Page 10

44 nd ந 45 p ப പ ప ಪ 46 ph ഫ ఫ ಫ 47 b பv ബ బ ಬ 48 bh ഭ భ ಭ 49 m ம മ మ ಮ 50 y ய യ య ಯ 51 r ர ര ర ರ 52 l ல ല ల ಲ 53 lx ள ള ళ ಳ 54 w வ വ వ ವ 55 sh ശ శ ಶ 56 sx ஷ ഷ ష ಷ 57 s ஸ സ స ಸ 58 h ஹ ഹ హ ಹ 59 kq 60 khq 61 gq 62 z 63 jhq 64 dxq ട 65 dxhq 66 dhq 67 f ஃப Page 11

68 bq 69 yq 70 nq ന 71 rx ற റ ఱ ಱ 72 sq 73 zh ழ ഴ 74 nxh 75 nh 76 mh 77 rh 78 lh 79 wh 80 q 81 hq 82 mq 83 x Page 12