Transliteration System for English to Sinhala Machine Translation

Similar documents
Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

MARK 12 Reading II (Adaptive Remediation)

Phonological Processing for Urdu Text to Speech System

The ABCs of O-G. Materials Catalog. Skills Workbook. Lesson Plans for Teaching The Orton-Gillingham Approach in Reading and Spelling

English Language and Applied Linguistics. Module Descriptions 2017/18

The Journey to Vowelerria VOWEL ERRORS: THE LOST WORLD OF SPEECH INTERVENTION. Preparation: Education. Preparation: Education. Preparation: Education

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Florida Reading Endorsement Alignment Matrix Competency 1

Year 4 National Curriculum requirements

Automatic English-Chinese name transliteration for development of multilingual resources

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Learning to Read and Spell Words:

Richardson, J., The Next Step in Guided Writing, Ohio Literacy Conference, 2010

CS 598 Natural Language Processing

Test Blueprint. Grade 3 Reading English Standards of Learning

Greeley-Evans School District 6 French 1, French 1A Curriculum Guide

MASTERY OF PHONEMIC SYMBOLS AND STUDENT EXPERIENCES IN PRONUNCIATION TEACHING. Master s thesis Aino Saarelainen

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Effect of Word Complexity on L2 Vocabulary Learning

Theme 5. THEME 5: Let s Count!

2017 national curriculum tests. Key stage 1. English grammar, punctuation and spelling test mark schemes. Paper 1: spelling and Paper 2: questions

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

Sari locative noun classes Contents

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

The influence of orthographic transparency on word recognition. by dyslexic and normal readers

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

GENERAL COMMENTS Some students performed well on the 2013 Tamil written examination. However, there were some who did not perform well.

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

DIBELS Next BENCHMARK ASSESSMENTS

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Developing Grammar in Context

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Study questions

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

Why Is the Chinese Curriculum Difficult for Immigrants Children from Southeast Asia

Primary English Curriculum Framework

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Affricates. Affricates 11/20/2015. Phonetics of English 1

raıs Factors affecting word learning in adults: A comparison of L2 versus L1 acquisition /r/ /aı/ /s/ /r/ /aı/ /s/ = individual sound

Arabic Orthography vs. Arabic OCR

Vocabulary Cycle B. Teacher s Notes

Get Your Hands On These Multisensory Reading Strategies

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

1. Introduction. 2. The OMBI database editor

Word Stress and Intonation: Introduction

Sri Lanka. On the scale of a world map, Sri Lanka previously known as Ceylon appears to hang like a Pearl over the Indian Ocean.

Character Stream Parsing of Mixed-lingual Text

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

PRESENTED BY EDLY: FOR THE LOVE OF ABILITY

Les cartes au poisson

Detecting English-French Cognates Using Orthographic Edit Distance

Stages of Literacy Ros Lugg

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Learning Methods in Multilingual Speech Recognition

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Primary National Curriculum Alignment for Wales

TESL /002 Principles of Linguistics Professor N.S. Baron Spring 2007 Wednesdays 5:30 pm 8:00 pm

UKLO Round Advanced solutions and marking schemes. 6 The long and short of English verbs [15 marks]

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Weave the Critical Literacy Strands and Build Student Confidence to Read! Part 2

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Syllabus FREN1A. Course call # DIS Office: MRP 2019 Office hours- TBA Phone: Béatrice Russell, Ph. D.

Holy Family Catholic Primary School SPELLING POLICY

TEKS Comments Louisiana GLE

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

English Language A Level. Edexcel. A Handbook

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

SARDNET: A Self-Organizing Feature Map for Sequences

DEPARTMENT OF JAPANESE LANGUAGE AND STUDIES

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Cross Language Information Retrieval

Problems of the Arabic OCR: New Attitudes

CAVE LANGUAGES KS2 SCHEME OF WORK LANGUAGE OVERVIEW. YEAR 3 Stage 1 Lessons 1-30

COMMUNICATION & NETWORKING. How can I use the phone and to communicate effectively with adults?

Words come in categories

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Fisk Street Primary School

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Unit 9. Teacher Guide. k l m n o p q r s t u v w x y z. Kindergarten Core Knowledge Language Arts New York Edition Skills Strand

1 st Grade Language Arts July 7, 2009 Page # 1

MARK¹² Reading II (Adaptive Remediation)

Chapter 5. The Components of Language and Reading Instruction

Language Change: Progress or Decay?

Universal contrastive analysis as a learning principle in CAPT

MERRY CHRISTMAS Level: 5th year of Primary Education Grammar:

THE ALLEGORY OF THE CATS By David J. LeMaster

On the Formation of Phoneme Categories in DNN Acoustic Models

CROSS-LANGUAGE MAPPING FOR SMALL-VOCABULARY ASR IN UNDER-RESOURCED LANGUAGES: INVESTIGATING THE IMPACT OF SOURCE LANGUAGE CHOICE

Handout #8. Neutralization

Transcription:

Transliteration System for English to Sinhala Machine Translation

Transliteration System for English to Sinhala Machine Translation Budditha Hettige Department of Statistics and Computer Science, Faculty of Applied Sciences, University of Sri Jayewardenepura & Asoka S. Karunananda Faculty of Information Technology, University of Moratuwa, Sri Lanka

Overview What is Machine Translation Problems in Machine Translation Machine Transliteration Sinhala & English Language Existing Approaches and Methods Proposed approach: Design Modules Conclusion and further works Demonstration

What is Machine Translation? Machine translation (MT) is a translation process that translate one natural language into other.

Machine Translation Process Source language dictionary Source language Analysis source language sentence Bilingual dictionary Translation Target language dictionary Target language generation Target language sentence

Source language analysis Morphological analysis Source language Morphological analyzer analyze word by word in given sentence and returns Morphological information for each word. Syntax analysis Source language parser identify the syntax of the given source language sentence.

Translation Translator is used to translate source language word in to target language

Target language generation Morphological generation Source language Morphological analyzer/generator generate appropriate target language words with grammatical information Syntax generation Target language parser generates the sentences in the target language

Problems in Machine translation Out-of-Vocabulary No words in a dictionary Proper noun translation Example (Mahinda Rajapaksha) Handling technical terms Pentium IV Processor Multiword Expression Oil cake ^lejqï& Semantic and pragmatic

What is Machine Transliteration? Machine transliteration is a method for automatic conversion of words in one language in to phonetically equivalent ones in another language. Example the English word machine is transliterated into Sinhala as ueiska.

Why Machine Transliteration Machine Transliteration can be used to solve Out-of-Vocabulary problem Translate Proper nouns

Design: English to Sinhala Machine Translation System English Sentence English Morphological analyzer English Dictionary English Parser Transliteration Translator Bilingual Dictionary Intermediate Editor Sinhala Morphological analyzer Sinhala Dictionary Sinhala Parser Sinhala Sentence

Transliteration Approaches Grapheme-based transliteration direct orthographical mapping from source graphemes to target graphemes Phoneme-based transliteration based on pronunciation or the source phoneme rather than spelling or source grapheme Hybrid and Correspondence-based transliteration Used above two approaches

Types of Transliterations Forward Transliteration Transliteration of a name from its native script to a foreign one Backward Transliteration Restoration of a previously transliterated name to its native scripts

English and Sinhala language English Language English contains 26 letters with 5 vowels Sinhala Language The Sinhala alphabet consists of 61 letters comprising 18 vowels, 41 consonants and 2 semiconsonants Represent 40 sounds: 14 vowel sounds and 26 consonant sounds

Phonetic Relation between English and Sinhala These two languages are fundamentally different from each other There are no stokes in English language Spoken and written English are equivalent. But there is a difference between written and spoken Sinhala language Also Diphthongs are not used in written Sinhala language

Disambiguation Two English sounds ^ and ә is represented in one Sinhala letter a (w) There are two Sounds in English International phonetic alphabet (IPA) I and i for English but Sinhala uses one e (b) for above both two sounds No Diphthongs are used in Sinhala Language. Therefore these sound representations have some difficulties. Two sounds v and w are represented in one Sinhalese letter w (j) No Direct Sound for English Letters q, x, z in Sinhala Also large numbers of irregular word pronunciations are difficult

Available Approaches Dictionary writers have used numbers of methods for English to Sinhala transliteration phonetic-based transliteration method based on International Phonetic Alphabet (IPA) sounds non-phonetic-based transliteration method Based on letters

Transliteration Approaches English Malalasekara Rathna Godage Aback D nela w[d]nela tnela Binocular nb fkdlahq,ad Ìfk[d]lHq,[¾] nhsfkdlahq,¾ Quota laõdwüd lafjdag lafjdagd Volcono fjd,a flbkadw fj[d],,aflafkda fjd,aflafkda xenophobia fizkad*adwì fi[z]k[d]f*daìh fifkdaf*daìwd Zero iazbd¾dw [z]isfrda isfrda

Proposed Approach to English to Sinhala Transliteration Letter-based transliteration approach Use Finite State Automaton (FSA) Two types of transliteration models are developed Type 1 : Original English text E.g Computer Type 2 : Sinhala words written using English letters e.g. Ambepussa

English to Sinhala Transliteration for Original English Text (Type1) Letter-based transliteration approach Use Finite State Automaton (FSA)

IPA Chart for English Vowels IPA English English Sinhala Examples a: a wd Father ɪ i b Sit ɪ y b City i: ee B See ɛ e t Bed ε: ir ta Bird æ a we lad, cat, ran ʌ U, ou w (jsjd;) run, enough ɒ o, a T not, wasp ɔ: aw, au law, caught ʊ U, oo W put, wood uː oo, ou W! soon, through ə a w(ixjd;) About ə er w(ixjd;) Winner

IPA Chart for English Consonants IPA English Sinhala Examples P p ma pen, spin, tip B b í but, web T t Ü two, sting, bet D d â do, odd tʃ ch, t É chair, nature, teach dʒ d,j,dge ca gin, joy, edge K c,k,q,ck la cat, kill, skin, queen, thick ɡ g.a go, get, beg F f,gh *a fool, enough, leaf V v, ve õ voice, have Θ th ;a thing, teeth Ð th, the oa this, breathe, father S s, c, ss i see, city, pass Z z, se i zoo, rose

IPA Chart for English Consonants contd.. ʒ s, ge i pleasure, beige H h ya ham M m ï man, ham N n ka no, tin Ŋ ng x singer, ring L l, ll, left, bell ɹ r r run, very W j j we J y h yes ʍ j j what

FST for Types 1 transliteration A i e a, e, i, o, u, y V2 e, r V1 r B d C1 g c e C2 D1 C4 C3 v e C5 t, e, s,c,g h k e e o a V4 V3 Vowels w, u o, u C n t l 0 C6 C7 C8 h D2 l D g e D Consonants l 0 = {b,c,d,f,g,h,j,k,l,m,n,p,q,r,s,t,v,w,x,y,z}

English to Sinhala Transliteration for Sinhala words written using English Letters (Type2) Letter-based transliteration approach Use Finite State Automaton (FSA)

Sinhala Transliteration alphabet for Type 2 Sinhala Eng Sinhala Eng Sinhala Eng w a X nga M pa wd aa Õ nnga M pha we ae p ca N ba we aee P cha N bha b i c ja U ma B ii CO jha U mba W u [ nya H ya W! uu { jnya R ra Ì Ị `P ndja, la Ï Ị g tta j va id ŗ G ttha Y sha

Sinhala Transliteration alphabet cont Sinhala Eng Sinhala Eng Sinhala Eng idd ŗ v daa I ssa T e V daha i sa Ta ee K nna y ha Ft ai ~ nnda < lla T o ; ta * fa oo : tha > gha T! au o da nda L ka O dha. ga L kha k na

FST for Types 2 transliteration I r V2 e V1 r D1 I e l s C2 C7 i t C1 s D1 h l b A a i o e u V3 V7 L2 V4 e V5 V6 L1 i u o, u B C d n t L2 C6 C3 C5 d n C4 L1 d D2 n, d, y D3 h d, j h t j D Vowels L1 = { a, e,,i, o, u, Ǐ, ŕ }, L2 = { a, e, i } d D4 Consonants L1 = { k, g, c, j, t d,b, m, y, r, f, v, s, h, l, n, p } L2 = { k, g, c, j, t, d, b, s, p}

Approach in Practice

Demonstration

Conclusion Handling of Pronunciations of an English word is a critical problem in English to Sinhala transliteration. English letter a represent different sound w, we and we (ago wf.da, America wefursld and antwekaá) in Sinhala English word contains different pronunciations two word father and fathom has different pronunciation for fath

Further work Incorporating English IPA into the system

Thank you!