GETTING STARTED WITH DIRECT MT. Milind Ganjoo

Similar documents
क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD

S. RAZA GIRLS HIGH SCHOOL

DCA प रय जन क य म ग नद शक द र श नद श लय मह म ग ध अ तरर य ह द व व व लय प ट ह द व व व लय, ग ध ह स, वध (मह र ) DCA-09 Project Work Handbook

HinMA: Distributed Morphology based Hindi Morphological Analyzer

1. Share the following information with your partner. Spell each name to your partner. Change roles. One object in the classroom:

Health Sciences and Human Services High School FRENCH 1,

9779 PRINCIPAL COURSE FRENCH

The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL

CAVE LANGUAGES KS2 SCHEME OF WORK LANGUAGE OVERVIEW. YEAR 3 Stage 1 Lessons 1-30

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

Greeley-Evans School District 6 French 1, French 1A Curriculum Guide

Cross Language Information Retrieval

Transcript for French Revision Form 5 ( ER verbs, Time and School Subjects) le français

Question (1) Question (2) RAT : SEW : : NOW :? (A) OPY (B) SOW (C) OSZ (D) SUY. Correct Option : C Explanation : Question (3)

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Context Free Grammars. Many slides from Michael Collins

1.2 Interpretive Communication: Students will demonstrate comprehension of content from authentic audio and visual resources.

Proposed syllabi of Foundation Course in French New Session FIRST SEMESTER FFR 100 (Grammar,Comprehension &Paragraph writing)

ENGLISH Month August

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

PROJECT 1 News Media. Note: this project frequently requires the use of Internet-connected computers


Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features

West Windsor-Plainsboro Regional School District French Grade 7

Training and evaluation of POS taggers on the French MULTITAG corpus

Language Acquisition French 2016

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

Rule-based Automatic Post-processing of SMT Output to Reduce Human Post-editing Effort

Introduction Brilliant French Information Books Key features

Part I. Figuring out how English works

Example answers and examiner commentaries: Paper 2

How long did... Who did... Where was... When did... How did... Which did...

Exemplar for Internal Achievement Standard French Level 1

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

ह द स ख! Hindi Sikho!

SAMPLE PAPER SYLLABUS

Arts, Literature and Communication International Baccalaureate (500.Z0)

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Lesson 2. La Familia. Independent Learner please see your lesson planner for directions found on page 43.

Course Guide and Syllabus for Zero Textbook Cost FRN 210

Course Syllabus Advanced-Intermediate Grammar ESOL 0352

CS 598 Natural Language Processing

Parsing of part-of-speech tagged Assamese Texts

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Les cartes au poisson

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Syllabus FREN1A. Course call # DIS Office: MRP 2019 Office hours- TBA Phone: Béatrice Russell, Ph. D.

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

ROSETTA STONE PRODUCT OVERVIEW

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Derivational and Inflectional Morphemes in Pak-Pak Language

Words come in categories

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Grammar Lesson Plan: Yes/No Questions with No Overt Auxiliary Verbs

Developing Grammar in Context

Ch VI- SENTENCE PATTERNS.

Reading and Viewing. Reading and Viewing. Français in Primary French Immersion : Kindergarten to Grade Three DRAFT/January

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

GAZİ UNIVERSITY FACULTY OF TOURISM DEPARTMENT OF TOURISM MANAGEMENT

INTRO TO FREN 1010 In 15 Mins Or Less INTRO TO FREN 1010 INTRO TO FREN 1010 INTRO TO FREN FREN 1010 sections FREN 1010

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Indian Institute of Technology, Kanpur

National University of Singapore Faculty of Arts and Social Sciences Centre for Language Studies Academic Year 2014/2015 Semester 2

LNGT0101 Introduction to Linguistics

U : Second Semester French

Grammars & Parsing, Part 1:

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

lgarfield Public Schools Italian One 5 Credits Course Description

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

F.No.29-3/2016-NVS(Acad.) Dated: Sub:- Organisation of Cluster/Regional/National Sports & Games Meet and Exhibition reg.

Pontificia Universidad Católica del Ecuador Facultad de Comunicación, Lingüística y Literatura Escuela de Lenguas Sección de Inglés

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

A Simple Surface Realization Engine for Telugu

CHAPTER IV RESEARCH FINDING AND DISCUSSION

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Sample Goals and Benchmarks

CHAPTER 5. THE SIMPLE PAST

Adjectives tell you more about a noun (for example: the red dress ).

Teaching Vocabulary Summary. Erin Cathey. Middle Tennessee State University

The Acquisition of English Grammatical Morphemes: A Case of Iranian EFL Learners

Part of Speech Template

Memory-based grammatical error correction

Best Practices in Internet Ministry Released November 7, 2008

Developing a TT-MCTAG for German with an RCG-based Parser

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Character Stream Parsing of Mixed-lingual Text

CORPUS ANALYSIS CORPUS ANALYSIS QUANTITATIVE ANALYSIS

Heritage Korean Stage 6 Syllabus Preliminary and HSC Courses

Course Outline for Honors Spanish II Mrs. Sharon Koller

Counterbalance? Counterbalancing form-focused and content-based instruction in immersion pedagogy. Counterbalanced instruction

An Interface between Prosodic Phonology and Syntax in Kurdish

Beginners French FREN 101 University Studies Program. Course Outline

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Transcription:

GETTING STARTED WITH DIRECT MT Milind Ganjoo

Outline Direct MT approach Adding transfer rules Analyzing word alignments Examples and inferences

Step 1: Dictionary translation One foreign word à one (possibly more) English dictionary word(s) e.g. rouge à red, aux à to the Good starting point: http://www.wordreference.com/ Problems with this approach? Context e.g. été means summer when used by itself, but is an auxiliary verb in the past tense of to be when used as a été Ignore context for this exercise pick the most common independent meaning Tokenization e.g. Qu est-ce que tu fais? (French) It s ok to normalize morphemes: Que, est, ce, que, tu, fais

Step 1: Dictionary translation One foreign word à one (possibly more) English dictionary word(s) e.g. rouge à red, aux à to the Good starting point: http://www.wordreference.com/

Step 2: Reordering Use simple POS-based rules Rules should be general enough e.g. noun adjective à adjective noun POS tags must be on individual words, bereft of context

Let s work through an example! French: Elle aime la robe rouge. Find the direct translation English: She loves the red dress. Google Translate: She loves red dress.

Example 2 French: Qu est-ce que vous faites? English: What do you do? (crude translation) Find the direct translation It s ok to normalize morphemes: Que, est, ce, que, tu, fais

Example 3 French: La lettre a été envoyée le Mardi Direct translation: The letter went was summer sent the Tuesday Actual: The letter was sent on Tuesday What does Google Translate say? Context is important e.g. été means summer when used by itself, but is an auxiliary verb in the past tense of to be when used as a été Ignore context for this exercise pick the most common independent meaning

Useful analysis tool: word alignment a[i] f[i] 0 1-1 3 2 il aime la musique classique he likes classical music

Picaro: alignment visualization tool http://www.isi.edu/~riesa/software/picaro/ Try the demo: Web version Use English as e language, and your chosen language as f Alignment input format: f-e

Picaro: alignment visualization tool Example French: Il aime la musique classique English: He loves classical music What would the alignment format be? Il à he, aime à love, la à (), musique à music, classique à classical Alignment: 0-0 1-1 3-3 4-2

Outputting alignment from your code Elle aime la robe rouge She loves the dress red NP ADJP à ADJP NP Swap alignments She loves the red dress

Outputting alignment from your code // translate for each i: english_trans[i] = lookup_dictionary(french[i]) alignment[i] = i // reordering // NN JJ -> JJ NN for each i: if english_trans[i] is noun and english_trans[i+1] is adjective: swap(english_trans[i], english_trans[i+1]) swap(alignment[i], alignment[i+1])

Let s try another language Hindi! One of the 22 official languages of India Regional dialect/vernacular developed in the first millenium Absorbed large number of Persian, Arabic and Turkish words (and English words later) Standardized by the Government of India in 1958

A simple sentence! "क न & 'ध खर द- ज रह 01 Direct translation:! "क न & 'ध खर द- ज रह 01 I store from milk buy go is am Actual translation: I am going to buy milk from the store Google Translate: I'm going to buy milk from the store Language divergences: SOV Postpositions from comes after noun store Tenses for verbs require multiple auxiliary verbs

Other simple examples "2नय 4 द 6क र 7 ल ग ह : ; Direct translation: World in two type of people are are Actual translation: There are two types of people in the world Google Translate: There are two types of people in the world Information about singular/plural nouns is not directly presented determined from verb agreement

Other simple examples व= म?@ - आज नए बजट क ऐल न कय F Direct translation: Finance minister has today new budget of announcement made is Actual translation: The Finance minister announced the new budget today Google Translate: Finance Minister has announced today a new budget Hindi does not contain articles (a/an/the) specificity is determined from context

Things don t always work well उस- अप- बKचM क अपन घर और अपन स र द लत R द Direct translation: He his gave children his children to his house his and house his all and wealth all his give wealth gave He gave his children his house and all his wealth (Could be she/her as well) Google Translate: He gave all your money to your children and your home. Can t determine gender since we aren t using

Try to obtain as much grammatical diversity as possible See slides on Language Divergences for