From Machine Translation to Computer Assisted Translation using Finite-State Models

Similar documents
Neural Network Model of the Backpropagation Algorithm

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

Fast Multi-task Learning for Query Spelling Correction

More Accurate Question Answering on Freebase

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

MyLab & Mastering Business

1 Language universals

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

arxiv: v1 [cs.cl] 2 Apr 2017

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Regression for Sentence-Level MT Evaluation with Pseudo References

Language Model and Grammar Extraction Variation in Machine Translation

A Quantitative Method for Machine Translation Evaluation

Noisy SMS Machine Translation in Low-Density Languages

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Re-evaluating the Role of Bleu in Machine Translation Research

Cross Language Information Retrieval

Learning Methods in Multilingual Speech Recognition

Detecting English-French Cognates Using Orthographic Edit Distance

Constructing Parallel Corpus from Movie Subtitles

Speech Recognition at ICSI: Broadcast News and beyond

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Evolutive Neural Net Fuzzy Filtering: Basic Description

Learning Methods for Fuzzy Systems

In Workflow. Viewing: Last edit: 10/27/15 1:51 pm. Approval Path. Date Submi ed: 10/09/15 2:47 pm. 6. Coordinator Curriculum Management

TINE: A Metric to Assess MT Adequacy

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Undergraduate Programs INTERNATIONAL LANGUAGE STUDIES. BA: Spanish Studies 33. BA: Language for International Trade 50

Multi-Lingual Text Leveling

A heuristic framework for pivot-based bilingual dictionary induction

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

The NICT Translation System for IWSLT 2012

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Yoshida Honmachi, Sakyo-ku, Kyoto, Japan 1 Although the label set contains verb phrases, they

Using dialogue context to improve parsing performance in dialogue systems

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

BMBF Project ROBUKOM: Robust Communication Networks

Distant Supervised Relation Extraction with Wikipedia and Freebase

The KIT-LIMSI Translation System for WMT 2014

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

Paper 2. Mathematics test. Calculator allowed. First name. Last name. School KEY STAGE TIER

Language Independent Passage Retrieval for Question Answering

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

SELECCIÓN DE CURSOS CAMPUS CIUDAD DE MÉXICO. Instructions for Course Selection

ACTIVITY: Comparing Combination Locks

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Variations of the Similarity Function of TextRank for Automated Summarization

Mrs. Esther O. Garcia. Course: AP Spanish literature

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Language properties and Grammar of Parallel and Series Parallel Languages

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Interview on Quality Education

From Empire to Twenty-First Century Britain: Economic and Political Development of Great Britain in the 19th and 20th Centuries 5HD391

What the National Curriculum requires in reading at Y5 and Y6

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Word Segmentation of Off-line Handwritten Documents

Task Tolerance of MT Output in Integrated Text Processes

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

TEKS Correlations Proclamation 2017

A hybrid approach to translate Moroccan Arabic dialect

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Active Learning. Yingyu Liang Computer Sciences 760 Fall

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

On the Combined Behavior of Autonomous Resource Management Agents

Speech Emotion Recognition Using Support Vector Machine

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Courses below are sorted by the column Field of study for your better orientation. The list is subject to change.

Characteristics of the Text Genre Realistic fi ction Text Structure

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Modeling function word errors in DNN-HMM based LVCSR systems

Data Fusion Models in WSNs: Comparison and Analysis

Agent-Based Software Engineering

TEAM NEWSLETTER. Welton Primar y School SENIOR LEADERSHIP TEAM. School Improvement

Function Tables With The Magic Function Machine

Finding Translations in Scanned Book Collections

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

CELEBRA UN POWWOW LESSON PLAN FOR GRADES 3 6

Initial approaches on Cross-Lingual Information Retrieval using Statistical Machine Translation on User Queries

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

ROSETTA STONE PRODUCT OVERVIEW

MCAS_2017_Gr5_ELA_RID. IV. English Language Arts, Grade 5

A Case Study: News Classification Based on Term Frequency

On-the-Fly Customization of Automated Essay Scoring

Applying Biliteracy and the Bridge to the Immersion Setting BIC Conference

Appendix L: Online Testing Highlights and Script

Physics 270: Experimental Physics

Defining Numeracy the story continues David Kaye LLU+ London South Bank University

Spanish III Class Description

Transcription:

From Machine Translaion o Compuer Assised Translaion using Finie-Sae Models Jorge Civera, Elsa Cubel, Anonio L. Lagarda, David Picó, Jorge González, Enrique Vidal, Francisco Casacubera Insiuo Tecnológico de Informáica Dpo. de Sisemas Informáicos y Compuación, Universidad Poliécnica de Valencia E-46071 Valencia, Spain jorcisai@ii.upv.es Juan M. Vilar, Sergio Barrachina Dpo. de Lenguajes y Sisemas Informáicos, Universidad Jaime I, E-12071 Casellón de la Plana, Spain jvilar@lsi.uji.es Absrac Sae-of-he-ar machine ranslaion echniques are sill far from producing high qualiy ranslaions. This drawback leads us o inroduce an alernaive approach o he ranslaion problem ha brings human experise ino he machine ranslaion scenario. In his framework, namely Compuer Assised Translaion (CAT), human ranslaors inerac wih a ranslaion sysem, as an assisance ool, ha dinamically offers, a lis of ranslaions ha bes complees he par of he senence already ranslaed. In his paper, finie sae ransducers are presened as a candidae echnology in he CAT paradigm. The appropriaeness of his echnique is evaluaed on a priner manual corpus and resuls from preliminary experimens confirm ha human ranslaors would reduce o less han 25% he amoun of work o be done for he same ask. 1 Inroducion Sae-of-he-ar machine ranslaion echniques are sill far from producing high qualiy ranslaions. This drawback leads us o inroduce an alernaive approach o he ranslaion problem ha brings human experise ino he machine ranslaion scenario. (Langlais e al., 2000) proposed his idea ha can be illusraed as follows. Iniially, he human ranslaor is provided wih a possible ranslaion for he senence o be ranslaed. Unforunaely in mos of he cases, his ranslaion is no perfec, so he ranslaor amends i and asks for a ranslaion of he par of he senence sill o be ranslaed (compleion). This laer ineracion is repeaed as many imes as needed unil he final ranslaion is achieved. The scenario described in he previous paragraph, can be seen as an ieraive refinemen of he ranslaions offered by he ranslaion sysem, ha wihou possessing he desired qualiy, help he ranslaor o increase his/her produciviy. Nowadays, his lack of ranslaion excellence is a common characerisic in all machine ranslaion sysems. Therefore, he human-machine synergy represened by he CAT paradigm seems o be more promising han fully-auomaic ranslaion in he near fuure. The CAT paradigm has wo imporan aspecs: he models need o provide adequae compleions and hey have o do so efficienly o perform under usabiliy consrains. To fulfill hese wo requiremens, Sochasic Finie Sae Transducers (SFST) have been seleced since hey have proved in he pas o be able o provide adequae ranslaions (Vidal, 1997; Knigh and Al-Onaizan, 1998; Amengual e al., 2000; Casacubera e al., 2001; Bangalore and Ricardi, 2001). In addiion, efficien parsing algorihms can be easily adaped in order o provide compleions. The res of he paper is srucured as follows. The following secion inroduces he general seing for machine ranslaion and finie sae models. In secion 3, he search procedure for an ineracive ranslaion is presened. Experimenal resuls are presened in secion 4. Finally, some conclusions and fuure work are explained in secion 5. 2 Machine ranslaion wih finie-sae ransducers Given a source senence, he goal of MT is o find a arge senence ha maximizes:

& argmax s argmax s (1) The join disribuion s can be modeled by a Sochasic Finie Sae Transducer Casacubera, 2001): argmax (Picó and s argmax s (2) A Sochasic Finie-Sae Transducer (SFST) is a finie-sae nework whose ransiions are labeled by hree iems: 1. a source symbol (a word from he source language vocabulary); 2. a arge sring (a sequence of words from he arge language vocabulary) and 3. a ransiion probabiliy. They have been successfully applied ino many ranslaion asks (Vidal, 1997; Amengual e al., 2000; Casacubera e al., 2001). Furhermore, here exis efficien search algorihms like Vierbi (Vierbi, 1967) for he bes pah and he Recursive Enumeraion Algorihm (REA) (Jiménez and Marzal, 1999) for he -bes pahs. One possible way of inferring SFSTs is he Grammaical Inference and Alignmens for Transducer Inference (GIATI) echnique (he previous name of his echnique was MGTI - Morphic- Generaor Transducer Inference) (Casacubera e al., 2004). Given a finie sample of sring pairs, i works in hree seps: 1. Building raining srings. Each raining pair is ransformed ino a single sring from an exended alphabe o obain a new sample of srings. The exended alphabe conains words or subsrings from source and arge senences coming from raining pairs. 2. Inferring a (sochasic) regular grammar. Typically, smoohed -gram is inferred from he sample of srings obained in he previous sep. 3. Transforming he inferred regular grammar ino a ransducer. The symbols associaed o he grammar rules are ransformed ino source/arge symbols by applying an adequae ransformaion, hereby ransforming he grammar inferred in he previous sep ino a ransducer. The ransformaion of a parallel corpus ino a corpus of single senences is performed wih he help of saisical alignmens: each word is joined wih is ranslaion in he oupu senence, creaing an exended word. This joining is done aking care no o inver he order of he oupu words. The hird sep is rivial wih his arrangemen. In our experimens, he alignmens are obained using he GIZA sofware (Och and Ney, 2000; Al-Onaizan e al., 1999), which implemens IBM saisical models (Brown e al., 1993). 3 Ineracive search The concep of ineracive search is closely relaed o he CAT paradigm. This paradigm inroduces he new facor ino he general machine ranslaion equaion (Equaion 1). represens a prefix in he arge language obained as a resul of he ineracion beween he human ranslaor and he machine ranslaion sysem. As a side effec of his reformulaion, he opimizaion defined in Equaion 3 is performed over he se of arge suffixes raher han he se of complee arge senences. Thence, he goal of CAT in he finie-sae ransducer framework is o find a predicion of he bes suffix, given a source senence s, a prefix of he arge senence and a SFST : argmax s "!# #$ argmax % s!#& (' s! (3) argmax A ransducer can be undersood as a weighed graph in which every pah is a possible source-arge senence pair represened in a compac manner. Given a source senence s o be ranslaed, his senence is iniially employed o define a se of pahs in he ransducer, whose sequence of source symbols is compaible wih he source senence. Equaion 3 is jus defining he mos probable pah (arge suffix ) among hose ha are compaible, having as a arge prefix.

"paper" (0.4) 10 "." (0.133333) "(null)" (0.061728) 3 "(null)" (0.133333) 9 "." (0.133333) "load" (0.28) 0 1 "he" (0.246914) 2 "paper" (0.020833) "paper" (0.020833) "sock" (0.020833) 7 8 "nealy" (1) "." (1) 6 "." (1) 11 f=1 "(null)" (0.104167) 5 "." (1) "." (1) 4. Figure 1: Resulan word graph given he source senence cargue el papel The search for his pah (he produc of he probabiliies associaed wih is edges is maximum) is performed according o he Vierbi decoding over he se of pahs ha were compaible wih he source senence. The concaenaion of he arge symbols of his bes pah will give place o he arge senence (ranslaion). The soluion o he search problem has been devised in wo phases. The firs one copes wih he exracion of a word graph from a SFST given a source senence s. A word graph represens he se of pahs whose sequence of source symbols is compaible wih he source senence s. The second phase involves he search for he bes ranslaion over he word graph. To be more precise, in he presen work he concep of bes ranslaion has been exended o a se of bes ranslaions (n-bes ranslaions). This search can be carried ou efficienly aking ino accoun no only he a poseriori probabiliy of a given ranslaion, bu also he minimum edi cos wih respec o he arge prefix. The way in which his laer crierium is inegraed in he search process will be explain in secion 3.2. 3.1 Word-graph derivaion A word graph represens he se of all possible ranslaions for a given source senence s ha were embeded in he SFST. The derivaion of he word graph is performed by inersecing he SFST wih he source senence s defining a subgraph in whose pahs are compaible wih he source senence. Ineracive search can be simplified significanly by using his represenaion of he se of arge senences, since he inclusion of edi cos operaions along wih he search procedure inroduces some peculiariies ha can be solved efficienly in he word graph. An example of word graph is shown in Figure 1. 3.2 Search for -bes ranslaions given a prefix of he arge senence The applicaion of his ype of search is aimed a he core of CAT. In his paradigm, given a source senence s, he human ranslaor is provided wih a lis of n ranslaions, also called -bes ranslaions. Then, he human ranslaor will proceed o accep a prefix of one of hese -bes ranslaions as correc, appending some recificaions o he seleced prefix. This new prefix of he arge senence ogeher wih he source senence s will generae a new se of bes ranslaions ha will be again modified by he human ranslaor. This process is repeaed as many imes as neccessary o achieve he desired final ranslaion. Ideally, he ask would be o find he arge suffix ha maximizes he probabiliy a poseriori given a prefix of he arge senence and he inpu senence. In pracice, however, i may happen ha is no presen in he word graph. The soluion is o use no bu a prefix ha minimizes he ediion disance wih and is compaible wih. Therefore, he score of a arge ranslaion is characerized by wo funcions, he edi cos beween he arge prefix and he opimal prefix found in he word graph and he a poseriori probabiliy of ( ). In order o value more significanly hose ranslaions ha were closer o he user preferences, he lis of - bes ranslaions has been prioriized using wo crieria: firs, he minimum edi cos and hen, by he a

poseriori probabiliy. The algorihm proposed o solve his search problem is an adaped version of he Recursive Enumeraion Algorihm (REA) described in (Jiménez and Marzal, 1999) ha inegraes he minimum edi cos algorihm in he search procedure o deal wih words, inroduced by he user, ha are no presen in he word graph. This algorihm consiss of wo pars: Forward search ha calculaes he 1-bes pah from he iniial sae o every sae in he word graph. Pahs in he word graph are weighed no only based on heir a poseriori probabiliy, bu also on heir edi cos respec o he arge senence prefix. To his purpose, ficicious edges have been insered ino he word graph o represen ediion operaions like inserion, subsiuion and deleion. These ediion operaions have been included in he word graph in he following way: Inserion: An inserion edge has been insered as a loop for each sae in he word graph wih uniary cos. Deleion: A deleion edge is added for each arc in he word graph having he same source and arge sae han is sibling arc wih uniary cos. Subsiuion: Each arc in he word graph is reaed as a subsiuion edge whose edi cos is proporional o he levenshein disance beween he symbol associaed wih his arc and he word prefix employed o raverse his arc during he search. This subsiuion cos is zero when he word prefix maches he symbol in he word graph arc. Backward search ha enumeraes candidaes for he -bes pah along he -bes pah. This recursive algorihm defines he nex bes pah ha arrives a a given sae as he nex bes pah ha reaches plus he arc leaving from o. If his nex bes pah arriving a sae has no been calculaed ye, hen he nex bes pah procedure is called recursively unil a 1-bes pah is found or no bes pahs are found. To reduce he compuaional cos of he search, he beam-search echnique (Ney e al., 1992) has been implemened. During he word graph consrucion, wo beam coefficiens were employed o penalize hose edges leading o backoff saes over hose ones arriving a normal saes. Finally, a hird beam coefficien conrols how far in erms of number of ediion operaions a hypohesis. 4 Experimenal resuls 4.1 Corpus feaures The corpus employed o perform experimens was he Xerox corpus (SchlumbergerSema S.A e al., 2001). I involves he ranslaion of echnical Xerox manuals from English o Spanish, French and German and vice-versa. Some saisics abou he daa used for raining and es purposes are shown in Table 1. 4.2 Sample session A TT2 ineracive prooype, which uses he searching echniques presened in he previous secions, has been implemened. The user is able o cusomized his prooype in differen ways: number of suggesed ranslaions, lengh in number of words of hese suggesions, ec. In he example below, he number of suggesions is five and he lengh of hese suggesions has no been bound. Example 1 This example shows he funcionaliy and he ineracion beween he TT2 prooype and a ranslaor hrough a ranslaion insance from English o Spanish for a given senence drawn from he Xerox corpus. For beer undersanding of his example he reference arge senence is given below: Reference arge senence: Insalación de conroladores de impresora y archivos PPD. Source senence: Insalling he Priner Drivers and PPDs. Hypohesis 0.0: Insalación del los conroladores de impresión y archivos PPD adapados. Hypohesis 0.1: Insalación del los conroladores

Table 1: Feaures of Xerox Corpus: raining, vocabulary and es sizes measured in housands of words. SIM: Currenly used reversible preprocessing. RAW: Original corpus wihou preprocess. PERPLEXITY: Measure how well a language model describes he es se. EN / ES EN / DE EN / FR RAW SIM RAW SIM RAW SIM TRAINING 600/700 600/700 600/500 500/600 600/700 500/400 VOCABULARY 26 / 30 8 / 11 25 / 27 8 / 10 25 / 37 8 / 19 TEST 8 / 9 8 / 10 9 / 10 11 / 12 11 / 10 12 / 12 PERPLEXITY (3-gram) 107/60 48/33 93/169 51/87 193/135 73/52 Hypohesis 0.2: Insalación de la los conroladores de impresión y archivos PPD adapados. Hypohesis 0.3: Insalación de la los conroladores Hypohesis 0.4: Insalación de la esa los conroladores User ineracion 0: Hypohesis 0.2 is seleced and he cursor is posiioned a he beginning of he word los. Then, he ranslaor would ype he characer c, ha is, he nex characer in he reference arge senence. Prefix 0: Insalación de c Hypohesis 1.0: Insalación de c los conroladores de impresión y archivos PPD adapaados. Hypohesis 1.1: Insalación de c los conroladores Hypohesis 1.2: Insalación de c esa los conroladores de impresión y archivos PPD adapaados. Hypohesis 1.3: Insalación de c esa los conroladores Hypohesis 1.4: Insalación de conroladores de impresión y fax y en archivos PPD adapaados. User ineracion 1: Hypohesis 1.4 is seleced and he cursor is posiioned beween he characer s and i of he word impresión. Then, he ranslaor would ype he nex characer in he reference arge senence: o. Prefix 1: Insalación de conroladores de impreso Hypohesis 2.0: Insalación de conroladores de impresora y archivos PPD adapados. Hypohesis 2.1: Insalación de conroladores de impresora y ver los archivos PPD. Hypohesis 2.2: Insalación de conroladores de impresora/fax y ver los archivos PPD. Hypohesis 2.3: Insalación de conroladores de impresora/fax y archivos PPD adapados. Hypohesis 2.4: Insalación de conroladores de impresora y fax de CenreWare y ver los archivos PPD. User ineracion 2: Hypohesis 2.0 is seleced and he cursor is posiioned a he end of he word PPD. The ranslaor would jus need o add he characer.. Prefix 2: Insalación de conroladores de impresora y archivos PPD. Hypohesis 3.0: Insalación de conroladores de impresora y archivos PPD. Hypohesis 3.1: Insalación de conroladores de impresora y archivos PPD.: Hypohesis 3.2: Insalación de conroladores de impresora y archivos PPD.. Hypohesis 3.3: Insalación de conroladores de impresora y archivos PPD... Hypohesis 3.4: Insalación de conroladores de impresora y archivos PPD.:. User ineracion 3 : Hypohesis 3.0 is seleced and he user acceps he arge senence.

Final hypohesis: Insalación de conroladores de impresora y archivos PPD. 4.3 Translaion qualiy evaluaion The assessmen of he echniques presened in secion 3 has been carried ou using hree measures: 1. Translaion Word Error Rae (TWER): I is defined as he minimum number of word subsiuion, deleion and inserion operaions o conver he arge senence provided by he ransducer ino he reference ranslaion. Also known as edi disance. 2. Characer Error Rae (CER): Edi disance in erms of characers beween he arge senence provided by he ransducer and he reference ranslaion. 3. Key-Sroke Raio (KSR): Number of keysrokes ha are necessary o achieve he reference ranslaion plus he accepance keysroke divided by he number of running characers. 4. BiLingual Evaluaion Undersudy (BLEU) (Papineni e al., 2002): Basically is a funcion of he k-subsrings ha appear in he hypohesized arge senence and in he reference arge senence. These experimens were perfomed wih 3- gram ransducers based on he GIATI echnique. On he lefmos column appears he language pair employed for each experimen, English (En), Spanish (Es), French (Fr) and German (De). The main wo cenral columns compare he resuls obained wih 1-bes ranslaion o 5-bes ranslaions. When using 5-bes ranslaions, ha arge senence ou of hese five, ha minimizes mos he corresponden error measure is seleced. The resuls are shown in Table 2. The bes resuls were obained beween English and Spanish language pairs, in which he human ranslaor would need o ype less han 25% of he oal reference senences. In oher words, his would resul in a heoreically facor of 4 increase in he produciviy of human ranslaors. In fac, preliminary subjecive evaluaions have received posiive feedback from professional ranslaors when esing he prooype. Table 2: Resuls for he Xerox Corpus comparing 1-bes o 5-bes ranslaions 3-gram (1-bes) 3-gram (5-bes) RAW KSR CER TWER KSR CER TWER En-Es 26.0 29.1 42.3 23.4 24.4 37.2 Es-En 27.4 33.1 50.1 24.1 24.9 42.7 En-Fr 53.7 55.4 77.5 49.3 48.7 70.5 Fr-En 54.0 55.6 74.2 49.9 49.4 68.8 En-De 59.4 61.2 82.4 54.0 54.7 76.6 De-En 52.6 60.3 77.9 48.0 53.4 72.7 Furhermore, in all cases here is a clear and significan improvemen in error measures when moving from 1 o 5-bes ranslaions. This gain in ranslaion qualiy dimishes in a log-wise fashion as he number of bes ranslaions increases. However, he number of hypoheses should be limied o he user capabiliy o skim hrough he candidae ranslaions and decide on which one o selec. Table 3 presens he resuls obained on a simplified version of he corpus. This simplificaion consiss on okenizaion, case normalizaion and he subsiuion of numbers, priner codes, ec. by heir corresponden caegory labels. Table 3: Resuls for he Xerox Corpus comparing 1-bes o 5-bes ranslaions 3-gram (1-bes) 3-gram (5-bes) SIM WER CER BLEU WER CER BLEU En-Es 31.8 24.7 0.67 26.8 20.3 0.71 Es-En 34.3 27.8 0.62 27.0 20.4 0.69 En-Fr 64.2 48.8 0.43 57.2 42.8 0.45 Fr-En 59.2 48.5 0.42 53.6 42.5 0.45 En-De 72.1 55.3 0.32 65.8 49.1 0.35 De-En 64.7 53.9 0.36 58.4 47.7 0.39 Pair of languages as English and French presens somewha higher error raes, as is also he case beween English and German, reflecing he complexiy of he ask faced in hese experimens. 5 Conclusions and fuure work Finie-sae ransducers have been successfully applied o CAT. These models can be learn from parallel corpora. The concep of ineracive search has been inroduced in his paper along wih some

efficien echniques (word graph derivaion and - bes) ha solve he parsing problem given a prefix of he arge senence under real-ime consrains. The resuls show ha he 5-bes approach clearly improves he qualiy of he ranslaions, wih respec o he 1-bes approximaion. The promising resuls achieved in he firs experimens provide a new field in machine ranslaion sill o be explored, in which he human experise is combined wih machine ranslaion echniques o increase produciviy wihou sacrifying high-qualiy ranslaion. Finally, he inroducion of morpho-synacic informaion or bilingual caegories in finie-sae ransducers, are opics ha leave an open door o fuure research. As well as some improvemens in he search algorihms o reduce he compuaional cos of finding a pah in he word graph wih he minimum edi cos. Acknowledgmens The auhors would like o hank all he reasearchers involved in he TT2 projec who have conribued o he developmen of he mehodologies presened in his paper. This work has been suppored by he European Union under he IST Programme (IST-2001-32091). References Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knigh, John Laffery, Dan Melamed, Franz J. Och, David Purdy, Noah Smih, and David Yarowsky. 1999. Saisical machine ranslaion: Final repor. Workshop on language engineering, Johns Hopkins Universiy, Cener for Language and Speech Processing, Balimore, MD, USA. Juan C. Amengual, José M. Benedí, Asunción Casano, Anonio Casellanos, Vícor M. Jiménez, David Llorens, Andrés Marzal, Moisés Pasor, Federico Pra, Enrique Vidal, and Juan M. Vilar. 2000. The EuTrans-I speech ranslaion sysem. Machine Translaion, 15:75 103. S. Bangalore and G. Ricardi. 2001. A finie-sae approach o machine ranslaion. In Second Meeing of he Norh American Chaper of he Associaion for Compuaional Linguisics. Peer F. Brown, Sephen Della Piera, Vincen J. Della Piera, and Rober L. Mercer. 1993. The mahemaics of saisical machine ranslaion: Parameer esimaion. Compuaional Linguisics, 19(2):263 312. Francisco Casacubera, David Llorens, Carlos Marínez, Sirko Molau, Francisco Nevado, Hermann Ney, Moisés Pasor, David Picó, Albero Sanchis, Enrique Vidal, and Juan M. Vilar. 2001. Speech-o-speech ranslaion based on finie-sae ransducers. In Inernaional Conference on Acousic, Speech and Signal Processing, volume 1. IEEE Press, April. Francisco Casacubera, Hermann Ney, Franz J. Och, Enrique Vidal, Juan M. Vilar, Sergio Barrachina, Ismael García-Varea, David Llorens, Carlos Marínez, Sirko Molau, Francisco Nevado, Moisés Pasor, David Picó, and Albero Sanchís. 2004. Some approaches o saisical and finie-sae speech-o-speech ranslaion. Compuer Speech and Language, 18:25 47. Vícor M. Jiménez and Andrés Marzal. 1999. Compuing he k shores pahs: a new algorihm and an experimenal comparison. In J. S. Vier and C. D. Zaroliagis, ediors, Algorihm Engineering, volume 1668 of Lecure Noes in Compuer Science, pages 15 29, London, July. Springer-Verlag. Kevin Knigh and Yaser Al-Onaizan. 1998. Translaion wih finie-sae devices. In E. Hovy D. Farwell, L. Gerber, edior, Machine Translaion and he Informaion Soup: Third Conference of he Associaion for Machine Translaion in he Americas, volume 1529, pages 421 437, Langhorne, PA, USA, Ocober. AMTA 98. Philippe Langlais, George Foser, and Guy Lapalme. 2000. Uni compleion for a compuer-aided ranslaion yping sysem. Machine Translaion, 15(4):267 294. Hermann Ney, Dieer Mergel, Andreas Noll, and Annedore Paeseler. 1992. Daa driven organizaion for coninuous speech recogniion. In IEEE Transacions on Signal Processing, volume 40, pages 272 281. Franz J. Och and Hermann Ney. 2000. Improved saisical alignmen models. In ACL00,

pages 440 447, Hong Kong, China, Ocober. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a mehod for auomaic evaluaion of machine ranslaion. In Proceedings of he 40h Annual Meeing of he Associaion for Compuaional Linguisics, pages 311 318, Philadelphia. David Picó and Francisco Casacubera. 2001. Some saisical-esimaion mehods for sochasic finie-sae ransducers. Machine Learning, 44:121 142, July-Augus. SchlumbergerSema S.A, Insiuo Tecnológico de Informáica, Rheinisch Wesfälische Technische Hochschule Aachen Lehrsul für Informaik VI, Recherche Appliquée en Linguisique Informaique Laboraory Universiy of Monreal, Celer Soluciones, Sociéé Gamma, and Xerox Research Cenre Europe. 2001. TT2. TransType2 - compuer assised ranslaion. Projec echnical annex. Enrique Vidal. 1997. Finie-sae speech-o-speech ranslaion. In In. Conf. on Acousics Speech and Signal Processing (ICASSP-97), proc., Vol.1, pages 111 114, Munich. Andrew Vierbi. 1967. Error bounds for convoluional codes and a asymoically opimal decoding algorihm. IEEE Transacions on Informaion Theory, 13:260 269.