MT Summit IX, New Orleans, Sep , 2003 Panel Discussion HAVE WE FOUND THE HOLY GRAIL? Hermann Ney

Similar documents
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Cross Language Information Retrieval

Speech Recognition at ICSI: Broadcast News and beyond

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Noisy SMS Machine Translation in Low-Density Languages

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Language Model and Grammar Extraction Variation in Machine Translation

Learning Methods in Multilingual Speech Recognition

CS 598 Natural Language Processing

BMBF Project ROBUKOM: Robust Communication Networks

Lecture 1: Machine Learning Basics

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

A heuristic framework for pivot-based bilingual dictionary induction

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

The Strong Minimalist Thesis and Bounded Optimality

Lecture 10: Reinforcement Learning

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Constructing Parallel Corpus from Movie Subtitles

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Detecting English-French Cognates Using Orthographic Edit Distance

arxiv: v1 [cs.cl] 2 Apr 2017

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

A Case Study: News Classification Based on Term Frequency

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

South Carolina English Language Arts

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Re-evaluating the Role of Bleu in Machine Translation Research

Learning Methods for Fuzzy Systems

Laboratorio di Intelligenza Artificiale e Robotica

An Online Handwriting Recognition System For Turkish

A Quantitative Method for Machine Translation Evaluation

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Mandarin Lexical Tone Recognition: The Gating Paradigm

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

The College Board Redesigned SAT Grade 12

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Rubric for Scoring English 1 Unit 1, Rhetorical Analysis

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Universiteit Leiden ICT in Business

Your Partner for Additive Manufacturing in Aachen. Community R&D Services Education

Cross-lingual Text Fragment Alignment using Divergence from Randomness

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Applications of memory-based natural language processing

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Training and evaluation of POS taggers on the French MULTITAG corpus

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Language Center. Course Catalog

Software Maintenance

EMBA 2-YEAR DEGREE PROGRAM. Department of Management Studies. Indian Institute of Technology Madras, Chennai

Acquiring Competence from Performance Data

Your Partner for Additive Manufacturing in Aachen. Community R&D Services Education

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Arabic Orthography vs. Arabic OCR

BYLINE [Heng Ji, Computer Science Department, New York University,

English Language and Applied Linguistics. Module Descriptions 2017/18

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Task Tolerance of MT Output in Integrated Text Processes

Age Effects on Syntactic Control in. Second Language Learning

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Prediction of Maximal Projection for Semantic Role Labeling

Problems of the Arabic OCR: New Attitudes

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

On the Combined Behavior of Autonomous Resource Management Agents

Parsing of part-of-speech tagged Assamese Texts

L1 and L2 acquisition. Holger Diessel

Rule Learning With Negation: Issues Regarding Effectiveness

Linking Task: Identifying authors and book titles in verbose queries

Machine Translation on the Medical Domain: The Role of BLEU/NIST and METEOR in a Controlled Vocabulary Setting

An Introduction to Simio for Beginners

Laboratorio di Intelligenza Artificiale e Robotica

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

GENERAL COMMENTS Some students performed well on the 2013 Tamil written examination. However, there were some who did not perform well.

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Telekooperation Seminar

Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

Criterion Met? Primary Supporting Y N Reading Street Comprehensive. Publisher Citations

Matching Similarity for Keyword-Based Clustering

Lecture 2: Quantifiers and Approximation

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Truth Inference in Crowdsourcing: Is the Problem Solved?

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Phonological Processing for Urdu Text to Speech System

Seminar - Organic Computing

Florida Reading Endorsement Alignment Matrix Competency 1

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen

Transcription:

MT Summit IX, New Orleans, Sep. 23-27, 2003 Panel Discussion HAVE WE FOUND THE HOLY GRAIL? Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik VI Computer Science Department RWTH Aachen University of Technology D-52056 Aachen, Germany Ney: Statistical Speech Translation c RWTH Aachen 1 26-Sep-03

Contents 1 Specific Questions 3 2 Recent Projects: Speech and Language Translation 4 3 The Statistical Approach to NLP and MT 5 4 State of the Art in SMT 10 5 Answers 14 Ney: Statistical Speech Translation c RWTH Aachen 2 26-Sep-03

1 Specific Questions 1. Have we found the holy grail? 2. Will progress in data-driven MT continue unabated? 3. Has the data-driven paradigm been able to model information that was not present in rule-based systems? 4. Was the metric used to rank participating systems in the NIST competition fair? 5. Is it correct that SMT has indeed surpassed traditional rule-based systems? 6. Are there niche applications for which SMT is well suited? 7. Is there a danger that SMT s recent success gives the impression that MT is a solved problem? 8. Would the NIST evaluation have been different for the language pair English-French? 9. What about rule-based component s in today s and future data-driven systems? Ney: Statistical Speech Translation c RWTH Aachen 3 26-Sep-03

2 Recent Projects: Speech and Language Translation spoken language translation: joint projects (national, European, international: ATR, C-Star, Verbmobil, Eutrans, Nespole!, Fame, LC-Star, PF-Star,...): restricted domains: appointment scheduling, conference registration, travelling, tourism information,... vocabulary size: 3 000 10 000 words best performing systems and approaches: data-driven example-based methods finite-state transducers statistical approaches e.g.: Verbmobil evaluation [June 2000]: better by a factor of 2 written language translation: US Tides project 2001-2004 unrestricted domain: press news, vocab.size = 50 000 words language pairs: Chinese English, Arabic English performance [July 2003]: best statistical systems are better than conventional/commercial systems Ney: Statistical Speech Translation c RWTH Aachen 4 26-Sep-03

3 The Statistical Approach to NLP and MT principles: MT and other NLP tasks are complex tasks, for which perfect solutions are difficult (compare: all models in physics are approximations!) consequence: use imperfect and vague knowledge and try to minimize the number of decision errors statistical decision theory and Bayes decision rule using probabilistic dependencies between input x and decision c: x ĉ = arg max c = arg max c { } pr(c x) { } pr(c) pr(x c) resulting concept: NLP = Statistics + (Linguistic?) Modelling Ney: Statistical Speech Translation c RWTH Aachen 5 26-Sep-03

The Statistical Approach: Key Components decision rule: requires maximization (sometimes hard!) and probability distribution pr(c x), which is unknown probability model p θ (c x) or p θ (c) p θ (x c) is used to replace pr(c x) or pr(c) pr(x c) training criterion to learn the unknown parameters θ from training data ideal goal: optimum performance Ney: Statistical Speech Translation c RWTH Aachen 6 26-Sep-03

training data training phase & analysis of results training criterion refinements parameter estimates probability model test data decision rule testing phase & operational phase decision result Ney: Statistical Speech Translation c RWTH Aachen 7 26-Sep-03

Advantages of Statistical Approach holistic decision criterion: exploits ALL (available) knowledge sources is able to combine thousands of weak dependencies handles interpendencies, ambiguities and conflicts powerful training methods: training criterion is linked to performance fully automatic procedures (no human involved) HUGE amounts of data can be exploited note: virtually none of these statements applies to rule-based systems! Ney: Statistical Speech Translation c RWTH Aachen 8 26-Sep-03

Source Language Text Machine Translation: Bayes Decision Rule Transformation f 1 J Global Search: Pr(f 1 J e1 I ) Lexicon Model maximize Pr( e1 I ) Pr(f 1 J e1 I ) Alignment Model over e 1 I Pr( e 1 I ) Language Model Transformation Target Language Text Ney: Statistical Speech Translation c RWTH Aachen 9 26-Sep-03

4 State of the Art in SMT lot of progress in SMT: best statistical systems are competitive with conventional, hand tailored systems system components: alignment and lexicon model: training: IBM-1 to -5 and/or HMM: based on single words symmetrization of roles of source and target languages extraction of phrases (alignment templates): try to memorize all source/target phrases language model: word tri- and higher n-grams generation (search): beam search, with limited degree of non-monotinicity performance: use of phrases: lion s share of the improvement unclear: performance on unseen test data lack of syntactic structure Ney: Statistical Speech Translation c RWTH Aachen 10 26-Sep-03

Room for Improvements and Challenges Bayes decision rule for translating a source sentence f J 1 into a target sentence ei 1 : argmax e I 1 P r(e I 1 f J 1 ) = argmax {P r(e I e I 1 ) P r(f J 1 ei 1 )} 1 optimizes sentence errors, not word errors or BLEU/NIST score challenge: decision rule closer to word errors or BLEU/NIST score? training criterion? Ney: Statistical Speech Translation c RWTH Aachen 11 26-Sep-03

alignment and lexicon models (in training): challenges: introduction of context dependency: intra- and inter-sentence level integration of morphology and -syntax reordering based on syntactic structure phrases (alignment templates): good for seen test data memory-based translation challenge: design models with good generalization capabilities, i.e. which work well on UNSEEN test data challenge: consistent framework for implicit segmentation, words-phrases balance,... language model: monolingual grammar to improve the syntactic structure explicit link with word alignment and reordering bilingual grammar generation (or search): not a problem for present models, but what about more complex models in the future? Ney: Statistical Speech Translation c RWTH Aachen 12 26-Sep-03

comparison with speech recognition (1973-2003): most of the progress: by pure statistical modelling some progress: by weak acoustic-phonetic knowledge no progress: by classical rule-based and AI methods prediction (?) for machine translation: improvements by progress in pure statistical modelling: more training data (counteracts estimation problems) improved training criteria and training algorithms by better modelling the data inherent dependenccies (more structured models) (program for 20-200 years?) Ney: Statistical Speech Translation c RWTH Aachen 13 26-Sep-03

5 Answers SMT is the right direction, there is no inherent ceiling, but it is still a long way to go (20 200 years?) advantages of statistical MT: better decisions, processing lots of data, performance feedback If done correctly, SMT must result in the best performance due to the coupling of training and performance criterion fair comparison: many aspects: time, effort,... evaluation metric: not perfect, but of secondary importance specific applications for SMT: rapid system development (if parallel corpus exists) hybrid systems: in theory yes, in practice??? (see speech recognition) funding: Being too successful is not good for funding. Ney: Statistical Speech Translation c RWTH Aachen 14 26-Sep-03

THE END Ney: Statistical Speech Translation c RWTH Aachen 15 26-Sep-03