Contrastive Evaluation of Larger-context Neural Machine Translation

Similar documents
Residual Stacking of RNNs for Neural Machine Translation

Second Exam: Natural Language Parsing with Neural Networks

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

arxiv: v4 [cs.cl] 28 Mar 2016

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

arxiv: v3 [cs.cl] 7 Feb 2017

The KIT-LIMSI Translation System for WMT 2014

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

arxiv: v1 [cs.cl] 2 Apr 2017

Deep Neural Network Language Models

Language Model and Grammar Extraction Variation in Machine Translation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

arxiv: v1 [cs.cv] 10 May 2017

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A study of speaker adaptation for DNN-based speech synthesis

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Lip Reading in Profile

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Noisy SMS Machine Translation in Low-Density Languages

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

ON THE USE OF WORD EMBEDDINGS ALONE TO

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

arxiv: v3 [cs.cl] 24 Apr 2017

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

arxiv: v1 [cs.lg] 7 Apr 2015

Applications of memory-based natural language processing

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Developing a TT-MCTAG for German with an RCG-based Parser

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

arxiv: v2 [cs.cl] 18 Nov 2015

Semantic and Context-aware Linguistic Model for Bias Detection

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

arxiv: v2 [cs.cv] 30 Mar 2017

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

A Case Study: News Classification Based on Term Frequency

Annotation Projection for Discourse Connectives

A deep architecture for non-projective dependency parsing

Axiom 2013 Team Description Paper

CS 598 Natural Language Processing

arxiv: v1 [cs.cl] 27 Apr 2016

EAGLE: an Error-Annotated Corpus of Beginning Learner German

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

The Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University

Speech Emotion Recognition Using Support Vector Machine

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

Theoretical Syntax Winter Answers to practice problems

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Python Machine Learning

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Constructing Parallel Corpus from Movie Subtitles

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

Linking Task: Identifying authors and book titles in verbose queries

Cross Language Information Retrieval

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

The NICT Translation System for IWSLT 2012

Modeling function word errors in DNN-HMM based LVCSR systems

NEURAL DIALOG STATE TRACKER FOR LARGE ONTOLOGIES BY ATTENTION MECHANISM. Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

THE world surrounding us involves multiple modalities

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Parsing of part-of-speech tagged Assamese Texts

Assignment 1: Predicting Amazon Review Ratings

Finding Translations in Scanned Book Collections

Modeling function word errors in DNN-HMM based LVCSR systems

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

Task Tolerance of MT Output in Integrated Text Processes

arxiv: v5 [cs.ai] 18 Aug 2015

A Neural Network GUI Tested on Text-To-Phoneme Mapping

SAMPLE. PJM410: Assessing and Managing Risk. Course Description and Outcomes. Participation & Attendance. Credit Hours: 3

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

CS 101 Computer Science I Fall Instructor Muller. Syllabus

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Georgetown University at TREC 2017 Dynamic Domain Track

Boosting Named Entity Recognition with Neural Character Embeddings

A Reinforcement Learning Variant for Control Scheduling

Model Ensemble for Click Prediction in Bing Search Ads

A Review: Speech Recognition with Deep Learning Methods

Overview of the 3rd Workshop on Asian Translation

Knowledge Transfer in Deep Convolutional Neural Nets

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Transcription:

Institute of Computational Linguistics Contrastive Evaluation of Larger-context Neural Machine Translation Kolloquium Talk 2018 Mathias Müller 4/10/18 KOLLO, Mathias Müller

Larger-context neural machine translation 4/10/18 KOLLO, Mathias Müller Page 2

Why larger context? Source However, the European Central Bank (ECB) took an interest in it in a report on virtual currencies published in October. It describes bitcoin as "the most successful virtual currency, [ ]. Target Dennoch hat die Europäische Zentralbank (EZB) in einem im Oktober veröffentlichten Bericht über virtuelle Währungen Interesse hierfür gezeigt. Sie beschreibt Bitcoin als "die virtuelle Währung mit dem größten Erfolg [ ]. (example taken from newstest2013.{de,en}) 4/10/18 KOLLO, Mathias Müller Page 3

Why larger context? 4/10/18 KOLLO, Mathias Müller Page 4

Why larger context? Source It describes bitcoin as "the most successful virtual currency. Target Es beschreibt den Bitcoin als "die erfolgreichste virtuelle Währung". 4/10/18 KOLLO, Mathias Müller Page 5

How to incorporate larger context? Open question, preliminary works: gated auxiliary context or warm start decoder initialization with a document summary (Wang et al., 2017) additional encoder and attention network for previous source sentence (Jean et al., 2017) Concatenate previous source sentence, mark with a prefix (Tiedemann and Scherrer, 2017) both source and target context (Miculicich Werlen et al., submitted) hierarchical attention, among other solutions (Bawden et al., submitted) 4/10/18 KOLLO, Mathias Müller Page 6

Additional Encoder and attention network on top of Nematus (Sennrich et al., 2017) which follows standard practice: an encoder-decoder framework with attention (Bahdanau et al., 2014) Encoder and Decoder are gated recurrent units (GRUs), a variant of RNNs Decoder is a GRU conditioned on source sentence, the source sentence context in turn is generated by the encoder, and modulated by attention we also condition on preceding sentences, with additional encoders and separate attention networks 4/10/18 KOLLO, Mathias Müller Page 7

Additional Encoder and attention network on top of Nematus (Sennrich et al., 2017) which follows standard practice: an encoder-decoder framework with attention (Bahdanau et al., 2014) Encoder and Decoder are gated recurrent units (GRUs), a variant of RNNs Decoder is a GRU conditioned on source sentence, the source sentence context in turn is generated by the encoder, and modulated by attention we also condition on preceding sentences, with additional encoders and separate attention networks 4/10/18 KOLLO, Mathias Müller Page 7

Recurrent neural networks refresher 4/10/18 KOLLO, Mathias Müller Page 8

RNN variant: gated recurrent unit (GRU) Figure taken from Chung et al. (2014) 4/10/18 KOLLO, Mathias Müller Page 9

Conditional gated recurrent unit (cgru) Detailed formulas: https://github.com/nyu-dl/dl4mt-tutorial/blob/master/docs/cgru.pdf 4/10/18 KOLLO, Mathias Müller Page 10

Extension of cgru for n contexts Detailed formulas: https://github.com/bricksdont/ncgru/blob/master/ct.pdf 4/10/18 KOLLO, Mathias Müller Page 11

How to incorporate larger context? Additional encoder and attention networks for previous context (Jean et al., 2017) in Nematus Technically: an extension of deep transition (Pascanu et al., 2013) with additional GRU steps that attend to contexts other than the current source sentence Intuitively: while generating the next word, the decoder has access to previous source or target sentence Multiple encoders share most of the parameters because embedding matrices are tied (Press and Wolf, 2016) 4/10/18 KOLLO, Mathias Müller Page 12

Actual systems we have trained Nematus systems with standard parameters, similar to Edinburgh s WMT 17 submissions English to German (why?) Training data from WMT 17 1) Baseline system without additional context 2) + source context: 1 previous source sentence if any 3) + target context: 1 previous target sentence if any 4/10/18 KOLLO, Mathias Müller Page 13

How to evaluate larger-context systems? Need: evaluation that focuses on specific linguistic phenomena Challenge Set for contrastive evaluation Source Despite the fact that it is a part of China, Hong Kong determines its currency policy separately. Target Hongkong bestimmt, obwohl es zu China gehört, seine Währungspolitik selbst. Contrastive Hongkong bestimmt, obwohl er zu China gehört, seine Währungspolitik selbst. (example taken from newstest2009) 4/10/18 KOLLO, Mathias Müller Page 14

How to evaluate larger-context systems? Previous work with manually constructed sets: Guillou and Hardmeier (2016); Isabelle et al. (2017); Bawden et al., (submitted) Larger-scale automatic sets: Sennrich (2017); Rios et al. (2017); Burlot and Yvon (2017); ours 4/10/18 KOLLO, Mathias Müller Page 15

Our test set of contrastive examples Sources: WMT, CS Corpus, OpenSubtitles Good candidates extracted automatically after linguistic processing (parsing, coreference resolution) focused on personal pronouns Roughly 600k examples 4/10/18 KOLLO, Mathias Müller Page 16

4/10/18 KOLLO, Mathias Müller Page 17

Results: BLEU System newstest2015 (dev) newstest2017 (test) Baseline 24.80 23.02 C10 22.68 21.47 C11 24.48 22.38 4/10/18 KOLLO, Mathias Müller Page 18

Contrastive scores where EN pronoun is it Baseline C10 C11 Overall performance 0.44 0.47 0.64 Baseline C10 C11 it : er 0.18 0.27 0.50 it : es 0.84 0.76 0.83 it : sie 0.3 0.39 0.62 4/10/18 KOLLO, Mathias Müller Page 19

Contrastive scores where EN pronoun is it Baseline C10 C11 intrasegmental 0.61 0.60 0.67 extrasegmental 0.41 0.45 0.64 ê distance ê Baseline C10 C11 0 0.61 0.60 0.67 1 0.36 0.43 0.64 2 0.46 0.43 0.58 3 0.53 0.53 0.66 3+ 0.67 0.56 0.76 4/10/18 KOLLO, Mathias Müller Page 20

Current activities Last steps for the contrastive evaluation experiments: Publish our resource and work at WMT 18 Ongoing work: inductive biases of fully convolutional (Gehring et al., 2017) or self-attention ( transformer ) models (Vaswani et al., 2017); collaboration with Edinburgh Low-resource experiments with Rumansh: pretraining transformer models with self-attentional language models (adaptation of Ramachandran et al., 2017) 4/10/18 KOLLO, Mathias Müller Page 22

Thanks! Code currently here: https://gitlab.cl.uzh.ch/mt/nematus-context2 4/10/18 KOLLO, Mathias Müller Page 23

Bibliography Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arxiv preprint arxiv:1409.0473 (2014). Bawden, Rachel, et al. Evaluating Discourse Phenomena in Neural Machine Translation. (Submitted to NAACL 2018) Burlot, Franck, and François Yvon. "Evaluating the morphological competence of Machine Translation Systems." Proceedings of the Second Conference on Machine Translation. 2017. Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arxiv preprint arxiv:1412.3555 (2014). Gehring, Jonas, et al. "Convolutional sequence to sequence learning." arxiv preprint arxiv:1705.03122 (2017). Guillou, Liane, and Christian Hardmeier. "PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation." LREC. 2016. Isabelle, Pierre, Colin Cherry, and George Foster. "A Challenge Set Approach to Evaluating Machine Translation." Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. 4/10/18 KOLLO, Mathias Müller Page 24

Bibliography Jean, Sebastien, et al. "Does Neural Machine Translation Benefit from Larger Context?." arxiv preprint arxiv:1704.05135 (2017). Miculicich Werlen, Lesly, et al. Self-Attentive Residual Decoder for Neural Machine Translation. (Submitted to NAACL 2018) Pascanu, Razvan, et al. "How to construct deep recurrent neural networks." In Proceedings of the Second International Conference on Learning Representations (ICLR 2014) Press, Ofir, and Lior Wolf. "Using the Output Embedding to Improve Language Models." Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Vol. 2. 2017. Ramachandran, Prajit, Peter Liu, and Quoc Le. "Unsupervised Pretraining for Sequence to Sequence Learning." Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. Rikters, Matīss, Mark Fishel, and Ondřej Bojar. "Visualizing neural machine translation attention and confidence." The Prague Bulletin of Mathematical Linguistics 109.1 (2017): 39-50. 4/10/18 KOLLO, Mathias Müller Page 25

Bibliography Rios Gonzales, Annette, Laura Mascarell, and Rico Sennrich. "Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings." Proceedings of the Second Conference on Machine Translation. 2017. Sennrich, Rico, et al. "Nematus: a Toolkit for Neural Machine Translation." Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017. Sennrich, Rico. "How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs." Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Vol. 2. 2017. Tiedemann, Jörg, and Yves Scherrer. "Neural Machine Translation with Extended Context." Proceedings of the Third Workshop on Discourse in Machine Translation. 2017. Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017. Wang, Longyue, et al. "Exploiting Cross-Sentence Context for Neural Machine Translation." Proceedings of EMNLP. 2017. 4/10/18 KOLLO, Mathias Müller Page 26

Appendix: Notions of depth in RNN networks generally three types of depth (Pascanu et al., 2013): stacked layers deep transition deep output (each layer individually recurrent) (units not individually recurrent) (units not individually recurrent) in Nematus, the decoder is implemented as a cgru with deep transition and deep output crucially: attention over source sentence vectors C is a deep transition step 4/10/18 KOLLO, Mathias Müller Page 27