Unsupervised NMT with Weight Sharing. Zhen Yang, Wei Chen, Feng Wang and Bo Xu Institute of Automation, Chinese Academy of Sciences 2018/07/16

Similar documents
arxiv: v1 [cs.cl] 2 Apr 2017

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Generative models and adversarial training

Residual Stacking of RNNs for Neural Machine Translation

Cross Language Information Retrieval

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Python Machine Learning

Noisy SMS Machine Translation in Low-Density Languages

A heuristic framework for pivot-based bilingual dictionary induction

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Probabilistic Latent Semantic Analysis

Georgetown University at TREC 2017 Dynamic Domain Track

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Investigation on Mandarin Broadcast News Speech Recognition

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Overview of the 3rd Workshop on Asian Translation

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Adding syntactic structure to bilingual terminology for improved domain adaptation

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Language Model and Grammar Extraction Variation in Machine Translation

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Lip Reading in Profile

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Modeling function word errors in DNN-HMM based LVCSR systems

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v4 [cs.cl] 28 Mar 2016

arxiv: v2 [cs.cv] 30 Mar 2017

TextGraphs: Graph-based algorithms for Natural Language Processing

arxiv: v3 [cs.cl] 7 Feb 2017

Modeling function word errors in DNN-HMM based LVCSR systems

Exploration. CS : Deep Reinforcement Learning Sergey Levine

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Multilingual Sentiment and Subjectivity Analysis

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Deep Neural Network Language Models

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Artificial Neural Networks written examination

Evolutive Neural Net Fuzzy Filtering: Basic Description

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Dialog-based Language Learning

Welcome to the session on ACCUPLACER Policy Development. This session will touch upon common policy decisions an institution may encounter during the

Second Exam: Natural Language Parsing with Neural Networks

Soft Computing based Learning for Cognitive Radio

Lecture 1: Machine Learning Basics

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

arxiv: v4 [cs.cv] 13 Aug 2017

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

arxiv: v1 [cs.cv] 10 May 2017

Learning Methods for Fuzzy Systems

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval

arxiv: v2 [cs.ir] 22 Aug 2016

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Axiom 2013 Team Description Paper

Laboratorio di Intelligenza Artificiale e Robotica

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

A study of speaker adaptation for DNN-based speech synthesis

THE world surrounding us involves multiple modalities

Application of Visualization Technology in Professional Teaching

Radius STEM Readiness TM

A student diagnosing and evaluation system for laboratory-based academic exercises

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

arxiv: v1 [cs.lg] 7 Apr 2015

FBK-HLT-NLP at SemEval-2016 Task 2: A Multitask, Deep Learning Approach for Interpretable Semantic Textual Similarity

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

INPE São José dos Campos

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Cross-Lingual Text Categorization

Learning Methods in Multilingual Speech Recognition

Report on organizing the ROSE survey in France

Assignment 1: Predicting Amazon Review Ratings

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A Reinforcement Learning Variant for Control Scheduling

Matching Meaning for Cross-Language Information Retrieval

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

The KIT-LIMSI Translation System for WMT 2014

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Attributed Social Network Embedding

Learning to Schedule Straight-Line Code

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Using dialogue context to improve parsing performance in dialogue systems

The NICT Translation System for IWSLT 2012

M55205-Mastering Microsoft Project 2016

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Probabilistic principles in unsupervised learning of visual structure: human data and a model

Transcription:

Unsupervised NMT with Weight Sharing Zhen Yang, Wei Chen, Feng Wang and Bo Xu Institute of Automation, Chinese Academy of Sciences 2018/07/16

1 Background Contents 2 3 The proposed model Experiments and results 4 Related and future work

Background Assumption: different languages can be mapped into one shared-latent space

Techniques based on Initialize the model with inferred bilingual dictionary Unsupervised word embedding mapping Learn strong language model De-noising Auto-Encoding Convert Unsupervised setting into a supervised one Back-translation Constrain the latent representation produced by encoders to a shared space fully-shared encoder fixed mapped embedding GAN

We find The shared encoder is a bottleneck for unsupervised NMT The shared encoder is weak in keeping the unique and internal characteristics of each language, such as the style, terminology and sentence structure. Since each language has its own characteristics, the source and target language should be encoded and learned independently. Fixed word embedding also weakens the performance (not included in the paper) If you are interested about this part, you can find some discussions in our github code: https://github.com/zhenyangiacas/unsupervised-nmt

The proposed model: The local GAN is utilized to constrain the source and target latent representations to have the same distribution (embedding-reinforced encoder is also designed for this purpose, see our paper for detail). The global GAN is utilized to fine tune the whole model.

Experiment setup: Training sets: WMT16En-de, WMT14En-Fr, LDC Zn-En Note: The monolingual data is built by selecting the front half of the source language and the back half of the target language. Test sets: newstest2016en-de, newstest2014en-fr, NIST02En-Zh Model Architecture: 4 self-attention layers for encoder and decoder Word Embedding: applying the Word2vec to pre-train the word embedding utilizing Vecmap to map these embedding to a shared-latent space

Experimental results: The effects of the weight-sharing layer number Layers for sharing En-de En-Fr Zh-En 0 10.23 16.02 13.75 1 10.86 16.97 14.52 2 10.56 16.73 14.07 3 10.63 16.50 13.92 4 10.01 16.44 12.86 Sharing one layer achieves the best translation performance.

Experimental results: The BLEU results of the proposed model: Baseline 1: the word-by-word translation according to the similarity of the word embedding Baseline 2: unsupervised NMT with monolingual corpora only proposed by Facebook. Upper Bound: the supervised translation on the same model.

Experimental results: Ablation study We perform an ablation study by training multiple versions of our model with some missing components: the local GAN, global GAN, the directional self-attention, the weight-sharing and the embedding-reinforced encoder. We do not test the importance of the auto-encoding, back-translation and the pre-trained embeddings since they have been widely tested in previous works.

Semi-supervised NMT (with 0.2M parallel data) Continue training the model after unsupervised training on the parallel data From scratch, training the model on monolingual data for one epoch, and then on parallel data for one epoch, and another one on monolingual data, on and on. Models BLEU Only with parallel data 11.59 Fully unsupervised training 10.48 Continuing Training on supervised data 14.51 Jointly training on monolingual and parallel data 15.79

Related works: G. Lample, A. Conneau, L. Denoyer, and M. Ranzato. 2018. Unsupervised machine translation using monolingual corpora only. In International Conference on Learning Representations (ICLR). Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018. Unsupervised neural machine translation. In International Conference on Learning Representations (ICLR). G. Lample, A. Conneau, L. Denoyer, and M. Ranzato. 2018 Phrase-Based & Neural Unsupervised Machine Translation (arxiv) * The newest paper (third one) proposes the shared BPE method for unsupervised NMT, its effectiveness is to be verified (around +10 BLEU points improvement is presented).

Future work: Continuing testing the unsupervised NMT and seeking to find its optimal configurations. Testing the performance of semi-supervised NMT with a little amount of bilingual data. Investigating more effective approach for utilizing the monolingual data in the framework of unsupervised NMT.

Code and new results can be found at: https://github.com/zhenyangiacas/unsupervised-nmt