Multilingual Code-switching Identification via LSTM Recurrent Neural Networks

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Python Machine Learning

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

arxiv: v4 [cs.cl] 28 Mar 2016

Indian Institute of Technology, Kanpur

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v1 [cs.lg] 7 Apr 2015

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Second Exam: Natural Language Parsing with Neural Networks

Lecture 1: Machine Learning Basics

arxiv: v1 [cs.cl] 2 Apr 2017

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

arxiv: v2 [cs.ir] 22 Aug 2016

On the Formation of Phoneme Categories in DNN Acoustic Models

Assignment 1: Predicting Amazon Review Ratings

Calibration of Confidence Measures in Speech Recognition

Residual Stacking of RNNs for Neural Machine Translation

Deep Neural Network Language Models

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

arxiv: v1 [cs.cl] 20 Jul 2015

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

CSL465/603 - Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Model Ensemble for Click Prediction in Bing Search Ads

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Attributed Social Network Embedding

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

arxiv: v1 [cs.cl] 27 Apr 2016

(Sub)Gradient Descent

Improvements to the Pruning Behavior of DNN Acoustic Models

Modeling function word errors in DNN-HMM based LVCSR systems

Word Segmentation of Off-line Handwritten Documents

Linking Task: Identifying authors and book titles in verbose queries

A Vector Space Approach for Aspect-Based Sentiment Analysis

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

A deep architecture for non-projective dependency parsing

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

arxiv: v1 [cs.lg] 15 Jun 2015

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Online Updating of Word Representations for Part-of-Speech Tagging

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

Moving code-switching research toward more empirically grounded methods

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Forget catastrophic forgetting: AI that learns after deployment

Knowledge Transfer in Deep Convolutional Neural Nets

A Case Study: News Classification Based on Term Frequency

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Probabilistic Latent Semantic Analysis

Truth Inference in Crowdsourcing: Is the Problem Solved?

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Modeling function word errors in DNN-HMM based LVCSR systems

Arabic Orthography vs. Arabic OCR

Summarizing Answers in Non-Factoid Community Question-Answering

A study of speaker adaptation for DNN-based speech synthesis

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

arxiv: v5 [cs.ai] 18 Aug 2015

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text

Dialog-based Language Learning

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Cross Language Information Retrieval

Speech Recognition at ICSI: Broadcast News and beyond

arxiv: v1 [cs.cv] 10 May 2017

Artificial Neural Networks written examination

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

NEURAL DIALOG STATE TRACKER FOR LARGE ONTOLOGIES BY ATTENTION MECHANISM. Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim

A JOINT MANY-TASK MODEL: GROWING A NEURAL NETWORK FOR MULTIPLE NLP TASKS

Generative models and adversarial training

arxiv: v2 [cs.cl] 26 Mar 2015

Switchboard Language Model Improvement with Conversational Data from Gigaword

Word Embedding Based Correlation Model for Question/Answer Matching

Cultivating DNN Diversity for Large Scale Video Labelling

Boosting Named Entity Recognition with Neural Character Embeddings

Postprint.

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

There are some definitions for what Word

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Lecture 2: Quantifiers and Approximation

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Semantic and Context-aware Linguistic Model for Bias Detection

THE world surrounding us involves multiple modalities

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

arxiv: v3 [cs.cl] 7 Feb 2017

ARNE - A tool for Namend Entity Recognition from Arabic Text

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Evolutive Neural Net Fuzzy Filtering: Basic Description

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

arxiv: v1 [cs.cv] 2 Jun 2017

Transcription:

Multilingual Code-switching Identification via LSTM Recurrent Neural Networks Younes Samih Suraj Mahrjan Mohammed Attia Laura Kallmeyer Thamar Solorio University of Düsseldorf Houston University Google Inc. EMNLP 2016 Second Workshop on Computational Approaches to Code Switching Austin, Texas USA November, 1, 2016

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Content Linguistic Background Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 2/27

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Content Linguistic Background Dataset Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 2/27

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Content Linguistic Background Dataset Neural Network Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 2/27

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Content Linguistic Background Dataset Neural Network Approach Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 2/27

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Content Linguistic Background Dataset Neural Network Approach Summary Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 2/27

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Code-switching Linguistic Background speakers switch from one language or dialect to another within the same context [Bullock and Toribio, 2009] Three types of codes-switching: inter-sentential, Intra-sentential, intra-word Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 3/27

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Code-switching Linguistic Background speakers switch from one language or dialect to another within the same context [Bullock and Toribio, 2009] Three types of codes-switching: inter-sentential, Intra-sentential, intra-word Constraints on Code-switching equivalence constraint [Poplack 1980] The Matrix Language-Frame (MLF)[Myers-Scotton 1993] Matrix language (ML) The embedded language (EL) Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 3/27

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Shared Task Dataset MSA-Egyptian Data all training dev test tweets 11,241 8,862 1,117 1,262 tokens 227,329 185,928 20,688 20,713 Table: MSA-Egyptian Data statistics Spanish-English Data all training dev test tweets 21,036 8,733 1,587 10,716 tokens 294,261 139,539 33,276 121,446 Table: Spanish-English Data statistics Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 4/27

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Corpora Arabic Corpus genre tokens Facebook posts 8,241,244 Tweets 2,813,016 News comments 95,241,480 MSA news texts 276,965,735 total 383,261,475 Table: Arabic corpus statistics Spanish-English Corpus English gigaword corpus(graff et al.,2003) Spanish gigaword corpus (Graff,2006) Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 5/27

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Data preprocessing Data preprocessing mapping Arabic scripts to SafeBuckwalter conversion of all Persian numbers to Arabic numbers Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 6/27

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Data preprocessing Data preprocessing mapping Arabic scripts to SafeBuckwalter conversion of all Persian numbers to Arabic numbers conversion of Arabic punctuation to Latin punctuation remove kashida (elongation character) and vowel marks Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 6/27

Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Data preprocessing Data preprocessing mapping Arabic scripts to SafeBuckwalter conversion of all Persian numbers to Arabic numbers conversion of Arabic punctuation to Latin punctuation remove kashida (elongation character) and vowel marks separate punctuation marks from words Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 6/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Neural network Recurrent Neural Network Long short-term memory network Word Embeddings Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 7/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Reccurent Neural Network Figure by Christopher Olah RNN Given input sequence:x 1, x 2,..., x n Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 8/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Reccurent Neural Network Figure by Christopher Olah RNN Given input sequence:x 1, x 2,..., x n a standard RNN computes the output vector y t word x t of each Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 8/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Reccurent Neural Network Figure by Christopher Olah RNN Given input sequence:x 1, x 2,..., x n a standard RNN computes the output vector y t word x t h t = H(W xh x t + W hh h 1 + b h ) y t = y hy + b y of each Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 8/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long-term dependencies Figure by Christopher Olah Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 9/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long-term dependencies Figure by Christopher Olah Basics Problem learning long-term dependencies in the data Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 9/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long-term dependencies Figure by Christopher Olah Basics Problem learning long-term dependencies in the data Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 9/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long-term dependencies Figure by Christopher Olah Basics Problem learning long-term dependencies in the data Vanishing gradients Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 9/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long-term dependencies Figure by Christopher Olah Basics Problem learning long-term dependencies in the data Vanishing gradients exploding gradients Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 9/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long short-term memory network Figure by Christopher Olah LSTM Basics f t = σ(w f.[h t 1, x t ] + b f ) i t = σ(w i.[h t 1, x t ] + b i ) C t = tanh(w C.[h t 1, x t ] + b C ) C t = f t.c t 1 + i t. C t o t = σ(w o.[h t 1, x t ] + b o ) h t = o t tanh(c t ) Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 10/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Vector Space Models Vector space models Distributional hypothesis: Words in the same contexts share the same meaning Count-based methods (Latent Semantic Analysis,...) Neural probabilistic language models(word embeddings) Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 11/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Word2vec The main component of the neural-network approach Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 12/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Word2vec The main component of the neural-network approach Representation of each feature as a vector in a low dimensional space Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 12/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Word2vec The main component of the neural-network approach Representation of each feature as a vector in a low dimensional space Continuous Bag-of-Words model (CBOW) vs Skip-Gram model Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 12/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Word Embeddings Figure by Yoav Goldberg Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 13/27

Introduction Neural network Approach Results Analysis Summary Code-switching detection Code-switching detection System Architecture Implementation Details Results Summary Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 14/27

Introduction Neural network Approach Results Analysis Summary Code-switching detection System Architecture LSTM-CRF for Code-switching Detection Our neural network architecture consists of the following three layers: Input layer: comprises both character and word embeddings Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 15/27

Introduction Neural network Approach Results Analysis Summary Code-switching detection System Architecture LSTM-CRF for Code-switching Detection Our neural network architecture consists of the following three layers: Input layer: comprises both character and word embeddings Hidden layer: two LSTMs map both words and character representations to hidden sequences Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 15/27

Introduction Neural network Approach Results Analysis Summary Code-switching detection System Architecture LSTM-CRF for Code-switching Detection Our neural network architecture consists of the following three layers: Input layer: comprises both character and word embeddings Hidden layer: two LSTMs map both words and character representations to hidden sequences Output layer: a Softmax or a CRF computes the probability distribution over all labels Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 15/27

System Architecture

Introduction Neural network Approach Results Analysis Summary Code-switching detection Implementation Details Pre-trained Word embeddings Character embeddings Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 17/27

Introduction Neural network Approach Results Analysis Summary Code-switching detection Implementation Details Pre-trained Word embeddings Character embeddings Optimization: Dropout Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 17/27

Introduction Neural network Approach Results Analysis Summary Code-switching detection Implementation Details Pre-trained Word embeddings Character embeddings Optimization: Dropout Output layer: Softmax or CRF Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 17/27

Introduction Neural network Approach Results Analysis Summary Code-switching detection Implementation Details Pre-trained Word embeddings Character embeddings Optimization: Dropout Output layer: Softmax or CRF Training: Stochastic gradient descent optimizing Cross-entropy Objective function Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 17/27

Introduction Neural network Approach Results Analysis Summary Code-switching detection Implementation Details Pre-trained Word embeddings Character embeddings Optimization: Dropout Output layer: Softmax or CRF Training: Stochastic gradient descent optimizing Cross-entropy Objective function Hyper-parameters tuning on Devset Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 17/27

Introduction Neural network Approach Results Analysis Summary Results on Spanish-English Dev set Labels CRF (feats) CRF (emb) CRF (feats+ emb) word LSTM char LSTM char-word LSTM ambiguous 0.00 0.02 0.00 0.00 0.00 0.00 fw 0.00 0.00 0.00 0.00 0.00 0.00 lang1 0.97 0.97 0.97 0.93 0.94 0.96 lang2 0.96 0.95 0.96 0.91 0.89 0.93 mixed 0.00 0.00 0.00 0.00 0.00 0.00 ne 0.52 0.51 0.57 0.34 0.13 0.32 other 1.00 1.00 1.00 0.85 1.00 1.00 unk 0.04 0.08 0.10 0.00 0.00 0.04 Accuracy 0.961 0.960 0.963 0.896 0.923 0.954 Table: F1 score results on Spanish-English development dataset Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 18/27

Introduction Neural network Approach Results Analysis Summary Results on MSA-Egyptian Dev set Labels CRF (feats) CRF (emb) CRF (feats+ emb) word LSTM char LSTM char- word LSTM ambiguous 0.00 0.00 0.00 0.00 0.00 0.00 lang1 0.80 0.88 0.88 0.86 0.57 0.88 lang2 0.83 0.91 0.91 0.92 0.23 0.92 mixed 0.00 0.00 0.00 0.00 0.00 0.00 ne 0.83 0.84 0.86 0.84 0.66 0.84 other 0.97 0.97 0.97 0.92 0.97 0.97 Accuracy 0.829 0.894 0.896 0.896 0.530 0.900 Table: F1 score results on MSA-Egyptian development dataset Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 19/27

Introduction Neural network Approach Results Analysis Summary Tweet level results Scores Es-En MSA Monolingual F1 0.92 0.890 Code-switched F1 0.88 0.500 Weighted F1 0.90 0.830 Table: Tweet level results on the test dataset. Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 20/27

Introduction Neural network Approach Results Analysis Summary Token level results Label Recall Precision F-score ambiguous 0.000 0.000 0.000 fw 0.000 0.000 0.000 lang1 0.922 0.939 0.930 lang2 0.978 0.982 0.980 mixed 0.000 0.000 0.000 ne 0.639 0.484 0.551 other 0.992 0.998 0.995 unk 0.120 0.019 0.034 Accuracy 0.967 Table: Token level results on Spanish-English test dataset. Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 21/27

Introduction Neural network Approach Results Analysis Summary Token level results Label Recall Precision F-score ambiguous 0.000 0.000 0.000 fw 0.000 0.000 0.000 lang1 0.877 0.832 0.854 lang2 0.913 0.896 0.904 mixed 0.000 0.000 0.000 ne 0.729 0.829 0.777 other 0.938 0.975 0.957 unk 0.000 0.000 0.000 Accuracy 0.879 Table: Token level results on MSA-DA test dataset. Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 22/27

Introduction Neural network Approach Results Analysis Summary Char-word representation Spanish-English CRF Model Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 23/27

Introduction Neural network Approach Results Analysis Summary Char-word representation MSA-Egyptian CRF Model Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 24/27

CRF Model Most likely Score Most unlikely Score unk unk 1.789 lang 1 mixed -0.172 ne ne 1.224 mixed lang 1-0.196 fw fw 1.180 amb other -0.244 lang1 lang1 1.153 ne mixed -0.246 lang 2 lang 2 1.099 mixed other -0.254 other other 0.827 fw lang 1-0.282 lang1 ne 0.316 ne lang2-0.334 other lang 1 0.222 unk ne -0.383 lang2 mixed 0.216 lang2 lang1-0.980 lang1 other 0.191 lang1 lang2-0.993 Table: Most likely and unlikely transitions learned by CRF model for the Spanish-English dataset.

Summary Automatic identification of code-switching in tweets A unified neural network for language identification rivals state-of-the-art methods that rely on language-specific tools

Summary Automatic identification of code-switching in tweets A unified neural network for language identification rivals state-of-the-art methods that rely on language-specific tools What next? Implement character aware Bidirectional LSTM to capture word morphology Employ the More sophisticated CNN-Bidirectional LSTM

Thank you for your attention! Questions?