Multilingual Code-switching Identification via LSTM Recurrent Neural Networks

Size: px

Start display at page:

Download "Multilingual Code-switching Identification via LSTM Recurrent Neural Networks"

Prudence Maxwell
6 years ago
Views:

1 Multilingual Code-switching Identification via LSTM Recurrent Neural Networks Younes Samih Suraj Mahrjan Mohammed Attia Laura Kallmeyer Thamar Solorio University of Düsseldorf Houston University Google Inc. EMNLP 2016 Second Workshop on Computational Approaches to Code Switching Austin, Texas USA November, 1, 2016

2 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Content Linguistic Background Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 2/27

3 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Content Linguistic Background Dataset Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 2/27

4 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Content Linguistic Background Dataset Neural Network Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 2/27

5 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Content Linguistic Background Dataset Neural Network Approach Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 2/27

6 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Content Linguistic Background Dataset Neural Network Approach Summary Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 2/27

7 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Code-switching Linguistic Background speakers switch from one language or dialect to another within the same context [Bullock and Toribio, 2009] Three types of codes-switching: inter-sentential, Intra-sentential, intra-word Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 3/27

8 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Code-switching Linguistic Background speakers switch from one language or dialect to another within the same context [Bullock and Toribio, 2009] Three types of codes-switching: inter-sentential, Intra-sentential, intra-word Constraints on Code-switching equivalence constraint [Poplack 1980] The Matrix Language-Frame (MLF)[Myers-Scotton 1993] Matrix language (ML) The embedded language (EL) Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 3/27

9 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Shared Task Dataset MSA-Egyptian Data all training dev test tweets 11,241 8,862 1,117 1,262 tokens 227, ,928 20,688 20,713 Table: MSA-Egyptian Data statistics Spanish-English Data all training dev test tweets 21,036 8,733 1,587 10,716 tokens 294, ,539 33, ,446 Table: Spanish-English Data statistics Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 4/27

10 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Corpora Arabic Corpus genre tokens Facebook posts 8,241,244 Tweets 2,813,016 News comments 95,241,480 MSA news texts 276,965,735 total 383,261,475 Table: Arabic corpus statistics Spanish-English Corpus English gigaword corpus(graff et al.,2003) Spanish gigaword corpus (Graff,2006) Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 5/27

11 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Data preprocessing Data preprocessing mapping Arabic scripts to SafeBuckwalter conversion of all Persian numbers to Arabic numbers Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 6/27

12 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Data preprocessing Data preprocessing mapping Arabic scripts to SafeBuckwalter conversion of all Persian numbers to Arabic numbers conversion of Arabic punctuation to Latin punctuation remove kashida (elongation character) and vowel marks Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 6/27

13 Introduction Neural network Approach Results Analysis Summary Road Map Code-switching Dataset Data preprocessing Data preprocessing mapping Arabic scripts to SafeBuckwalter conversion of all Persian numbers to Arabic numbers conversion of Arabic punctuation to Latin punctuation remove kashida (elongation character) and vowel marks separate punctuation marks from words Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 6/27

14 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Neural network Recurrent Neural Network Long short-term memory network Word Embeddings Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 7/27

15 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Reccurent Neural Network Figure by Christopher Olah RNN Given input sequence:x 1, x 2,..., x n Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 8/27

16 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Reccurent Neural Network Figure by Christopher Olah RNN Given input sequence:x 1, x 2,..., x n a standard RNN computes the output vector y t word x t of each Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 8/27

Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Reccurent Neural Network Figure by Christopher Olah RNN Given input sequence:x 1, x 2,.

17 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Reccurent Neural Network Figure by Christopher Olah RNN Given input sequence:x 1, x 2,..., x n a standard RNN computes the output vector y t word x t h t = H(W xh x t + W hh h 1 + b h ) y t = y hy + b y of each Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 8/27

18 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long-term dependencies Figure by Christopher Olah Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 9/27

19 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long-term dependencies Figure by Christopher Olah Basics Problem learning long-term dependencies in the data Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 9/27

20 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long-term dependencies Figure by Christopher Olah Basics Problem learning long-term dependencies in the data Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 9/27

21 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long-term dependencies Figure by Christopher Olah Basics Problem learning long-term dependencies in the data Vanishing gradients Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 9/27

Problem learning long-term dependencies in the data Vanishing gradients

22 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long-term dependencies Figure by Christopher Olah Basics Problem learning long-term dependencies in the data Vanishing gradients exploding gradients Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 9/27

[h t 1, x t ] + b i ) C t = tanh(w C.[h t 1, x t ] + b C ) C t = f t.c t 1 + i t. C t o t = σ(w o.

23 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Long short-term memory network Figure by Christopher Olah LSTM Basics f t = σ(w f.[h t 1, x t ] + b f ) i t = σ(w i.[h t 1, x t ] + b i ) C t = tanh(w C.[h t 1, x t ] + b C ) C t = f t.c t 1 + i t. C t o t = σ(w o.[h t 1, x t ] + b o ) h t = o t tanh(c t ) Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 10/27

24 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Vector Space Models Vector space models Distributional hypothesis: Words in the same contexts share the same meaning Count-based methods (Latent Semantic Analysis,...) Neural probabilistic language models(word embeddings) Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 11/27

25 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Word2vec The main component of the neural-network approach Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 12/27

26 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Word2vec The main component of the neural-network approach Representation of each feature as a vector in a low dimensional space Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 12/27

27 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Word2vec The main component of the neural-network approach Representation of each feature as a vector in a low dimensional space Continuous Bag-of-Words model (CBOW) vs Skip-Gram model Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 12/27

28 Introduction Neural network Approach Results Analysis Summary RNN LSTM Word Embeddings Word Embeddings Figure by Yoav Goldberg Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 13/27

29 Introduction Neural network Approach Results Analysis Summary Code-switching detection Code-switching detection System Architecture Implementation Details Results Summary Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 14/27

30 Introduction Neural network Approach Results Analysis Summary Code-switching detection System Architecture LSTM-CRF for Code-switching Detection Our neural network architecture consists of the following three layers: Input layer: comprises both character and word embeddings Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 15/27

31 Introduction Neural network Approach Results Analysis Summary Code-switching detection System Architecture LSTM-CRF for Code-switching Detection Our neural network architecture consists of the following three layers: Input layer: comprises both character and word embeddings Hidden layer: two LSTMs map both words and character representations to hidden sequences Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 15/27

32 Introduction Neural network Approach Results Analysis Summary Code-switching detection System Architecture LSTM-CRF for Code-switching Detection Our neural network architecture consists of the following three layers: Input layer: comprises both character and word embeddings Hidden layer: two LSTMs map both words and character representations to hidden sequences Output layer: a Softmax or a CRF computes the probability distribution over all labels Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 15/27

33 System Architecture

34 Introduction Neural network Approach Results Analysis Summary Code-switching detection Implementation Details Pre-trained Word embeddings Character embeddings Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 17/27

35 Introduction Neural network Approach Results Analysis Summary Code-switching detection Implementation Details Pre-trained Word embeddings Character embeddings Optimization: Dropout Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 17/27

36 Introduction Neural network Approach Results Analysis Summary Code-switching detection Implementation Details Pre-trained Word embeddings Character embeddings Optimization: Dropout Output layer: Softmax or CRF Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 17/27

37 Introduction Neural network Approach Results Analysis Summary Code-switching detection Implementation Details Pre-trained Word embeddings Character embeddings Optimization: Dropout Output layer: Softmax or CRF Training: Stochastic gradient descent optimizing Cross-entropy Objective function Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 17/27

38 Introduction Neural network Approach Results Analysis Summary Code-switching detection Implementation Details Pre-trained Word embeddings Character embeddings Optimization: Dropout Output layer: Softmax or CRF Training: Stochastic gradient descent optimizing Cross-entropy Objective function Hyper-parameters tuning on Devset Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 17/27

39 Introduction Neural network Approach Results Analysis Summary Results on Spanish-English Dev set Labels CRF (feats) CRF (emb) CRF (feats+ emb) word LSTM char LSTM char-word LSTM ambiguous fw lang lang mixed ne other unk Accuracy Table: F1 score results on Spanish-English development dataset Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 18/27

40 Introduction Neural network Approach Results Analysis Summary Results on MSA-Egyptian Dev set Labels CRF (feats) CRF (emb) CRF (feats+ emb) word LSTM char LSTM char- word LSTM ambiguous lang lang mixed ne other Accuracy Table: F1 score results on MSA-Egyptian development dataset Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 19/27

41 Introduction Neural network Approach Results Analysis Summary Tweet level results Scores Es-En MSA Monolingual F Code-switched F Weighted F Table: Tweet level results on the test dataset. Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 20/27

42 Introduction Neural network Approach Results Analysis Summary Token level results Label Recall Precision F-score ambiguous fw lang lang mixed ne other unk Accuracy Table: Token level results on Spanish-English test dataset. Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 21/27

43 Introduction Neural network Approach Results Analysis Summary Token level results Label Recall Precision F-score ambiguous fw lang lang mixed ne other unk Accuracy Table: Token level results on MSA-DA test dataset. Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 22/27

44 Introduction Neural network Approach Results Analysis Summary Char-word representation Spanish-English CRF Model Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 23/27

45 Introduction Neural network Approach Results Analysis Summary Char-word representation MSA-Egyptian CRF Model Younes Samih, Suraj Mahrjan Mohammed Attia, Laura Kallmeyer 24/27

46 CRF Model Most likely Score Most unlikely Score unk unk lang 1 mixed ne ne mixed lang fw fw amb other lang1 lang ne mixed lang 2 lang mixed other other other fw lang lang1 ne ne lang other lang unk ne lang2 mixed lang2 lang lang1 other lang1 lang Table: Most likely and unlikely transitions learned by CRF model for the Spanish-English dataset.

47 Summary Automatic identification of code-switching in tweets A unified neural network for language identification rivals state-of-the-art methods that rely on language-specific tools

48 Summary Automatic identification of code-switching in tweets A unified neural network for language identification rivals state-of-the-art methods that rely on language-specific tools What next? Implement character aware Bidirectional LSTM to capture word morphology Employ the More sophisticated CNN-Bidirectional LSTM

49 Thank you for your attention! Questions?

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering