Incorporating Latent Meanings of Morphological Compositions to Enhance Word Embeddings. Yang Xu, Jiawei Liu, Wei Yang, and Liusheng Huang

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

arxiv: v1 [cs.cl] 2 Apr 2017

Assignment 1: Predicting Amazon Review Ratings

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v1 [cs.cl] 20 Jul 2015

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Linking Task: Identifying authors and book titles in verbose queries

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Noisy SMS Machine Translation in Low-Density Languages

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

CS Machine Learning

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Lecture 1: Machine Learning Basics

Speech Emotion Recognition Using Support Vector Machine

A study of speaker adaptation for DNN-based speech synthesis

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Language Model and Grammar Extraction Variation in Machine Translation

Rule Learning With Negation: Issues Regarding Effectiveness

Word Embedding Based Correlation Model for Question/Answer Matching

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Differential Evolutionary Algorithm Based on Multiple Vector Metrics for Semantic Similarity Assessment in Continuous Vector Space

Joint Learning of Character and Word Embeddings

Word Segmentation of Off-line Handwritten Documents

Modeling function word errors in DNN-HMM based LVCSR systems

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

arxiv: v1 [cs.cv] 10 May 2017

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

Online Updating of Word Representations for Part-of-Speech Tagging

Modeling function word errors in DNN-HMM based LVCSR systems

A Case Study: News Classification Based on Term Frequency

Python Machine Learning

Human Emotion Recognition From Speech

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Comment-based Multi-View Clustering of Web 2.0 Items

Australian Journal of Basic and Applied Sciences

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

A Comparison of Two Text Representations for Sentiment Analysis

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Speaker Identification by Comparison of Smart Methods. Abstract

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Speech Recognition at ICSI: Broadcast News and beyond

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Rule Learning with Negation: Issues Regarding Effectiveness

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

A Vector Space Approach for Aspect-Based Sentiment Analysis

Derivational and Inflectional Morphemes in Pak-Pak Language

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Literature and the Language Arts Experiencing Literature

Learning Methods for Fuzzy Systems

12- A whirlwind tour of statistics

The Acquisition of Person and Number Morphology Within the Verbal Domain in Early Greek

Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database

Lecture 10: Reinforcement Learning

arxiv: v2 [cs.cv] 30 Mar 2017

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

On-the-Fly Customization of Automated Essay Scoring

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Memory-based grammatical error correction

Autoencoder and selectional preference Aki-Juhani Kyröläinen, Juhani Luotolahti, Filip Ginter

Switchboard Language Model Improvement with Conversational Data from Gigaword

Florida Reading Endorsement Alignment Matrix Competency 1

Knowledge-Free Induction of Inflectional Morphologies

Improvements to the Pruning Behavior of DNN Acoustic Models

arxiv: v2 [cs.ir] 22 Aug 2016

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Attributed Social Network Embedding

Prediction of Maximal Projection for Semantic Role Labeling

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

A heuristic framework for pivot-based bilingual dictionary induction

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Mandarin Lexical Tone Recognition: The Gating Paradigm

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

INPE São José dos Campos

Truth Inference in Crowdsourcing: Is the Problem Solved?

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Deep Neural Network Language Models

An Empirical and Computational Test of Linguistic Relativity

arxiv: v4 [cs.cl] 28 Mar 2016

Semantic Modeling in Morpheme-based Lexica for Greek

Probabilistic Latent Semantic Analysis

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

arxiv: v2 [cs.cl] 26 Mar 2015

Statewide Framework Document for:

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Regression for Sentence-Level MT Evaluation with Pseudo References

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

What the National Curriculum requires in reading at Y5 and Y6

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

MYCIN. The MYCIN Task

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Transcription:

Incorporating Latent Meanings of Morphological Compositions to Enhance Word Embeddings Yang Xu, Jiawei Liu, Wei Yang, and Liusheng Huang School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230027, China July 17 th 2018

01 Introduction OUTLINE 02 Latent Meaning Models 03 Experimental Setup 04 Experimental Results 05 Conclusion!2

01 Introduction 3

Word-level Word Embedding 01 Neural Network-Based e.g., GloVe (Pennington et al.) INPUT PROJECTION OUTPUT w(t-2) INPUT PROJECTION OUTPUT w(t-2) w(t-1) SUM w(t) w(t) SUM w(t-1) word-word co-occurrence matrix w(t+1) w(t+1) w(t+2) w(t+2) CBOW Skip-gram e.g., CBOW, Skip-gram (Mikolov et al.) 02 Matrix Factorization-Based ( Spectral Methods ) 4

Morphology-based Word Embedding Morpheme Embeddings 01 Prefix Root Suffix in cred ible Word Embeddings Word incredible Training Model Generated Word Vectors Generated Word Generative Model Morpheme Embeddings Prefix Root Suffix!5 02

Our Original Intention Word-level models: InputWords; Output Word Embeddings Morphology-based models: Input Words + Morphemes Output Word Embeddings + Morpheme Embeddings Our Latent Meaning Models: InputWords + Latent Meanings of Morphemes Output Word Embeddings ( no by-product, e.g., morpheme embedding) PURPOSE: to not only encode morphological properties into words, but also enhance the semantic similarities among word embeddings!6

Explicit Models & Our Models Corpus sentence i : Explicit models directly use morphemes in cred ible it is an incredible in not believe able thing capable Lookup table Prefix Latent Meaning in un in, not not Root Latent Meaning sentence j : un believ able it is unbelievable that not believe able capable believ cred believe believe Suffix Latent Meaning Our models employ the latent meanings of morphemes *Note: The lookup table can be derived from morphological lexicons.!7 able ible able, capale able, capale

02 Latent Meaning Models!8

CBOW with Negative Sampling Sequence of tokens INPUT PROJECTION OUTPUT t i-2 Objective Function: L = 1 n n i=1 logp(t i Context(t i )) t i-1 SUM t i t i+1 (Target Word) Negative Sampling: t i+2 (Context Words)!9

Three Specific Models 01 03 LMM-A (Latent Meaning Model-Average) 02 LMM-M (Latent Meaning Model-Max) LMM-S (Latent Meaning Model-Similarity)!10

Word Map incredible Lookup table Prefix Latent Meaning Word Map in cred ible in un in, not not Word Prefix Root Suffix unbelievable un believ able Root Latent Meaning believ cred believe believe Suffix Latent Meaning able ible able, capale able, capale incredible in not believe able capable unbelievable not believe able capable #rows = vocabulary *Note: The derivational morphemes, not the inflectional morphemes, are mainly concerned!11

Latent Meaning Model-Average (LMM-A) Sequence of tokens The latent meanings of s morphemes have equal contributions to A paradigm of LMM-A The modified embedding of : Latent Meaning in Prefix not Root believe 1/ 5 1/ 5 1/ 5 Context(t i) it is SUM t i an : a set of latent meanings of s morphemes : the length of Suffix capable able 1/ 5 1/ 5 incredible thing is utilized for training An item of the Word Map Word Prefix Root Suffix incredible in not believe able capable!12

Latent Meaning Model-Similarity (LMM-S) Sequence of tokens The latent meanings of s morphemes are assigned with different weights: ω <tj, w> = cos ( v tj, v w ) x Mj cos ( v tj, v x ) The modified embedding of :, w M j : a set of latent meanings of s morphemes Latent Meaning Prefix Root Suffix in not believe capable able it is incredible thing An item of the Word Map Word Prefix Root Suffix incredible A paradigm of LMM-S? in? not? believe? capable? able Context(t i) in not believe able capable SUM t i an!13

Latent Meaning Model-Max (LMM-M) Sequence of tokens Keep the latent meanings that have maximum similarities to : Pmax j = argmax cos w ( v tj, v w ), w P j cos ( v tj, v w ), w R j R j max = argmax w S j max = argmax w cos ( v tj, v w ), w S j The modified embedding of : M j max = {P j max, Rj max, Sj max} Latent Meaning Prefix Root Suffix in not believe capable able it is incredible thing An item of the Word Map Word Prefix Root Suffix incredible A paradigm of LMM-M? not? believe? able Context(t i) in not believe able capable SUM t i an!14

Update Rules for LMMs New Objective Function (After modifying the input layer of CBOW): ^ L = 1 n n i=1 logp(v ti ^v tj ) t j Context(t i) All parameters introduced by our models can be directly derived using the word map and word embeddings Update not just but the embeddings of the latent meanings with the same weights as they are assigned in the forward propagation period!15

03 Experimental Setup!16

Corpus & Word Map Corpus Word Map News corpus of 2009 (2013 ACL Eighth Workshop) Size: 1.7GB ~500 million tokens ~600,000 words Digits & punctuation marks are filtered Morpheme segmentation using Morefessor (Creutz & Lagus, 2007) Assign latent meanings Lookup table derived from the resources provided by Michigan State University* 90 prefixes, 382 roots, 67 suffixes *Resources web link: https://msu.edu/~defores1/gre/roots/gre_rts_afx1.htm!17

Baselines & Parameter Settings Baselines:! Word-level models: CBOW, Skip-gram, GloVe! Explicitly Morpheme-related Model (EMM) A paradigm of EMM Morphemes it Super-parameter Settings:! Equal settings to all models! Vector Dimension: 200 Prefix Root in cred is incredible SUM an! Context window size: 5 Suffix ible thing! #Negative_Samples: 20!18

Evaluation Benchmarks (1/2) Word Similarity: Dataset Name #Pairs Name #Pairs Name #Pairs RG-65 65 Rare-Word 2034 Men-3k 3000 Wordsim-353 353 SCWS 2003 WS-353-Related 252 Gold Standard Datasets Widely-used Datasets Syntactic Analogy:! a b as c? (d) e.g., Queen King as Woman (Man)! Microsoft Research Syntactic Analogies dataset (8000 items)!19

Evaluation Benchmarks (2/2) Text Classification:! 20 Newsgroups dataset (19000 documents of 20 different topics)! 4 text classification tasks, each involves 10 topics! Training/Validation/Test subsets (6:2:2)! Feature vector: average word embedding of words in each document! L2-regularized logistic regression classifier!20

04 Experimental Results!21

The Results on Word Similarity CBOW Skip-gram GloVe EMM LMM-A LMM-S LMM-M Wordsim-353 58.77 61.94 49.40 60.01 62.05 63.13 61.54 Rare-Word 40.58 36.42 33.40 40.83 43.12 42.14 40.51 RG-65 56.50 62.81 59.92 60.85 62.51 62.49 63.07 SCWS 63.13 60.20 47.98 60.28 61.86 61.71 63.02 Men-3k 68.07 66.30 60.56 66.76 66.26 68.36 64.65 WS-353-Related 49.72 57.05 47.46 54.48 56.14 58.47 55.19 (Given different models) Spearman s rank correlation (%) on different datasets!22

The Results on Syntactic Analogy Question: a b as c (d) Answer: CBOW Skip-gram GloVe EMM LMM-A LMM-S LMM-M Syntactic Analogy 13.46 13.14 13.94 17.34 20.38 17.59 18.30 Syntactic analogy performance (%)!23

The Results on Text Classification CBOW Skip-gram GloVe EMM LMM-A LMM-S LMM-M Text Classification 78.26 79.40 77.01 80.00 80.67 80.59 81.28 Average text classification accuracy across the 4 tasks (%)!24

The Impact of Corpus Size Results on Wordsim-353 task with different corpus size!25

The Impact of Context Window Size Results on Wordsim-353 task with different context window size!26

Word Embedding Visualization latent meanings of morphemes Visualization of word embeddings based on PCA!27

05 Conclusions!28

Conclusions Employ latent meanings of morphemes rather than the internal compositions themselves to train word embeddings By modifying the input layer and update rules of CBOW, we proposed three latent meaning models (LMM-A, LMM-S, LMM-M) The comprehensive quality of word embedings are enhanced by incorporating latent meanings of morphemes In the future, we intend to evaluate our models for some morpheme-rich languages like Russian, German, etc.!29

Thank you! Questions?