LATENT SEMANTIC WORD SENSE DISAMBIGUATION USING GLOBAL CO-OCCURRENCE INFORMATION

Similar documents
Probabilistic Latent Semantic Analysis

A Case Study: News Classification Based on Term Frequency

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Rule Learning With Negation: Issues Regarding Effectiveness

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

A Bayesian Learning Approach to Concept-Based Document Classification

Python Machine Learning

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

Rule Learning with Negation: Issues Regarding Effectiveness

Comment-based Multi-View Clustering of Web 2.0 Items

Linking Task: Identifying authors and book titles in verbose queries

CS Machine Learning

(Sub)Gradient Descent

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

A Comparison of Two Text Representations for Sentiment Analysis

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Learning Methods in Multilingual Speech Recognition

Generative models and adversarial training

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Leveraging Sentiment to Compute Word Similarity

Modeling function word errors in DNN-HMM based LVCSR systems

Multilingual Sentiment and Subjectivity Analysis

Learning From the Past with Experiment Databases

Assignment 1: Predicting Amazon Review Ratings

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Cross Language Information Retrieval

The Smart/Empire TIPSTER IR System

Reducing Features to Improve Bug Prediction

Switchboard Language Model Improvement with Conversational Data from Gigaword

Modeling function word errors in DNN-HMM based LVCSR systems

Constructing Parallel Corpus from Movie Subtitles

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Online Updating of Word Representations for Part-of-Speech Tagging

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

AQUA: An Ontology-Driven Question Answering System

arxiv: v2 [cs.cv] 30 Mar 2017

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

arxiv: v1 [cs.cl] 2 Apr 2017

Matching Similarity for Keyword-Based Clustering

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Robust Sense-Based Sentiment Classification

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

arxiv: v1 [cs.lg] 3 May 2013

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Short Text Understanding Through Lexical-Semantic Analysis

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Using dialogue context to improve parsing performance in dialogue systems

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Concepts and Properties in Word Spaces

Universiteit Leiden ICT in Business

Learning Methods for Fuzzy Systems

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

Latent Semantic Analysis

A Statistical Approach to the Semantics of Verb-Particles

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Handling Sparsity for Verb Noun MWE Token Classification

Efficient Online Summarization of Microblogging Streams

Cross-Lingual Text Categorization

Combining a Chinese Thesaurus with a Chinese Dictionary

Lexical category induction using lexically-specific templates

On document relevance and lexical cohesion between query terms

Word Sense Disambiguation

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Dialog Act Classification Using N-Gram Algorithms

Evolutive Neural Net Fuzzy Filtering: Basic Description

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Lecture 1: Basic Concepts of Machine Learning

Memory-based grammatical error correction

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Automating the E-learning Personalization

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Speech Emotion Recognition Using Support Vector Machine

A study of speaker adaptation for DNN-based speech synthesis

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Accuracy (%) # features

Education: Integrating Parallel and Distributed Computing in Computer Science Curricula

Word Segmentation of Off-line Handwritten Documents

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Transcription:

LAEN SEMANIC WORD SENSE DISAMBIGUAION USING GLOBAL CO-OCCURRENCE INFORMAION Minoru Sasaki Department of Computer and Information Sciences, Faculty of Engineering, Ibaraki University, 4-12-1, Nakanarusawa, Hitachi, Ibaraki, Japan msasaki@mx.ibaraki.ac.jp ABSRAC In this paper, I propose a novel word sense disambiguation method based on the global cooccurrence information using NMF. When I calculate the dependency relation matrix, the existing method tends to produce very sparse co-occurrence matrix from a small training set. herefore, the NMF algorithm sometimes does not converge to desired solutions. o obtain a large number of co-occurrence relations, I propose to use co-occurrence frequencies of dependency relations between word features in the whole training set. his enables us to solve data sparseness problem and induce more effective latent features. o evaluate the efficiency of the method of word sense disambiguation, I make some experiments to compare with the result of the two baseline methods. he results of the experiments show this method is effective for word sense disambiguation in comparison with the all baseline methods. Moreover, the proposed method is effective for obtaining a stable effect by analyzing the global co-occurrence information. KEYWORDS Word Sense Disambiguation, Global Co-occurrence information, Dependency Relations, Non- Negative Matrix Factorization 1. INRODUCION In natural language processing, acquisition of sense examples from an example set that contain a given target word enables to construct an extensive data set of tagged examples to demonstrate a wide range of semantic analysis. For example, using the obtained data set, we can construct a classifier that identifies its word sense by analyzing co-occurrence statistics of a target word. Also, we can make a wide-coverage case frame dictionary automatically and construct thesaurus for each meaning of a polysemous word. o construct large-sized training data, language dictionary and thesaurus, it is increasingly important to further improve to select the most appropriate meaning of the ambiguous word. If we have training data, word sense disambiguation (WSD) task reduces to a classification problem based on supervised learning. his approach is generally applicable to construct a classifier from a set of manually sense-tagged training data. hen, this classifier is used to identify the appropriate sense for new examples. A typical method for this approach is the David C. Wyld et al. (Eds) : CCSI, SIPP, AISC, PDCA, NLP - 2014 pp. 463 468, 2014. CS & I-CSCP 2014 DOI : 10.5121/csit.2014.4240

464 Computer Science & Information echnology (CS & I) classical bag-of-words (BOW) approach [5], where each document is represented as a feature vector counting the number of occurrences of different words as features. By using such features, it becomes easy to adapt many existing supervised learning methods such as Support Vector Machine (SVM) [1] for the WSD task. However, when the general vector space model, in which a document is represented as a vector using term frequency based weighting methods, is applied to the WSD, the local context words are typically used as features and the global context information without dictionary information is not employed in the previous research. In this paper, I propose a novel word sense disambiguation method based on the global cooccurrence information using NMF. In previous research, [2] proposes a novel WSD method of particular word instances using the automatically extracted sense information. When we calculate the dependency relation matrix, the existing method tends to produce very sparse co-occurrence matrix from a small training set. herefore, the NMF algorithm sometimes does not converge to desired solutions. o obtain a large number of co-occurrence relations, I propose to use cooccurrence frequencies of dependency relations between word features in the whole training set. his enables us to solve data sparseness problem and induce more effective latent features. 2. NON-NEGAIVE MARIX FACORIZAION Non-Negative Matrix Factorization (NMF) is a popular decomposition method for multivariate data [4]. NMF decomposes the m n non-negative matrix X to the m k matrix W and the k n matrix H, while these matrixes have no negative elements. Usually, k is chosen to be smaller value than n and m. X WH (1) Using the NMF for a term-document matrix X, the matrix H represents the clustering result with k topics. For quantifying the quality of this approximation, cost functions based on Kullback-Leibler divergence is used and minimized using iterative update rules as follows: W W ( XH ) ( WHH ) ( X H ) ( HW W ) H H, (2) (3) where W and H are the i-th row and the j-th column element respectively. hese matrices W and H are initialized randomly with non-negative data and these above update rules are iteratively applied until the max iteration number (or convergence) is reached. 3. WSD USING GLOBAL CO-OCCURRENCE INFORMAION 3.1. Latent Semantic WSD Using Local Co-occurrence Information In previous research, [3] proposes a novel WSD method of particular word instances using the automatically extracted sense information. his method induces latent features for three matrices. he first matrix A contains co-occurrence frequencies of words that the target word co-occurs

Computer Science & Information echnology (CS & I) 465 with. he second matrix B contains term frequencies of words that appear in the context window. he third matrix C contains co-occurrence frequencies of words that the co-occurring context words of the target word co-occur with. hen, NMF is applied to the three matrices to factorize each matrix into two non-negative matrices, while the former results are used to initialize the next factorization, as shown in Figure 1. Figure 1. Interleaved NMF algorithm for Latent Semantic WSD Given a non-negative matrices A, B and C in the beginning of this method, matrices W, H, G and F are initialized randomly with non-negative values. hen, it decomposes the matrix A into the matrices W and H using NMF. In the decomposition of the matrix B, the updated matrix W is copied to matrix V and the updated matrices V and G is computed using NMF. In the decomposition of the matrix C, the transpose of the updated matrix G is copied to matrix U and the updated matrices U and F are obtained using NMF. At the last step of the iteration, the matrix F is copied to matrix H. his iteration is repeated until the maximum number of iterations is reached or the objective function of all NMF decomposition no longer improves. In order to perform this method for WSD, it needs to fold each sense of the target word into semantic space using the matrix H. For each sense in training data, centroid vector c is calculated and this centroid is mapped into the semantic space using the matrix H as follows: b = ch (4) For input example of the target word, its context words are extracted to construct a vector f and the vector f is also mapped into the same semantic space using the matrix G as follows: d = fg (5) hen, cosine similarity between the vector d and each of the sense vectors b are calculated and the sense that is the largest cosine similarity is selected.

466 Computer Science & Information echnology (CS & I) 3.2. Latent Semantic WSD Using Global Co-occurrence Information his Latent Semantic WSD method is efficient for finding a reduced semantic space. However, problem arises when we apply this method. When we calculate the third matrix C, this method tends to produce very sparse co-occurrence matrix from a small training set. o obtain a large number of co-occurrence relations, I propose to use co-occurrence frequencies of dependency relations between word features in the whole training set. his enables us to solve data sparseness problem and induce more effective latent features. Like the above method, the proposed method needs three matrices, but the third matrix is different from the previous method. he third matrix D contains co-occurrence frequency of context words that co-occur in dependency relations to context words in a large document set. he proposed method induces latent features for these three matrices A, B and D. 4. EXPERIMEN o evaluate the efficiency of the proposed WSD method using the global Co-occurrence information, I conduct some experiments to compare with the result of the existing methods. In this section, I describe an outline of the experiments. 4.1. Data I used the Semeval-2010 Japanese WSD task data set, which includes 50 target words comprising 22 nouns, 23 verbs, and 5 adjectives [2]. In this data set, there are 50 training and 50 test instances for each target word. Moreover, to obtain a large number of co-occurrence relations, I use 22,832 documents chosen from the Japanese BCCWJ corpus 1. 4.2. Evaluation Method 4.2.1. Baseline System 1 As the first baseline method, I only use the first matrix A described in section 3.1. o construct the matrix A, I represent each sentence with the target word in the training set as a highdimensional vector where each component represents the co-occurrence frequency of the target word in the sentence. hen, NMF is applied to the matrix A to factorize each matrix into two nonnegative matrices W and H. Each vector is tagged with the sense of the target word in that sentence. So centroid c i of the co-occurrence vectors that are assigned the same sense i is calculated and each centroid c i is mapped into the semantic space using the matrix H as follows: b i = cih (6) For input example of the target word, its context words are extracted to construct a vector f and the vector f is also mapped into the same semantic space using the matrix H as follows: d = fh (7) hen, cosine similarity between the vector d and each of the sense vectors b are calculated and the sense that is the largest cosine similarity is selected. 1 http://www.ninjal.ac.jp/english/products/bccwj/

4.2.2. Baseline System 2 Computer Science & Information echnology (CS & I) 467 In the second baseline system, I use the latent semantic WSD using local co-occurrence information described in Section 3.1. I construct the three matrices A, B and C to induce latent semantic dimensions using NMF. able 1. Experimental Results of Each Execution (highest precision rates among all the run are written in bold font) System Run 1 Run 2 Run 3 Baseline System 1 53.28% 53.88% 51.80% Baseline System 2 59.68% 61.08% 61.08% Proposed Method 60.44% 61.48% 61.04% 62.00% 61.00% 60.00% 59.00% 58.00% 57.00% 56.00% 55.00% 54.00% 53.00% 52.00% Baseline 1 Baseline 2 proposed method 5. EXPERIMENAL RESULS Figure 2. Highest Precision of Each system Figure 2 shows the experimental results of the baseline methods and the proposed method. Computational experience reported shows that the choice of initial point is quite important to the NMF's goal. In practice, the algorithms are run several times with different initial points and the NMF is chosen as the feasible solution. In our experiments, each method is executed three times and average precision of all execution is calculated. In this Figure 2, the proposed method shows higher precision than the other baseline methods so that this approach is effective for word sense disambiguation. In comparison with the baseline system 1, the proposed method can obtain better precision so that it is effective for WSD to use context information and co-occurrence information. In comparison with the baseline system 2, the proposed method provides slightly better precision than the baseline system 2. As shown in able 1, the proposed method can obtain the highest precision and can be stable at high precision value. However, the baseline system 2 cannot achieve stable precision because of the lack of the number of co-occurrence information. herefore, the proposed method is effective for obtaining a stable effect by analyzing the global co-occurrence information.

468 Computer Science & Information echnology (CS & I) ACKNOWLEDGEMEN In this paper, I propose a novel word sense disambiguation method based on the global cooccurrence information using NMF. o evaluate the efficiency of the method of word sense disambiguation, I conduct some experiments to compare with the result of the two baseline methods. he results of the experiments show this method is effective for word sense disambiguation in comparison with the all baseline methods. Moreover, the proposed method is effective for obtaining a stable effect by analyzing the global co-occurrence information. Further work would be required to consider a larger sized training data to obtain a large amount of co-occurrence information. REFERENCES [1] Corinna Cortes & Vladimir Vapnik, (1995) Support-Vector Networks, Machine Learning, Vol. 20, No. 3, pp273-297. [2] Okumura, Manabu & Shirai, Kiyoaki & Komiya, Kanako & Yokono, Hikaru, (2010) SemEval-2010 task: Japanese WSD, Proceedings of the 5th International Workshop on Semantic Evaluation, pp69-74. [3] Van de Cruys, im & Apidianaki, Marianna, (2011) Latent Semantic Word Sense Induction and Disambiguation, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language echnologies-volume 1, pp1476-1485. [4] Lee, Daniel D. & Seung, Sebastian, (2001) Algorithms for Non-negative Matrix Factorization, Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, MI Press, pp556 562. [5] Witten, Ian H. & Moffat, Alistair & Bell, imothy C., (1999) Managing Gigabytes (second edition): Compressing and Indexing Documents and Images, Morgan Kaufmann Publishers Inc. AUHORS Minoru Sasaki: received his B.E., M.E. and D.Eng. degrees in information science and intelligent systems from the University of okushima in 1996, 1998 and 2000. He is now a lecturer in the department of computer and information sciences at Ibaraki University. His research interests include natural language processing and information, information retrieval and text min ing.