Word Sense Determination from Wikipedia. Data Using a Neural Net

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Assignment 1: Predicting Amazon Review Ratings

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CSL465/603 - Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

(Sub)Gradient Descent

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

A Case Study: News Classification Based on Term Frequency

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

CS Machine Learning

arxiv: v1 [cs.lg] 15 Jun 2015

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Probabilistic Latent Semantic Analysis

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Deep Neural Network Language Models

Knowledge Transfer in Deep Convolutional Neural Nets

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Comment-based Multi-View Clustering of Web 2.0 Items

arxiv: v2 [cs.ir] 22 Aug 2016

Second Exam: Natural Language Parsing with Neural Networks

Semantic and Context-aware Linguistic Model for Bias Detection

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Switchboard Language Model Improvement with Conversational Data from Gigaword

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Modeling function word errors in DNN-HMM based LVCSR systems

Generative models and adversarial training

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Linking Task: Identifying authors and book titles in verbose queries

Indian Institute of Technology, Kanpur

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Word Segmentation of Off-line Handwritten Documents

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Model Ensemble for Click Prediction in Bing Search Ads

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Attributed Social Network Embedding

Artificial Neural Networks written examination

Calibration of Confidence Measures in Speech Recognition

Human Emotion Recognition From Speech

arxiv: v1 [cs.cl] 2 Apr 2017

Disambiguation of Thai Personal Name from Online News Articles

A deep architecture for non-projective dependency parsing

Truth Inference in Crowdsourcing: Is the Problem Solved?

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

arxiv: v1 [cs.cv] 10 May 2017

Learning Methods for Fuzzy Systems

Rule Learning With Negation: Issues Regarding Effectiveness

Australian Journal of Basic and Applied Sciences

A study of speaker adaptation for DNN-based speech synthesis

GACE Computer Science Assessment Test at a Glance

Using dialogue context to improve parsing performance in dialogue systems

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

WHEN THERE IS A mismatch between the acoustic

Rule Learning with Negation: Issues Regarding Effectiveness

A Reinforcement Learning Variant for Control Scheduling

Lecture 1: Basic Concepts of Machine Learning

A Review: Speech Recognition with Deep Learning Methods

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

CS 446: Machine Learning

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Using focal point learning to improve human machine tacit coordination

White Paper. The Art of Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

THE enormous growth of unstructured data, including

Modeling function word errors in DNN-HMM based LVCSR systems

A Vector Space Approach for Aspect-Based Sentiment Analysis

Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Universiteit Leiden ICT in Business

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Statewide Framework Document for:

THE world surrounding us involves multiple modalities

Software Maintenance

Evolutive Neural Net Fuzzy Filtering: Basic Description

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

HLTCOE at TREC 2013: Temporal Summarization

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

arxiv: v2 [cs.cl] 26 Mar 2015

Text-mining the Estonian National Electronic Health Record

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Transcription:

1 Word Sense Determination from Wikipedia Data Using a Neural Net CS 297 Report Presented to Dr. Chris Pollett Department of Computer Science San Jose State University By Qiao Liu May 2017

Word Sense Determination from Wikipedia Data Using a Neural Net 2 Table of Contents INTRODUCTION... 3 DELIVERABLE 1... 5 THE MNIST DATASET... 5 SOFTMAX REGRESSION... 5 IMPLEMENTATION... 6 MANUALLY VISUALIZE LEARNING... 6 TENSORBOARD... 8 DELIVERABLE 2... 10 INTRODUCTION TO WORD EMBEDDING... 10 ONE APPLICATION EXAMPLE... 11 APPLY TO THE PROJECT... 12 DELIVERABLE 3... 13 THE DICTIONARY OF AMBIGUOUS WORD... 13 PREPROCESSING DATA... 13 CONCLUSION... 14 REFERENCES... 15

Word Sense Determination from Wikipedia Data Using a Neural Net 3 Introduction Many words carry different meanings based on their context. For instance, apple could refer to a fruit, a company or a film. The ability to identify the entities (such as apple) based on the context where it occurs, has been established as an important task in several areas, including topic detection and tracking, machine translation, and information retrieval. Our aim is to build an entity disambiguation system. The Wikipedia data set has been used as data set in many research projects. One of them, which is similar to our project is Large-Scale Named Entity Disambiguation Based on Wikipedia Data, by Silviu Cucerzan [1]. Information is extracted from the titles of entity pages, the titles of redirecting pages, the disambiguation pages, and the references to entity pages in other Wikipedia articles [1]. The disambiguation process employs a vector space model based on hypothesis, in which a vector representation of the processed document is compared with the vector representations of the Wikipedia entities [1]. In our project, we use the English Wikipedia dataset as a source of word sense, and word embedding to determine the sense of word within the given context. Word embedding were originally introduced by Bengio, et al, in 2000 [2]. A Word embedding is a parameterized function mapping words in some language to high-

Word Sense Determination from Wikipedia Data Using a Neural Net 4 dimensional vectors. Methods to generate this mapping include neural networks, dimensionality reduction on the word co-occurrence matrix, probabilistic models, and explicit representation in terms of the context in which words appear. Using a neural network to learn word embedding is one of the most exciting area of research in deep learning now [3]. Unlike previous work, in our project, we will use neural network to learn word embeddings. The following are the deliverables I have done in this semester to understand the essence of machine learning, neural network, word embedding and the work flow of TensorFlow. In Deliverable 1, I developed a program to recognize handwritten digits using TensorFlow, softmax and MNIST dataset. TensorBoard is practiced in deliverable 1 as well. In Deliverable 2, I present the introduction to word embedding and some thoughts about the approach of the project. In Deliverable 3, I created a dictionary of the ambiguous entities in Wikipedia and extract those pages to plain text file. More details of those three deliverables are discussed in the following sections.

Word Sense Determination from Wikipedia Data Using a Neural Net 5 Deliverable 1 My first deliverable is an example program implemented in TensorFlow. This program used softmax to recognize handwritten digits from the MNIST dataset. Prior to implementation, I studied machine learning, neural network and Python. The MNIST Dataset The MNIST data is hosted on Yann LeCun's website. MNIST consists of 70000 data point. Each data point consists of a label and an image of a handwritten digit. The label is a digit from 0 to 9. The image is 28 pixels by 28 pixels. I split the 70000 data points to three groups. 55,000 data points in training set. 10,000 data points in test training set. 5,000 in validation set. We can interpret the image as a 2D matrix. One process of the data in the program is flattening this 2D matrix to a 1D array of 28 28 = 784 numbers. This operation retains the feature of the image and keep it consistent with the image and label. Softmax Regression Softmax regression (or multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes. In logistic regression, we assumed that the labels were binary: y(i) {0,1}. Softmax regression allows us to handle y(i) {1,,K} where K is the number of classes [4], which is the case in this practice.

Word Sense Determination from Wikipedia Data Using a Neural Net 6 The Softmax function is given by s(z) j is in the range (0, 1), and s(z) j = 1. In this problem, z $ = $ W ',$ x $ + b ' where W ' is the weights and b ' is the bias for class i, and j is an index for summing over the pixels in our input image. Implementation x is the flattened array of the image feature. W is the weight. b is the bias which is independent of the input. y is the classification outcome from the softmax model. Manually Visualize Learning I trained the model with different numbers of gradient descent iterations and different learning rates, ad shown in the graphs below. The accuracy was between

Word Sense Determination from Wikipedia Data Using a Neural Net 7 89.5% to 92%.

Word Sense Determination from Wikipedia Data Using a Neural Net 8 TensorBoard Beside manually visualizing the diagram above, there is a component named TensorBoard that facilitates visualized learning. It will be helpful to utilize this component in the future. Thus, I experimented using TensorBoard as well. TensorBoard operates by reading TensorFlow events files, which contain summary data that you can generate when running TensorFlow. There are various summary operations, such as scalar, histogram, merge_all, etc [5]. An example diagram created by TensorBoard shows as below.

Word Sense Determination from Wikipedia Data Using a Neural Net 9

Word Sense Determination from Wikipedia Data Using a Neural Net 10 Deliverable 2 Word embedding is a parameterized function mapping words in some language to high-dimensional vectors. Methods to generate this mapping include neural networks, dimensionality reduction on the word co-occurrence matrix, probabilistic models, and explicit representation in terms of the context in which words appear. My literature reviews focus on learning word embedding by a neural network. Introduction to Word Embedding A word embedding is sometimes called a word representation or a word vector. It maps words to a high dimensional vector of real number. The meaningful vector learned can be used to perform some task. word R 1 W( cat ) = [0.3, -0.2, 0.7, ] W( dog ) = [0.5, 0.4-0.6, ] Visualizing the representation of a word in a two-dimension projection, we can sometimes see its intuitive sense. For example, looking at Figure 1, digits are close together, and, there are linear relationship between words.

Word Sense Determination from Wikipedia Data Using a Neural Net 11 Figure 1. Two-dimension Projection [3] One Application Example One task we might train a network for is predicting whether a 5-gram (sequence of five words) is valid. To predict these values accurately, the network needs to learn good parameters for both W and R [3].

Word Sense Determination from Wikipedia Data Using a Neural Net 12 Apply to The Project It turns out, though, that much more sophisticated relationships are also encoded in word embedding [3]. In this project, one possible approach is using pre-trained word vector W and works on R to find disambiguate words. Another possible approach is working on both W and R. The fully analysis, implement and evaluation of the learning algorithm will be done in CS298.

Word Sense Determination from Wikipedia Data Using a Neural Net 13 Deliverable 3 In Deliverable 3, I extracted pages of ambiguous words from Wikipedia data. Since the Wikipedia data is a huge bz2 file, I extracted pages of ambiguous words while decompressing the Wikipedia dump on the fly. The Dictionary of Ambiguous Word A word list was primarily extracted from the disambiguation data on http://wiki.dbpedia.org/downloads-2016-04#h26493-2. I created a main dictionary based on this file. There are plenty pages with words in the main dictionary as the title is redirected to another page. Thus, an additional dictionary is created while decompressing the Wikipedia bz2 file with the main dictionary as a filter. Then, the additional dictionary is used as a filter to decompress the Wikipedia bz2 file again. Preprocessing Data By decompressing the Wikipedia bz2 file twice, a file with only disambiguation page was output. Further data processing is needed in future after the requirement of data is clearer in CS298.

Word Sense Determination from Wikipedia Data Using a Neural Net 14 Conclusion During CS297, I started by learning machine learning, neural networks, TensorFlow, and Python. I practiced on programming to solidify my understanding and gain experience. Literature review on disambiguation led me to understand the state of the art. Literature review on word embedding helped me understand what it is and how it can be used in my project. In CS297, I also started data preprocessing, however, most of the work of data processing will not be done until I figure out the requirements of the data when I work out how to create the model. In CS 298, I will work on how to define and build the model. To do this, I will need to gather a deeper understanding of how word embedding and neural networks work. Data processing will also be an important part of CS298 as well. Meanwhile, I will research on how to evaluation the outcome of the model as well.

Word Sense Determination from Wikipedia Data Using a Neural Net 15 References Last Name, F. M. (Year). Article Title. Journal Title, Pages From - To. 1. Cucerzan, Silviu. (2007). Large-Scale Named Entity Disambiguation Based on Wikipedia Data 2. Bengio, Yoshua. and Ducharme, Réjean. and Vincent, Pascal. and Pascal, Christian (2003). A Neural Probabilistic Language Model. Journal of Machine Learning Research, Pages 1137-1155. 3. Olah, Christopher. (2014). "Deep Learning, NLP, and Representations", http://colah.github.io/posts/2014-07-nlp-rnns-representations/ 4. UFLDL Tutorial, http://ufldl.stanford.edu/tutorial/supervised/softmaxregression/ 5. TensorFlow Tutorial, https://www.tensorflow.org/get_started/summaries_and_tensorboard

Word Sense Determination from Wikipedia Data Using a Neural Net 16