Extracting emerging knowledge from social media. Jae Hee Lee (COMP3740) (Supervisor: Dongwoo Kim) 18 May 2018

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Python Machine Learning

arxiv: v1 [cs.cv] 10 May 2017

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v4 [cs.cl] 28 Mar 2016

Attributed Social Network Embedding

A Neural Network GUI Tested on Text-To-Phoneme Mapping

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Artificial Neural Networks written examination

Lecture 1: Machine Learning Basics

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Assignment 1: Predicting Amazon Review Ratings

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

arxiv: v1 [cs.lg] 15 Jun 2015

Model Ensemble for Click Prediction in Bing Search Ads

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Word Segmentation of Off-line Handwritten Documents

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

16.1 Lesson: Putting it into practice - isikhnas

Georgetown University at TREC 2017 Dynamic Domain Track

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Human Emotion Recognition From Speech

Comment-based Multi-View Clustering of Web 2.0 Items

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A study of speaker adaptation for DNN-based speech synthesis

Knowledge Transfer in Deep Convolutional Neural Nets

ON THE USE OF WORD EMBEDDINGS ALONE TO

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

On-Line Data Analytics

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

arxiv: v3 [cs.cl] 7 Feb 2017

Deep Neural Network Language Models

Modeling function word errors in DNN-HMM based LVCSR systems

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Probabilistic Latent Semantic Analysis

arxiv: v1 [cs.lg] 7 Apr 2015

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Offline Writer Identification Using Convolutional Neural Network Activation Features

THE world surrounding us involves multiple modalities

INPE São José dos Campos

Detecting English-French Cognates Using Orthographic Edit Distance

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Rule Learning With Negation: Issues Regarding Effectiveness

arxiv: v2 [cs.cl] 26 Mar 2015

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Residual Stacking of RNNs for Neural Machine Translation

Reducing Features to Improve Bug Prediction

THE enormous growth of unstructured data, including

Modeling function word errors in DNN-HMM based LVCSR systems

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Summarizing Answers in Non-Factoid Community Question-Answering

AQUA: An Ontology-Driven Question Answering System

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

arxiv: v1 [cs.cl] 2 Apr 2017

Speech Recognition at ICSI: Broadcast News and beyond

arxiv: v1 [cs.cl] 27 Apr 2016

BENCHMARKING OF FREE AUTHORING TOOLS FOR MULTIMEDIA COURSES DEVELOPMENT

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

HLTCOE at TREC 2013: Temporal Summarization

There are some definitions for what Word

arxiv: v2 [cs.ir] 22 Aug 2016

SARDNET: A Self-Organizing Feature Map for Sequences

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Cost-sensitive Deep Learning for Early Readmission Prediction at A Major Hospital

Australian Journal of Basic and Applied Sciences

CS Machine Learning

Second Exam: Natural Language Parsing with Neural Networks

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Calibration of Confidence Measures in Speech Recognition

Speech Emotion Recognition Using Support Vector Machine

Field Experience Management 2011 Training Guides

Statewide Framework Document for:

Test Effort Estimation Using Neural Network

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

On the Formation of Phoneme Categories in DNN Acoustic Models

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database

Applications of data mining algorithms to analysis of medical data

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Using dialogue context to improve parsing performance in dialogue systems

WHEN THERE IS A mismatch between the acoustic

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

CSL465/603 - Machine Learning

MetaPAD: Meta Pattern Discovery from Massive Text Corpora

GACE Computer Science Assessment Test at a Glance

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Linking Task: Identifying authors and book titles in verbose queries

A deep architecture for non-projective dependency parsing

Indian Institute of Technology, Kanpur

Speaker Identification by Comparison of Smart Methods. Abstract

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Laboratorio di Intelligenza Artificiale e Robotica

Beyond the Pipeline: Discrete Optimization in NLP

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Transcription:

Extracting emerging knowledge from social media Jae Hee Lee (COMP3740) (Supervisor: Dongwoo Kim) 18 May 2018

2 Motivation Aiming to construct multimedia knowledge graph, a part of Picturing Knowledge project. Enhancing the existing knowledge graph developed at the ANU Computational Media Lab Focusing on extracting relations from multimedia source

3 Background: Knowledge Graph Representing knowledge in a graph to connect related information

4 Background: Knowledge Base Database for knowledge sharing

5 Background: Knowledge Base Barack Obama query to DBpedia

6 Background: OpenIE Extraction of relation tuples from text by leveraging the sentence structure. Born in a small town, she took the midnight train going anywhere Each clause then maximally shortened Split each sentence into a set of entailed clauses She took the midnight train She took midnight train Fragments are segmented into OpenIE triples she took the midnight train going anywhere Born in a small town, she took the midnight train Born in a town, she took the midnight train (She;took;midnight train)

7 Background: Neural Relation Extraction (NRE) Serves the same purpose as OpenIE but uses neural network to extract relations. Two neural network models are implemented: recurrent neural network (RNN); and convolutional neural network (CNN).

Background: Neural Network 8

9 Convolutional Neural Network (CNN) Often used for classifying input images into categories. Four main layers: Convolution layer Non-linearity function layer Max-pooling layer Fully connected layer

10 CNN Consider each image as a matrix of pixel values.

11 CNN: Convolution Layer Extracts features from an image. Green matrix: 5 x 5 image pixels Yellow matrix: 3 x 3 matrix (a filter) From the top-left, CNN slides the filter by 1 pixel (a stride) and perform some calculation (e.g. element-wise multiplication). The final output is called feature map

12 CNN: Non-linearity Layer If negative value, it returns 0 and positive values stay unchanged. The feature map becomes the rectified feature map. Rectified Linear Unit (ReLU)

13 CNN: Max-pooling layer This layer further reduces the dimensionality of each feature map but retains most important information. The max-pooling gets the maximum value. Like the convolution layer, we slide over the feature map.

14 CNN: Fully Connected Layer The output of max-pooling layer represents features of the input image. The purpose is to use these features for classifying the input into various classes. Therefore, this layer represents a feature vector for the input. The sum of output probabilities from the layer is 1 by applying Softmax function that generates a probability distribution over N different outcomes.

15 Work Done Data preparation for NRE (extracted from ElasticSearch DB) OpenIE implementation NRE implementation Graph Visualization

OpenIE + Visualization 16

17 Implementation details (OpenIE) Used Java with Stanford NLP tools FrameNet and DBpedia were used to extract the canonical form of entities-relation tuples. Semafor was used as an API to use FrameNet data.

18 Implementation details (Graph) Python used for backend (Django framework) JavaScript with D3.js library for the front-end Compatible with modern browsers (tested on Chrome and Firefox)

19 Result (OpenIE) Generated possible tuples for each sentence ranked via using TF-IDF with harmonic mean. Extracted canonical forms using DBpedia and FrameNet. Enhanced the knowledge graph.

Result (OpenIE) 20

21 Export Process Used existing Java library to export the result obtained from OpenIE to.gexf file format. The visualization tool then imports and renders the file.

Result (Visualization) 22

Neural Relation Extraction 23

24 CNN on sentences W1 W2 W3 W4 W5 W6 Sentence Encoding Step Given Pre-trained word vectors (New York Times dataset). A set of sentences and corresponding entities. To do Generate a vector representation of each word in a sentence comprised of word and position embeddings. In the convolution layer, prepare filters that slide over a set of word vectors. For example, if there are 6 words in a sentence, a filter with size 3 will slide four times. In the max-pooling layer, we select the maximum value from the feature map generated from the convolution layer.

25 CNN on sentences Selective Attention Step Obtained sentences representation or vector (sentence embeddings). To do (subject, object) Sentence 1 Sentence 2 Sentence 3 Have a set S that contains n sentences for each entity pair. Relation 1: score (0~1) Relation 2: score (0~1) Relation 3: score (0~1) Generate a set vector s that is a weighted sum of sentence vectors. Weight is computed using Selective Attention that uses scores that tell how well the input sentence and the relation match. Using Softmax function, scores are determined for all relations of the set S.

26 Implementation details (NRE) Python (with Tensorflow) was used. Implemented RNN and CNN with Attention Layer. For performance measure, precision along with ROC- AUC score are used.

27 Result (NRE) CNN Precision @ 100:0.05 Precision @ 200:0.05 Precision @ 300:0.06 ROC-AUC score:0.93 Parameter used: number of filters (per filter size): 64 Number of epoch: 3 Maximum sentences per batch: 100 Batch size: 4 30% of the training set used RNN Precision @ 100:0.59 Precision @ 200:0.58 Precision @ 300:0.57 ROC-AUC score:0.99 Parameter used: hidden units in a hidden layer: 16 Number of epoch: 3 Maximum sentences per batch: 100

28 Evaluation Given limited resources (i.e. Tesla K40C GPU), there was a limitation as to setting parameters for a proper experiment. In a limited environment, RNN works better than CNN.

29 Future Work Experiment on a better system to properly experiment the performance of RNN and CNN. Visualize the NRE result using the graph constructed when working with OpenIE. Compare the result of OpenIE and NRE quantitatively. Find a method to enhance the performance. Use the data prepared to measure the performance of NRE. Try other neural network models than RNN and CNN.

30 Conclusion As a part of extracting emerging knowledge from social media, relation extraction over sentences is an important task. OpenIE extracts relations and together with visualization does give useful information. NRE extracts relations given sentences and corresponding entities. The performance of both methods needs to be quantitatively and qualitatively analysed to see which one performs better.

31 Acknowledgement I would like to express my special thanks of gratitude to my supervisor, Dongwoo Kim, who gave me guidance throughout the semester. Furthermore, I would also like to thank researchers at ANU Computational Media Lab who gave invaluable advice.

THANK YOU 32