Knowledge Representation and Reasoning with Deep Neural Networks. Arvind Neelakantan

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Second Exam: Natural Language Parsing with Neural Networks

Lecture 1: Machine Learning Basics

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v4 [cs.cl] 28 Mar 2016

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Generative models and adversarial training

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CSL465/603 - Machine Learning

CS Machine Learning

arxiv: v1 [cs.lg] 15 Jun 2015

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Python Machine Learning

Lecture 10: Reinforcement Learning

Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks

arxiv: v1 [cs.cv] 10 May 2017

A Case Study: News Classification Based on Term Frequency

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Semantic and Context-aware Linguistic Model for Bias Detection

Cal s Dinner Card Deals

Dialog-based Language Learning

Residual Stacking of RNNs for Neural Machine Translation

AQUA: An Ontology-Driven Question Answering System

Lip Reading in Profile

Georgetown University at TREC 2017 Dynamic Domain Track

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Assignment 1: Predicting Amazon Review Ratings

Grade 6: Correlated to AGS Basic Math Skills

Compositional Semantics

A study of speaker adaptation for DNN-based speech synthesis

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

arxiv: v1 [cs.lg] 7 Apr 2015

Deep Neural Network Language Models

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

An investigation of imitation learning algorithms for structured prediction

Probabilistic Latent Semantic Analysis

arxiv: v2 [cs.ir] 22 Aug 2016

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

A deep architecture for non-projective dependency parsing

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Introduction to Simulation

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Learning Microsoft Office Excel

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Radius STEM Readiness TM

Statewide Framework Document for:

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

On-the-Fly Customization of Automated Essay Scoring

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Online Updating of Word Representations for Part-of-Speech Tagging

(Sub)Gradient Descent

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A Vector Space Approach for Aspect-Based Sentiment Analysis

TextGraphs: Graph-based algorithms for Natural Language Processing

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Evolutive Neural Net Fuzzy Filtering: Basic Description

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

arxiv: v5 [cs.ai] 18 Aug 2015

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Dublin City Schools Mathematics Graded Course of Study GRADE 4

1.11 I Know What Do You Know?

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Objective: Add decimals using place value strategies, and relate those strategies to a written method.

Artificial Neural Networks written examination

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Lecture 2: Quantifiers and Approximation

Discriminative Learning of Beam-Search Heuristics for Planning

Learning From the Past with Experiment Databases

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Axiom 2013 Team Description Paper

Theory of Probability

Natural Language Processing. George Konidaris

Level 1 Mathematics and Statistics, 2015

Circuit Simulators: A Revolutionary E-Learning Platform

Rule Learning With Negation: Issues Regarding Effectiveness

Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs

arxiv: v2 [cs.lg] 8 Aug 2017

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Learning Methods for Fuzzy Systems

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

ON THE USE OF WORD EMBEDDINGS ALONE TO

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Speech Recognition at ICSI: Broadcast News and beyond

CS 598 Natural Language Processing

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

arxiv: v1 [cs.cl] 2 Apr 2017

Transfer Learning Action Models by Measuring the Similarity of Different Domains

PIRLS. International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries

Unit 7 Data analysis and design

A JOINT MANY-TASK MODEL: GROWING A NEURAL NETWORK FOR MULTIPLE NLP TASKS

Distant Supervised Relation Extraction with Wikipedia and Freebase

12- A whirlwind tour of statistics

Transcription:

Knowledge Representation and Reasoning with Deep Neural Networks Arvind Neelakantan UMass Amherst: David Belanger, Rajarshi Das, Andrew McCallum and Benjamin Roth Google Brain: Martin Abadi, Dario Amodei, Quoc Le and Ilya Sutskever 1

Knowledge Representation and Reasoning Represent world knowledge so that computers can use it Manipulating available knowledge to produce desired behavior Language understanding, robotics,. 2

Early Systems Symbolic Representation Reasoning/Inference with search General Problem Solver (Simon et al., 1959), Cyc (Lenat et al., 1986),. Precise 3

Early Systems Knowledge: Permissible Transformations Reasoning: Search Algorithm 4

Example 5

Example Which venue has the biggest turnout? 6

Example Which venue had the biggest turnout? 1. Pick column Attendance 2. Get Position of Max entry 3. Print corresponding entry from column Site 7

Example Which venue had the biggest turnout? select site (max attendance) Manipulating Symbols and Discrete Processing! 8

Early Systems: Issues Real-world data is challenging Lack of generalization to large number of symbols No learning 9

Recent Work Markov Logic Networks (Richardson & Domingos, 2006), Probabilistic Soft Logic (Kimmig et al., 2012), Semantic Parsers (Zelle & Mooney, 1996),. Some components are learned Symbolic, most of the problems remain 10

Deep Neural Networks Output Input Speech Recognition: ~5% absolute accuracy improvement Image Recognition: ~10% absolute accuracy improvement (Dahl et al., 2012) and (Krizhevsky et al., 2012) 11

Deep Neural Networks Output real-valued vector (distributed representations) Input Continuous data and processing through real-numbers Transformation from input to output is learned from data using backpropagation algorithm 12

Perception vs Reasoning Input: Continuous Data vs Discrete Symbols Processing: Fuzzy vs Programs containing discrete operations, Rules, 13

Deep Neural Networks for Knowledge Representation and Reasoning 14

Deep Neural Networks for Knowledge Representation and Reasoning 1. Can we represent symbols with distributed representations and learn them? 2. Can we learn neural networks to perform reasoning with these representations? 15

Deep Neural Networks for Knowledge Representation and Reasoning 1. Generalization via distributed representations 2. Powerful non-linear models 3. Learn end-to-end, handle messy real-world data 16

Deep Neural Networks for Knowledge Representation and Reasoning 1. Can we represent symbols with distributed representations and learn them? 2. Can we train neural networks to perform reasoning with these representations? Massive Structured Knowledge Base Semi-Structured Web Tables 17

Knowledge Graphs Melinda Gates ChairOf Gates Foundation Headquarters Seattle 18

Knowledge Graph Path Queries Task1 LivesIn Melinda Gates ChairOf Gates Foundation Headquarters Seattle ChairOf (A, X) ^ Headquarters (X, B) LivesIn (A, B) 19

Program Induction/Semantic Parsing 20

Program Induction/Semantic Parsing Which venue had the biggest turnout? => select site (max attendance) how many games were telecasted in CBS? => count(location == CBS) 21

Program Induction/Semantic Parsing Task 2 Which venue had the biggest turnout? => select site (max attendance) how many games were telecasted in CBS? => count(location == CBS) 22

Related Work in Reasoning Natural Language Inference/Textual Entailment Visual Question Answering Reading Comprehension 23

Task 1: Knowledge Graph Path Queries Arvind Neelakantan, Benjamin Roth, and Andrew McCallum. Knowledge base completion using compositional vector space models. Workshop on Automated Knowledge Base Construction at NIPS, 2014 Arvind Neelakantan, Benjamin Roth, and Andrew McCallum. Compositional vector space models for knowledge base completion. ACL, 2015 Rajarshi Das, Arvind Neelakantan, David Belanger, and Andrew McCallum. Chains of reasoning over entities, relations, and text using recurrent neural networks. EACL, 2017 24

Path Queries heads Single-Hop Melinda Gates ChairOf Gates Foundation LivesIn Multi-Hop Melinda Gates ChairOf Gates Foundation Headquarters Seattle 25

Motivation LivesIn Melinda Gates ChairOf heads Gates Foundation Headquarters headquartered in Seattle leads headquarters located in leader of founded in chairperson of based in Previous Work: Symbolic - Path Ranking Algorithm (Lao et al., 2011) & Sherlock (Schoenmackers et al., 2010) - Combinatorial Explosion => Poor Generalization 26

Multi-hop Reasoning: Current Methods do not generalize to unseen paths 27

Model (Neelakantan, Roth, McCallum, 2014) LivesIn Vector Similarity RNN RNN Melinda Gates ChairOf Gates Foundation Headquarters Seattle Generalize to Unseen Paths! 28

Selection/Attention 1. Max Similarity Score Target Relation Spouse Bill Gates Friends........ Warren Buffett visited Melinda Gates ChairOf Gates Foundation Headquarters Seattle Train with backprop! 29

Data Entity Pairs 3.2M Facts Relations 52M 51K Relation types tested 46 Total # paths 191M Average Path Length 4.7 Maximum Path Length 7 30

Results - Attention Method Mean Average Precision Path Ranking Algorithm 64.4 Path Ranking Algorithm + bigram 64.9 RNN (max) 65.2 31

Selection/Attention 1. Max 2. Average 3. top-k 4. LogSumExp Similarity Score Target Relation Melinda Gates Spouse Bill Gates ChairOf (Das, Neelakantan, Belanger, McCallum, 2016) Friends........ Gates Foundation Warren Buffett visited Headquarters Seattle Train with backprop! 32

Results - Attention Method Mean Average Precision Path Ranking Algorithm 64.4 Path Ranking Algorithm + bigram 64.9 RNN (max) 65.2 RNN (avg) 55.0 RNN (top-k) 68.2 RNN (logsumexp) 70.1 33

Predictive Paths seen paths /people/person/place_of_birth(a, B) A X Y B was born in /location/mailing_address/ citytown /location/mailing_address/ state_province_region A X B from /location/location/ contains -1 unseen paths /people/person/place_of_birth(a, B) A X B born in near was born in commonly known as 34

Multi-hop Reasoning: Current Methods do not generalize to unseen paths Recurrent Neural Networks achieve state-of-the-art results on answering path queries 35

Zero-Shot LivesIn Vector Similarity RNN RNN Melinda Gates ChairOf Gates Foundation Headquarters Seattle Predict relations not explicitly trained on! 36

Results Method Mean Average Precision Random 7.6 RNN (zero-shot) 20.6 RNN (supervised) 50.1 37

Multi-hop Reasoning: Current Methods do not generalize to unseen paths Recurrent Neural Networks achieve state-of-the-art results on answering path queries RNNs can perform zero-shot learning! 38

Deep Neural Networks for Knowledge Representation and Reasoning Recurrent Neural Networks achieve state-of-the-art results on answering knowledge graph path queries 39

Task 2: Program Induction/Semantic Parsing Arvind Neelakantan, Quoc V. Le, and Ilya Sutskever. Neural Programmer: Inducing latent programs with gradient descent. ICLR, 2016 Arvind Neelakantan, Quoc V Le, Martin Abadi, Andrew McCallum, and Dario Amodei. Learning a natural language interface with neural programmer. ICLR, 2017 40

Program Induction/Semantic Parsing 41

Program Induction/Semantic Parsing Lookup Question: Which venue had the biggest turnout? Number Question: how many games were telecasted in CBS? 42

Program Induction/Semantic Parsing Which venue had the biggest turnout? => select site (max attendance) how many games were telecasted in CBS? => count(location == CBS) 43

Challenges Multi-step Reasoning Which section is the longest? => select name (max kilometers) Weak Supervision Which section is the longest? => select name (max kilometers) aaaaaaaaaaaaaaaaaaaaaaaaaaaa => IDF Checkpoint 44

Motivation Strong Supervision Weak Supervision (dataset specific rules to guide program search) Non-Neural Network Zelle & Mooney, (1996); Zettlemoyer & Collins, (2005) Liang et al., (2011); Kwiatkowski et al., (2013); Pasupat & Liang, (2015) 45

End-to-End Neural Networks Learning Discrete Functions is notoriously challenging! (Joulin & Mikolov, 2015) 46

Semantic Parsing: multi-step reasoning with discrete functions; weak supervision 47

Neural Programmer (Neelakantan, Le, Sutskever, 2016) What was the timestep t = 1,,T total number of Neural Network goals scored in 2005 Scalar Row Column Selection Operation Selection Lookup Answer Answer Selector Operations Count Select ArgMax ArgMin > < Print Row Selector from t-1 Table 48 Data from Table

Neural Programmer History RNN Timestep t Outputt ht-1 Table Input at step t ct RNN step Question RNN ht q Col Selector Op Selector [ ; ] Operations t = 1, 2,, T Input at step t+1 Final Output = OutputT Output: Scalar Answer, Lookup Answer, Row Selector 49

Operations Row Selector: vector with size equal to number of rows - Comparison: >, <, >=, <= - Superlative: argmax, argmin - Table Ops: select, first, last, prev, next, group_by_max - Reset/No-Op Scalar Answer: real number - Aggregation: count Lookup Answer: matrix with same dimension as table - Print 50

Example Question Step 1 Step 2 Step 3 Step 4 What was the total number of goals scored in 2005 Operation Column No-Op - No-Op - select season print goals 51

Weak Supervision Question Step 1 Step 2 Step 3 Step 4 What was the total number of goals scored in 2005 Operation Column No-Op - No-Op - select season print goals Final Answer: 12 52

Soft Selection/Attention (Bahdanau, Cho, Bengio, 2014) Average outputs of the different operations weighted by the probabilities from the model Train with backprop! 53

Soft Selection/Attention 0.7 0.3 Column A Column B 0.6 Operation A 10-5 0.4 Operation B 100 50 Output 0.6 x 0.7 x 10 + 0.6 x 0.3 x -5 + 0.4 x 0.7 x 100 + 0.4 x 0.3 x 50 + 54

Training Objective Final Answer - Number Answer: Square Loss - Lookup Answer: Average of loss on each entry Answer simply written down introduces ambiguity - Number could be generated or a table entry - Multiple table entries match the answer - Minimum of individual losses 55

Semantic Parsing: multi-step reasoning with discrete functions; weak supervision Neural Programmer can be trained end-to-end with backprogragation using weak supervision 56

Previous Work Non-Neural Network Neural Network Strong Supervision Zelle & Mooney, (1996); Zettlemoyer & Collins, (2005) Jia & Liang, (2016); Neural Programmer Interpreter (Reed & De Freitas, 2015); Neural Enquirer (Yin et al., 2016) Weak Supervision Liang et al., (2011); Kwiatkowski et al., (2013); Pasupat & Liang, (2015) Dynamic Neural Module Network (Andreas et al., 2016) not end-to-end 57

Experiments WikiTablesQuestions dataset (Pasupat & Liang, 2015) Database at test time are unseen during training 10k training examples with weak supervision Hard selection at test time 4 timesteps and 15 operations 58

Neural Networks Seq2Seq (Sutskever, Vinyals & Le, 2014) 8.9% accuracy Pointer Networks (Vinyals, Fortunato & Jaitly, 2015) 4.0% accuracy on lookup questions 59

Results (Neelakantan, Le, Abadi, McCallum, Amodei, 2017) Method Dev Accuracy Test Accuracy Information Retrieval System 13.4 12.7 Simple Semantic Parser 23.6 24.3 Semantic Parser (Pasupat & Liang, 2015) 37.0 37.1 Neural Programmer - {dropout, weight decay } 30.3 - Neural Programmer 34.2 34.2 Ensemble of 15 Neural Programmers 37.5 37.7 60

Training Data Size Textual Entailment Textual Entailment Reading Comprehension # Training Examples 4.5k 550k 86k Non-Neural Network 77.8 78.2 51.0 Neural Network 71.3 88.3 82.9 61

Conversational QA(Iyyer, Yih, Chang, 2017) Method Test Accuracy Semantic Parser (Pasupat & Liang, 2015) 33.2 Neural Programmer 40.2 DynSP (Iyyer, Yih, Chang, 2017) 44.7 62

Semantic Parsing: multi-step reasoning with discrete functions; weak supervision Neural Programmer can be trained end-to-end with backprogragation using weak supervision Neural Programmer works surprisingly well on a small real-world dataset 63

Example Programs (1) Question Step 1 Step 2 Step 3 Step 4 What is the total number of teams? Operation Column - - - - - - count - how many games had greater than 1500 in attendance? Operation Column - - - - >= attendance count - what is the total number of runnerups listed on the chart? Operation Column - - - - select outcome count - 64

Example Programs (2) Question Step 1 Step 2 Step 3 Step 4 which section is longest?? Operation Column - - - - argmax kilometers print name Which engine(s) has the least amount of power? Operation Column - - - - argmin power print engine Who had more silver medals, cuba or brazil? Operation Column argmax nation select nation argmax silver print nation 65

Example Programs (3) Question Step 1 Step 2 Step 3 Step 4 who was the next appointed director after lee p. brown? Operation Column select name next - last - print name what team is listed previous to belgium? Operation Column select team previous - first - print team 66

Summary 67

Deep Neural Networks for Knowledge Representation and Reasoning Recurrent Neural Networks achieve state-of-the-art results on answering knowledge graph path queries Neural Programmer achieves competitive results on a small real-world question answering dataset 68

Key Components Recurrent Neural Networks Attention/Selection Mechanism Backpropagation 69

Deep Neural Networks for Knowledge Representation and Reasoning Recurrent Neural Networks achieve state-of-the-art results on answering knowledge graph path queries Neural Programmer achieves competitive results on a small Code and Data are publicly available! real-world question answering dataset 70

Acknowledgements: Google PhD Fellowship, UMass Amherst and Google Brain Thank You! 71