Bayesian Deep Learning for Integrated Intelligence: Bridging the Gap between Perception and Inference

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Generative models and adversarial training

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Probabilistic Latent Semantic Analysis

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

THE world surrounding us involves multiple modalities

arxiv: v2 [cs.cv] 30 Mar 2017

arxiv: v1 [cs.lg] 15 Jun 2015

Truth Inference in Crowdsourcing: Is the Problem Solved?

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Lecture 1: Basic Concepts of Machine Learning

CSL465/603 - Machine Learning

Summarizing Answers in Non-Factoid Community Question-Answering

arxiv: v2 [cs.ir] 22 Aug 2016

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Attributed Social Network Embedding

Calibration of Confidence Measures in Speech Recognition

A Case Study: News Classification Based on Term Frequency

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Speech Emotion Recognition Using Support Vector Machine

Human Emotion Recognition From Speech

arxiv: v1 [cs.cv] 2 Jun 2017

Model Ensemble for Click Prediction in Bing Search Ads

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Axiom 2013 Team Description Paper

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Introduction to Simulation

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A study of speaker adaptation for DNN-based speech synthesis

Exploration. CS : Deep Reinforcement Learning Sergey Levine

arxiv: v1 [cs.cl] 2 Apr 2017

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Artificial Neural Networks written examination

Learning to Rank with Selection Bias in Personal Search

The Evolution of Random Phenomena

(Sub)Gradient Descent

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

Learning From the Past with Experiment Databases

Georgetown University at TREC 2017 Dynamic Domain Track

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Rule Learning With Negation: Issues Regarding Effectiveness

Second Exam: Natural Language Parsing with Neural Networks

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Evolutive Neural Net Fuzzy Filtering: Basic Description

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

Word Segmentation of Off-line Handwritten Documents

Corrective Feedback and Persistent Learning for Information Extraction

Comment-based Multi-View Clustering of Web 2.0 Items

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

WHEN THERE IS A mismatch between the acoustic

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Australian Journal of Basic and Applied Sciences

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Knowledge-Based - Systems

Transfer Learning Action Models by Measuring the Similarity of Different Domains

A Case-Based Approach To Imitation Learning in Robotic Agents

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Device Independence and Extensibility in Gesture Recognition

Modeling function word errors in DNN-HMM based LVCSR systems

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

tom

Switchboard Language Model Improvement with Conversational Data from Gigaword

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Automating the E-learning Personalization

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Bug triage in open source systems: a review

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Dialog-based Language Learning

Latent Semantic Analysis

Assignment 1: Predicting Amazon Review Ratings

SARDNET: A Self-Organizing Feature Map for Sequences

Computerized Adaptive Psychological Testing A Personalisation Perspective

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Learning Methods for Fuzzy Systems

Transcription:

1 Bayesian Deep Learning for Integrated Intelligence: Bridging the Gap between Perception and Inference Hao Wang Department of Computer Science and Engineering Joint work with Naiyan Wang, Xingjian Shi, and Dit-Yan Yeung

2 Perception and Inference See (visual object recognition) Read (text understanding) Hear (speech recognition) Comprehensive AI Think (inference and reasoning)

Perception 3 Bayesian Deep Learning (BDL) Motivation: Our goal Perception & Inference/reasoning Deep Learning & Graphical Models Inference/reasoning Deep learning Graphical model Bayesian deep learning

4 Perception and Inference Perception component Content understanding Task-Specific component Target task Bayesian deep learning (BDL) Maximum a posteriori (MAP) Markov chain Monte Carlo (MCMC) Variational inference (VI)

5 Example: Medical Diagnosis Perception component Symptoms Task-Specific component Reasoning and inference Bayesian deep learning (BDL)

6 Example: Movie Recommender Systems Perception component Content understanding Task-Specific component Similarity, preferences Recommendation Bayesian deep learning (BDL)

7 A Principled Probabilistic Framework Perception Component Task-Specific Component Perception Variables Task Variables Hinge Variables [ Wang et al. 2016 ]

8 BDL Models for Different Applications [ Wang et al. 2016 ]

9 Bayesian Deep Learning: Under a Principled Framework Probabilistic Graphical Models

10 Collaborative Deep Learning [ Wang et al. 2015 (KDD) ]

11 Recommender Systems Rating matrix: Matrix completion Observed preferences: To predict:

12 Recommender Systems with Content Content information: Plots, directors, actors, etc.

13 Modeling the Content Information Handcrafted features Automatically learn features Automatically learn features and adapt for ratings

14 Modeling the Content Information 1. Powerful features for content information Deep learning 2. Feedback from rating information Non-i.i.d. Collaborative deep learning

15 Deep Learning Stacked denoising autoencoders Convolutional neural networks Recurrent neural networks Typically for i.i.d. data

16 Modeling the Content Information 1. Powerful features for content information Deep learning 2. Feedback from rating information Non-i.i.d. Collaborative deep learning (CDL)

17 Contribution Collaborative deep learning: * deep learning for non-i.i.d. data * joint representation learning and collaborative filtering

18 Contribution Collaborative deep learning Complex target: * beyond targets like classification and regression * to complete a low-rank matrix

19 Contribution Collaborative deep learning Complex target First hierarchical Bayesian models for deep hybrid recommender system

20 Stacked Denoising Autoencoders (SDAE) Corrupted input Clean input [ Vincent et al. 2010 ]

21 Probabilistic Matrix Factorization (PMF) Graphical model: Notation: latent vector of item j latent vector of user i rating of item j from user i Generative process: Objective function if using MAP: [ Salakhutdinov et al. 2008 ]

22 Probabilistic SDAE Graphical model: Generative process: Generalized SDAE Notation: corrupted input clean input weights and biases

23 Collaborative Deep Learning (CDL) Graphical model: Collaborative deep learning SDAE Two-way interaction More powerful representation Infer missing ratings from content Infer missing content from ratings Notation: rating of item j from user i latent vector of item j latent vector of user i corrupted input clean input weights and biases content representation

24 A Principled Probabilistic Framework (Recap) Perception Component Task-Specific Component Perception Variables Task Variables Hinge Variables [ Wang et al. 2016 ]

25 CDL with Two Components Graphical model: Collaborative deep learning SDAE Two-way interaction More powerful representation Infer missing ratings from content Infer missing content from ratings Notation: rating of item j from user i latent vector of item j latent vector of user i corrupted input clean input weights and biases content representation

26 Collaborative Deep Learning Neural network representation for degenerated CDL

27 Collaborative Deep Learning Information flows from ratings to content

28 Collaborative Deep Learning Information flows from content to ratings

29 Collaborative Deep Learning Representation learning <-> recommendation

30 Learning maximizing the posterior probability is equivalent to maximizing the joint log-likelihood

31 Learning Prior (regularization) for user latent vectors, weights, and biases

32 Learning Generating item latent vectors from content representation with Gaussian offset

33 Learning Generating clean input from the output of probabilistic SDAE with Gaussian offset

34 Learning Generating the input of Layer l from the output of Layer l-1 with Gaussian offset

35 Learning measures the error of predicted ratings

36 Learning If goes to infinity, the likelihood simplifies to

37 Update Rules For U and V, use block coordinate descent: For W and b, use a modified version of backpropagation:

38 Datasets Content information Titles and abstracts Titles and abstracts Movie plots [ Wang et al. 2011 ] [ Wang et al. 2013 ]

39 Evaluation Metrics Recall: Mean Average Precision (map): Higher recall and map indicate better recommendation performance

40 Comparing Methods Hybrid methods using BOW and ratings Loosely coupled; interaction is not two-way PMF+LDA

41 Recall@M When the ratings are very sparse: citeulike-t, sparse setting Netflix, sparse setting When the ratings are dense: citeulike-t, dense setting Netflix, dense setting

42 Mean Average Precision (map) Exactly the same as Oord et al. 2013, we set the cutoff point at 500 for each user. A relative performance boost of about 50%

43 Number of Layers Sparse Setting Dense Setting The best performance is achieved when the number of layers is 2 or 3 (4 or 6 layers of generalized neural networks).

44 Example User Romance Movies Moonstruck True Romance Precision: 30% VS 20%

45 Example User Action & Drama Movies Johnny English American Beauty Precision: 50% VS 20%

46 Example User Precision: 90% VS 50%

47 Summary: Collaborative Deep Learning Non-i.i.d (collaborative) deep learning With a complex target First hierarchical Bayesian models for hybrid deep recommender system Significantly advance the state of the art

48 Marginalized CDL Transformation to latent factors CDL: Reconstruction error Transformation to latent factors Marginalized CDL: Reconstruction error [ Li et al., CIKM 2015 ]

49 Collaborative Deep Ranking [ Ying et al., PAKDD 2016 ]

Generative Process: Collaborative Deep Ranking 50

51 Symmetric CDL Both item content and user attributes User attributes: age, gender, occupation, country, city, geolacation, domain, etc [ Li et al., CIKM 2015 ]

52 Symmetric CDL Marginalized CDL: Item content Symmetric CDL: Item content User attributes

53 Other Extensions of CDL Word2vec, tf-idf Sampling-based, variational inference Tagging information, networks

54 Relational Stacked Denoising Autoencoders [ Wang et al. 2015 (AAAI) ]

55 BDL for Topic Models and Relational Learning Topic hierarchy Topic generation Word generation Topic-word relation Inter-document relation BDL-Based Topic Models

56 Relational SDAE as Relational Topic Models Perception component Task-Specific component Topic hierarchy Inter-document relation BDL-Based Topic Models [ Wang et al. 2015 (AAAI) ]

57 Relational SDAE: Motivation Unsupervised representation learning Enhance representation power with relational information

58 Probabilistic SDAE Graphical model: Generative process: Generalized SDAE Notation: corrupted input clean input weights and biases

59 Relational SDAE: Graphical Model Notation: corrupted input clean input adjacency matrix

60 Relational SDAE: Two Components Perception Component Task-Specific Component

Relational SDAE: Generative Process 61

Relational SDAE: Generative Process 62

63 Multi-Relational SDAE: Graphical Model Product of Q+1 Gaussians Multiple networks: citation networks co-author networks Notation: corrupted input clean input adjacency matrix

64 Relational SDAE: Objective Function Network A Relational Matrix S Relational Matrix S Middle-Layer Representations

Update Rules 65

From Representation to Tag Recommendation 66

Algorithm 67

Datasets 68

Sparse Setting, citeulike-a 69

Dense Setting, citeulike-a 70

Sparse Setting, movielens-plot 71

Dense Setting, movielens-plot 72

73 Case Study 1: Tagging Scientific Articles Precision: 10% VS 60%

74 Case Study 2: Tagging Movies (SDAE) Precision: 30% VS 60%

75 Case Study 2: Tagging Movies (RSDAE) Does not appear in the tag lists of movies linked to E.T. the Extra-Terrestrial Very difficult to discover this tag

76 Relational SDAE as Deep Relational Topic Models Perception component Task-Specific component Topic hierarchy Inter-document relation BDL-Based Topic Models Unified into a probabilistic relational model for relational deep learning [ Wang et al. 2015 (AAAI) ]

77 Applications of Bayesian Deep Learning: Under a Principled Framework Relational SDAE Collaborative Deep Learning Probabilistic Graphical Models

78 Take-home Messages Probabilistic graphical models for formulating both representation learning and inference/reasoning components Learnable representation serving as a bridge Tight, two-way interaction is crucial

79 Future Goals General Framework: 1. Ability of understanding text, images, and videos 2. Ability of inference and planning under uncertainty 3. Close the gap between human intelligence and artificial intelligence

80 Thanks! Q&A