Unsupervised Learning Jointly With Image Clustering

Similar documents
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Python Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

arxiv: v1 [cs.lg] 15 Jun 2015

Generative models and adversarial training

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

A study of speaker adaptation for DNN-based speech synthesis

arxiv: v4 [cs.cv] 13 Aug 2017

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

CSL465/603 - Machine Learning

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints

arxiv:submit/ [cs.cv] 2 Aug 2017

Rule Learning With Negation: Issues Regarding Effectiveness

Active Learning. Yingyu Liang Computer Sciences 760 Fall

(Sub)Gradient Descent

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Knowledge Transfer in Deep Convolutional Neural Nets

THE world surrounding us involves multiple modalities

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A Review: Speech Recognition with Deep Learning Methods

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Rule Learning with Negation: Issues Regarding Effectiveness

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

arxiv: v2 [cs.cv] 30 Mar 2017

Word Segmentation of Off-line Handwritten Documents

Team Formation for Generalized Tasks in Expertise Social Networks

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Cultivating DNN Diversity for Large Scale Video Labelling

Learning Methods for Fuzzy Systems

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Comment-based Multi-View Clustering of Web 2.0 Items

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

THE enormous growth of unstructured data, including

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Evolutive Neural Net Fuzzy Filtering: Basic Description

A Deep Bag-of-Features Model for Music Auto-Tagging

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Deep Neural Network Language Models

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

arxiv: v1 [cs.lg] 7 Apr 2015

Attributed Social Network Embedding

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

arxiv: v1 [cs.cv] 10 May 2017

INPE São José dos Campos

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems

CS Machine Learning

Diverse Concept-Level Features for Multi-Object Classification

arxiv: v1 [cs.cl] 2 Apr 2017

Speech Emotion Recognition Using Support Vector Machine

arxiv: v2 [cs.ir] 22 Aug 2016

Offline Writer Identification Using Convolutional Neural Network Activation Features

Lecture 1: Machine Learning Basics

Georgetown University at TREC 2017 Dynamic Domain Track

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Artificial Neural Networks written examination

Residual Stacking of RNNs for Neural Machine Translation

Top US Tech Talent for the Top China Tech Company

arxiv: v1 [cs.cl] 27 Apr 2016

arxiv: v4 [cs.cl] 28 Mar 2016

SARDNET: A Self-Organizing Feature Map for Sequences

WHEN THERE IS A mismatch between the acoustic

Model Ensemble for Click Prediction in Bing Search Ads

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

arxiv: v2 [cs.cl] 26 Mar 2015

Australian Journal of Basic and Applied Sciences

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

The stages of event extraction

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Dropout improves Recurrent Neural Networks for Handwriting Recognition

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

SORT: Second-Order Response Transform for Visual Recognition

Sample Problems for MATH 5001, University of Georgia

Semi-Supervised Face Detection

Soft Computing based Learning for Cognitive Radio

Deep Facial Action Unit Recognition from Partially Labeled Data

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Lip Reading in Profile

Second Exam: Natural Language Parsing with Neural Networks

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

A survey of multi-view machine learning

ON THE USE OF WORD EMBEDDINGS ALONE TO

Reducing Features to Improve Bug Prediction

arxiv: v1 [cs.cl] 20 Jul 2015

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Transcription:

Unsupervised Learning Jointly With Image Clustering Jianwei Yang Devi Parikh Dhruv Batra Virginia Tech https://filebox.ece.vt.edu/~jw2yang/ 1

2

Huge amount of images!!! 3

Huge amount of images!!! Learning without annotation efforts 4

Huge amount of images!!! Learning without annotation efforts What we need to learn? 5

Huge amount of images!!! Learning without annotation efforts What we need to learn? An open problem 6

Huge amount of images!!! Learning without annotation efforts What we need to learn? An open problem A hot problem 7

Huge amount of images!!! Learning without annotation efforts What we need to learn? An open problem A hot problem Various methodologies 8

Learning distribution (structure) Clustering Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323. 9

Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323. 10

Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Hierarchical Clustering Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323. 11

Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Hierarchical Clustering Spectral Clustering Manor et al, NIPS 04 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323. 12

Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Hierarchical Clustering Spectral Clustering Manor et al, NIPS 04 Graph Cut Shi et al, TPAMI 00 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323. 13

Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Hierarchical Clustering Spectral Clustering Manor et al, NIPS 04 Graph Cut Shi et al, TPAMI 00 DBSCAN, Ester et al, KDD 96 (Image Credit: Jesse Johnson) Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323. 14

Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Hierarchical Clustering Spectral Clustering Manor et al, NIPS 04 Graph Cut Shi et al, TPAMI 00 DBSCAN, Ester et al, KDD 96 (Image Credit: Jesse Johnson) EM Algorithm, Dempster et al, JRSS 77 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323. 15

Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Hierarchical Clustering Spectral Clustering Manor et al, NIPS 04 Graph Cut Shi et al, TPAMI 00 DBSCAN, Ester et al, KDD 96 (Image Credit: Jesse Johnson) EM Algorithm, Dempster et al, JRSS 77 NMF, Xu et al, SIGIR 03 (Image Credit: Conrad Lee) Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323. 16

Learning distribution (structure) Sub-space Analysis PCA (Image Credit: Jesse Johnson) ICA (Image Credit: Shylaja et al) tsne, Maaten et al, JMLR 08 Subspace Clustering, Vidal et al. Sparse coding, Olshausen et al. Vision Research 97 17

Learning representation (feature) Bengio et al, TPAMI 13 Autoencoder, Hinton et al, Science 06 (Image Credit: Jesse Johnson) DBN, Hinton et al, Science 06 DBM, Salakhutdinov et al, AISTATS 09 Yoshua Bengio, Aaron Courville, and Pierre Vincent. "Representation learning: A review and new perspectives." IEEE Transactions on Pattern Analysis and Machine Intelligence. 35.8 (2013): 1798-1828. 18

Learning representation (feature) VAE, Kingma et al, arxiv 13 (Image Credit: Fast Forward Labs) GAN, Goodfellow et al, NIPS 14 DCGAN, Radford et al, arxiv 15 (Image Credit: Mike Swarbrick Jones) 19

Most Recent CV Works Spatial context, Doersch et al, ICCV 15 Temporal context, Wang et al, ICCV 15 Ego-motion, Jayaraman et al, ICCV 15 Solving Jigsaw, Noroozi et al, ECCV 16 Context Encoder, Deepak et al, CVPR 16 20

Most Recent CV Works TAGnet, Wang et al, SDM 16 Visual concept clustering, Huang et al, CVPR 16 Deep Embedding, Xie et al, ICML 16 Graph constraint, Li et al, ECCV 16 21

Our Work Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters 22

Outline Intuition Approach Experiments Extensions 23

Intuition Meaningful clusters can provide supervisory signals to learn image representations 24

Intuition Meaningful clusters can provide supervisory signals to learn image representations Good representations help to get meaningful clusters 25

Intuition Cluster images first, and then learn representations 26

Intuition Cluster images first, and then learn representations Learn representations first, and then cluster images 27

Intuition Cluster images first, and then learn representations Learn representations first, and then cluster images Cluster images and learn representations progressively 28

Intuition Good clusters Good cluster Good representations Poor clusters Poor representations Good representations 29

Intuition Good clusters Good cluster Good representations Poor clusters Poor representations Good representations 30

Intuition Good clusters Good cluster Good representations Poor clusters Poor representations Good representations 31

Intuition Good clusters Good cluster Good representations Poor clusters Poor representations Good representations 32

Approach Framework Objective Algorithm & Implementation 33

Approach: Framework arg min L( y, I) Convolutional Neural Network Representation Learning arg min L( y, I) y, Agglomerative Clustering Agglomerative Clustering arg min L( y, I) y 34

Approach: Framework Convolutional Neural Network Agglomerative Clustering arg min L( y, I) arg min L( y, I) y 35

Approach: Recurrent Framework 36

Approach: Recurrent Framework 37

Approach: Recurrent Framework 38

Approach: Recurrent Framework 39

Approach: Recurrent Framework 40

Approach: Recurrent Framework 41

Approach: Recurrent Framework Backward at each time-step is time-consuming and prone to over-fitting! 42

Approach: Recurrent Framework Backward at each time-step is time-consuming and prone to over-fitting! How about updating once for multiple time-steps? 43

Approach: Recurrent Framework Partially Unrolling: divide all T time-steps into P periods In each period, we merge clusters for multiple times and update CNN parameters at the end of period 44

Approach: Recurrent Framework Partially Unrolling: divide all T time-steps into P periods In each period, we merge clusters for multiple times and update CNN parameters at the end of period 45

Approach: Recurrent Framework Partially Unrolling: divide all T time-steps into P periods In each period, we merge clusters for multiple times and update CNN parameters at the end of period P is determined by a hyper-parameter will be introduced later 46

Approach: Objective Function arg min L( y, I) arg min L( y, I) arg min L( y, I) y, y Overall loss: 47

Approach: Objective Function Loss at time-step t: Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy 48

Approach: Objective Function Loss at time-step t: Affinity measure Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy 49

Approach: Objective Function Loss at time-step t: i-th cluster Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy 50

Approach: Objective Function Loss at time-step t: K_c nearest neighbor clusters of i-th cluster Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy 51

Approach: Objective Function Loss at time-step t: Affinity between i-th cluster and its NN Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy 52

Approach: Objective Function Loss at time-step t: Affinity between i-th cluster and its NN Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy Differences between two cluster affinities 53

Approach: Objective Function Loss at time-step t: Affinity between i-th cluster and its NN Merge these two clusters Differences between two cluster affinities Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy 54

Approach: Objective Function Loss at time-step t: Affinity between i-th cluster and its NN Merge these two clusters Differences between two cluster affinities Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy 55

Approach: Objective Function Loss in forward pass in period p (merge clusters): Loss in forward pass in period p (merge clusters): 56

Approach: Objective Function Loss in forward pass in period p (merge clusters): Loss in forward pass in period p (merge clusters): 57

Approach: Objective Function Loss in forward pass in period p (merge clusters): CNN parameters are fixed Loss in forward pass in period p (merge clusters): 58

Approach: Objective Function Loss in forward pass in period p (merge clusters): CNN parameters are fixed Loss in forward pass in period p (merge clusters): Cluster labels are fixed 59

Approach: Objective Function Forward Pass: Simple Greedy Algorithm Merge two clusters which minimize the loss at each time step 60

Approach: Objective Function Forward Pass: Simple Greedy Algorithm Merge two clusters which minimize the loss at each time step 61

Approach: Objective Function Forward Pass: Simple Greedy Algorithm Merge two clusters which minimize the loss at each time step 62

Approach: Objective Function Forward Pass: Simple Greedy Algorithm Merge two clusters which minimize the loss at each time step 63

Approach: Objective Backward Pass: 64

Approach: Objective Consider all previous periods Backward Pass: 65

Approach: Objective Consider all previous periods Backward Pass: Cluster based loss is not proper for batch optimization!!! 66

Approach: Objective Consider all previous periods Backward Pass: Cluster based loss is not proper for batch optimization!!! Approximation: 67

Approach: Objective Consider all previous periods Backward Pass: Recall cluster-based loss: Convert to sample-based loss: Intra-sample affinity Inter-sample affinity 68

Approach: Objective Consider all previous periods Backward Pass: Recall cluster-based loss: Convert to sample-based loss: Weighted triplet loss Intra-sample affinity Inter-sample affinity 69

Approach: Algorithm & Implementation 70

Approach: Algorithm & Implementation Raw image data 71

Approach: Algorithm & Implementation Raw image data Assume it is known 72

Approach: Algorithm & Implementation Raw image data Assume it is known Randomly initialize CNN parameters 4 samples in each cluster in average 73

Approach: Algorithm & Implementation Raw image data Assume it is known Randomly initialize CNN parameters 4 samples in each cluster in average Train CNN for about 20 epochs 74

Approach: Algorithm & Implementation Raw image data Assume it is known Randomly initialize CNN parameters 4 samples in each cluster in average Train CNN for about 20 epochs We can go back and retrain the model, but it improve slightly 75

Experiments Datasets Network Architecture Image Clustering Representation Learning 76

Experiments: Datasets MNIST (70000, 10, 28x28) USPS (11000, 10, 16x16) COIL20 (1440, 20, 128x128) COIL100 (7200, 100, 128x128) UMist (575, 20, 112x92) FRGC (2462, 20, 32x32) CMU-PIE (2856, 68, 32x32) Youtube Face (1000, 41, 55x55) 77

Experiments: Settings Two important parameters Set the layer numbers so that the Output feature map is about 10x10 78

Experiments: Clustering : Performance +6.43% on NMI to best performance of existing approaches averaged over all datasets 79

Experiments: Clustering : Performance +12.76% on AC to best performance of existing approaches averaged over all datasets 80

Experiments: Clustering : Performance Average +21.5% on NMI 81

Experiments: Clustering : Performance Average +25.7% on NMI 82

Experiments: Clustering : Performance Our clustering performance vs. that of existing clustering approaches using raw image data. Clustering performance using our representation fed to existing clustering algorithms.

Experiments: Clustering : Visualization COIL-20 COIL-100 84

Experiments: Clustering : Visualization USPS MNIST-test 85

Experiments: Clustering : Ablation study 86

Experiments: Clustering : Verification 87

Experiments: Clustering : Time Cost 88

Experiments: Representation Learning Representation transfer Representation learning Testing generalization of our learnt (unsupervised) representation to LFW face verification. Evaluation on CIFAR-10 classification 89

Extensions: Data Visualization 90

Conclusion A new unsupervised learning method jointly with image clustering, cast the problem into a recurrent optimization problem; In the recurrent framework, clustering is conducted during forward pass, and representation learning is conducted during backward pass; A unified loss function in the forward pass and backward pass; Performance outperforms the state-of-the-art over a number of datasets; It can also learn plausible representations for image recognition. 91

Thanks! https://github.com/jwyang/joint-unsupervised-learning 92