CSE 291: Advances in Computer Vision. Manmohan Chandraker. Lecture 2: Background

Similar documents
HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Knowledge Transfer in Deep Convolutional Neural Nets

Python Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

arxiv: v1 [cs.lg] 15 Jun 2015

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Lecture 1: Machine Learning Basics

SORT: Second-Order Response Transform for Visual Recognition

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

THE enormous growth of unstructured data, including

Artificial Neural Networks written examination

INPE São José dos Campos

arxiv: v1 [cs.cv] 10 May 2017

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

arxiv: v1 [cs.lg] 7 Apr 2015

Diverse Concept-Level Features for Multi-Object Classification

Generative models and adversarial training

A Review: Speech Recognition with Deep Learning Methods

Cultivating DNN Diversity for Large Scale Video Labelling

Evolutive Neural Net Fuzzy Filtering: Basic Description

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v1 [cs.cl] 27 Apr 2016

An empirical study of learning speed in backpropagation

Lip Reading in Profile

(Sub)Gradient Descent

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Offline Writer Identification Using Convolutional Neural Network Activation Features

A Neural Network GUI Tested on Text-To-Phoneme Mapping

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

A Deep Bag-of-Features Model for Music Auto-Tagging

Model Ensemble for Click Prediction in Bing Search Ads

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

2.B.4 Balancing Crane. The Engineering Design Process in the classroom. Summary

Speech Recognition at ICSI: Broadcast News and beyond

arxiv: v4 [cs.cl] 28 Mar 2016

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Forget catastrophic forgetting: AI that learns after deployment

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

arxiv:submit/ [cs.cv] 2 Aug 2017

Word Segmentation of Off-line Handwritten Documents

Residual Stacking of RNNs for Neural Machine Translation

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

arxiv: v2 [cs.ir] 22 Aug 2016

Deep Neural Network Language Models

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Modeling function word errors in DNN-HMM based LVCSR systems

Softprop: Softmax Neural Network Backpropagation Learning

Learning to Schedule Straight-Line Code

Attributed Social Network Embedding

WHEN THERE IS A mismatch between the acoustic

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Modeling function word errors in DNN-HMM based LVCSR systems

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Improving Fairness in Memory Scheduling

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Axiom 2013 Team Description Paper

arxiv: v4 [cs.cv] 13 Aug 2017

Human Emotion Recognition From Speech

Test Effort Estimation Using Neural Network

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

arxiv: v2 [cs.cv] 3 Aug 2017

The Evolution of Random Phenomena

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

A study of speaker adaptation for DNN-based speech synthesis

Assignment 1: Predicting Amazon Review Ratings

A Study of the Effectiveness of Using PER-Based Reforms in a Summer Setting

Deep Facial Action Unit Recognition from Partially Labeled Data

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Calibration of Confidence Measures in Speech Recognition

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Improvements to the Pruning Behavior of DNN Acoustic Models

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Reinforcement Learning by Comparing Immediate Reward

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Pentomino Problem. Use the 3 pentominos that are provided to make as many different shapes with 12 sides or less. Use the following 3 shapes:

A Comparison of Annealing Techniques for Academic Course Scheduling

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

arxiv: v2 [cs.cv] 30 Mar 2017

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Software Maintenance

Transcription:

CSE 291: Advances in Computer Vision Manmohan Chandraker Lecture 2: Background

Recap

Features have been key SIFT [Lowe IJCV 04] HOG [Dalal and Triggs CVPR 05] SPM [Lazebnik et al. CVPR 06] Textons and many others: SURF, MSER, LBP, GLOH,..

Learning a Hierarchy of Feature Extractors Hierarchical and expressive feature representations Trained end-to-end, rather than hand-crafted for each task Remarkable in transferring knowledge across tasks

Significant recent impact on the field Big labeled datasets Deep learning GPU technology

Neuron Inputs are feature values Each feature has a weight Sum is the activation If the activation is: Positive, output +1 Negative, output -1 Slide credit: Pieter Abeel and Dan Klein

Two-layer neural network Slide credit: Pieter Abeel and Dan Klein

Activation functions

From fully connected to convolutional networks image Fully connected layer Slide: Lazebnik

From fully connected to convolutional networks feature map learned weights image Convolutional layer Slide: Lazebnik

From fully connected to convolutional networks feature map learned weights image Convolutional layer Slide: Lazebnik

From fully connected to convolutional networks image Convolutional layer next layer Slide: Lazebnik

Learnable filters

Number of weights

Filters over the whole image

Weight sharing Insight: Images have similar features at various spatial locations!

Pooling operations Aggregate multiple values into a single value Invariance to small transformations Keep only most important information for next layer Reduces the size of the next layer Fewer parameters, faster computations Observe larger receptive field in next layer Hierarchically extract more abstract features

Convolutional Neural Networks

Architectural details of AlexNet Similar framework to LeCun 1998 but: Bigger model (7 hidden layers, 650k units, 60M parameters) More data (10 6 images instead of 10 3 images) GPU implementation (50 times speedup over CPU)

VGGNet architecture Much more accurate AlexNet : 18.2% top-5 error VGGNet: 6.8% top-5 error More than twice as many layers Filters are much smaller Harder and slower to train

Deep residual learning Plain Net Simple design Use only 3x3 conv (like VGG) No hidden FC

Key ideas for CNN architectures Convolutional layers Same local functions evaluated everywhere Much fewer parameters Pooling Larger receptive field ReLU Maintain a gradient signal over large portion of domain Limit parameters Sequence of 3x3 filters instead of large filters 1x1 convolutions to reduce feature dimensions Skip network Easier optimization with greater depth

Optimization in CNNs

A 3-layer network for digit recognition MNIST dataset

Cost function The network tries to approximate the function y(x) and its output is a We use a quadratic cost function, or MSE, or L2-loss.

Gradient descent

Stochastic gradient descent Update rules for each parameter: Cost function is a sum over all the training samples: Gradient from entire training set: Usually, n is very large.

Stochastic gradient descent Gradient from entire training set: For large training data, gradient computation takes a long time Leads to slow learning Instead, consider a mini-batch with m samples If sample size is large enough, properties approximate the dataset

Stochastic gradient descent

Stochastic gradient descent

Stochastic gradient descent

Stochastic gradient descent Build up velocity as a running mean of gradients.

Backpropagation This is all you need to know to get the gradients in a neural network! Backpropagation: application of chain rule in certain order, taking advantage of forward propagation to efficiently compute gradients.

Backpropagation example [Slides credit: Fei-Fei Li]

Backpropagation example

Backpropagation example

Backpropagation example

Backpropagation example

Backpropagation example

Backpropagation example

Backpropagation example

Backpropagation example

Backpropagation example

Backpropagation example

Backpropagation example

Backpropagation example Add gate: gradient distributor Mul gate: gradient switcher

Convolutional layer is differentiable

Max Pooling

Loss Functions and Regularizations

Slow learning with sigmoid neurons

Slow learning with sigmoid neurons When the neuron s output is close to 1, learning becomes slow.

Cross-Entropy Loss

Cross-Entropy Loss Rate of learning depends on error in prediction! Prevents the learning slowdown from derivative of sigmoid.

Better activation functions Computes f(x) = max (0, x) Does not saturate (in positive region) Computationally efficient Converges faster than sigmoid Same advantages as ReLU Stays alive when x < 0

Over-fitting

More data prevents over-fitting But not always feasible to have more data that is relevant.

Regularization reduces over-fitting

L2 regularization L2 regularization: Partial derivatives: Update rule:

L1 regularization L1 regularization: Partial derivatives: C 0 is the cross-entropy term. Update rule:

L2 or L1 regularization

Regularization reduces over-fitting

Dropout as a regularization Modify the network itself Randomly delete half the hidden neurons in the network Repeat several times to learn weights and biases At runtime, twice as many neurons, so halve the weights outgoing from a neuron

Dropout as a regularization Modify the network itself Randomly delete half the hidden neurons in the network Repeat several times to learn weights and biases At runtime, twice as many neurons, so halve the weights outgoing from a neuron Averaging or voting scheme to decide output Same training data, but random initializations Each network over-fits in a different way Average output not sensitive to particular mode

Dropout as a regularization A useful insight from AlexNet paper Reduces complex co-adaptations of neurons, since a neuron cannot rely on presence of others Each neuron forced to learn independent features in conjunction with random other neurons Dropout ensures the model can make robust predictions.

Data augmentation as regularization

Data augmentation as regularization Horizontal flips

Data augmentation as regularization Random crops and scales

Data augmentation as regularization Color jitter

Data augmentation as regularization Color jitter Can do a lot more: rotation, shear, non-rigid, motion blur, lens distortions,.

Transfer Learning in CNNs

Transfer Learning Improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned. Weight initialization for CNN Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks [Oquab et al. CVPR 2014] Slide: Jiabin Huang

Transfer Learning

CNNs are good at transfer learning

Fine-tune h T using h S as initialization

Initializng ht with hs

Initializng ht with hs

Initializng ht with hs

Initializng ht with hs

Strategy for fine-tuning Amount of data needed

Use hs as a feature extractor for ht

Transfer learning is a common choice

Training a Good CNN

Verifying that CNN is Trained Well [M. Ranzato]

Verifying that CNN is Trained Well [M. Ranzato]