ELEC 576: Training Convnets Lecture 5. Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

arxiv: v1 [cs.lg] 15 Jun 2015

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Python Machine Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

arxiv: v1 [cs.cl] 27 Apr 2016

Model Ensemble for Click Prediction in Bing Search Ads

Artificial Neural Networks written examination

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Generative models and adversarial training

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A Deep Bag-of-Features Model for Music Auto-Tagging

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Forget catastrophic forgetting: AI that learns after deployment

Softprop: Softmax Neural Network Backpropagation Learning

arxiv: v1 [cs.lg] 7 Apr 2015

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v1 [cs.cv] 10 May 2017

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Attributed Social Network Embedding

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Cultivating DNN Diversity for Large Scale Video Labelling

Knowledge Transfer in Deep Convolutional Neural Nets

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

arxiv: v2 [cs.cv] 30 Mar 2017

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

INPE São José dos Campos

CSL465/603 - Machine Learning

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Masterarbeit. Im Studiengang Informatik. Predicting protein contacts by combining information from sequence and physicochemistry

SORT: Second-Order Response Transform for Visual Recognition

WHEN THERE IS A mismatch between the acoustic

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Evolutive Neural Net Fuzzy Filtering: Basic Description

How People Learn Physics

A Review: Speech Recognition with Deep Learning Methods

Lip Reading in Profile

Second Exam: Natural Language Parsing with Neural Networks

An empirical study of learning speed in backpropagation

Offline Writer Identification Using Convolutional Neural Network Activation Features

arxiv: v2 [cs.cv] 4 Mar 2016

arxiv: v1 [cs.cl] 20 Jul 2015

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

CS Machine Learning

arxiv: v2 [cs.ir] 22 Aug 2016

Evolution of Symbolisation in Chimpanzees and Neural Nets

Calibration of Confidence Measures in Speech Recognition

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

arxiv: v1 [cs.dc] 19 May 2017

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Axiom 2013 Team Description Paper

Data Fusion Through Statistical Matching

arxiv: v2 [cs.cl] 26 Mar 2015

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Breaking the Habit of Being Yourself Workshop for Quantum University

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Deep Neural Network Language Models

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Learning From the Past with Experiment Databases

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

THE enormous growth of unstructured data, including

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

LEARNING TO PLAY IN A DAY: FASTER DEEP REIN-

An Introduction to Simulation Optimization

Georgetown University at TREC 2017 Dynamic Domain Track

Improvements to the Pruning Behavior of DNN Acoustic Models

Using focal point learning to improve human machine tacit coordination

Communication and Cybernetics 17

SARDNET: A Self-Organizing Feature Map for Sequences

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

An Introduction to Simio for Beginners

Classification Using ANN: A Review

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A study of speaker adaptation for DNN-based speech synthesis

Learning to Schedule Straight-Line Code

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Residual Stacking of RNNs for Neural Machine Translation

A Reinforcement Learning Variant for Control Scheduling

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Speech Recognition at ICSI: Broadcast News and beyond

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

Getting Started with TI-Nspire High School Science

Transcription:

ELEC 576: Training Convnets Lecture 5 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.) 10-04-2016

Administrivia RCSG will be giving us a 30 minute tutorial today on how to use their commodity computing services. Please start Assignment #1 ASAP!!!

Latest News

Better Generative Models for Images of Products https://people.eecs.berkeley.edu/~junyanz/projects/gvm/

Google Brain Residency Program

Training Convnets: Problems and Solutions

Training on CIFAR10 http://cs.stanford.edu/people/karpathy/convnetjs/ demo/cifar10.html

Data Preprocessing

Zero-Center & Normalize Data

PCA & Whitening

In Practice, for Images: Center Only

Data Augmentation During training: Random crops on the original image Horizontal reflections During testing: Average prediction of image augmented by the four corner patches and the center patch + flipped image (10 augmentations of the image Data augmentation reduces overfitting

Weight Initialization

Interesting Question: What happens when the weights are initialized to 0? (2 min)

Answer

Random Initialization W = 0.01 * np.random.randn(d, H) Works fine for small networks, but can lead to non-homogeneous distributions of activations across the layers of a network.

Look at Some Activation Statistics Setup: 10-layer net with 500 neurons on each layer, using tanh nonlinearities, and initializing as described in last slide.

Random Initialization

Random Initialization

Random Initialization Interesting Question: What will the gradients look like in the backward pass when all activations become zero?

Answer: The gradients in the backward pass will become zero!

Xavier Initialization W = np.random.randn(fan_in, fan_out) / np.sqrt(fan_in) Reasonable initialization (Mathematical derivation assumes linear activations)

Xavier Initialization W = np.random.randn(fan_in, fan_out) / np.sqrt(fan_in) but it breaks when using ReLU non-linearity

More Initialization Techniques Understanding the difficulty of training deep feedforward neural networks by Glorot and Bengio, 2010 Exact solutions to the nonlinear dynamics of learning in deep linear neural networks by Saxe et al, 2013 Random walk initialization for training very deep feedforward networks by Sussillo and Abbott, 2014 Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification by He et al., 2015 Data-dependent Initializations of Convolutional Neural Networks by Krähenbühl et al., 2015 All you need is a good init by Mishkin and Matas, 2015

Choosing an Activation Function that Helps the Training

Sigmoid

Tanh

ReLU dead in -region

Leaky ReLU

Exponential Linear Unit

Maxout

In Practice

Training Algorithms

Stochastic Gradient Descent

Stochastic Gradient Descent

Stochastic Gradient Descent for Neural Networks

Batch GD vs Stochastic GD

Mini-batch SGD

Momentum Update

Nesterov Momentum Update

Nesterov Momentum Update express the update in term of x_ahead, instead of x

Adagrad Per-parameter adaptive learning rate methods RMSprop Adam

Annealing the Learning Rates

Compare Learning Methods http://cs231n.github.io/neural-networks-3/#sgd

In Practice Adam is the default choice in most cases Instead, SGD variants based on (Nesterov s) momentum are more standard than second-order methods because they are simpler and scale more easily. If you can afford to do full batch updates then try out L-BFGS (Limited-memory version of Broyden Fletcher Goldfarb Shanno (BFGS) algorithm). Don t forget to disable all sources of noise.

Regularization

DropOut

DropOut

DropOut

DropOut

DropOut

Normalization

Batch Normalization

Batch Normalization

Batch Normalization

Batch Normalization

Ensembles

Model Ensembles

Model Ensembles

Hyperparameter Optimization

Hyperparameter Optimization

Hyperparameter Optimization

Hyperparameter Optimization

Hyperparameter Optimization

Hyperparameter Optimization

Synaptic Pruning

Monitoring the Learning Process

Double-check that the Loss is Reasonable

Double-check that the Loss is Reasonable

Overfit Very Small Portion of the Training Data

Transfer Learning

Transfer Learning

Transfer Learning

Species of Convnets

Alex Net

VGG Net

GoogLenet

ResNet

MDNet: Convnet for Object Tracking

Convnet for Brain Tumor Segmentation (Top 4 in BRATS 2015)

U-Net: Convnet for Segmentation of Neuronal Structures in Electron Microscopic Stacks (Won the ISBI Cell Tracking Challenge 2015)

DeepBind: Convnet for Predicting the Sequence Specificities of DNA- and RNA- Binding Proteins