Deep Learning. Mohammad Ali Keyvanrad Lecture 5:A Review of Artificial Neural Networks (4)

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Lecture 1: Machine Learning Basics

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Python Machine Learning

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Knowledge Transfer in Deep Convolutional Neural Nets

arxiv: v1 [cs.lg] 15 Jun 2015

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

(Sub)Gradient Descent

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

Artificial Neural Networks written examination

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

The Good Judgment Project: A large scale test of different methods of combining expert predictions

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Learning From the Past with Experiment Databases

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Generative models and adversarial training

Evolutive Neural Net Fuzzy Filtering: Basic Description

SARDNET: A Self-Organizing Feature Map for Sequences

Diagnostic Test. Middle School Mathematics

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Introduction to Simulation

Model Ensemble for Click Prediction in Bing Search Ads

Softprop: Softmax Neural Network Backpropagation Learning

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Assignment 1: Predicting Amazon Review Ratings

Major Milestones, Team Activities, and Individual Deliverables

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Cultivating DNN Diversity for Large Scale Video Labelling

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Residual Stacking of RNNs for Neural Machine Translation

arxiv: v1 [cs.lg] 7 Apr 2015

Axiom 2013 Team Description Paper

Software Maintenance

arxiv: v1 [cs.cv] 10 May 2017

International Business BADM 455, Section 2 Spring 2008

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Comparison of Annealing Techniques for Academic Course Scheduling

Grade 6: Correlated to AGS Basic Math Skills

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

THE enormous growth of unstructured data, including

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Simple Random Sample (SRS) & Voluntary Response Sample: Examples: A Voluntary Response Sample: Examples: Systematic Sample Best Used When

arxiv: v2 [cs.cv] 30 Mar 2017

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Discriminative Learning of Beam-Search Heuristics for Planning

A Deep Bag-of-Features Model for Music Auto-Tagging

Georgetown University at TREC 2017 Dynamic Domain Track

Test Effort Estimation Using Neural Network

A Case Study: News Classification Based on Term Frequency

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

The Strong Minimalist Thesis and Bounded Optimality

Time series prediction

INPE São José dos Campos

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Active Learning. Yingyu Liang Computer Sciences 760 Fall

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

Lip Reading in Profile

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Reinforcement Learning by Comparing Immediate Reward

Learning Methods for Fuzzy Systems

Human Emotion Recognition From Speech

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

An OO Framework for building Intelligence and Learning properties in Software Agents

Deep Neural Network Language Models

Getting Started with TI-Nspire High School Science

Using focal point learning to improve human machine tacit coordination

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Attributed Social Network Embedding

WHEN THERE IS A mismatch between the acoustic

arxiv: v1 [cs.cl] 20 Jul 2015

Mathematics Program Assessment Plan

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

arxiv:submit/ [cs.cv] 2 Aug 2017

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Seminar - Organic Computing

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

A study of speaker adaptation for DNN-based speech synthesis

Towards a Robuster Interpretive Parsing

arxiv: v1 [cs.cl] 27 Apr 2016

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Course Name: Elementary Calculus Course Number: Math 2103 Semester: Fall Phone:

An Introduction to Simio for Beginners

Math 181, Calculus I

Introduce yourself. Change the name out and put your information here.

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

STAT 220 Midterm Exam, Friday, Feb. 24

Transcription:

Deep Learning Mohammad Ali Keyvanrad Lecture 5:A Review of Artificial Neural Networks (4)

OUTLINE Model Ensembles Regularization Dropout Regularization: A common pattern 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 2

OUTLINE Model Ensembles Regularization Dropout Regularization: A common pattern 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 3

Model Ensembles One reliable approach to improving the performance of Neural Networks Train multiple independent models At test time average their predictions Disadvantage Take longer to evaluate on test example 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 4

Model Ensembles 1. Same model, different initializations Use cross-validation to determine the best hyperparameters train multiple models with different random initialization Danger: variety is only due to initialization. 2. Top models discovered during cross-validation. Use cross-validation to determine the best hyperparameters pick the top few (e.g. 10) models to form the ensemble Danger: including suboptimal models 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 5

Model Ensembles 3. Different checkpoints of a single model taking different checkpoints of a single network over time when training is very expensive Danger: lack of variety 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 6

Model Ensembles 4. Running average of parameters during training Averaging the state of the network over last several iterations Maintain a second copy of the network s weights with exponentially decaying sum of previous weights Smoothed version of the weights over last few steps almost always achieves better validation error Why? Network is jumping around the mode Higher chance of being nearer the mode 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 7

OUTLINE Model Ensembles Regularization Dropout Regularization: A common pattern 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 8

Regularization Definition A process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting. Usage Learn simpler models Induce models to be sparse Introduce group structure into the learning problem 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 9

Regularization A regularization term (or regularizer) R(f) is added to a loss function V : loss function f(x) : predicted value λ : A parameter which controls the importance of the regularization term Regularization introduces a penalty for exploring certain regions of the function space used to build the model, which can improve generalization. 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 10

Controlling the capacity of Neural Networks to prevent overfitting 1. L2 regularization (Tikhonov regularization or Weight decay) The most common form of regularization 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 11

Controlling the capacity of Neural Networks to prevent overfitting 2. L1 regularization Relatively common form of regularization Leads the weight vectors to become sparse Very close to exactly zero Using only a sparse subset of their most important inputs 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 12

Controlling the capacity of Neural Networks to prevent overfitting 3. Elastic net regularization L1 + L2 4. Max norm constraints Enforce an absolute upper bound on the magnitude of the weight vector for every neuron Clamping the weight vector w of every neuron to satisfy w 2 < c Network cannot explode even when the learning rates are set too high 5. Dropout 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 13

OUTLINE Model Ensembles Regularization Dropout Regularization: A common pattern 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 14

Dropout Dropout can be considered as a bagging technique Averages over a large amount of models with tied parameters. Dropout can generate smoother objective surface A pretrain technique we may pretrain a DNN using dropout to quickly find a relatively good initial point Then fine-tune the DNN without using dropout 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 15

Dropout Deep neural nets with a large number of parameters are very powerful machine learning systems Overfitting is a serious problem in Deep networks Large networks model ensembles are slow to use Difficult to deal with overffitting by combining many different large neural nets Dropout is a technique for addressing this problem. 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 16

Dropout The term dropout refers to dropping out units Randomly set some neurons to zero Probability of retaining is a hyperparameter p = 0.5 is common [Srivastava et al, 2014] 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 17

Dropout How can this possibly be a good idea? Forces the network to have a redundant representation Prevents co-adaptation of features 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 18

Dropout How can this possibly be a good idea? A neural net with n units, can be seen as a collection of 2 n possible thinned neural networks A large ensemble of models These networks all share weights Each binary mask is one model An FC layer with 4096 units 2 4096 ~ 10 1233 possible masks 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 19

Dropout In the simplest case, each unit is retained with a fixed probability p independent of other units. p can be chosen using a validation set or can simply be set at 0.5. For the input units, however, the optimal probability of retention is usually closer to 1 than to 0.5. 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 20

Dropout At test time It is not feasible to explicitly average the predictions from exponentially many thinned models Want to average out the randomness at test-time But this integral seems hard 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 21

Dropout Want to approximate the integral Consider a single neuron 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 22

Dropout Idea Use a single neural net at test time without dropout Multiply each weight by dropout probability 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 23

Dropout (MNIST) 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 24

Dropout (TIMIT) 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 25

OUTLINE Model Ensembles Regularization Dropout Regularization: A common pattern 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 26

Regularization: A common pattern Training: stochastic behavior in the forward pass Add some kind of randomness Testing: the noise is marginalized Average out randomness Analytically: as is the case with dropout when multiplying by p Numerically: e.g. via sampling, by performing several forward passes with different random decisions and then averaging them 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 27

Regularization: A common pattern Example: Batch Normalization Training (kind of randomness) Normalize using stats from random minibatches Testing (Average out randomness) Use fixed stats to normalize 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 28

Regularization: A common pattern Example: Data Augmentation Training (kind of randomness) Transform image (Horizontal Flips, Random crops, ) Testing (Average out randomness) Sample random Transform 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 29

Regularization: A common pattern ResNet Training : sample random crops / scales Pick random L in range [256, 480] Resize training image, short side = L Sample random 224 x 224 patch Testing : average a fixed set of crops Resize image at 5 scales: {224, 256, 384, 480, 640} For each size, use 10 224 x 224 crops: 4 corners + center, + flips 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 30

Regularization: A common pattern Get creative for your problem! Random mix/combinations of Translation contrast and brightness rotation stretching shearing lens distortions, 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 31

Regularization: A common pattern Other Examples [Wan et al, Regularization of Neural Networks using DropConnect, ICML 2013] Huang et al, Deep Networks with Stochastic Depth, ECCV 2016 [Graham, Fractional Max Pooling, arxiv 2014] 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 32

References Stanford Convolutional Neural Networks for Visual Recognition course (Neural Nets notes 2) Stanford Convolutional Neural Networks for Visual Recognition course (Neural Nets notes 3) Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of machine learning research 15.1 (2014). https://en.wikipedia.org/wiki/overfitting https://en.wikipedia.org/wiki/regularization_(mathema tics) 10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 33

10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 34

10/15/2017 M.A Keyvanrad Deep Learning (Lecture5-A Review of Artificial Neural Networks (4)) 35