Deep neural networks III

Similar documents
Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Knowledge Transfer in Deep Convolutional Neural Nets

arxiv: v1 [cs.lg] 15 Jun 2015

SORT: Second-Order Response Transform for Visual Recognition

(Sub)Gradient Descent

Python Machine Learning

Artificial Neural Networks written examination

Generative models and adversarial training

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

arxiv: v1 [cs.cv] 10 May 2017

THE enormous growth of unstructured data, including

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Lecture 1: Machine Learning Basics

Cultivating DNN Diversity for Large Scale Video Labelling

A Review: Speech Recognition with Deep Learning Methods

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Word Segmentation of Off-line Handwritten Documents

arxiv: v1 [cs.cl] 27 Apr 2016

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Evolutive Neural Net Fuzzy Filtering: Basic Description

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Softprop: Softmax Neural Network Backpropagation Learning

Lip Reading in Profile

A Deep Bag-of-Features Model for Music Auto-Tagging

Dropout improves Recurrent Neural Networks for Handwriting Recognition

INPE São José dos Campos

Offline Writer Identification Using Convolutional Neural Network Activation Features

WHEN THERE IS A mismatch between the acoustic

Residual Stacking of RNNs for Neural Machine Translation

Test Effort Estimation Using Neural Network

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

arxiv: v1 [cs.lg] 7 Apr 2015

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

An empirical study of learning speed in backpropagation

Human Emotion Recognition From Speech

arxiv: v2 [cs.cv] 30 Mar 2017

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Model Ensemble for Click Prediction in Bing Search Ads

A study of speaker adaptation for DNN-based speech synthesis

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

CSL465/603 - Machine Learning

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

arxiv: v2 [cs.cl] 26 Mar 2015

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Deep Neural Network Language Models

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

Diverse Concept-Level Features for Multi-Object Classification

Modeling function word errors in DNN-HMM based LVCSR systems

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Forget catastrophic forgetting: AI that learns after deployment

arxiv: v4 [cs.cl] 28 Mar 2016

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

arxiv: v2 [cs.ir] 22 Aug 2016

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Modeling function word errors in DNN-HMM based LVCSR systems

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

On the Formation of Phoneme Categories in DNN Acoustic Models

arxiv:submit/ [cs.cv] 2 Aug 2017

CS Machine Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Classification Using ANN: A Review

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

EVERYTHING DiSC WORKPLACE LEADER S GUIDE

SARDNET: A Self-Organizing Feature Map for Sequences

There are some definitions for what Word

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Custom Program Title. Leader s Guide. Understanding Other Styles. Discovering Your DiSC Style. Building More Effective Relationships

Axiom 2013 Team Description Paper

Calibration of Confidence Measures in Speech Recognition

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Summarizing Answers in Non-Factoid Community Question-Answering

Attributed Social Network Embedding

A deep architecture for non-projective dependency parsing

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Time series prediction

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Active Learning. Yingyu Liang Computer Sciences 760 Fall

arxiv: v4 [cs.cv] 13 Aug 2017

Device Independence and Extensibility in Gesture Recognition

Mathematics process categories

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

arxiv: v2 [cs.cv] 4 Mar 2016

arxiv: v2 [cs.ro] 3 Mar 2017

arxiv: v1 [cs.lg] 20 Mar 2017

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Deep Facial Action Unit Recognition from Partially Labeled Data

Transcription:

Deep neural networks III June 5 th, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Announcements PS due 6/ (Thurs), 11:59 pm Review session during Thurs lecture Post questions on piazza 2 Convolutional Neural Networks (CNN) Neural network with specialized connectivity structure Stack multiple stages of feature extractors Higher stages compute more global, more invariant, more abstract features Classification layer at the end Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86(11): 2 24, 1998. Adapted from Rob Fergus 1

Convolutional Neural Networks (CNN) Feed-forward feature extraction: 1. Convolve input with learned filters 2. Apply non-linearity. Spatial pooling (downsample) Supervised training of convolutional filters by back-propagating classification error Output (class probs) Spatial pooling Non-linearity Convolution (Learned) Input Image Adapted from Lana Lazebnik xx image height depth width xx image 5x5x filter Convolve the filter with the image i.e. slide over the image spatially, computing dot products 2

Convolution Layer xx image 5x5x filter 1 number: the result of taking a dot product between the filter and a small 5x5x chunk of the image (i.e. 5*5* = 5-dimensional dot product + bias) Convolution Layer xx image 5x5x filter activation map convolve (slide) over all spatial locations 1 Convolution Layer consider a second, green filter xx image 5x5x filter activation maps convolve (slide) over all spatial locations 1

For example, if we had 6 5x5 filters, we ll get 6 separate activation maps: activation maps Convolution Layer 6 We stack these up to get a new image of size xx6! one filter => one activation map example 5x5 filters ( total) We call the layer convolutional because it is related to convolution of two signals: Element-wise multiplication and sum of a filter and the signal (image) Adapted from, Kristen Grauman Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions CONV, ReLU e.g. 6 5x5x filters 6 4

Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions 24 CONV, ReLU e.g. 6 5x5x filters 6 CONV, ReLU e.g. 10 5x5x6 filters 10 24 CONV, ReLU. xx image 5x5x filter activation map convolve (slide) over all spatial locations 1 x input (spatially) 5

x input (spatially) x input (spatially) x input (spatially) 6

x input (spatially) => 5x5 output x input (spatially) applied with stride 2 x input (spatially) applied with stride 2

x input (spatially) applied with stride 2 => x output! x input (spatially) applied with stride? x input (spatially) applied with stride? doesn t fit! cannot apply x filter on x input with stride. 8

F N F N Output size: (N - F) / stride + 1 e.g. N =, F = : stride 1 => ( - )/1 + 1 = 5 stride 2 => ( - )/2 + 1 = stride => ( - )/ + 1 = 2. :\ preview: A Common Architecture: AlexNet Figure from http://www.mdpi.com/202 4292//11/14680/htm 9

Case Study: VGGNet [Simonyan and Zisserman, 2014] Only x CONV stride 1, pad 1 and 2x2 MAX POOL stride 2 best model 11.2% top 5 error in ILSVRC 201 ->.% top 5 error Case Study: GoogLeNet [Szegedy et al., 2014] Inception module ILSVRC 2014 winner (6.% top 5 error) Case Study: ResNet [He et al., 2015] ILSVRC 2015 winner (.6% top 5 error) Slide from Kaiming He s recent presentation https://www.youtube.com/watch?v=1pglj-ukt1w 10

Case Study: ResNet (slide from Kaiming He s recent presentation) Case Study: ResNet [He et al., 2015] ILSVRC 2015 winner (.6% top 5 error) 2- weeks of training on 8 GPU machine (slide from Kaiming He s recent presentation) Practical matters 11

Comments on training algorithm Not guaranteed to converge to zero training error, may converge to local optima or oscillate indefinitely. However, in practice, does converge to low error for many large networks on real data. Thousands of epochs (epoch = network sees all training data once) may be required, hours or days to train. To avoid local-minima problems, run several trials starting with different random weights (random restarts), and take results of trial with lowest training set error. May be hard to set learning rate and to select number of hidden units and layers. Neural networks had fallen out of fashion in 90s, early 2000s; back with a new name and significantly improved performance (deep networks trained with dropout and lots of data). Ray Mooney, Carlos Guestrin, Dhruv Batra Over-training prevention Running too many epochs can result in over-fitting. error on test data on training data 0 # training epochs Keep a hold-out validation set and test accuracy on it after every epoch. Stop training when additional epochs actually increase validation error. Adapted from Ray Mooney Training: Best practices Use mini-batch Use regularization Use cross-validation for your parameters Use RELU or leaky RELU, don t use sigmoid Center (subtract mean from) your data Learning rate: too high? too low? 12

Regularization: Dropout Randomly turn off some neurons Allows individual neurons to independently be responsible for performance Dropout: A simple way to prevent neural networks from overfitting [Srivastava JMLR 2014] Adapted from Jia-bin Huang Data Augmentation (Jittering) Create virtual training samples Horizontal flip Random crop Color casting Geometric distortion Jia-bin Huang Deep Image [Wu et al. 2015] Transfer Learning You need a lot of a data if you want to train/use CNNs 1

Transfer Learning with CNNs The more weights you need to learn, the more data you need That s why with a deeper network, you need more data for training than for a shallower network One possible solution: Set these to the already learned weights from another network Learn these on your own task Transfer Learning with CNNs Source: classification on ImageNet Target: some other task/data 1. Train on ImageNet 2. Small dataset:. Medium dataset: finetuning more data = retrain more of the network (or all of it) Freeze these Freeze these Train this Train this Lecture 11-29 Adapted from Summary We use deep neural networks because of their strong performance in practice Convolutional neural networks (CNN) Convolution, nonlinearity, max pooling Training deep neural nets We need an objective function that measures and guides us towards good performance We need a way to minimize the loss function: stochastic gradient descent We need backpropagation to propagate error through all layers and change their weights Practices for preventing overfitting Dropout; data augmentation; transfer learning 14