Universität Konstanz,

Similar documents
Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Generative models and adversarial training

Python Machine Learning

Knowledge Transfer in Deep Convolutional Neural Nets

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.cv] 10 May 2017

Word Segmentation of Off-line Handwritten Documents

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

arxiv: v2 [cs.cv] 30 Mar 2017

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

CS Machine Learning

Lecture 2: Quantifiers and Approximation

Cultivating DNN Diversity for Large Scale Video Labelling

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

A study of speaker adaptation for DNN-based speech synthesis

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Human Emotion Recognition From Speech

Assignment 1: Predicting Amazon Review Ratings

Artificial Neural Networks written examination

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lip Reading in Profile

THE enormous growth of unstructured data, including

Speech Recognition at ICSI: Broadcast News and beyond

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Forget catastrophic forgetting: AI that learns after deployment

SORT: Second-Order Response Transform for Visual Recognition

CSL465/603 - Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

arxiv: v1 [cs.cl] 27 Apr 2016

A Case Study: News Classification Based on Term Frequency

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Software Maintenance

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Model Ensemble for Click Prediction in Bing Search Ads

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

WHEN THERE IS A mismatch between the acoustic

Residual Stacking of RNNs for Neural Machine Translation

Probabilistic Latent Semantic Analysis

INPE São José dos Campos

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Rule Learning With Negation: Issues Regarding Effectiveness

Evolution of Symbolisation in Chimpanzees and Neural Nets

Learning From the Past with Experiment Databases

Evolutive Neural Net Fuzzy Filtering: Basic Description

Australian Journal of Basic and Applied Sciences

Improvements to the Pruning Behavior of DNN Acoustic Models

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Speech Emotion Recognition Using Support Vector Machine

Softprop: Softmax Neural Network Backpropagation Learning

Learning Methods for Fuzzy Systems

arxiv:submit/ [cs.cv] 2 Aug 2017

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Mathematics process categories

arxiv: v1 [cs.lg] 7 Apr 2015

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Diverse Concept-Level Features for Multi-Object Classification

Offline Writer Identification Using Convolutional Neural Network Activation Features

Modeling function word errors in DNN-HMM based LVCSR systems

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Attributed Social Network Embedding

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Calibration of Confidence Measures in Speech Recognition

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Deep Facial Action Unit Recognition from Partially Labeled Data

SARDNET: A Self-Organizing Feature Map for Sequences

A Web Based Annotation Interface Based of Wheel of Emotions. Author: Philip Marsh. Project Supervisor: Irena Spasic. Project Moderator: Matthew Morgan

Lecture 1: Basic Concepts of Machine Learning

Axiom 2013 Team Description Paper

arxiv: v4 [cs.cl] 28 Mar 2016

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Modeling function word errors in DNN-HMM based LVCSR systems

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

An Introduction to Simio for Beginners

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Online Updating of Word Representations for Part-of-Speech Tagging

Rule Learning with Negation: Issues Regarding Effectiveness

Linking Task: Identifying authors and book titles in verbose queries

Test Effort Estimation Using Neural Network

Grade 6: Correlated to AGS Basic Math Skills

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Transcription:

Universität Konstanz, 11.06.2018

LeNet - LeCun, et al. developed a pioneer ConvNet for handwritten digits: - Many hidden layers - Many kernels in each layer - Pooling of the outputs of nearby replicated units - A wide net that can cope with several digits at once even if they overlap - This net was used for reading 10% of the checks in North America 2 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Architecture of LeNet-5 - The early layers were convolutional - The last two layers were fully-connected - See a impressive demo of LeNet here 3 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

LeNet-5 vs. human - LeNet misclassified 82 test patterns - Notice that most of the errors are cases that people find quite easy - The human error rate is probably 20 to 30 errors but nobody has had the patience to measure it 4 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Review - LeNet uses knowledge about the invariance to design: - the local connectivity - the weight sharing - the pooling - It achieved 82 errors - it can be reduced to about 40 errors by creating a whole lot more training data - However, it may require a lot of work and may make learning take much longer - It also proposed a benchmark database, MNIST, including 60,000 training data and 10,000 test data 5 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

From handwritten digits to objects - Recognizing real objects in color images downloaded from the Internet is much more complicated than recognizing handwritten digits: - Hundred times as many classes (1000 vs 10) - Hundred times as many pixels (256 x 256 x 3 color vs 28 x 28 gray) - Cluttered scenes requiring segmentation - Multiple objects in each image - Now the question is: will the same type of convolutional neural network work? 6 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

What is ILSVRC? - The ImageNet is an image dataset, containing 14,197,122 annotated images organized by the semantic hierarchy of WordNet - ImageNet Large Scale Visual Recognition Challenge (ILSVRC) uses a subset of ImageNet images for training the algorithms and some of ImageNet s image collection protocols for annotating additional images for testing the algorithms - ILSVRC over the years has consisted of one or more of the following tasks: - Image classification - Single-object localization - Object detection 7 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

ILSVRC task - Image classification (discuss in this week) - Each image contains one ground truth label of 1000 object categories - Get the correct class in the top 5 bets - Single-object localization - Each image contains one ground truth label of 1000 object categories. Additionally, every instance of this category is annotated with an axis-aligned bounding box - For each bet, put a box around the object. The correct localization must have at least 50% overlap with the ground truth bounding box - Object detection - The images are annotated with axis-aligned bounding boxes indicating the position and scale of every instance of each target object category - Evaluation is similar to single-object localization, but with multiple objects 8 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

ILSVRC image classification winners 9 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Architecture of AlexNet 10 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Architecture of AlexNet - The net contains eight layers with weights; the first five are convolutional and the remaining three are fully-connected - Number of neurons: 150,528 253,440 186,624 64,896 64,896 43,264 4096 4096 1000 - Difference from LeNet: - Bigger, deeper - ReLu: make training much faster and are more expressive than logistic units - Max pooling - Local response normalisation - Featured Convolutional Layers stacked on top of each other - Training one two GPUs: half of the kernels (or neurons) on each GPU, with one additional trick: the GPUs communicate only in certain layers - 90 epochs with five to six days 11 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Tricks that reduce overfitting - Data augmentation: - Train on random 224x224 patches from the 256x256 images to get more data. Also use left-right reflections of the images - At test time, combine the opinions from ten different patches: The four 224x224 corner patches plus the central 224x224 patch plus the reflections of those five patches - Dropout: - Dropout in the first two fully-connected layers 12 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

3 AlexNet Results 13 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Architecture of VGG16 14 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

VGGNet - VGG short for Visual Geometry Group from University of Oxford - 16-19 layers - Only 3x3 kernel stride 1, zero-padding 1 and 2x2 max pooling stride 2 15 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Smaller kernels, deeper network - What have we gained by using a stack of three 3 3 conv. layers instead of a single 7 7 layer? - Incorporate three non-linear rectification layers instead of a single one, which makes the decision function more discriminative - Decrease the number of parameters from (7 2 C 2 ) = 49C 2 to 3(3 2 C 2 ) = 27C 2 16 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Review - ILSVRC 14 2nd in classification, 1st in localization - Similar training procedure as AlexNet - VGG19 only slightly better than VGG16, but requires more memory - FC7 features generalize well to other tasks 17 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Motivation - The most straightforward way of improving the performance of deep neural networks is by increasing their size, both depth and width - Increasing network size has two drawbacks: - means a larger number of parameters prove to overfitting - the dramatically increased use of computational resources - Increase the depth and width of the network while keeping the computational budget constant 18 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Inception module - naive - Apply parallel operations on the input from from previous layer: - Multiple kernel size for convolution (1x1, 3x3, 5x5) - Pooling operation (3x3) - Concatenate all filter outputs together depth-wise - What is the problem with this structure? 19 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Inception module - naive - Let assume the setup of inception module is: - Input size: 28x28x256-1x1 convolutional kernels: 128 with stride 1-3x3 convolutional kernels: 192 with stride 1, zero-padding 1-5x5 convolutional kernels: 96 with stride 1, zero-padding 2-3x3 max pooling: stride 1, zero-padding 1 - The output is 28x28x(128+192+96+256) = 28x28x672 20 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Inception module - naive - - : - 1x1 conv, 128: 28x28x128x1x1x256-3x3 conv, 192: 28x28x192x3x3x256-5x5 conv, 96: 28x28x96x5x5x256 - - Very expensive to compute - Pooling layer also preserves feature depth, which means total depth always grow dramatically after concatenation - Solution: bottleneck layers that use 1x1 convolutions to reduce feature depth 21 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

1x1 convolutions - Preserve spatial dimensions, reduces depth - Projects depth to lower dimension (combination of feature maps) 22 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Inception module with dimensionality reduction 23 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Inception module with dimensionality reduction - If we adding three layers with 1x1 conv, 64 kernels, then - : - 1x1 conv, 128: 28x28x128x1x1x256-1x1 conv, 64: 28x28x64x1x1x256-3x3 conv, 192: 28x28x192x3x3x64-1x1 conv, 64: 28x28x64x1x1x256-5x5 conv, 96: 28x28x96x5x5x64-1x1 conv, 64: 28x28x64x1x1x256-24 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Architecture of GoogLeNet 25 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Part a: stem network 26 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Part b: stacked inception modules 27 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Part c: auxiliary classifiers - Features produced by the layers in the middle of the network should be very discriminative - Auxiliary classifiers connected to these intermediate layers, discrimination in the lower stages in the classifier was expected - During training, their loss gets added to the total loss of the network with a discount weight (the losses of the auxiliary classifiers were weighted by 0.3). - At inference time, these auxiliary networks are discarded 28 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Review - GoogLeNet is deeper with computational efficiency - 22 layers - Efficient Inception module - Only 5 million parameters, 12x less than AlexNet - ILSVRC 14 image classification winner (6.7% top 5 error) 29 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Motivation - Stacking more layers does not mean better performance - With the network depth increasing, accuracy gets saturated and then degrades rapidly - Such degradation is not caused by overfitting optimize 30 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Residual block - Hypothesis: the problem is an optimization problem, deeper models are harder to optimize - The deeper model should be able to perform at least as well as the shallower model - The added layers are identity mapping, and the other layers are copied from the learned shallower model - Solution: Use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping - y = F (x, W i ) + x F (x) = W 2 σ(w 1 x) 31 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Residual block (cont.) - Similar to GoogLeNet, use bottleneck layer to improve efficiency for deeper networks 32 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Architecture of ResNet 33 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Performance comparison 34 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin

Summary - LeNet: pioneer net for digit recognition - AlexNet: smaller compute, still memory heavy, lower accuracy - VGG: Highest memory, most operations - GoogLeNet: most efficient - ResNet: moderate efficiency depending on model, better accuracy - Inception-v4: hybrid of ResNet and Inception, highest accuracy 35 / 35 11.06.2018 Deep Learning Programming - Lecture 8 Dr. Hanhe Lin