Lecture 3: Neural Network Basics & Architecture Design. Xiangyu Zhang Face++ Researcher

Similar documents
arxiv: v1 [cs.lg] 15 Jun 2015

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

SORT: Second-Order Response Transform for Visual Recognition

Python Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

(Sub)Gradient Descent

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Lecture 1: Machine Learning Basics

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Lip Reading in Profile

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Generative models and adversarial training

arxiv: v1 [cs.cv] 10 May 2017

Knowledge Transfer in Deep Convolutional Neural Nets

THE enormous growth of unstructured data, including

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

Diverse Concept-Level Features for Multi-Object Classification

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

arxiv: v4 [cs.cv] 13 Aug 2017

Model Ensemble for Click Prediction in Bing Search Ads

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v2 [cs.cv] 30 Mar 2017

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

A Deep Bag-of-Features Model for Music Auto-Tagging

CSL465/603 - Machine Learning

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v2 [cs.cl] 26 Mar 2015

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

arxiv: v1 [cs.cl] 27 Apr 2016

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Human Emotion Recognition From Speech

Assignment 1: Predicting Amazon Review Ratings

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

arxiv: v2 [cs.cv] 4 Mar 2016

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Speech Emotion Recognition Using Support Vector Machine

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Word Segmentation of Off-line Handwritten Documents

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

A Review: Speech Recognition with Deep Learning Methods

arxiv:submit/ [cs.cv] 2 Aug 2017

Offline Writer Identification Using Convolutional Neural Network Activation Features

Cultivating DNN Diversity for Large Scale Video Labelling

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

INPE São José dos Campos

Software Maintenance

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Summarizing Answers in Non-Factoid Community Question-Answering

Attributed Social Network Embedding

CS Machine Learning

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Test Effort Estimation Using Neural Network

Time series prediction

Calibration of Confidence Measures in Speech Recognition

Forget catastrophic forgetting: AI that learns after deployment

Learning From the Past with Experiment Databases

Artificial Neural Networks written examination

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Axiom 2013 Team Description Paper

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

What is a Mental Model?

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Webly Supervised Learning of Convolutional Networks

Deep Facial Action Unit Recognition from Partially Labeled Data

Residual Stacking of RNNs for Neural Machine Translation

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Second Exam: Natural Language Parsing with Neural Networks

arxiv: v4 [cs.cl] 28 Mar 2016

A study of speaker adaptation for DNN-based speech synthesis

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Knowledge-Based - Systems

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Semi-Supervised Face Detection

Georgetown University at TREC 2017 Dynamic Domain Track

arxiv: v2 [cs.lg] 8 Aug 2017

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Evolutive Neural Net Fuzzy Filtering: Basic Description

arxiv: v2 [cs.ir] 22 Aug 2016

arxiv: v1 [cs.cv] 2 Jun 2017

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Deep Neural Network Language Models

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Australian Journal of Basic and Applied Sciences

LEARNING TO PLAY IN A DAY: FASTER DEEP REIN-

Softprop: Softmax Neural Network Backpropagation Learning

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Reducing Features to Improve Bug Prediction

Transcription:

Lecture 3: Neural Network Basics & Architecture Design Xiangyu Zhang Face++ Researcher zhangxiangyu@megvii.com

Visual Recognition A fundamental task in computer vision Classification Object Detection Semantic Segmentation Instance Segmentation Key point Detection VQA

Why Recognition Difficult? Pose Occlusion Multiple Objects Inter-class Similarity

Any Silver Bullet? Deep Neural Networks

Outline Neural Network Basics Architecture Design

PART 1: Neural Network Basics Motivation Deep neural networks Convolutional Neural Networks (CNNs) ** Special thanks Marc'Aurelio Ranzato for the tutorial Large-Scale Visual Recognition With Deep Learning in CVPR 2013. All pictures are owned by the authors.

PART 1: Neural Network Basics Motivation Deep neural networks Convolutional Neural Networks (CNNs)

Features for Recognition

Nonlinear Features vs. Linear Classifiers Feature extractor should be nonlinear!

Learning Non-Linear Features Q: which class of non-linear functions shall we consider?

Shallow or Deep Shallow Deep

Linear Combination Kernel learning Boosting Drawbacks: Exponential number of templates required!

Composition Main Idea of Deep Learning

Concepts Reuse in Deep Learning Zeiler M D, Fergus R. Visualizing and understanding convolutional networks

Concepts Reuse in Deep Learning (cont d) Zeiler M D, Fergus R. Visualizing and understanding convolutional networks

Concepts Reuse in Deep Learning (cont d) Efficiency: intermediate concepts can be re-used

Deep Learning Framework A problem: Optimization is difficult: non-convex, non-linear system

Deep Learning Framework (cont d)

Deep Learning Framework (cont d)

Summary: Key Ideas of Deep Learning We need nonlinear system We need to learn it from data Build feature hierarchies (function composition) End-to-end learning

PART 1: Neural Network Basics Motivation Deep neural networks Convolutional Neural Networks (CNNs)

How to Build Deep Network? Neuron or Layer Design

Shallow Cases Linear Case: SVM

Shallow Cases (cont d) Linear Case: Logistic Regression Linear transformation + nonlinear activation

Neuron Design Single Neuron: Linear Projection + Nonlinear Activation

Deep Neuron Network

Deep Neural Network (cont d)

Gradient-based Training For each iteration: 1. Forward Propagation 2. Backward Propagation 3. Update Parameters (Optimization)

Forward Propagation (FPROP)

Forward Propagation (FPROP) This is the typical processing at test time. At training time, we need to compute an error measure and tune the parameters to decrease the error.

Loss Function

Loss Function Q: how to tune the parameters to decrease the loss? A: If loss is (a.e.) differentiable we can compute gradients. We can use chain-rule, a.k.a. back-propagation, to compute the gradients w.r.t. parameters at the lower layers.

Backward Propagation (BPROP)

Backward Propagation (BPROP) (cont d)

Backward Propagation (BPROP) (cont d)

Optimization Stochastic Gradient Descent (on mini-batches): Stochastic Gradient Descent with Momentum:

Summary: Key Ideas of Deep Neural Networks Neural Net = stack of feature detectors F-Prop / B-Prop Learning by SGD

PART 1: Neural Network Basics Motivation Deep neural networks Convolutional Neural Networks (CNNs)

Deep Neural Networks on Images How to apply a neural network on 2D or 3D inputs?

Fully-connected Net

Locally-connected Net STATIONARITY? Statistics are similar at different locations (translation invariance)

Convolutional Net

Convolutional Net (cont d)

Convolutional Net (cont d)

Convolutional Net (cont d)

Convolutional Layer

Convolutional Layer (cont d)

Summary: Key Ideas of Convolutional Nets A standard neural net applied to images: scales quadratically with the size of the input does not leverage stationarity Solution: connect each hidden unit to a small patch of the input share the weight across hidden units This is called: convolutional network.

Other Layers Over the years, some new modules have proven to be very effective when plugged into conv-nets:

Pooling Layer

Pooling Layer

Local Contrast Normalization Layer

Typical Architecture Q: Where is the nonlinearity?

Typical Architecture (cont d)

Conv Architecture Example (AlexNet) Krizhevsky et al. ImageNet Classification with deep CNNs NIPS 2012

Convolutional Nets: Training All layers are differentiable (a.e.). We can use standard backpropagation. Algorithm: Given a small mini-batch 1. F-PROP 2. B-PROP 3. PARAMETER UPDATE

Summary: Key Ideas of Conv Nets Conv. Nets have special layers like: pooling, and local contrast normalization Back-propagation can still be applied. These layers are useful to: reduce computational burden increase invariance ease the optimization

PART 2: Architecture Design Overview Structure design Layer design Architecture for special tasks

PART 2: Architecture Design Overview Structure design Layer design Architecture for special tasks

Architecture Design What? Network topology Layer functions Hyper-parameters Optimization algorithms Why? Difficult to determine the optimal structures Requirements of different applications, datasets or limitations

Architecture Design (cont d) How? Manually Automatically Objective Representation capability Robustness, anti-overfitting Computation or parameter efficiency Ease of optimization More accuracy, less complexity

PART 2: Architecture Design Overview Structure design Layer design Architecture for special tasks

Benchmark: ImageNet Dataset 1K classes (for ILSVRC competition) 1.2M+ training images, 50K validation images, 100K test images ILSVRC competition Difficulty Fine-grained classes Large variation Costly training

Benchmark: ImageNet Dataset 1K classes (for ILSVRC competition) 1.2M+ training images, 50K validation images, 100K test images ILSVRC competition Difficulty Fine-grained classes Large variation Costly training? Walker hound English foxhound Beagle

Benchmark: ImageNet Dataset 1K classes (for ILSVRC competition) 1.2M+ training images, 50K validation images, 100K test images ILSVRC competition Difficulty Fine-grained classes Large variation Costly training

Benchmark: ImageNet Dataset 1K classes (for ILSVRC competition) 1.2M+ training images, 50K validation images, 100K test images ILSVRC competition Difficulty Fine-grained classes Large variation Costly training

Recent Nets ImageNet Classification Scores 152 layers 8 layers 8 layers 19 layers 22 layers

AlexNet Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks

VGGNet Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition

GoogleNet Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions

Deep Residual Network Easy to optimize Enable very deep structures -- Over 100 layers for ImageNet model He K, Zhang X, Ren S, et al. Deep residual learning for image recognition

Deep Residual Network (cont d) Bottleneck design Increasing depth, less complexity He K, Zhang X, Ren S, et al. Deep residual learning for image recognition

Xception Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions

ResNeXt Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks

ShuffleNet Zhang X, Zhou X, Lin M, et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

Densely Connected Convolutional Networks Huang G, Liu Z, Weinberger K Q, et al. Densely connected convolutional networks

Squeeze-and-Excitation Networks Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks

Summary: Ideas of Structure Design Deeper and wider Ease of optimization Multi-path design Residual path Sparse connection

PART 2: Architecture Design Overview Structure design Layer design Architecture for special tasks

Spatial Pyramid Pooling He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition

Batch Normalization Batch normalization: Accelerating deep network training by reducing internal covariate shift

Parametric Rectifiers He K, Zhang X, Ren S, et al. Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification

Bilinear CNNs Lin T Y, RoyChowdhury A, Maji S. Bilinear cnn models for fine-grained visual recognition

PART 2: Architecture Design Overview Structure design Layer design Architecture for special tasks

Deepface Taigman Y, Yang M, Ranzato M A, et al. Deepface: Closing the gap to human-level performance in face verification

Global Convolutional Networks Peng C, Zhang X, Yu G, et al. Large Kernel Matters--Improve Semantic Segmentation by Global Convolutional Network

Hourglass Networks Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation

Summary: Trends on Architecture Design Effectiveness and efficiency Task & data specific ML & optimization perspective Insight & motivation driven

Thanks