Lecture 6 Deep Learning and Computer Vision.

Lecture 6 Deep Learning and Computer Vision peimt@bit.edu.cn 1

Deep Learning slides of Xin Liu of vipl.ict.ac.cn http://neuralnetworksanddeeplearning.com/ 2

Deep Learning

Traditional Computer Vision

Human Brain v Humans have a primary visual cortex, also known as V1, containing 140 million neurons, with tens of billions of connections between them. v Human vision involves not just V1, but an entire series of visual cortices - V2, V3, V4, and V5 - doing progressively more complex image processing.

Human Brain

Human Brain v The difficulty of visual pattern recognition becomes apparent if you attempt to write a computer program to recognize digits like those above. v When you try to make such rules precise, you quickly get lost in a morass of exceptions and caveats and special cases. It seems hopeless.

Neural Networks v Neural networks approach the problem in a different way. v Take a large number of handwritten digits, known as training examples v Develop a system which can learn from those training examples v Uses the examples to automatically infer rules for recognizing handwritten digits

Perceptrons v A perceptron takes several inputs, and produces a single binary output:

Perceptrons v Perceptron can weigh up different kinds of evidence in order to make decisions v A complex network of perceptrons could make quite subtle decisions

Perceptrons

Neural Networks

Neural Networks v If it were true that a small change in a weight (or bias) causes only a small change in output, then we could use this fact to modify the weights and biases to get our network to behave more in the manner we want. v Changing the weights and biases over and over to produce better and better output.

Sigmoid neuron

Neural Networks v By using the activation function we get a smoothed out perceptron. v The smoothness means that small changes in the weights and in the bias will produce a small change in the output from the neuron

Multilayer Perceptrons

Quadratic cost function

Quadratic cost function v For some function

Quadratic cost function

Stochastic gradient descent v Estimate the gradient by computing for a small sample of randomly chosen training inputs (mini batch). v By averaging over this small sample it turns out that we can quickly get a good estimate of the true gradient and this helps speed up gradient descent, and thus learning.

Why CNN v For input as a 10 * 10 image: - A 3 layer MLP with 200 hidden units and 10 output units contains ~22k parameters v For input as a 100 * 100 image: - A 3 layer MLP with 20k hidden units and 10 output units contains ~200m parameters

Why CNN v MLP can be improved in two ways: - Locally connected instead of fully connected - Sharing weights between neurons v We achieve those by using convolution neurons

Local receptive fields

Local receptive fields stride length is 1

Shared weights and biases v Each hidden neuron has a bias and 5 5 weights connected to its local receptive field. v Use the same weights and bias for each of the 24 24 hidden neurons

Shared weights and biases

Pooling layers

Fully-connected layer

Deep Learning The traditional method:hand-craft feature+classifier The modern method:unsupervised mid-level feature learning Deep learning:end to end hierarchal feature learning

Deep Learning

Understand the Human Brain

Neural Network: concatenation of functions

Activation Functions

Loss Functions v Euclidean Loss v Cross-entropy loss v Contrastive Loss v Triplet Loss v Moon Loss

Why does CNN work v Faster heterogeneous parallel computing CPU clusters, GPUs, etc. v Large dataset ImageNet: 1.2m images of 1,000 object classes CoCo: 300k images of 2m object instances v Improvements in model architecture ReLU, dropout, inception, etc.

Case Study: LeNet-5

Case Study: ResNet

Other Deep models-siamese Net

Other Deep models-c3d

Other Deep models-rnn

Other Deep models-lstm

Deep Learning in Face Recognition 60

DeepID Sun Y, et al CVPR 2014

DeepID2 Sun Y, et al NIPS 2014

DeepID2+ Sun Y, et al CVPR 2015

DeepID3 Sun Y, et al arxiv 2015

DeepFace Yaniv Taigman, et al CVPR 2014

FaceNet Florian Schroff, et al CVPR 2015

Deep Learning in Face Recognition Slide from Xin Liu VIPL

Deep Learning in Object Detection 69

R-CNN Girshick, CVPR 2014

SPP-Net K He, et al, ECCV 2014

Fast R-CNN Girshick, ICCV 2015

Faster R-CNN Girshick, NIPS 2015

YOLO: You Only Look Once Redmon J, et al, arxiv 2015

SSD: Single Shot MultiBox Detector Wei Liu, et al, ECCV 2016

Deep Learning in Object Detection Slide from Xin Liu VIPL

Deep Learning in Image Classification Slide from Xin Liu VIPL

Deep Learning in Face Retrieval 78

Deep CNN based Binary Hash Video Representations Zhen Dong, et al, AAAI 2016

Deep Learning in Object Tracking 80

DeepTrack Hanxi Li, et al, TIP 2016

PEIMT@BIT.EDU.CN 82