Lecture 6 Deep Learning and Computer Vision peimt@bit.edu.cn 1
Deep Learning slides of Xin Liu of vipl.ict.ac.cn http://neuralnetworksanddeeplearning.com/ 2
Deep Learning
Deep Learning
Deep Learning
Deep Learning
Deep Learning
Deep Learning
Traditional Computer Vision
Human Brain v Humans have a primary visual cortex, also known as V1, containing 140 million neurons, with tens of billions of connections between them. v Human vision involves not just V1, but an entire series of visual cortices - V2, V3, V4, and V5 - doing progressively more complex image processing.
Human Brain
Human Brain v The difficulty of visual pattern recognition becomes apparent if you attempt to write a computer program to recognize digits like those above. v When you try to make such rules precise, you quickly get lost in a morass of exceptions and caveats and special cases. It seems hopeless.
Neural Networks v Neural networks approach the problem in a different way. v Take a large number of handwritten digits, known as training examples v Develop a system which can learn from those training examples v Uses the examples to automatically infer rules for recognizing handwritten digits
Perceptrons v A perceptron takes several inputs, and produces a single binary output:
Perceptrons v Perceptron can weigh up different kinds of evidence in order to make decisions v A complex network of perceptrons could make quite subtle decisions
Perceptrons
Perceptrons
Neural Networks
Neural Networks v If it were true that a small change in a weight (or bias) causes only a small change in output, then we could use this fact to modify the weights and biases to get our network to behave more in the manner we want. v Changing the weights and biases over and over to produce better and better output.
Sigmoid neuron
Sigmoid neuron
Neural Networks v By using the activation function we get a smoothed out perceptron. v The smoothness means that small changes in the weights and in the bias will produce a small change in the output from the neuron
Multilayer Perceptrons
Multilayer Perceptrons
Multilayer Perceptrons
Multilayer Perceptrons
Quadratic cost function
Quadratic cost function v For some function
Quadratic cost function
Quadratic cost function
Stochastic gradient descent v Estimate the gradient by computing for a small sample of randomly chosen training inputs (mini batch). v By averaging over this small sample it turns out that we can quickly get a good estimate of the true gradient and this helps speed up gradient descent, and thus learning.
Why CNN v For input as a 10 * 10 image: - A 3 layer MLP with 200 hidden units and 10 output units contains ~22k parameters v For input as a 100 * 100 image: - A 3 layer MLP with 20k hidden units and 10 output units contains ~200m parameters
Why CNN v MLP can be improved in two ways: - Locally connected instead of fully connected - Sharing weights between neurons v We achieve those by using convolution neurons
Local receptive fields
Local receptive fields stride length is 1
Shared weights and biases v Each hidden neuron has a bias and 5 5 weights connected to its local receptive field. v Use the same weights and bias for each of the 24 24 hidden neurons
Shared weights and biases
Pooling layers
Pooling layers
Pooling layers
Fully-connected layer
Deep Learning The traditional method:hand-craft feature+classifier The modern method:unsupervised mid-level feature learning Deep learning:end to end hierarchal feature learning
Deep Learning
Understand the Human Brain
Understand the Human Brain
Understand the Human Brain
Understand the Human Brain
Neural Network: concatenation of functions
Neural Network: concatenation of functions
Activation Functions
Loss Functions v Euclidean Loss v Cross-entropy loss v Contrastive Loss v Triplet Loss v Moon Loss
Why does CNN work v Faster heterogeneous parallel computing CPU clusters, GPUs, etc. v Large dataset ImageNet: 1.2m images of 1,000 object classes CoCo: 300k images of 2m object instances v Improvements in model architecture ReLU, dropout, inception, etc.
Case Study: LeNet-5
Case Study: ResNet
Case Study: ResNet
Other Deep models-siamese Net
Other Deep models-c3d
Other Deep models-rnn
Other Deep models-lstm
Deep Learning in Face Recognition 60
DeepID Sun Y, et al CVPR 2014
DeepID Sun Y, et al CVPR 2014
DeepID2 Sun Y, et al NIPS 2014
DeepID2+ Sun Y, et al CVPR 2015
DeepID3 Sun Y, et al arxiv 2015
DeepFace Yaniv Taigman, et al CVPR 2014
FaceNet Florian Schroff, et al CVPR 2015
Deep Learning in Face Recognition Slide from Xin Liu VIPL
Deep Learning in Object Detection 69
R-CNN Girshick, CVPR 2014
SPP-Net K He, et al, ECCV 2014
Fast R-CNN Girshick, ICCV 2015
Faster R-CNN Girshick, NIPS 2015
YOLO: You Only Look Once Redmon J, et al, arxiv 2015
SSD: Single Shot MultiBox Detector Wei Liu, et al, ECCV 2016
Deep Learning in Object Detection Slide from Xin Liu VIPL
Deep Learning in Image Classification Slide from Xin Liu VIPL
Deep Learning in Face Retrieval 78
Deep CNN based Binary Hash Video Representations Zhen Dong, et al, AAAI 2016
Deep Learning in Object Tracking 80
DeepTrack Hanxi Li, et al, TIP 2016
PEIMT@BIT.EDU.CN 82