Lecture 3: Neural Network Basics & Architecture Design. Xiangyu Zhang Face++ Researcher

Size: px

Start display at page:

Download "Lecture 3: Neural Network Basics & Architecture Design. Xiangyu Zhang Face++ Researcher"

Anastasia Lindsey
5 years ago
Views:

1 Lecture 3: Neural Network Basics & Architecture Design Xiangyu Zhang Face++ Researcher

2 Visual Recognition A fundamental task in computer vision Classification Object Detection Semantic Segmentation Instance Segmentation Key point Detection VQA

3 Why Recognition Difficult? Pose Occlusion Multiple Objects Inter-class Similarity

4 Any Silver Bullet? Deep Neural Networks

5 Outline Neural Network Basics Architecture Design

6 PART 1: Neural Network Basics Motivation Deep neural networks Convolutional Neural Networks (CNNs) ** Special thanks Marc'Aurelio Ranzato for the tutorial Large-Scale Visual Recognition With Deep Learning in CVPR All pictures are owned by the authors.

7 PART 1: Neural Network Basics Motivation Deep neural networks Convolutional Neural Networks (CNNs)

8 Features for Recognition

9 Nonlinear Features vs. Linear Classifiers Feature extractor should be nonlinear!

10 Learning Non-Linear Features Q: which class of non-linear functions shall we consider?

11 Shallow or Deep Shallow Deep

12 Linear Combination Kernel learning Boosting Drawbacks: Exponential number of templates required!

13 Composition Main Idea of Deep Learning

14 Concepts Reuse in Deep Learning Zeiler M D, Fergus R. Visualizing and understanding convolutional networks

15 Concepts Reuse in Deep Learning (cont d) Zeiler M D, Fergus R. Visualizing and understanding convolutional networks

16 Concepts Reuse in Deep Learning (cont d) Efficiency: intermediate concepts can be re-used

17 Deep Learning Framework A problem: Optimization is difficult: non-convex, non-linear system

18 Deep Learning Framework (cont d)

19 Deep Learning Framework (cont d)

20 Summary: Key Ideas of Deep Learning We need nonlinear system We need to learn it from data Build feature hierarchies (function composition) End-to-end learning

21 PART 1: Neural Network Basics Motivation Deep neural networks Convolutional Neural Networks (CNNs)

22 How to Build Deep Network? Neuron or Layer Design

23 Shallow Cases Linear Case: SVM

24 Shallow Cases (cont d) Linear Case: Logistic Regression Linear transformation + nonlinear activation

25 Neuron Design Single Neuron: Linear Projection + Nonlinear Activation

26 Deep Neuron Network

27 Deep Neural Network (cont d)

28 Gradient-based Training For each iteration: 1. Forward Propagation 2. Backward Propagation 3. Update Parameters (Optimization)

29 Forward Propagation (FPROP)

30 Forward Propagation (FPROP) This is the typical processing at test time. At training time, we need to compute an error measure and tune the parameters to decrease the error.

31 Loss Function

32 Loss Function Q: how to tune the parameters to decrease the loss? A: If loss is (a.e.) differentiable we can compute gradients. We can use chain-rule, a.k.a. back-propagation, to compute the gradients w.r.t. parameters at the lower layers.

33 Backward Propagation (BPROP)

34 Backward Propagation (BPROP) (cont d)

35 Backward Propagation (BPROP) (cont d)

36 Optimization Stochastic Gradient Descent (on mini-batches): Stochastic Gradient Descent with Momentum:

37 Summary: Key Ideas of Deep Neural Networks Neural Net = stack of feature detectors F-Prop / B-Prop Learning by SGD

38 PART 1: Neural Network Basics Motivation Deep neural networks Convolutional Neural Networks (CNNs)

39 Deep Neural Networks on Images How to apply a neural network on 2D or 3D inputs?

40 Fully-connected Net

41 Locally-connected Net STATIONARITY? Statistics are similar at different locations (translation invariance)

42 Convolutional Net

43 Convolutional Net (cont d)

44 Convolutional Net (cont d)

45 Convolutional Net (cont d)

46 Convolutional Layer

47 Convolutional Layer (cont d)

48 Summary: Key Ideas of Convolutional Nets A standard neural net applied to images: scales quadratically with the size of the input does not leverage stationarity Solution: connect each hidden unit to a small patch of the input share the weight across hidden units This is called: convolutional network.

49 Other Layers Over the years, some new modules have proven to be very effective when plugged into conv-nets:

50 Pooling Layer

51 Pooling Layer

52 Local Contrast Normalization Layer

53 Typical Architecture Q: Where is the nonlinearity?

54 Typical Architecture (cont d)

55 Conv Architecture Example (AlexNet) Krizhevsky et al. ImageNet Classification with deep CNNs NIPS 2012

56 Convolutional Nets: Training All layers are differentiable (a.e.). We can use standard backpropagation. Algorithm: Given a small mini-batch 1. F-PROP 2. B-PROP 3. PARAMETER UPDATE

57 Summary: Key Ideas of Conv Nets Conv. Nets have special layers like: pooling, and local contrast normalization Back-propagation can still be applied. These layers are useful to: reduce computational burden increase invariance ease the optimization

58 PART 2: Architecture Design Overview Structure design Layer design Architecture for special tasks

59 PART 2: Architecture Design Overview Structure design Layer design Architecture for special tasks

60 Architecture Design What? Network topology Layer functions Hyper-parameters Optimization algorithms Why? Difficult to determine the optimal structures Requirements of different applications, datasets or limitations

61 Architecture Design (cont d) How? Manually Automatically Objective Representation capability Robustness, anti-overfitting Computation or parameter efficiency Ease of optimization More accuracy, less complexity

62 PART 2: Architecture Design Overview Structure design Layer design Architecture for special tasks

63 Benchmark: ImageNet Dataset 1K classes (for ILSVRC competition) 1.2M+ training images, 50K validation images, 100K test images ILSVRC competition Difficulty Fine-grained classes Large variation Costly training

64 Benchmark: ImageNet Dataset 1K classes (for ILSVRC competition) 1.2M+ training images, 50K validation images, 100K test images ILSVRC competition Difficulty Fine-grained classes Large variation Costly training? Walker hound English foxhound Beagle

65 Benchmark: ImageNet Dataset 1K classes (for ILSVRC competition) 1.2M+ training images, 50K validation images, 100K test images ILSVRC competition Difficulty Fine-grained classes Large variation Costly training

66 Benchmark: ImageNet Dataset 1K classes (for ILSVRC competition) 1.2M+ training images, 50K validation images, 100K test images ILSVRC competition Difficulty Fine-grained classes Large variation Costly training

67 Recent Nets ImageNet Classification Scores 152 layers 8 layers 8 layers 19 layers 22 layers

68 AlexNet Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks

69 VGGNet Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition

70 GoogleNet Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions

71 Deep Residual Network Easy to optimize Enable very deep structures -- Over 100 layers for ImageNet model He K, Zhang X, Ren S, et al. Deep residual learning for image recognition

72 Deep Residual Network (cont d) Bottleneck design Increasing depth, less complexity He K, Zhang X, Ren S, et al. Deep residual learning for image recognition

73 Xception Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions

74 ResNeXt Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks

75 ShuffleNet Zhang X, Zhou X, Lin M, et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

76 Densely Connected Convolutional Networks Huang G, Liu Z, Weinberger K Q, et al. Densely connected convolutional networks

77 Squeeze-and-Excitation Networks Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks

78 Summary: Ideas of Structure Design Deeper and wider Ease of optimization Multi-path design Residual path Sparse connection

79 PART 2: Architecture Design Overview Structure design Layer design Architecture for special tasks

80 Spatial Pyramid Pooling He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition

81 Batch Normalization Batch normalization: Accelerating deep network training by reducing internal covariate shift

82 Parametric Rectifiers He K, Zhang X, Ren S, et al. Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification

83 Bilinear CNNs Lin T Y, RoyChowdhury A, Maji S. Bilinear cnn models for fine-grained visual recognition

84 PART 2: Architecture Design Overview Structure design Layer design Architecture for special tasks

85 Deepface Taigman Y, Yang M, Ranzato M A, et al. Deepface: Closing the gap to human-level performance in face verification

86 Global Convolutional Networks Peng C, Zhang X, Yu G, et al. Large Kernel Matters--Improve Semantic Segmentation by Global Convolutional Network

87 Hourglass Networks Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation

88 Summary: Trends on Architecture Design Effectiveness and efficiency Task & data specific ML & optimization perspective Insight & motivation driven

89 Thanks

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and