ELEC 576: Training Convnets Lecture 5. Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.

Size: px

Start display at page:

Download "ELEC 576: Training Convnets Lecture 5. Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept."

Jasmine Wilson
5 years ago
Views:

1 ELEC 576: Training Convnets Lecture 5 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.)

2 Administrivia RCSG will be giving us a 30 minute tutorial today on how to use their commodity computing services. Please start Assignment #1 ASAP!!!

3 Latest News

4 Better Generative Models for Images of Products

5 Google Brain Residency Program

6 Training Convnets: Problems and Solutions

7 Training on CIFAR10 demo/cifar10.html

8 Data Preprocessing

9 Zero-Center & Normalize Data

10 PCA & Whitening

11 In Practice, for Images: Center Only

augmented by the four corner patches and the center patch + flipped

12 Data Augmentation During training: Random crops on the original image Horizontal reflections During testing: Average prediction of image augmented by the four corner patches and the center patch + flipped image (10 augmentations of the image Data augmentation reduces overfitting

13 Weight Initialization

14 Interesting Question: What happens when the weights are initialized to 0? (2 min)

15 Answer

16 Random Initialization W = 0.01 * np.random.randn(d, H) Works fine for small networks, but can lead to non-homogeneous distributions of activations across the layers of a network.

17 Look at Some Activation Statistics Setup: 10-layer net with 500 neurons on each layer, using tanh nonlinearities, and initializing as described in last slide.

18 Random Initialization

19 Random Initialization

20 Random Initialization Interesting Question: What will the gradients look like in the backward pass when all activations become zero?

21 Answer: The gradients in the backward pass will become zero!

22 Xavier Initialization W = np.random.randn(fan_in, fan_out) / np.sqrt(fan_in) Reasonable initialization (Mathematical derivation assumes linear activations)

23 Xavier Initialization W = np.random.randn(fan_in, fan_out) / np.sqrt(fan_in) but it breaks when using ReLU non-linearity

24 More Initialization Techniques Understanding the difficulty of training deep feedforward neural networks by Glorot and Bengio, 2010 Exact solutions to the nonlinear dynamics of learning in deep linear neural networks by Saxe et al, 2013 Random walk initialization for training very deep feedforward networks by Sussillo and Abbott, 2014 Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification by He et al., 2015 Data-dependent Initializations of Convolutional Neural Networks by Krähenbühl et al., 2015 All you need is a good init by Mishkin and Matas, 2015

25 Choosing an Activation Function that Helps the Training

26 Sigmoid

27 Tanh

28 ReLU dead in -region

29 Leaky ReLU

30 Exponential Linear Unit

31 Maxout

32 In Practice

33 Training Algorithms

34 Stochastic Gradient Descent

35 Stochastic Gradient Descent

36 Stochastic Gradient Descent for Neural Networks

37 Batch GD vs Stochastic GD

38 Mini-batch SGD

39 Momentum Update

40 Nesterov Momentum Update

41 Nesterov Momentum Update express the update in term of x_ahead, instead of x

42 Adagrad Per-parameter adaptive learning rate methods RMSprop Adam

43 Annealing the Learning Rates

44 Compare Learning Methods

45 In Practice Adam is the default choice in most cases Instead, SGD variants based on (Nesterov s) momentum are more standard than second-order methods because they are simpler and scale more easily. If you can afford to do full batch updates then try out L-BFGS (Limited-memory version of Broyden Fletcher Goldfarb Shanno (BFGS) algorithm). Don t forget to disable all sources of noise.

46 Regularization

47 DropOut

48 DropOut

49 DropOut

50 DropOut

51 DropOut

52 Normalization

53 Batch Normalization

54 Batch Normalization

55 Batch Normalization

56 Batch Normalization

57 Ensembles

58 Model Ensembles

59 Model Ensembles

60 Hyperparameter Optimization

61 Hyperparameter Optimization

62 Hyperparameter Optimization

63 Hyperparameter Optimization

64 Hyperparameter Optimization

65 Hyperparameter Optimization

66 Synaptic Pruning

67 Monitoring the Learning Process

68 Double-check that the Loss is Reasonable

69 Double-check that the Loss is Reasonable

70 Overfit Very Small Portion of the Training Data

79 Transfer Learning

80 Transfer Learning

81 Transfer Learning

82 Species of Convnets

83 Alex Net

84 VGG Net

85 GoogLenet

86 ResNet

87 MDNet: Convnet for Object Tracking

88 Convnet for Brain Tumor Segmentation (Top 4 in BRATS 2015)

89 U-Net: Convnet for Segmentation of Neuronal Structures in Electron Microscopic Stacks (Won the ISBI Cell Tracking Challenge 2015)

90 DeepBind: Convnet for Predicting the Sequence Specificities of DNA- and RNA- Binding Proteins

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering