Deep Learning Theory and Applications

Size: px

Start display at page:

Download "Deep Learning Theory and Applications"

Erick Powell
5 years ago
Views:

1 Deep Learning Theory and Applications Kevin Moon Guy Wolf CPSC/AMTH 663

2 Outline 1. Course logistics 2. What is Deep Learning? 3. Deep learning examples CNNs Word embeddings RNNs Autoencoders Ultra deep learning (ResNet) Generative models (e.g. GANs) Deep reinforcement learning Boltzman machines

3 Course Logistics Textbooks (available online) Neural Networks and Deep Learning by Michael Nielsen Deep Learning by Goodfellow, Bengio, and Courville Required background Basic probability Basic linear algebra & calculus Programming experience Python and Tensorflow will be used in this course Look at the textbooks and HW 1 for an idea Course Website: cpsc663.guywolf.org Course info, lecture slides, & HW Canvas Announcements & HW

4 Course Logistics Office hours: TBD 5-6 HW assignments Assigned about every 2 weeks, due on Thursdays All/most will include some programming (Python & Tensorflow) Final project (details forthcoming) In groups of 3-4

5 Goals of the Course A solid understanding of supervised feedforward neural networks Stochastic gradient descent, backpropagation, etc. Cost functions, regularizers, etc. The ability to design and train novel architectures An understanding of optimization strategies in training deep architectures Understanding of important deep architectures (e.g. CNN, RNN, autoencoders, GANs, deep reinforcement learning)

6 What is deep learning? Big Data Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions Machine learning Field of study that gives computers the ability to learn without being explicitly programmed. Artificial neural network (ANN) A computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs. Dr. Robert Hecht- Nielsen Deep learning A set of algorithms that attempt to model high-level data abstractions in data by using multiple processing layers, composed of multiple linear and non-linear transformations. Often an ANN with many layers A tool in machine learning and big data analysis

7 What is deep learning? CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

8 Deep learning is hot CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

9 Deep learning is hot CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

10 Deep learning is hot CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

11 Recent success in deep learning Image colorization (Zhang et al., 2016)

12 Recent success in deep learning Image colorization (Zhang et al., 2016) Colorized classical photographs by Ansel Adams

13 Recent success in deep learning Real-time visual translation on smartphones 1. Find the letters 2. Recognize the letters 3. Translate 4. Render the translation in the same style Google blog, 2015

14 Recent success in deep learning Object classification/detection in images (Krizhevsky et al., 2012)

15 Recent success in deep learning Automatic text generation (Andrej Karpathy blog, 2015)

16 Recent success in deep learning Automatic image caption generation (Karpathy & Fei-Fei, 2015)

17 Recent success in deep learning Automatic game playing Alpha Go Zero Alpha Zero

18 What is a neural network? Multi-layer perceptron

19 The perceptron Developed in 1950 s and 1960 s by Frank Rosenblatt Binary inputs Single binary output Example: Nielsen, 2015

20 The perceptron Computing the output: Assign weights to each input Determine if weighted sum of inputs is greater than some threshold output = 0 if jj 1 if jj Nielsen, 2015 ww jj xx jj threshold ww jj xx jj > threshold

21 The perceptron Example: Decide whether to attend a cheese festival Three factors: 1. Is the weather good? xx 1 2. Does your boyfriend or girlfriend want to accompany you? xx 2 3. Is the festival near public transit? (you don t own a car) xx 3 0, if no xx jj = 1, if yes Nielsen, 2015

22 The perceptron Example: Decide whether to attend a cheese festival Three factors: 1. Is the weather good? xx 1 2. Does your boyfriend or girlfriend want to accompany you? xx 2 3. Is the festival near public transit? (you don t own a car) xx 3 Case 1: Love cheese but hate bad weather ww 1 = 6 ww 2 = 2 ww 3 = 2 Threshold= 5 jj ww jj xx jj > threshold whenever weather is good (xx 1 = 1) jj ww jj xx jj < threshold whenever weather is bad (xx 1 = 0)

23 The perceptron Example: Decide whether to attend a cheese festival Three factors: 1. Is the weather good? xx 1 2. Does your boyfriend or girlfriend want to accompany you? xx 2 3. Is the festival near public transit? (you don t own a car) xx 3 Case 2: Love cheese but don t hate bad weather as much ww 1 = 6 ww 2 = 2 ww 3 = 2 Threshold= 3 jj ww jj xx jj > threshold whenever weather is good (xx 1 = 1) or boyfriend or girlfriend will go (xx 2 = 1) and when the festival is near public transit (xx 3 = 1)

24 The multilayer perceptron (MLP) A single perceptron is pretty simple A complex network of perceptrons can make subtle decisions First Layer Second Layer Nielsen, 2015

25 Notation Simplification ww xx = jj ww jj xx jj ww and xx are the weight and input vectors, respectively Replace the threshold with perceptron bias Bias bb = threshold output = 0 if ww xx + bb 0 1 if ww xx + bb > 0 Bias is a measure of ease in firing the perceptron

26 Logic circuits with perceptrons ww 1, ww 2 = 2, bb = 3 Nielsen, 2015 What is the output of this perceptron for each possible input? What logic circuit is this? Input 00 produces 1 Input 01 or 10 produce 1 Input 11 produces 0 This is a NAND gate!

built from NAND gates Therefore, perceptrons are

27 Logic circuits with perceptrons NAND gates are universal for computation Any computation can be built from NAND gates Therefore, perceptrons are universal for computation Bitwise addition: Nielsen, 2015

28 So what? We can create learning algorithms that automatically tune the weights and biases Tuning occurs in response to external stimuli and w/o direct intervention Creates a circuit designed for the problem at hand

29 Why go deep? Representations matter Goodfellow et al., 2016 CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

30 Why go deep? Representations matter Goodfellow et al., 2016 CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

31 Increasing # of neurons 1. Perceptron (Rosenblatt, 1958) 4. Early backpropagation network 6. MLP for speech recognition (Bengio et al, 1991) 11. GPU-accelerated convolutional network (Challeapilla et al., 2006) 20. GoogLeNet (Szegedy et al., 2014a) Goodfellow et al., 2016

32 Design choices for an ANN Learning algorithms Backpropagation Stochastic gradient descent (SGD) Activation function (e.g. threshold) Cost functions Number and dimension of layers Connections between layers Regularizations Layers Batches More

33 Deep learning examples CNNs, word embeddings, RNNs, autoencoders, Ultra deep learning, generative models, deep reinforcement learning, restricted Boltzmann machines

34 Fully connected network Every feature interacts with every other feature Weight matrix at every level allowed to be dense

35 Convolutional Neural Networks (CNNs) CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018 Very successful in images

36 Convolutional Neural Networks (CNNs) Only pixels that are close to each other in the image interact with each other (convolution layer) Weight matrices are highly structured Pooling helps to simplify output of convolution layer Yann LeCun

37 Convolutional Neural Networks (CNNs) CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018 Weights from the first layer tend to look like directional filters after training Detects edges, color change, etc.

38 Convolutional Neural Networks (CNNs) Goodfellow et al., 2016 CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

39 Word2Vec Organization of words via neural networks Next word in a sentence can be predicted based on organization

40 Recurrent Neural Networks (RNNs) Useful when time is important

41 Recurrent Neural Networks (RNNs) In feedforward nets (everything we ve considered so far), activations of later layers are completely determined by the input RNNs allow the hidden layers to be affected by activations at earlier times (i.e. feedback) E.g. a neuron s activation may include as input its activation at an earlier time Cycles are now included in the network This time-varying behavior make RNNs useful for analyzing data that change over time (e.g. speech) Training can be difficult for long-term dependencies

42 Fully Recurrent Network By Chrislb - created by Chrislb, CC BY-SA 3.0, CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

43 Autoencoders Attempts to compress the data and then reconstruct the input Bottleneck layer Reconstruction By Chervinskii - Own work, CC BY-SA 4.0, CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

44 Autoencoder Applications Pretraining Dimensionality reduction Information retrieval Denoising Data compression Generative modeling Batch correction Goodfellow et al., 2016

45 Ultra Deep Learning (e.g. ResNet) Very deep neural nets are difficult to train Accuracy can degrade with deeper networks ResNet developed a framework to address this degradation Successfully trained a 152 layer network Won the ILSVRC 2015 image classification task arxiv.org/abs/

46 Generative Models Create a map from random noise into distribution of training data to generate samples Generative Adversarial Net (GAN) Generative model is pitted against an discriminative model that determines whether a sample is from the model or the data Improves both generation and discrimination

47 Generative Models CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

48 Deep Reinforcement Learning Alpha Go Zero Alpha Zero

49 Deep Reinforcement Learning What is reinforcement learning? CS 294, Berkeley, Sergey Levine CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

50 Deep Reinforcement Learning Examples CS 294, Berkeley, Sergey Levine CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Restricted Boltzmann Machines A type of stochastic recurrent neural network and Markov Random Field Models probability distribution of input variables using input and hidden layer Trained

51 Restricted Boltzmann Machines A type of stochastic recurrent neural network and Markov Random Field Models probability distribution of input variables using input and hidden layer Trained using unlabeled data Useful in unsupervised or semisupervised setting Uses: Feature learning Initializing other deep networks Components in other models Wikipedia: Restricted Boltzmann Machine

52 Next time Machine learning background

Python Machine Learning

Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled