Special Topic: Deep Learning

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Python Machine Learning

Lecture 1: Machine Learning Basics

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

arxiv: v1 [cs.lg] 15 Jun 2015

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

(Sub)Gradient Descent

Probabilistic Latent Semantic Analysis

Knowledge Transfer in Deep Convolutional Neural Nets

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

AI Agent for Ice Hockey Atari 2600

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Artificial Neural Networks written examination

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v2 [cs.cv] 30 Mar 2017

Deep Neural Network Language Models

THE enormous growth of unstructured data, including

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

A Deep Bag-of-Features Model for Music Auto-Tagging

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Attributed Social Network Embedding

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

CSL465/603 - Machine Learning

Model Ensemble for Click Prediction in Bing Search Ads

arxiv: v1 [cs.lg] 7 Apr 2015

THE world surrounding us involves multiple modalities

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

WHEN THERE IS A mismatch between the acoustic

arxiv: v1 [cs.cv] 10 May 2017

Axiom 2013 Team Description Paper

Forget catastrophic forgetting: AI that learns after deployment

SORT: Second-Order Response Transform for Visual Recognition

Lecture 1: Basic Concepts of Machine Learning

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

arxiv: v1 [cs.cl] 27 Apr 2016

Generative models and adversarial training

Australian Journal of Basic and Applied Sciences

Learning Methods for Fuzzy Systems

arxiv: v2 [cs.cl] 26 Mar 2015

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Time series prediction

Dropout improves Recurrent Neural Networks for Handwriting Recognition

arxiv:submit/ [cs.cv] 2 Aug 2017

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning From the Past with Experiment Databases

Lip Reading in Profile

Assignment 1: Predicting Amazon Review Ratings

Cultivating DNN Diversity for Large Scale Video Labelling

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

arxiv: v2 [cs.ir] 22 Aug 2016

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

A Reinforcement Learning Variant for Control Scheduling

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

A Review: Speech Recognition with Deep Learning Methods

LEARNING TO PLAY IN A DAY: FASTER DEEP REIN-

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Modeling function word errors in DNN-HMM based LVCSR systems

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Georgetown University at TREC 2017 Dynamic Domain Track

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Modeling function word errors in DNN-HMM based LVCSR systems

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Residual Stacking of RNNs for Neural Machine Translation

Calibration of Confidence Measures in Speech Recognition

arxiv: v1 [cs.lg] 20 Mar 2017

Softprop: Softmax Neural Network Backpropagation Learning

SARDNET: A Self-Organizing Feature Map for Sequences

A survey of multi-view machine learning

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

arxiv: v4 [cs.cl] 28 Mar 2016

A study of speaker adaptation for DNN-based speech synthesis

Evolutive Neural Net Fuzzy Filtering: Basic Description

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Word Segmentation of Off-line Handwritten Documents

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Comment-based Multi-View Clustering of Web 2.0 Items

arxiv: v4 [cs.cv] 13 Aug 2017

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

arxiv: v1 [cs.dc] 19 May 2017

Introduction to Simulation

Offline Writer Identification Using Convolutional Neural Network Activation Features

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Second Exam: Natural Language Parsing with Neural Networks

CS Machine Learning

Speech Recognition at ICSI: Broadcast News and beyond

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

arxiv: v2 [cs.ro] 3 Mar 2017

INPE São José dos Campos

Transcription:

Special Topic: Deep Learning

Hello! We are Zach Jones and Sohan Nipunage You can find us at: zdj21157@uga.edu smn57958@uga.edu 2

Outline I. II. III. IV. What is Deep Learning? Why Deep Learning? Common Problems Popular Use Cases A. B. C. D. Convolutional Nets Recurrent Nets Deep RL Unsupervised V. Current Research VI. Q & A 3

1. What is Deep Learning? More than just a buzzword! 4

Neural Networks Single-layer (shallow) Neural Network 5

Deep Neural Networks Deep (but not that deep) Neural Network 6

Deep Neural Networks Deeper Neural Network 7

2. Why Deep Learning? Is there a point to all of this? 8

History of Learning Systems In the olden days: Expert Systems Knowledge from Experts Hand-Crafted Program The Answer Problem: This takes a lot of time and effort 9

History of Learning Systems Next Step: Classical Machine Learning Input Data HandDesigned Features Mapping from Features Problem: This takes a lot of time and effort The Answer 10

History of Learning Systems Next Step: Representation Learning Input Data Feature Learning Mapping from Features Problem: This is hard to do for some domains The Answer 11

History of Learning Systems The Present: Deep Learning Input Data Simple Features Mapping from High-Level Features More Complex Features The Answer 12

Why Deep Learning More sophisticated models learn very complex non-linear functions Layers as a mechanism for abstraction Automatic feature extraction Works well in practice 13

Why Deep Learning Loads of data Very flexible model can represent complex functions Powerful feature extraction Defeat the curse of dimensionality 14

Multiple Levels of Abstraction Capturing high-level abstractions allows us to achieve amazing results in difficult domains 15

No Free Lunch Anything you can do, I can do better! I can do anything better than you! Yes, including overfitting... 16

3. Common Problems Vanishing Gradients, Parameter Explosion, Overfitting, Long Training Time, and other disasters! 17

Problem: Vanishing Gradients Towards either end of the sigmoid function, Y values tend to respond very less to changes in X Gradient in that region is going to be too small. 18

Problem: Vanishing Gradients Backpropagation o=sig(wx+b) o/ W=o(1-o) X Chains of sigmoid derivatives Eating the gradient Narrow range 19

Solution: Rectified Linear Units Rectifier: 20

Solution: Rectified Linear Units Rectified Linear Units (ramp) f(x)=max (0,x) Derivative: All in or all out (unit step) f (x)=1 if x>0 else 0 First proposed as activation by Hahnloser et al (2000) Popularized by Hinton in his RBM (2010). Dead ReLUs LeakyReLU: f(x)=max (x,0.01x) PReLU: f(x)=max (x,ax) 21

Solution: Rectified Linear Units 22

Solution: Rectified Linear Units All You Need Is A Good Init (2015): Initialize from N(0,1) or U[-1,1] Orthonormalize the weights (Singular Value Decomposition-SVD) Unit singular values in all directions Keep scaling down until unit variance 23

Problem: Parameter Explosion 24

Solution: Shared Weights Each filter hi is replicated across the entire visual field. These replicated units share the same parameterization (weight vector and bias) and form a feature map. 25

Solution: Regularization, Dropout, and Normalization Regularization : Make some minima more appealing than others Smooth the search space (less jagged) Norm-based L1 (sparse weights) L2 (weight decay) 26

Solution: Regularization, Dropout, and Normalization Dropout: Randomly deactivating units in feature maps Forces all parts to be responsible for the output Practically becomes an Ensemble of networks 27

Solution: Regularization, Dropout, and Normalization Batch Normalization: Learns to adjust the mean and variance of the data Helps combat overfitting by removing circumstantial data statistics Helps keeping the gradients strong 28

Problem: Long Training Time Long training time may take upto days for computing. 29

Solution: Modern GPUs and TPUs GPUs allowed for much faster training time (days to hours). The NVIDIA CUDA Deep Neural Network library (cudnn) is a GPU-accelerated library of primitives for deep neural networks. cudnn provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cudnn is part of the NVIDIA Deep Learning SDK. 30

Solution: Modern GPUs and TPUs A tensor processing unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google specifically for neural network machine learning. The chip has been specifically designed for Google's TensorFlow framework 31

4. Popular Use Cases Let s see what all the cool kids are doing... 32

Convolutional Neural Networks Image and Video Processing 33

Image Processing Computer vision Explosive spatial domain 256 x 256 RGB image 256 x 256 x 3 = 196,000 inputs! Traditional Image processing: 34

What if we could learn the filters automatically? Enter: Convolutional Neural Nets 35

Convolution Operation 36

Convolutional Layers Layer parameters consist of a set of learnable filters Key idea: neurons only look at small region of input Convolutional layer maps from 3D input to 3D output Output size determined by hyperparameters: receptive field: n x m x l region of previous layer depth = number of filters to apply to a region stride = by how many pixels do we slide the receptive field 37

LeNet (1998) 38

AlexNet (2012) 39

AlexNet Classifications Top-5 Error Rate: 15.3% 40

Google Inception Network (2015) Top-5 Error Rate: 6.67% 41

U-Net (2015) 42

More Applications Text Classification [5] Words are also spatially correlated! Music Recommendation [6] 43

Deep Reinforcement Learning Decision Making in complex, unsearchable domains 44

Reinforcement Learning 45

Reinforcement Learning If we know the reward function, then it is easy! What if we don t? Idea: Learn the reward function using a deep neural network Capable of inferring complicated reward structure 46

DQN (2015) Deep Q-Learning for Arcade Games 47

AlphaGo Zero Policy Network Where should I search? Value Network What is the value of each state? Trained through self-play Beat reigning Go champions after four days of training 48

Recurrent Neural Networks Making sense of sequential data 49

Recurrent Neural Networks For visual datasets: features are spatially correlated What if features are correlated over time? Text Classification Speech Recognition Handwriting Recognition Solution: Recurrent Neural Networks 50

Recurrent Neural Networks Recurrent Neural Networks have back-connections 51

Recurrent Neural Networks Recurrent Neural Network unrolled over time 52

Basic Recurrent Neural Nets work well for short term dependencies 53 Image source: http://colah.github.io/posts/2015-08-understanding-lstms/

Basic Recurrent Neural Nets break down when data has long term dependencies 54 Image source: http://colah.github.io/posts/2015-08-understanding-lstms/

Long Short-Term Memory (LSTM) Solution: Long short-term memory cells Image source: http://colah.github.io/posts/2015-08-understanding-lstms/ 55

Unsupervised Learning Dimensionality Reduction, Generative Models, and Clustering 56

Unsupervised- Dimensionality reduction Autoencoders Impose constraints on the code (eg, sparse) 57

Unsupervised- Dimensionality reduction Denoising Autoencoders 58

Unsupervised- Generative models Generative Adversarial Networks (2014) 59

Unsupervised- Generative models Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015 60

Unsupervised- Generative models Variational Auto Encoders (2014) Concerned more about the distributions 61

Unsupervised- Clustering Spectral clustering: Formulate pairwise similarity between datapoints (kernel matrix) Eigendecompose the kernel matrix Retain only the largest k-eigenvectors (Laplacian eigenmaps) Apply k-means Eckart-Young-Mirsky theorem: First k-eigenvectors of a matrix M reconstruct the optimal low-rank (k) version of M Autoencoders are all about reconstruction 62

Unsupervised- Clustering 63

5. Current Research This could be you! 64

Adversarial Attacks CNN classifiers are easy to trick 65

Dense Nets Deep Neural Nets have tons of parameters Can we reduce the parameters without hurting accuracy? 66

Distributed Learning Learning involves updating weights Can we avoid the expensive gradient broadcast every iteration? 67

Memory-Augmented Neural Nets Meta-learning Can we learn to learn? Make use of long-term external memory One-shot Learning 68

Memory-Augmented Neural Nets MANN structure 69

Thanks! Any questions? You can find us at: zdj21157@uga.edu smn57958@uga.edu 70

Credits Papers referenced (in order of appearance): 1. LeNet (Yann LeCun) 2. AlexNet (Krishevsky et. al.) 3. Inception (Szegedy et. al.) 4. U-Net (Ronneberger et. al.) 5. CNNs for Sentence Classification (Yoon Kim) 6. Deep Content-Based Music Recommendation (van den Oord et. al.) 7. Playing Atari Games with DQN (Mnih et. al.) 8. AlphaGo Zero (Silver et. al.) 71

Credits Materials used: Presentation template by SlidesCarnival Bahaa s Original Deep Learning Presentation Yoshua Bengio s Lecture on Deep Learning 72