Deep Learning Theory and Applications

Similar documents
Python Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

(Sub)Gradient Descent

arxiv: v1 [cs.cv] 10 May 2017

Artificial Neural Networks written examination

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Generative models and adversarial training

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A Neural Network GUI Tested on Text-To-Phoneme Mapping

CSL465/603 - Machine Learning

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Artificial Neural Networks

Second Exam: Natural Language Parsing with Neural Networks

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Knowledge Transfer in Deep Convolutional Neural Nets

arxiv: v1 [cs.lg] 15 Jun 2015

Learning Methods for Fuzzy Systems

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

CS Machine Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Deep Neural Network Language Models

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

A Review: Speech Recognition with Deep Learning Methods

A deep architecture for non-projective dependency parsing

Model Ensemble for Click Prediction in Bing Search Ads

Attributed Social Network Embedding

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

arxiv: v2 [cs.cl] 26 Mar 2015

Calibration of Confidence Measures in Speech Recognition

Axiom 2013 Team Description Paper

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

THE world surrounding us involves multiple modalities

arxiv: v1 [cs.lg] 7 Apr 2015

Lecture 1: Basic Concepts of Machine Learning

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Softprop: Softmax Neural Network Backpropagation Learning

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

arxiv: v2 [cs.cv] 30 Mar 2017

Modeling function word errors in DNN-HMM based LVCSR systems

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

INPE São José dos Campos

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Evolution of Symbolisation in Chimpanzees and Neural Nets

A Reinforcement Learning Variant for Control Scheduling

arxiv: v4 [cs.cl] 28 Mar 2016

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

CS177 Python Programming

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Evolutive Neural Net Fuzzy Filtering: Basic Description

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

A Deep Bag-of-Features Model for Music Auto-Tagging

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

arxiv:submit/ [cs.cv] 2 Aug 2017

THE enormous growth of unstructured data, including

Modeling function word errors in DNN-HMM based LVCSR systems

arxiv: v1 [cs.cl] 27 Apr 2016

Lip Reading in Profile

Probabilistic Latent Semantic Analysis

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

A study of speaker adaptation for DNN-based speech synthesis

Test Effort Estimation Using Neural Network

Human Emotion Recognition From Speech

arxiv: v2 [cs.ir] 22 Aug 2016

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Issues in the Mining of Heart Failure Datasets

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

SORT: Second-Order Response Transform for Visual Recognition

Residual Stacking of RNNs for Neural Machine Translation

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Forget catastrophic forgetting: AI that learns after deployment

An empirical study of learning speed in backpropagation

Getting Started with Deliberate Practice

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Using focal point learning to improve human machine tacit coordination

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Device Independence and Extensibility in Gesture Recognition

WHEN THERE IS A mismatch between the acoustic

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

An OO Framework for building Intelligence and Learning properties in Software Agents

CS 100: Principles of Computing

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Cost-sensitive Deep Learning for Early Readmission Prediction at A Major Hospital

Transcription:

Deep Learning Theory and Applications Kevin Moon (kevin.moon@yale.edu) Guy Wolf (guy.wolf@yale.edu) CPSC/AMTH 663

Outline 1. Course logistics 2. What is Deep Learning? 3. Deep learning examples CNNs Word embeddings RNNs Autoencoders Ultra deep learning (ResNet) Generative models (e.g. GANs) Deep reinforcement learning Boltzman machines

Course Logistics Textbooks (available online) Neural Networks and Deep Learning by Michael Nielsen Deep Learning by Goodfellow, Bengio, and Courville Required background Basic probability Basic linear algebra & calculus Programming experience Python and Tensorflow will be used in this course Look at the textbooks and HW 1 for an idea Course Website: cpsc663.guywolf.org Course info, lecture slides, & HW Canvas Announcements & HW

Course Logistics Office hours: TBD 5-6 HW assignments Assigned about every 2 weeks, due on Thursdays All/most will include some programming (Python & Tensorflow) Final project (details forthcoming) In groups of 3-4

Goals of the Course A solid understanding of supervised feedforward neural networks Stochastic gradient descent, backpropagation, etc. Cost functions, regularizers, etc. The ability to design and train novel architectures An understanding of optimization strategies in training deep architectures Understanding of important deep architectures (e.g. CNN, RNN, autoencoders, GANs, deep reinforcement learning)

What is deep learning? Big Data Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions Machine learning Field of study that gives computers the ability to learn without being explicitly programmed. Artificial neural network (ANN) A computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs. Dr. Robert Hecht- Nielsen Deep learning A set of algorithms that attempt to model high-level data abstractions in data by using multiple processing layers, composed of multiple linear and non-linear transformations. Often an ANN with many layers A tool in machine learning and big data analysis

What is deep learning? CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Deep learning is hot CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Deep learning is hot CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Deep learning is hot CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Recent success in deep learning Image colorization (Zhang et al., 2016)

Recent success in deep learning Image colorization (Zhang et al., 2016) Colorized classical photographs by Ansel Adams

Recent success in deep learning Real-time visual translation on smartphones 1. Find the letters 2. Recognize the letters 3. Translate 4. Render the translation in the same style Google blog, 2015

Recent success in deep learning Object classification/detection in images (Krizhevsky et al., 2012)

Recent success in deep learning Automatic text generation (Andrej Karpathy blog, 2015)

Recent success in deep learning Automatic image caption generation (Karpathy & Fei-Fei, 2015)

Recent success in deep learning Automatic game playing Alpha Go Zero Alpha Zero

What is a neural network? Multi-layer perceptron

The perceptron Developed in 1950 s and 1960 s by Frank Rosenblatt Binary inputs Single binary output Example: Nielsen, 2015

The perceptron Computing the output: Assign weights to each input Determine if weighted sum of inputs is greater than some threshold output = 0 if jj 1 if jj Nielsen, 2015 ww jj xx jj threshold ww jj xx jj > threshold

The perceptron Example: Decide whether to attend a cheese festival Three factors: 1. Is the weather good? xx 1 2. Does your boyfriend or girlfriend want to accompany you? xx 2 3. Is the festival near public transit? (you don t own a car) xx 3 0, if no xx jj = 1, if yes Nielsen, 2015

The perceptron Example: Decide whether to attend a cheese festival Three factors: 1. Is the weather good? xx 1 2. Does your boyfriend or girlfriend want to accompany you? xx 2 3. Is the festival near public transit? (you don t own a car) xx 3 Case 1: Love cheese but hate bad weather ww 1 = 6 ww 2 = 2 ww 3 = 2 Threshold= 5 jj ww jj xx jj > threshold whenever weather is good (xx 1 = 1) jj ww jj xx jj < threshold whenever weather is bad (xx 1 = 0)

The perceptron Example: Decide whether to attend a cheese festival Three factors: 1. Is the weather good? xx 1 2. Does your boyfriend or girlfriend want to accompany you? xx 2 3. Is the festival near public transit? (you don t own a car) xx 3 Case 2: Love cheese but don t hate bad weather as much ww 1 = 6 ww 2 = 2 ww 3 = 2 Threshold= 3 jj ww jj xx jj > threshold whenever weather is good (xx 1 = 1) or boyfriend or girlfriend will go (xx 2 = 1) and when the festival is near public transit (xx 3 = 1)

The multilayer perceptron (MLP) A single perceptron is pretty simple A complex network of perceptrons can make subtle decisions First Layer Second Layer Nielsen, 2015

Notation Simplification ww xx = jj ww jj xx jj ww and xx are the weight and input vectors, respectively Replace the threshold with perceptron bias Bias bb = threshold output = 0 if ww xx + bb 0 1 if ww xx + bb > 0 Bias is a measure of ease in firing the perceptron

Logic circuits with perceptrons ww 1, ww 2 = 2, bb = 3 Nielsen, 2015 What is the output of this perceptron for each possible input? What logic circuit is this? Input 00 produces 1 Input 01 or 10 produce 1 Input 11 produces 0 This is a NAND gate!

Logic circuits with perceptrons NAND gates are universal for computation Any computation can be built from NAND gates Therefore, perceptrons are universal for computation Bitwise addition: Nielsen, 2015

So what? We can create learning algorithms that automatically tune the weights and biases Tuning occurs in response to external stimuli and w/o direct intervention Creates a circuit designed for the problem at hand

Why go deep? Representations matter Goodfellow et al., 2016 CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Why go deep? Representations matter Goodfellow et al., 2016 CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Increasing # of neurons 1. Perceptron (Rosenblatt, 1958) 4. Early backpropagation network 6. MLP for speech recognition (Bengio et al, 1991) 11. GPU-accelerated convolutional network (Challeapilla et al., 2006) 20. GoogLeNet (Szegedy et al., 2014a) Goodfellow et al., 2016

Design choices for an ANN Learning algorithms Backpropagation Stochastic gradient descent (SGD) Activation function (e.g. threshold) Cost functions Number and dimension of layers Connections between layers Regularizations Layers Batches More

Deep learning examples CNNs, word embeddings, RNNs, autoencoders, Ultra deep learning, generative models, deep reinforcement learning, restricted Boltzmann machines

Fully connected network Every feature interacts with every other feature Weight matrix at every level allowed to be dense

Convolutional Neural Networks (CNNs) CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018 Very successful in images

Convolutional Neural Networks (CNNs) Only pixels that are close to each other in the image interact with each other (convolution layer) Weight matrices are highly structured Pooling helps to simplify output of convolution layer Yann LeCun

Convolutional Neural Networks (CNNs) CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018 Weights from the first layer tend to look like directional filters after training Detects edges, color change, etc.

Convolutional Neural Networks (CNNs) Goodfellow et al., 2016 CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Word2Vec Organization of words via neural networks Next word in a sentence can be predicted based on organization

Recurrent Neural Networks (RNNs) Useful when time is important

Recurrent Neural Networks (RNNs) In feedforward nets (everything we ve considered so far), activations of later layers are completely determined by the input RNNs allow the hidden layers to be affected by activations at earlier times (i.e. feedback) E.g. a neuron s activation may include as input its activation at an earlier time Cycles are now included in the network This time-varying behavior make RNNs useful for analyzing data that change over time (e.g. speech) Training can be difficult for long-term dependencies

Fully Recurrent Network By Chrislb - created by Chrislb, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=224513 CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Autoencoders Attempts to compress the data and then reconstruct the input Bottleneck layer Reconstruction By Chervinskii - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45555552 CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Autoencoder Applications Pretraining Dimensionality reduction Information retrieval Denoising Data compression Generative modeling Batch correction Goodfellow et al., 2016

Ultra Deep Learning (e.g. ResNet) Very deep neural nets are difficult to train Accuracy can degrade with deeper networks ResNet developed a framework to address this degradation Successfully trained a 152 layer network Won the ILSVRC 2015 image classification task arxiv.org/abs/1512.03385

Generative Models Create a map from random noise into distribution of training data to generate samples Generative Adversarial Net (GAN) Generative model is pitted against an discriminative model that determines whether a sample is from the model or the data Improves both generation and discrimination

Generative Models CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Deep Reinforcement Learning Alpha Go Zero Alpha Zero

Deep Reinforcement Learning What is reinforcement learning? CS 294, Berkeley, Sergey Levine CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Deep Reinforcement Learning Examples CS 294, Berkeley, Sergey Levine CPSC/AMTH 663 (Kevin Moon/Guy Wolf) Deep Learning Overview Yale Spring 2018

Restricted Boltzmann Machines A type of stochastic recurrent neural network and Markov Random Field Models probability distribution of input variables using input and hidden layer Trained using unlabeled data Useful in unsupervised or semisupervised setting Uses: Feature learning Initializing other deep networks Components in other models Wikipedia: Restricted Boltzmann Machine

Next time Machine learning background