An Introduction to Deep Learning

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Lecture 1: Machine Learning Basics

Python Machine Learning

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

(Sub)Gradient Descent

Second Exam: Natural Language Parsing with Neural Networks

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Generative models and adversarial training

arxiv: v1 [cs.lg] 15 Jun 2015

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Forget catastrophic forgetting: AI that learns after deployment

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Georgetown University at TREC 2017 Dynamic Domain Track

A Neural Network GUI Tested on Text-To-Phoneme Mapping

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v4 [cs.cl] 28 Mar 2016

Assignment 1: Predicting Amazon Review Ratings

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

arxiv: v1 [cs.cv] 10 May 2017

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Artificial Neural Networks written examination

Deep Neural Network Language Models

CSL465/603 - Machine Learning

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Model Ensemble for Click Prediction in Bing Search Ads

Modeling function word errors in DNN-HMM based LVCSR systems

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

Cultivating DNN Diversity for Large Scale Video Labelling

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Modeling function word errors in DNN-HMM based LVCSR systems

Softprop: Softmax Neural Network Backpropagation Learning

A Deep Bag-of-Features Model for Music Auto-Tagging

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

A study of speaker adaptation for DNN-based speech synthesis

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

arxiv: v5 [cs.ai] 18 Aug 2015

Axiom 2013 Team Description Paper

ON THE USE OF WORD EMBEDDINGS ALONE TO

Residual Stacking of RNNs for Neural Machine Translation

CS Machine Learning

Attributed Social Network Embedding

Learning Methods for Fuzzy Systems

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Lip Reading in Profile

Human Emotion Recognition From Speech

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A Review: Speech Recognition with Deep Learning Methods

Deep Facial Action Unit Recognition from Partially Labeled Data

WHEN THERE IS A mismatch between the acoustic

THE enormous growth of unstructured data, including

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

THE world surrounding us involves multiple modalities

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

Knowledge Transfer in Deep Convolutional Neural Nets

Test Effort Estimation Using Neural Network

arxiv: v1 [cs.cl] 27 Apr 2016

INPE São José dos Campos

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Evolutive Neural Net Fuzzy Filtering: Basic Description

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

SORT: Second-Order Response Transform for Visual Recognition

Calibration of Confidence Measures in Speech Recognition

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Learning to Schedule Straight-Line Code

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

arxiv: v1 [cs.lg] 20 Mar 2017

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

arxiv: v2 [cs.ir] 22 Aug 2016

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Word Segmentation of Off-line Handwritten Documents

On the Formation of Phoneme Categories in DNN Acoustic Models

A JOINT MANY-TASK MODEL: GROWING A NEURAL NETWORK FOR MULTIPLE NLP TASKS

arxiv: v2 [cs.cl] 26 Mar 2015

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Improvements to the Pruning Behavior of DNN Acoustic Models

Cost-sensitive Deep Learning for Early Readmission Prediction at A Major Hospital

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Artificial Neural Networks

Issues in the Mining of Heart Failure Datasets

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Time series prediction

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Speech Recognition at ICSI: Broadcast News and beyond

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Transcription:

An Introduction to Deep Learning Patrick Emami University of Florida Department of Computer and Information Science and Engineering September 7, 2017 Patrick Emami (CISE) Deep Learning September 7, 2017 1 / 30

Overview 1 What is Deep Learning? The General Framework A Brief History of Deep Neural Networks 2 Why is Deep Learning so successful? Big Data Era 3 Applications and Architectures Computer Vision Natural Language Processing Training Deep Neural Networks Patrick Emami (CISE) Deep Learning September 7, 2017 2 / 30

What is Deep Learning? Patrick Emami (CISE) Deep Learning September 7, 2017 3 / 30

Simple Definition Deep Learning can be viewed as the composition of many functions for the purpose of mapping input values to output values in such a way so as to encourage the discovery of representations of data. Patrick Emami (CISE) Deep Learning September 7, 2017 4 / 30

Function Approximation Many machine learning problems can be framed as function approximation. Example: Given a sample of data points x i R n, i = 1,..., N and binary labels y i {0, 1} from a dataset, find parameters θ such that L(y, f (x, θ)) is minimized over all other data points x and true labels y in the dataset, for some loss function L and some family of parameterized functions f. Source: http://people.cs.uchicago.edu/ amr/122- w12/assignments/hw1a/index.html Patrick Emami (CISE) Deep Learning September 7, 2017 5 / 30

Multi-Layer Perceptron (MLP) In Deep Learning, we try to approximate functions with Deep Neural Networks Source: https://www.researchgate.net/publication/287209604 Prediction of Final Concentrate Grade Using Artificial Neural Networks from Gol- E-Gohar Iron Ore Plant Patrick Emami (CISE) Deep Learning September 7, 2017 6 / 30

Universal Function Approximation It was shown in [Hornik, 1991] that a multi-layer perceptron is a universal function approximator. This means that, given enough hidden units, it can model any suitably smooth function to any desired level of accuracy Patrick Emami (CISE) Deep Learning September 7, 2017 7 / 30

History of Neural Networks Source: https://beamandrew.github.io/deeplearning/2017/02/23/deep learning 101 part1.html Patrick Emami (CISE) Deep Learning September 7, 2017 8 / 30

Why is Deep Learning so successful? Patrick Emami (CISE) Deep Learning September 7, 2017 9 / 30

Scalability Source: https://machinelearningmastery.com/whatis-deep-learning/ Patrick Emami (CISE) Deep Learning September 7, 2017 10 / 30

Scalability Patrick Emami (CISE) Deep Learning September 7, 2017 11 / 30

NVIDIA s graphics cards and CUDA library allows for extremely fast matrix operations on DNNs with millions of parameters Source: https://devblogs.nvidia.com/parallelforall/nvidiaibm-cloud-support-imagenet-large-scalevisual-recognition-challenge/ Patrick Emami (CISE) Deep Learning September 7, 2017 12 / 30

Deep Learning Frameworks Patrick Emami (CISE) Deep Learning September 7, 2017 13 / 30

Applications and Architectures Patrick Emami (CISE) Deep Learning September 7, 2017 14 / 30

Computer Vision Object Detection Source: https://www.kaggle.com/c/imagenetobject-detection-challenge Patrick Emami (CISE) Deep Learning September 7, 2017 15 / 30

Computer Vision Semantic Segmentation Source: http://nicolovaligi.com/deep-learningmodels-semantic-segmentation.html Patrick Emami (CISE) Deep Learning September 7, 2017 16 / 30

Computer Vision Source: https://www.youtube.com/watch?v=c4ztzg4ckzs Multi-Object Tracking Patrick Emami (CISE) Deep Learning September 7, 2017 17 / 30

Convolutional Neural Networks A CNN [Krizhevsky, 2012] for multi-class classification. CNNs can also be used for many other learning tasks such as regression by changing the output layer. Source: https://www.mathworks.com/discovery/convolutionalneural-network.html Patrick Emami (CISE) Deep Learning September 7, 2017 18 / 30

Learned Representations Source: https://stats.stackexchange.com/questions/146413/whyconvolutional-neural-networks-belong-todeep-learning Patrick Emami (CISE) Deep Learning September 7, 2017 19 / 30

Binary Classification with CNNs The negative log-likelihood for 0-1 binary classification with CNNs: p(y x, θ) = Bernoulli(y σ(w g(x, θ) + b)) Setting σ(w g(x, θ) + b) to p, p {0, 1}, = p y (1 p) 1 y NLL(x, θ) = (y log p + (1 y) log(1 p)). So for p < 0.5, your CNN should predict y = 1, and for p >= 0.5, it should predict y = 0. Nonlinear and non-convex optimization problem! Patrick Emami (CISE) Deep Learning September 7, 2017 20 / 30

Natural Language Processing Source: http://colah.github.io/posts/2014-07- NLP-RNNs-Representations/ Distributed Word Representations Patrick Emami (CISE) Deep Learning September 7, 2017 21 / 30

Natural Language Processing Source: https://opensource.googleblog.com/2017/04/tfseq2seq-sequence-to-sequence-frameworkin-tensorflow.html Machine Translation Patrick Emami (CISE) Deep Learning September 7, 2017 22 / 30

Natural Language Processing Text Summarization Source: http://www.kdnuggets.com/2016/09/deeplearning-august-update-part-2.html Patrick Emami (CISE) Deep Learning September 7, 2017 23 / 30

Recurrent Neural Networks Source: https://leonardoaraujosantos.gitbooks.io/artificialinteligence/content/recurrent neural networks.html Patrick Emami (CISE) Deep Learning September 7, 2017 24 / 30

Long Short-Term Memory The LSTM cell, well suited for large bodies of text [Hochreiter, 1997] Source: https://commons.wikimedia.org/wiki/file:long Short Term Memory.png Patrick Emami (CISE) Deep Learning September 7, 2017 25 / 30

Backpropagation Goal: Find the optimal set of parameters for the Deep Neural Network that minimizes the loss on the training set without overfitting. Solution: With your training set, compute the gradient of the loss with respect to the parameters in each layer and set this equal to 0. Use the chain rule! Gradients flow backwards from the output to the input layer. Auto-differentiation engines, like Tensorflow, handle this for us nowadays. Patrick Emami (CISE) Deep Learning September 7, 2017 26 / 30

Stochastic Gradient Descent Use mini-batch stochastic gradient descent to update parameters, since using the full dataset can be too expensive. The following is an example of updating a single weight w using our negative log-likelihood loss from earlier. = 1 B B w NLL(x i, θ) i=1 w = w α Patrick Emami (CISE) Deep Learning September 7, 2017 27 / 30

Resources 1 http://www.fast.ai/ 2 https://www.udacity.com/course/deep-learning--ud730 3 http://www.deeplearningbook.org/ 4 https://keras.io/ Patrick Emami (CISE) Deep Learning September 7, 2017 28 / 30

References Hornik, Kurt (1991) Approximation capabilities of multilayer feedforward networks Neural networks 4(2), 251 257 Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E (2012) Imagenet classification with deep convolutional neural networks Advances in neural information processing systems Hochreiter, Sepp and Schmidhuber, Juergen (1997) Long Short-Term Memory Neural Computation 9(8), 1735 1780 Patrick Emami (CISE) Deep Learning September 7, 2017 29 / 30

Questions? Patrick Emami (CISE) Deep Learning September 7, 2017 30 / 30