Deep Learning Techniques and Applications. Georgiana Neculae

Similar documents
Python Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Artificial Neural Networks written examination

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lecture 1: Machine Learning Basics

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

arxiv: v1 [cs.lg] 15 Jun 2015

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

arxiv: v1 [cs.lg] 7 Apr 2015

Knowledge Transfer in Deep Convolutional Neural Nets

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

CSL465/603 - Machine Learning

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Generative models and adversarial training

Deep Neural Network Language Models

Test Effort Estimation Using Neural Network

INPE São José dos Campos

Model Ensemble for Click Prediction in Bing Search Ads

Softprop: Softmax Neural Network Backpropagation Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Attributed Social Network Embedding

arxiv: v1 [cs.cv] 10 May 2017

An empirical study of learning speed in backpropagation

CS Machine Learning

Axiom 2013 Team Description Paper

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Second Exam: Natural Language Parsing with Neural Networks

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

arxiv: v1 [cs.cl] 27 Apr 2016

Artificial Neural Networks

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

arxiv: v4 [cs.cl] 28 Mar 2016

Learning to Schedule Straight-Line Code

Cultivating DNN Diversity for Large Scale Video Labelling

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Speech Recognition at ICSI: Broadcast News and beyond

On the Formation of Phoneme Categories in DNN Acoustic Models

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

A Deep Bag-of-Features Model for Music Auto-Tagging

Learning Methods for Fuzzy Systems

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

A Review: Speech Recognition with Deep Learning Methods

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

A Reinforcement Learning Variant for Control Scheduling

Assignment 1: Predicting Amazon Review Ratings

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Using the Artificial Neural Networks for Identification Unknown Person

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

SORT: Second-Order Response Transform for Visual Recognition

Forget catastrophic forgetting: AI that learns after deployment

Adaptive learning based on cognitive load using artificial intelligence and electroencephalography

Evolutive Neural Net Fuzzy Filtering: Basic Description

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Human Emotion Recognition From Speech

Lip Reading in Profile

An OO Framework for building Intelligence and Learning properties in Software Agents

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Residual Stacking of RNNs for Neural Machine Translation

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Lecture 10: Reinforcement Learning

Lecture 1: Basic Concepts of Machine Learning

THE enormous growth of unstructured data, including

A study of speaker adaptation for DNN-based speech synthesis

Knowledge-Based - Systems

Laboratorio di Intelligenza Artificiale e Robotica

Modeling function word errors in DNN-HMM based LVCSR systems

Switchboard Language Model Improvement with Conversational Data from Gigaword

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

THE world surrounding us involves multiple modalities

FF+FPG: Guiding a Policy-Gradient Planner

SOFTWARE EVALUATION TOOL

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

arxiv: v2 [cs.ro] 3 Mar 2017

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

AI Agent for Ice Hockey Atari 2600

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

TD(λ) and Q-Learning Based Ludo Players

Top US Tech Talent for the Top China Tech Company

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Calibration of Confidence Measures in Speech Recognition

arxiv: v1 [cs.lg] 20 Mar 2017

Probabilistic Latent Semantic Analysis

arxiv: v1 [cs.dc] 19 May 2017

Modeling function word errors in DNN-HMM based LVCSR systems

SARDNET: A Self-Organizing Feature Map for Sequences

Using focal point learning to improve human machine tacit coordination

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Transcription:

Deep Learning Techniques and Applications Georgiana Neculae

Outline 1. Why Deep Learning? 2. Applications and specialized Neural Networks 3. Neural Networks basics and training 4. Potential issues 5. Preventing overfitting 6. Research directions 7. Implementing your own!

Why Deep Learning?

Why is it important? Impressive performance on what was perceived as exclusively human tasks: Playing games Artistic creativity Verbal communication Problem solving

Applications

Speech Recognition Aim: Input speech recordings and receive text. Why? (translation, AI assistants, automatic subtitles) Challenges come from the differences between pronunciations: Intonation Accent Speed Cadence or inflection

Recurrent Neural Networks (RNNs) Make use of internal memory to predict the most likely future sequence based on what they have seen so far

WaveNet Generates speech that sounds more natural than any existing techniques Also used to synthesize and generate music https://deepmind.com/blog/wavenet-generative-model-r aw-audio/

Object Detection and Recognition Why? (face detection for cameras, counting, visual search engine) What features are important when learning to understand an image?

Object Detection and Recognition Difficulty arises from: Multiple objects can be identified in a photo Objects can be occluded by environment Object of interest could be too small Same class examples could be very different

Convolutional Neural Networks (CNNs)

Object Recognition

Object Recognition http://extrapolated-art.com/ https://deepdreamgenerator.com/feed

Reinforcement Learning Learning is done through trial-and-error, based on rewards or punishments Agents independently develop successful strategies that lead to the greatest long-term rewards No hand engineered features or domain heuristics are provided, the agents being capable to learn directly from raw inputs

Reinforcement Learning AlphaGo, a deep neural network trained using reinforcement learning, defeated Lee Sedol (the strongest Go player of the last decade) by 4 games to 1. https://deepmind.com/blog/deep-reinforcement-learning/

Neural Networks Basics

Perceptron the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence Frank Rosenblatt, 1957

Perceptron to Logistic Regression (recap)

Logistic Regression (recap) Linear model capable of solving 2 class problems Uses the Sigmoid function to scale the output between [0,1] f(x) wtx - t

Logistic Regression (recap) Uses the Log-loss function (cross entropy) to minimize the error:

Gradient Descent (recap) Update rule: Update parameters in the negative direction of the gradient. Negative gradient Increase value of w1 Positive gradient Decrease value of w1

Gradient Descent (recap) Log-loss function: Gradient is given by the partial derivative with respect to parameter wi:

Gradient Descent Gradient is given by the partial derivative with respect to parameter wi:

Gradient Descent What if we add another unit (neuron) How do we update the parameters?

Gradient Descent Gradient is computed in the same way. How do we combine the outputs of the two neurons?

Multi-layer Perceptron Two neurons can only be combined by using another neuron:

Error Function Regression (network predicts real values): Classification (network predicts class probability estimates):

Gradient Descent Real value Note the use of the chain rule to compute the derivative

BackProp Real value How do we update w(0,0) and w(0,1)? We propagate the error through the network.

BackProp Real value Descend to next layer and compute the gradient with respect to W(0,0)

Deep Neural Network Can add more layers and neurons in each layer A bias neuron can be used to shift the decision boundary, as in the Perceptron:

Activation Functions Commonly used functions: f(x) f(x) x f(x) x x

Activation Functions Sigmoid ReLu (Rectified Linear Unit) No vanishing or exploding gradient Tanh (Hyperbolic Tangent) Output can be interpreted as probabilities Converges faster than the sigmoid function SoftMax Generalisation of the logistic function, outputs can be interpreted as probabilities

Decision boundary XOR problem (non-linear) Neural Networks are nonlinear models Takasi J. Ozaki, Decision Boundaries for Deep Learning and other Machine Learning classifiers

Potential Issues

Local minima Caused by the high dimensional parameter space, which causes points to be saddle points instead Because of this, they are not an issue in practice

Vanishing gradient problem Appears when a change in a parameter s value causes very small changes in the value of the network output Manifests in very small gradient values when the update of the parameter is computed

Vanishing gradient problem Appears in gradient based methods, caused by some activation functions (sigmoid or tanh) Output of function Input to function Magnified by the addition of hidden layers

Overfitting Bishop 2006: Pattern recognition and machine learning

Preventing Overfitting

Early Stopping

Weight sharing Parameters are shared by having the values stored in the same memory location Decrease amount of parameters at the cost of reducing model complexity Mostly used in convolutional and recurrent networks

Dropout Randomly omit some units of the network over a training batch (group of training examples) Encourage specialization of the generated network to the batch

Dropout It is a form of regularization Akin to using an ensemble, each trained on single batches

Conclusions

Summary Impressive performance on difficult tasks has made Deep Learning very popular Based on Perceptron and Logistic Regression Training is done using Gradient Descent and Backprop Error function, activation function and architecture are problem dependent Easy to overfit, but there are ways to avoid it

Research Directions Understanding more about how Neural Networks learn Applications to vision, speech and problem solving Improving computational performance, specialised hardware Tensor Processing Units (TPUs) Moving towards more biologically inspired neurons Spiking Neurons

Libraries and Resources Tensorflow: great support and lots of resources Theano: one of the first deep learning libraries, no multi-gpu support (support discontinued) Keras: very high level library that work on top of Theano or Tensorflow Lasagne: similar to Keras, but only compatible with Theano Caffe: specialised more for computer vision than deep learning Torch: uses the programming language Lua, has a wrapper for Python

Thank You!