A very brief overview of deep learning

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

In Workflow. Viewing: Last edit: 10/27/15 1:51 pm. Approval Path. Date Submi ed: 10/09/15 2:47 pm. 6. Coordinator Curriculum Management

arxiv: v1 [cs.lg] 15 Jun 2015

CSL465/603 - Machine Learning

Second Exam: Natural Language Parsing with Neural Networks

E mail: Phone: LIBRARY MBA MAIN OFFICE

A Deep Bag-of-Features Model for Music Auto-Tagging

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Artificial Neural Networks written examination

Python Machine Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Dropout improves Recurrent Neural Networks for Handwriting Recognition

THE world surrounding us involves multiple modalities

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

A Review: Speech Recognition with Deep Learning Methods

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Georgetown University at TREC 2017 Dynamic Domain Track

Model Ensemble for Click Prediction in Bing Search Ads

Deep Neural Network Language Models

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

arxiv: v1 [cs.lg] 7 Apr 2015

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

CS Machine Learning

Residual Stacking of RNNs for Neural Machine Translation

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Knowledge Transfer in Deep Convolutional Neural Nets

Generative models and adversarial training

Cultivating DNN Diversity for Large Scale Video Labelling

Human Emotion Recognition From Speech

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

A study of speaker adaptation for DNN-based speech synthesis

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

arxiv: v1 [cs.lg] 20 Mar 2017

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

THE enormous growth of unstructured data, including

SORT: Second-Order Response Transform for Visual Recognition

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Axiom 2013 Team Description Paper

Rule Learning With Negation: Issues Regarding Effectiveness

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

arxiv: v1 [cs.cl] 27 Apr 2016

Calibration of Confidence Measures in Speech Recognition

Evolutive Neural Net Fuzzy Filtering: Basic Description

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Test Effort Estimation Using Neural Network

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

arxiv: v2 [cs.ir] 22 Aug 2016

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

arxiv: v4 [cs.cl] 28 Mar 2016

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

On the Formation of Phoneme Categories in DNN Acoustic Models

TEAM NEWSLETTER. Welton Primar y School SENIOR LEADERSHIP TEAM. School Improvement

Learning to Schedule Straight-Line Code

arxiv: v1 [cs.cv] 10 May 2017

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Master of Science in Management Institut Teknologi Bandung

Nordic Centre Newsletter

Semi-Supervised Face Detection

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Learning Methods for Fuzzy Systems

Dialog-based Language Learning

Assignment 1: Predicting Amazon Review Ratings

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Softprop: Softmax Neural Network Backpropagation Learning

TD(λ) and Q-Learning Based Ludo Players

arxiv: v5 [cs.ai] 18 Aug 2015

Rule Learning with Negation: Issues Regarding Effectiveness

FF+FPG: Guiding a Policy-Gradient Planner

Probabilistic Latent Semantic Analysis

ON THE USE OF WORD EMBEDDINGS ALONE TO

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Attributed Social Network Embedding

Deep Facial Action Unit Recognition from Partially Labeled Data

Using focal point learning to improve human machine tacit coordination

A deep architecture for non-projective dependency parsing

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Modeling function word errors in DNN-HMM based LVCSR systems

A JOINT MANY-TASK MODEL: GROWING A NEURAL NETWORK FOR MULTIPLE NLP TASKS

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

GACE Computer Science Assessment Test at a Glance

Discriminative Learning of Beam-Search Heuristics for Planning

Transcription:

Co-funded by the FP7 Programme of the A very brief overview of deep learning Maarten Grachten Austrian Research Ins tute for Ar ficial Intelligence http://www.ofai.at/research/impml Lrn2 Cre8 Learning to Create EUROPEAN UNION

Table of contents What is deep learning? Backpropaga on and beyond Selected deep learning topics for music processing 1

Deep learning There is no single defini on of deep learning, but most defini ons emphasize: Branch of machine learning Models are graph structures (networks) with mul ple layers (deep) Models are typically non-linear Both supervised and unsupervised methods are used for fi ng models to data 2

Deep learning There is no single defini on of deep learning, but most defini ons emphasize: Branch of machine learning Models are graph structures (networks) with mul ple layers (deep) Models are typically non-linear Both supervised and unsupervised methods are used for fi ng models to data 2

Deep learning There is no single defini on of deep learning, but most defini ons emphasize: Branch of machine learning Models are graph structures (networks) with mul ple layers (deep) Models are typically non-linear Both supervised and unsupervised methods are used for fi ng models to data 2

Deep learning There is no single defini on of deep learning, but most defini ons emphasize: Branch of machine learning Models are graph structures (networks) with mul ple layers (deep) Models are typically non-linear Both supervised and unsupervised methods are used for fi ng models to data 2

An example of deep models: Deep neural networks A neuron is a non-linear transforma on of a linear sum of inputs: y = f(w T x + b) An array of neurons taking the same input x form a new layer on top of the input in a neural network: y = f(w T x + b) Third layer: y 2 = f(w T 2 f(wt 1 x + b 1) + b 2 ) input x. 1 w 1j x 2 x 3.. ac va on y j. x n w nj 1 b j 3

An example of deep models: Deep neural networks A neuron is a non-linear transforma on of a linear sum of inputs: y = f(w T x + b) An array of neurons taking the same input x form a new layer on top of the input in a neural network: y = f(w T x + b) Third layer: y 2 = f(w T 2 f(wt 1 x + b 1) + b 2 ) input x. 1 w 1j 1 w 1j.. y j 1 x 2 ac va on x 3 w nj 1.. y j. w nj b j b j 1 1 x n 3

An example of deep models: Deep neural networks A neuron is a non-linear transforma on of a linear sum of inputs: y = f(w T x + b) An array of neurons taking the same input x form a new layer on top of the input in a neural network: y = f(w T x + b) Third layer: y 2 = f(w T 2 f(wt 1 x + b 1) + b 2 ) 3

Rela on to other machine learning approaches How is deep learning different from 80 s NN research? Training methods derived from probabilis c interpreta on of networks as genera ve models Greedy layer-wise training More powerful op miza on methods More compu ng power, larger data sets Feature design vs. feature learning The success of most machine learning approaches cri cally depends on appropriately designed features Deep learning reduces need for manual feature design: Models learn features as non-linear transforma ons of data Deep models learn hierarchies of features Unsupervised (pre) training prevents overfi ng 4

Rela on to other machine learning approaches How is deep learning different from 80 s NN research? Training methods derived from probabilis c interpreta on of networks as genera ve models Greedy layer-wise training More powerful op miza on methods More compu ng power, larger data sets Feature design vs. feature learning The success of most machine learning approaches cri cally depends on appropriately designed features Deep learning reduces need for manual feature design: Models learn features as non-linear transforma ons of data Deep models learn hierarchies of features Unsupervised (pre) training prevents overfi ng 4

What are deep models used for? Tasks Predic on classifica on, regression problems Predic on as part of model (output layer, input layer) Use model to obtain feature vectors for data, use any classifier for predic on (WEKA) Genera on e.g. facial expressions, gait, music Denoising of data Reconstruc on/comple on of par al data Genera on of new data by sampling Successful applica on domains Image: object recogni on, op cal character recogni on Audio: speech recogni on, music retrieval, transcrip on Text: parsing, sen ment analysis, machine transla on 5

What are deep models used for? Tasks Predic on classifica on, regression problems Predic on as part of model (output layer, input layer) Use model to obtain feature vectors for data, use any classifier for predic on (WEKA) Genera on e.g. facial expressions, gait, music Denoising of data Reconstruc on/comple on of par al data Genera on of new data by sampling Successful applica on domains Image: object recogni on, op cal character recogni on Audio: speech recogni on, music retrieval, transcrip on Text: parsing, sen ment analysis, machine transla on 5

What are deep models used for? Tasks Predic on classifica on, regression problems Predic on as part of model (output layer, input layer) Use model to obtain feature vectors for data, use any classifier for predic on (WEKA) Genera on e.g. facial expressions, gait, music Denoising of data Reconstruc on/comple on of par al data Genera on of new data by sampling Successful applica on domains Image: object recogni on, op cal character recogni on Audio: speech recogni on, music retrieval, transcrip on Text: parsing, sen ment analysis, machine transla on 5

Tradi onal learning in neural networks: Backpropaga on Given data D, define a loss func on on targets and actual output: L D (θ), for example: summed squared error between output and targets cross-entropy between output and targets Use gradient descent to itera vely find be er weights θ Compute the gradient of L with respect to θ, either: Batch gradient: L D (θ) Stochas c gradient: L d (θ) for d D Update w θ by subtrac ng α L w (α: learning rate) Con nue descent un l some stopping criterion, e.g.: Convergence of θ Early stopping (stop when error on valida on set starts to increase) 6

Limita ons of backpropaga on (BP) Does not scale well to deep networks (including recurrent networks): gradients further away from the outputs tend to either vanish or explode [Hochreiter and Schmidhuber, 1997] Likely to se le at (poor) local minima of the loss func on Since BP used to be the state-of-the-art training algorithm: limited success with deep neural networks 7

Limita ons of backpropaga on (BP) Does not scale well to deep networks (including recurrent networks): gradients further away from the outputs tend to either vanish or explode [Hochreiter and Schmidhuber, 1997] Likely to se le at (poor) local minima of the loss func on Since BP used to be the state-of-the-art training algorithm: limited success with deep neural networks L(θ) θ opt ˆθ BP 7

Limita ons of backpropaga on (BP) Does not scale well to deep networks (including recurrent networks): gradients further away from the outputs tend to either vanish or explode [Hochreiter and Schmidhuber, 1997] Likely to se le at (poor) local minima of the loss func on Since BP used to be the state-of-the-art training algorithm: limited success with deep neural networks L(θ) θ opt ˆθ BP 7

Limita ons of backpropaga on (BP) Does not scale well to deep networks (including recurrent networks): gradients further away from the outputs tend to either vanish or explode [Hochreiter and Schmidhuber, 1997] Likely to se le at (poor) local minima of the loss func on Since BP used to be the state-of-the-art training algorithm: limited success with deep neural networks L(θ) θ opt ˆθ BP 7

Modern approaches to train deep neural networks Long short term memory [Hochreiter and Schmidhuber, 1997] Specialized recurrent structure + gradient descent to explicitly preserve error gradients over long distances Training networks by second order op miza on Hessian-free training [Martens, 2010] Greedy layer-wise training [Hinton et al., 2006] Train layers individually, supervised or unsupervised Higher layers take as input the output from lower layers Layers o en trained as Restricted Boltzmann Machines or Autoencoders Data-specific models and training Convolu onal neural networks [Lecun et al., 1998] Dropout [Hinton et al., 2012] Randomly ignore hidden units during training Avoids overfi ng 8

Topics covered in following talks Recurrent Neural Networks Beat-tracking with LSTM (Sebas an Böck) Hessian-free training (Carlos Cancino) (Stacked) Autoencoders Learning binary codes for fast music retrieval (Jan Schlüter) (Stacked) Restricted Boltzmann Machines Speech/Music classifica on (Jan Schlüter) Learning tonal structure from melodies (Carlos Cancino) Convolu onal Neural Networks and dropout Onset detec on / Audio segmenta on (Jan Schlüter) High-dimensional aspects of CNNs (Karen Ullrich) 9

References I Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computa on, 18:1527 1554. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2012). Improving neural networks by preven ng co-adapta on of feature detectors. CoRR, abs/1207.0580. Hochreiter, S. and Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computa on, 9(8):1735 1780. Lecun, Y., Bo ou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recogni on. In Proceedings of the IEEE, pages 2278 2324. Martens, J. (2010). Deep learning via hessian-free op miza on. In Proceedings of the 27th Interna onal Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pages 735 742. 10