Introduction to Deep Learning

Similar documents
Python Machine Learning

Artificial Neural Networks written examination

Lecture 1: Machine Learning Basics

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Deep Neural Network Language Models

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Second Exam: Natural Language Parsing with Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

(Sub)Gradient Descent

arxiv: v1 [cs.lg] 15 Jun 2015

Generative models and adversarial training

Learning to Schedule Straight-Line Code

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Knowledge Transfer in Deep Convolutional Neural Nets

Softprop: Softmax Neural Network Backpropagation Learning

CSL465/603 - Machine Learning

Evolution of Symbolisation in Chimpanzees and Neural Nets

Axiom 2013 Team Description Paper

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Learning Methods for Fuzzy Systems

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

INPE São José dos Campos

arxiv: v2 [cs.cv] 30 Mar 2017

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A Review: Speech Recognition with Deep Learning Methods

Evolutive Neural Net Fuzzy Filtering: Basic Description

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Assignment 1: Predicting Amazon Review Ratings

CS Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Modeling function word errors in DNN-HMM based LVCSR systems

Probabilistic Latent Semantic Analysis

Calibration of Confidence Measures in Speech Recognition

Lecture 1: Basic Concepts of Machine Learning

arxiv: v1 [cs.cv] 10 May 2017

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Artificial Neural Networks

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Time series prediction

A study of speaker adaptation for DNN-based speech synthesis

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Speech Recognition at ICSI: Broadcast News and beyond

Test Effort Estimation Using Neural Network

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

CS 446: Machine Learning

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Attributed Social Network Embedding

arxiv: v2 [cs.ir] 22 Aug 2016

An empirical study of learning speed in backpropagation

THE world surrounding us involves multiple modalities

Human Emotion Recognition From Speech

Speech Emotion Recognition Using Support Vector Machine

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Abstractions and the Brain

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Speaker Identification by Comparison of Smart Methods. Abstract

An Empirical and Computational Test of Linguistic Relativity

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Linking Task: Identifying authors and book titles in verbose queries

arxiv:submit/ [cs.cv] 2 Aug 2017

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Soft Computing based Learning for Cognitive Radio

A Deep Bag-of-Features Model for Music Auto-Tagging

A deep architecture for non-projective dependency parsing

City University of Hong Kong Course Syllabus. offered by Department of Architecture and Civil Engineering with effect from Semester A 2017/18

Knowledge-Based - Systems

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

arxiv: v2 [cs.cl] 26 Mar 2015

Word Segmentation of Off-line Handwritten Documents

An OO Framework for building Intelligence and Learning properties in Software Agents

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

While you are waiting... socrative.com, room number SIMLANG2016

Learning From the Past with Experiment Databases

Model Ensemble for Click Prediction in Bing Search Ads

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Device Independence and Extensibility in Gesture Recognition

Rule Learning With Negation: Issues Regarding Effectiveness

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

On the Combined Behavior of Autonomous Resource Management Agents

Issues in the Mining of Heart Failure Datasets

Truth Inference in Crowdsourcing: Is the Problem Solved?

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

On-Line Data Analytics

MYCIN. The MYCIN Task

Deep Facial Action Unit Recognition from Partially Labeled Data

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

WHEN THERE IS A mismatch between the acoustic

Circuit Simulators: A Revolutionary E-Learning Platform

THE enormous growth of unstructured data, including

Data Fusion Through Statistical Matching

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Transcription:

Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of Technology Kanpur Reading of Chap. 1 from Learning Deep Architectures for AI ; Yoshua Bengio; FTML Vol. 2, No. 1 (2009) 1 127 Date: 12 Nov, 2015 1

A Motivational Task: Percepts Concepts Create algorithms that can understand scenes and describe them in natural language that can infer semantic concepts to allow machines to interact with humans using these concepts Requires creating a series of abstractions Image (Pixel Intensities) Objects in Image Object Interactions Scene Description Deep learning aims to automatically learn these abstractions with little supervision Courtesy: Yoshua Bengio, Learning Deep Architectures for AI 2

Deep Visual-Semantic Alignments for Generating Image Descriptions (Karpathy, Fei-Fei; CVPR 2015) two young girls are playing with lego toy. "boy is doing backflip on wakeboard." "construction worker in orange safety vest is working on road." "man in black shirt is playing guitar." http://cs.stanford.edu/people/karpathy/deepimagesent/ 3

Challenge in Modelling Complex Behaviour Too many concepts to learn Too many object categories Too many ways of interaction between objects categories Behaviour is a highly varying function underlying factors f: L V L: latent factors of variation low dimensional latent factor space V: visible behaviour high dimensional observable space f: highly non-linear function 4

Example: Learning the Configuration Space of a Robotic Arm 5

C-Space Discovery using Isomap 6

How do We Train Deep Architectures? Inspiration from mammal brain Multiple Layers of neurons (Rumelhart et al 1986) Train each layer to compose the representations of the previous layer to learn a higher level abstraction Ex: Pixels Edges Contours Object parts Object categories Local Features Global Features Train the layers one-by-one (Hinton et al 2006) Greedy strategy 7

Multilayer Perceptron with Back-propagation First deep learning model (Rumelhart, Hinton, Williams 1986) Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal outputs hidden layers input vector Source: Hinton s 2009 tutorial on Deep Belief Networks 8

Drawbacks of Back-propagation based Deep Neural Networks They are discriminative models Get all the information from the labels And the labels don t give so much of information Need a substantial amount of labeled data Gradient descent with random initialization leads to poor local minima

Hand-written digit recognition Classification of MNIST hand-written digits 10 digit classes Input image: 28x28 gray scale 784 dimensional input

A Deeper Look at the Problem One hidden layer with 500 neurons => 784 * 500 + 500 * 10 0.4 million weights Fitting a model that best explains the training data is an optimization problem in a 0.4 million dimensional space It s almost impossible for Gradient descent with random initialization to arrive at the global optimum

A Solution Deep Belief Networks (Hinton et al. 2006) Pre-trained N/W Weights Slow Fine-tuning (Using Back-propagation) Fast unsupervised pre-training Good Solution Random Initial position Very slow Back-propagation (Often gets stuck at poor local minima) Very high-dimensional parameter space

A Solution Deep Belief Networks (Hinton et al. 2006) Before applying back-propagation, pre-train the network as a series of generative models Use the weights of the pre-trained network as the initial point for the traditional back-propagation This leads to quicker convergence to a good solution Pre-training is fast; fine-tuning can be slow

Quick Check: MLP vs DBN on MNIST MLP (1 Hidden Layer) 1 hour: 2.18% 14 hours: 1.65% DBN 1 hour: 1.65% 14 hours: 1.10% 21 hours: 0.97% Intel QuadCore 2.83GHz, 4GB RAM MLP: Python :: DBN: Matlab

Intermediate Representations in Brain Disentanglement of factors of variation underlying the data Distributed Representations Activation of each neuron is a function of multiple features of the previous layer Feature combinations of different neurons are not necessarily mutually exclusive Sparse Representations Only 1-4% neurons are active at a time Localized Representation Distributed Representation 15

Local vs. Distributed in Input Space Local Methods Assume smoothness prior g(x) = f(g(x 1 ), g(x 2 ),, g(x k )) {x 1, x 2,, x k } are neighbours of x Require a metric space A notion of distance or similarity in the input space Fail when the target function is highly varying Examples Nearest Neighbour methods Kernel methods with a Gaussian kernel Distributed Methods No assumption of smoothness No need for a notion of similarity Ex: Neural networks 16

Multi-task Learning Source: https://en.wikipedia.org/wiki/multi-task_learning 17

Desiderata for Learning AI Ability to learn complex, highly-varying functions Ability to learn multiple levels of abstraction with little human input Ability to learn from a very large set of examples Training time linear in the number of examples Ability to learn from mostly unlabeled data Unsupervised and semi-supervised Multi-task learning Sharing of representations across tasks Fast predictions 18

References Primary Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning Vol. 2, No. 1 (2009) 1 127 Hinton, G. E., Osindero, S. and Teh, Y. A fast learning algorithm for deep belief nets. Neural Computation 18 (2006), pp 1527-1554 Rumelhart, David E., Geoffrey E. Hinton, and R. J. Williams. Learning Internal Representations by Error Propagation. David E. Rumelhart, James L. McClelland, and the PDP research group. (editors), Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1: Foundations. MIT Press, 1986. Secondary Hinton, G. E., Learning Multiple Layers of Representation, Trends in Cognitive Sciences, Vol. 11, (2007) pp 428-434. Hinton G.E., Tutorial on Deep Belief Networks, Machine Learning Summer School, Cambridge, 2009 Andrej Karpathy, Li Fei-Fei. Deep Visual-Semantic Alignments for Generating Image Descriptions. CVPR 2015.