CS519: Deep Learning 1. Introduction

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Python Machine Learning

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

THE enormous growth of unstructured data, including

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Lecture 1: Machine Learning Basics

CSL465/603 - Machine Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

arxiv: v1 [cs.lg] 15 Jun 2015

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

A Deep Bag-of-Features Model for Music Auto-Tagging

Speech Emotion Recognition Using Support Vector Machine

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Cultivating DNN Diversity for Large Scale Video Labelling

Human Emotion Recognition From Speech

arxiv: v1 [cs.lg] 7 Apr 2015

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

(Sub)Gradient Descent

SORT: Second-Order Response Transform for Visual Recognition

Softprop: Softmax Neural Network Backpropagation Learning

Second Exam: Natural Language Parsing with Neural Networks

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Lip Reading in Profile

Knowledge Transfer in Deep Convolutional Neural Nets

arxiv: v1 [cs.cv] 10 May 2017

Word Segmentation of Off-line Handwritten Documents

Offline Writer Identification Using Convolutional Neural Network Activation Features

Generative models and adversarial training

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

arxiv: v2 [cs.cv] 30 Mar 2017

A Review: Speech Recognition with Deep Learning Methods

arxiv: v2 [cs.cl] 26 Mar 2015

INPE São José dos Campos

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Evolutive Neural Net Fuzzy Filtering: Basic Description

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

CS Machine Learning

arxiv: v4 [cs.cv] 13 Aug 2017

Time series prediction

Deep Neural Network Language Models

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Artificial Neural Networks written examination

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A study of speaker adaptation for DNN-based speech synthesis

Lecture 1: Basic Concepts of Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Diverse Concept-Level Features for Multi-Object Classification

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Learning Methods for Fuzzy Systems

Reducing Features to Improve Bug Prediction

Forget catastrophic forgetting: AI that learns after deployment

Modeling function word errors in DNN-HMM based LVCSR systems

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

THE world surrounding us involves multiple modalities

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Semi-Supervised Face Detection

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

WHEN THERE IS A mismatch between the acoustic

Deep Facial Action Unit Recognition from Partially Labeled Data

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Assignment 1: Predicting Amazon Review Ratings

Residual Stacking of RNNs for Neural Machine Translation

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Test Effort Estimation Using Neural Network

There are some definitions for what Word

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Artificial Neural Networks

Learning to Schedule Straight-Line Code

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Modeling function word errors in DNN-HMM based LVCSR systems

CS 446: Machine Learning

arxiv: v1 [cs.cl] 27 Apr 2016

Axiom 2013 Team Description Paper

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Learning From the Past with Experiment Databases

Model Ensemble for Click Prediction in Bing Search Ads

Active Learning. Yingyu Liang Computer Sciences 760 Fall

arxiv: v4 [cs.cl] 28 Mar 2016

Multivariate k-nearest Neighbor Regression for Time Series data -

Calibration of Confidence Measures in Speech Recognition

Switchboard Language Model Improvement with Conversational Data from Gigaword

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

arxiv:submit/ [cs.cv] 2 Aug 2017

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Dialog-based Language Learning

ON THE USE OF WORD EMBEDDINGS ALONE TO

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Transcription:

CS519: Deep Learning 1. Introduction Winter 2017 Fuxin Li With materials from Pierre Baldi, Geoffrey Hinton, Andrew Ng, Honglak Lee, Aditya Khosla, Joseph Lim 1

Cutting Edge of Machine Learning: Deep Learning in Neural Networks Engineering applications: Computer vision Speech recognition Natural Language Understanding Robotics 2

Computer Vision Image Classification Imagenet Over 1 million images, 1000 classes, different sizes, avg 482x415, color 16.42% Deep CNN dropout in 2012 6.66% 22 layer CNN (GoogLeNet) in 2014 3.6% (Microsoft Research Asia) super-human performance in 2015 Sources: Krizhevsky et al ImageNet Classification with Deep Convolutional Neural Networks, Lee et al Deeply supervised nets 2014, Szegedy et al, Going Deeper with convolutions, ILSVRC2014, Sanchez & Perronnin CVPR 2011, http://www.clarifai.com/ 3 Benenson, http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html

Speech recognition on Android (2013) 4

Impact on speech recognition 5

P. Di Lena, K. Nagata, and P. Baldi. Deep Architectures for Protein Contact Map Prediction. Bioinformatics, 28, 2449-2457, (2012) Deep Learning 6

Deep Learning Applications Engineering: Computer Vision (e.g. image classification, segmentation) Speech Recognition Natural Language Processing (e.g. sentiment analysis, translation) Science: Biology (e.g. protein structure prediction, analysis of genomic data) Chemistry (e.g. predicting chemical reactions) Physics (e.g. detecting exotic particles) and many more to come 7

Penetration into mainstream media 8

Aha 9

Machine learning before Deep Learning 10

Typical goal of machine learning Input: X Output: Y images/video audio ML ML Label: Motorcycle Suggest tags Image search Speech recognition Music classification Speaker identification (Supervised) Machine learning: Find ff, so that ff(xx) YY text ML Web search Anti-spam Machine translation 11

e.g. ML motorcycle 12

e.g. 13

Why is this hard? You see this: But the camera sees this: 14

Raw representation pixel 1 Input Raw image pixel 2 Motorbikes Non -Motorbikes Learning algorithm pixel 2 pixel 1 15

Raw representation pixel 1 Input Raw image pixel 2 Motorbikes Non -Motorbikes Learning algorithm pixel 2 pixel 1 16

Raw representation pixel 1 Input Raw image pixel 2 Motorbikes Non -Motorbikes Learning algorithm pixel 2 pixel 1 17

What we want Input handlebars wheel Raw image Feature representation E.g., Does it have Handlebars? Wheels? Motorbikes Non -Motorbikes Features Learning algorithm pixel 2 Wheels pixel 1 Handlebars 18

Some feature representations SIFT Spin image HoG RIFT Textons GLOH 19

Some feature representations SIFT Spin image Coming up with features is often difficult, timeconsuming, and requires expert knowledge. HoG RIFT Textons GLOH 20

Deep Learning: Let s learn the representation! object models object parts (combination of edges) edges pixels 21

Neural Networks Neuron: Many stacked neurons! 22

Historical Remarks The high and low tides of neural networks 23

1950s 1960s The Perceptron The Perceptron was introduced in 1957 by Frank Rosenblatt. Perceptron: - D0 d D1 Activation functions: D2 Learning: Input Layer Output Layer Destinations Update 24

1970s -- Hiatus Perceptrons. Minsky and Papert. 1969 Revealed the fundamental difficulty in linear perceptron models Stopped research on this topic for more than 10 years 25

1980s, nonlinear neural networks (Werbos 1974, Rumelhart, Hinton, Williams 1986) Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal outputs hidden layers input vector 26

1990s: Universal approximators Glorious times for neural networks (1986-1999): Success in handwritten digits Boltzmann machines Network of all sorts Complex mathematical techniques Kernel methods (1992 2010): (Cortes, Vapnik 1995), (Vapnik 1995), (Vapnik 1998) Fixed basis function First paper is forced to publish under Support Vector Networks 27

Recognizing Handwritten Digits MNIST database 60,000 training, 10,000 testing Large enough for digits Battlefield of the 90s Algorithm Error Rate (%) Linear classifier (perceptron) 12.0 K-nearest-neighbors 5.0 Boosting 1.26 SVM 1.4 Neural Network 1.6 Convolutional Neural Networks 0.95 With automatic distortions + ensemble + many tricks 0.23 28

What s wrong with backpropagation? It requires a lot of labeled training data The learning time does not scale well It is theoretically the same as kernel methods Both are universal approximators It can get stuck in poor local optima Kernel methods give globally optimal solution It overfits, especially with many hidden layers Kernel methods have proven approaches to control overfitting 29

Caltech-101: Long-time computer vision struggles without enough data Caltech-101 dataset Around 10,000 images Certainly not enough! ~80% is widely considered to be the limit on this dataset Algorithm Accuracy (%) SVM with Pyramid Matching Kernel (2005) 58.2% Spatial Pyramid Matching (2006) 64.6% SVM-KNN (2006) 66.2% Sparse Coding + Pyramid Matching (2009) 73.2% SVM Regression w object proposals (2010) 81.9% Group-Sensitive MKL (2009) 84.3% Deep Learning (pretrained on Imagenet) (2014) 91.4% 30

2010s: Deep representation learning Comeback: Make it deep! Learn many, many layers simultaenously How does this happen? Max-pooling (Weng, Ahuja, Huang 1992) Stochastic gradient descent (Hinton 2002) ReLU nonlinearity (Nair and Hinton 2010), (Krizhevsky, Sutskever, Hinton 2012) Better understanding of subgradients Dropout (Hinton et al. 2012) WAY more labeled data Amazon Mechanical Turk (https://www.mturk.com/mturk/welcome) 1 million+ labeled data A lot better computing power GPU processing 31

Convolutions: Utilize Spatial Locality Sobel filter Convolution Convolution 32

Convolutional Neural Networks Learning filters: CNN makes sense because locality is important for visual processing 33

A Convolutional Neural Network Model 224 x 224 224 x 224 112 x 112 56 x 56 28 x 28 14 x 14 7 x 7 Airplane Dog Car SUV Minivan Sign Pole 34

Images that respond to various filters Zeiler and Fergus 2014 35

Recurrent Neural Network Temporal stability: history always repeats itself Parameter sharing across time 36

What is the hidden assumption in your problem? Image Understanding: Spatial locality Temporal Models: Temporal (partial) stationarity How about your problem? 37

References (Weng, Ahuja, Huang 1992) J. Weng, N. Ahuja and T. S. Huang, "Cresceptron: a self-organizing neural network which grows adaptively," Proc. International Joint Conference on Neural Networks, Baltimore, Maryland, vol I, pp. 576-581, June, 1992. (Hinton 2002) Hinton, G. E..Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 14, pp 1771-1800. (Hinton, Osindero and Teh 2006) Hinton, G. E., Osindero, S. and Teh, Y.. A fast learning algorithm for deep belief nets. Neural Computation 18, pp 1527-1554. (Cortes and Vapnik 1995) Support-vector networks. C Cortes, V Vapnik. Machine learning 20 (3), 273-297 (Vapnik 1995) V Vapnik. The Nature of Statistical Learning Theory. Springer 1995 (Vapnik 1998) V Vapnik. Statistical Learning Theory. Wiley 1998. (Krizhevsky, Sutskever, Hinton 2012). ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012 (Nair and Hinton 2010) V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proc. 27 th International Conference on Machine Learning, 2010 (Hinton et al. 2012) G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. Arxiv 2012. (Zeiler and Fergus 2014) M.D. Zeiler, R. Fergus. Visualizing and Understanding Convolutional Networks. ECCV 2014 38