Lecture 6 Deep Learning and Computer Vision.

Similar documents
A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

arxiv: v1 [cs.lg] 15 Jun 2015

THE enormous growth of unstructured data, including

Python Machine Learning

Lip Reading in Profile

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Knowledge Transfer in Deep Convolutional Neural Nets

arxiv: v1 [cs.cv] 10 May 2017

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Artificial Neural Networks written examination

Cultivating DNN Diversity for Large Scale Video Labelling

(Sub)Gradient Descent

Forget catastrophic forgetting: AI that learns after deployment

arxiv: v1 [cs.cl] 27 Apr 2016

Lecture 1: Machine Learning Basics

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

arxiv: v4 [cs.cl] 28 Mar 2016

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

SORT: Second-Order Response Transform for Visual Recognition

Webly Supervised Learning of Convolutional Networks

Diverse Concept-Level Features for Multi-Object Classification

Residual Stacking of RNNs for Neural Machine Translation

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Artificial Neural Networks

arxiv: v1 [cs.lg] 7 Apr 2015

A Deep Bag-of-Features Model for Music Auto-Tagging

A Neural Network GUI Tested on Text-To-Phoneme Mapping

THE world surrounding us involves multiple modalities

Second Exam: Natural Language Parsing with Neural Networks

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Cost-sensitive Deep Learning for Early Readmission Prediction at A Major Hospital

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Model Ensemble for Click Prediction in Bing Search Ads

Deep Neural Network Language Models

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

CSL465/603 - Machine Learning

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

arxiv: v2 [cs.cl] 26 Mar 2015

A Review: Speech Recognition with Deep Learning Methods

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

arxiv: v2 [cs.cv] 4 Mar 2016

Evolutive Neural Net Fuzzy Filtering: Basic Description

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

arxiv:submit/ [cs.cv] 2 Aug 2017

WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web

arxiv: v2 [cs.ir] 22 Aug 2016

Softprop: Softmax Neural Network Backpropagation Learning

Attributed Social Network Embedding

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

Offline Writer Identification Using Convolutional Neural Network Activation Features

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Issues in the Mining of Heart Failure Datasets

Modeling function word errors in DNN-HMM based LVCSR systems

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

arxiv: v4 [cs.cv] 13 Aug 2017

Modeling function word errors in DNN-HMM based LVCSR systems

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Accelerated Learning Course Outline

Generative models and adversarial training

Axiom 2013 Team Description Paper

Accelerated Learning Online. Course Outline

arxiv: v1 [cs.lg] 20 Mar 2017

A deep architecture for non-projective dependency parsing

arxiv: v2 [cs.lg] 8 Aug 2017

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Georgetown University at TREC 2017 Dynamic Domain Track

arxiv: v2 [cs.cv] 30 Mar 2017

Time series prediction

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

ON THE USE OF WORD EMBEDDINGS ALONE TO

arxiv: v2 [cs.cv] 3 Aug 2017

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

A study of speaker adaptation for DNN-based speech synthesis

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning to Schedule Straight-Line Code

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Test Effort Estimation Using Neural Network

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Semantic and Context-aware Linguistic Model for Bias Detection

Lecture 1: Basic Concepts of Machine Learning

Transcription:

Lecture 6 Deep Learning and Computer Vision peimt@bit.edu.cn 1

Deep Learning slides of Xin Liu of vipl.ict.ac.cn http://neuralnetworksanddeeplearning.com/ 2

Deep Learning

Deep Learning

Deep Learning

Deep Learning

Deep Learning

Deep Learning

Traditional Computer Vision

Human Brain v Humans have a primary visual cortex, also known as V1, containing 140 million neurons, with tens of billions of connections between them. v Human vision involves not just V1, but an entire series of visual cortices - V2, V3, V4, and V5 - doing progressively more complex image processing.

Human Brain

Human Brain v The difficulty of visual pattern recognition becomes apparent if you attempt to write a computer program to recognize digits like those above. v When you try to make such rules precise, you quickly get lost in a morass of exceptions and caveats and special cases. It seems hopeless.

Neural Networks v Neural networks approach the problem in a different way. v Take a large number of handwritten digits, known as training examples v Develop a system which can learn from those training examples v Uses the examples to automatically infer rules for recognizing handwritten digits

Perceptrons v A perceptron takes several inputs, and produces a single binary output:

Perceptrons v Perceptron can weigh up different kinds of evidence in order to make decisions v A complex network of perceptrons could make quite subtle decisions

Perceptrons

Perceptrons

Neural Networks

Neural Networks v If it were true that a small change in a weight (or bias) causes only a small change in output, then we could use this fact to modify the weights and biases to get our network to behave more in the manner we want. v Changing the weights and biases over and over to produce better and better output.

Sigmoid neuron

Sigmoid neuron

Neural Networks v By using the activation function we get a smoothed out perceptron. v The smoothness means that small changes in the weights and in the bias will produce a small change in the output from the neuron

Multilayer Perceptrons

Multilayer Perceptrons

Multilayer Perceptrons

Multilayer Perceptrons

Quadratic cost function

Quadratic cost function v For some function

Quadratic cost function

Quadratic cost function

Stochastic gradient descent v Estimate the gradient by computing for a small sample of randomly chosen training inputs (mini batch). v By averaging over this small sample it turns out that we can quickly get a good estimate of the true gradient and this helps speed up gradient descent, and thus learning.

Why CNN v For input as a 10 * 10 image: - A 3 layer MLP with 200 hidden units and 10 output units contains ~22k parameters v For input as a 100 * 100 image: - A 3 layer MLP with 20k hidden units and 10 output units contains ~200m parameters

Why CNN v MLP can be improved in two ways: - Locally connected instead of fully connected - Sharing weights between neurons v We achieve those by using convolution neurons

Local receptive fields

Local receptive fields stride length is 1

Shared weights and biases v Each hidden neuron has a bias and 5 5 weights connected to its local receptive field. v Use the same weights and bias for each of the 24 24 hidden neurons

Shared weights and biases

Pooling layers

Pooling layers

Pooling layers

Fully-connected layer

Deep Learning The traditional method:hand-craft feature+classifier The modern method:unsupervised mid-level feature learning Deep learning:end to end hierarchal feature learning

Deep Learning

Understand the Human Brain

Understand the Human Brain

Understand the Human Brain

Understand the Human Brain

Neural Network: concatenation of functions

Neural Network: concatenation of functions

Activation Functions

Loss Functions v Euclidean Loss v Cross-entropy loss v Contrastive Loss v Triplet Loss v Moon Loss

Why does CNN work v Faster heterogeneous parallel computing CPU clusters, GPUs, etc. v Large dataset ImageNet: 1.2m images of 1,000 object classes CoCo: 300k images of 2m object instances v Improvements in model architecture ReLU, dropout, inception, etc.

Case Study: LeNet-5

Case Study: ResNet

Case Study: ResNet

Other Deep models-siamese Net

Other Deep models-c3d

Other Deep models-rnn

Other Deep models-lstm

Deep Learning in Face Recognition 60

DeepID Sun Y, et al CVPR 2014

DeepID Sun Y, et al CVPR 2014

DeepID2 Sun Y, et al NIPS 2014

DeepID2+ Sun Y, et al CVPR 2015

DeepID3 Sun Y, et al arxiv 2015

DeepFace Yaniv Taigman, et al CVPR 2014

FaceNet Florian Schroff, et al CVPR 2015

Deep Learning in Face Recognition Slide from Xin Liu VIPL

Deep Learning in Object Detection 69

R-CNN Girshick, CVPR 2014

SPP-Net K He, et al, ECCV 2014

Fast R-CNN Girshick, ICCV 2015

Faster R-CNN Girshick, NIPS 2015

YOLO: You Only Look Once Redmon J, et al, arxiv 2015

SSD: Single Shot MultiBox Detector Wei Liu, et al, ECCV 2016

Deep Learning in Object Detection Slide from Xin Liu VIPL

Deep Learning in Image Classification Slide from Xin Liu VIPL

Deep Learning in Face Retrieval 78

Deep CNN based Binary Hash Video Representations Zhen Dong, et al, AAAI 2016

Deep Learning in Object Tracking 80

DeepTrack Hanxi Li, et al, TIP 2016

PEIMT@BIT.EDU.CN 82