Deep Learning for Computer Vision. commercial-in-confidence

Similar documents
Python Machine Learning

Generative models and adversarial training

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Lecture 1: Machine Learning Basics

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Artificial Neural Networks written examination

Axiom 2013 Team Description Paper

arxiv: v1 [cs.lg] 15 Jun 2015

Knowledge Transfer in Deep Convolutional Neural Nets

Word Segmentation of Off-line Handwritten Documents

Rule Learning With Negation: Issues Regarding Effectiveness

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v1 [cs.cv] 10 May 2017

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Laboratorio di Intelligenza Artificiale e Robotica

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Forget catastrophic forgetting: AI that learns after deployment

Modeling user preferences and norms in context-aware systems

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Human Emotion Recognition From Speech

Lip Reading in Profile

Evolutive Neural Net Fuzzy Filtering: Basic Description

Lecture 1: Basic Concepts of Machine Learning

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Knowledge-Based - Systems

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Deep Facial Action Unit Recognition from Partially Labeled Data

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Calibration of Confidence Measures in Speech Recognition

Probabilistic Latent Semantic Analysis

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

INPE São José dos Campos

Speech Recognition at ICSI: Broadcast News and beyond

Test Effort Estimation Using Neural Network

A Review: Speech Recognition with Deep Learning Methods

arxiv: v2 [cs.cv] 30 Mar 2017

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Laboratorio di Intelligenza Artificiale e Robotica

CS 446: Machine Learning

Speech Emotion Recognition Using Support Vector Machine

Learning Methods for Fuzzy Systems

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Model Ensemble for Click Prediction in Bing Search Ads

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

CSL465/603 - Machine Learning

Rule Learning with Negation: Issues Regarding Effectiveness

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Assignment 1: Predicting Amazon Review Ratings

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Australian Journal of Basic and Applied Sciences

arxiv: v4 [cs.cl] 28 Mar 2016

Learning to Schedule Straight-Line Code

A Deep Bag-of-Features Model for Music Auto-Tagging

Dialog-based Language Learning

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Second Exam: Natural Language Parsing with Neural Networks

Reducing Features to Improve Bug Prediction

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Diverse Concept-Level Features for Multi-Object Classification

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Issues in the Mining of Heart Failure Datasets

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

THE enormous growth of unstructured data, including

Seminar - Organic Computing

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

On the Formation of Phoneme Categories in DNN Acoustic Models

Summarizing Answers in Non-Factoid Community Question-Answering

Lecture 10: Reinforcement Learning

CS Machine Learning

A Case Study: News Classification Based on Term Frequency

arxiv: v2 [cs.ir] 22 Aug 2016

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Action Recognition and Video

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Deep Neural Network Language Models

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Copyright by Sung Ju Hwang 2013

On-Line Data Analytics

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

arxiv: v1 [cs.cl] 2 Apr 2017

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

WHEN THERE IS A mismatch between the acoustic

Attributed Social Network Embedding

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Evolution of Symbolisation in Chimpanzees and Neural Nets

Transcription:

Deep Learning for Computer Vision

Introduction to Computer Vision & Deep Learning Presented by Hayden Faulkner

What Is Computer Vision?

What is Computer Vision? Using computers to understand (process) imagery

What Is Deep Learning?

What is Deep Learning? Part of a broader set of Machine Learning methods Reinforcement Learning Clustering Decision Tree Learning Support Vector Machines Rule-based Machine Learning Similarity and Metric Learning Artificial Neural Networks Association Rule Learning Deep Learning Genetic Algorithms Bayesian networks Sparse Dictionary Learning Inductive Logic Programming Representation Learning

What is Deep Learning? Deep Learning methods focus on learning data representations via a set of many sequential operations* *Many experts have their own definition

Image Classification: A Fundamental Computer Vision Problem Process / Model Dog

Image Classification: A Fundamental Computer Vision Problem Feature Extractor - Process the image into a lower dimensional space more useful for classification Features hand-crafted (designed) by researchers Used for picking up image properties such as edges or patterns Some features: SIFT, HOG, LBP, MSER, Color-SIFT Trainable Classifier - Dog Classifier uses image features to decide a label Utilises machine learning to learn classifier parameters, but it s not deep learning Different classifiers learn and classify in different ways, a popular choice has been Support Vector Machines (SVM) SVMs attempt find hyperplanes in the high dimensional feature space to separate features from different classes Left from: https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.html Right from: https://docs.opencv.org/2.4/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html

Image Classification: A Fundamental Computer Vision Problem Feature Extractor - Process the image into a lower dimensional space more useful for classification Features hand-crafted (designed) by researchers Used for picking up image properties such as edges or patterns Some features: SIFT, HOG, LBP, MSER, Color-SIFT, Trainable Classifier - Dog Classifier uses image features to decide a label Utilises machine learning to learn classifier parameters, but it s not deep learning Different classifiers learn and classify in different ways, a popular choice has been Support Vector Machines (SVM) SVMs attempt find hyperplanes in the high dimensional feature space to separate features from different classes This worked okay, but it wasn t very scalable, the hand-crafted features weren t rich enough to handle many different object types and object appearance variations (pose, lighting, orientation, scene)

Image Classification: A Fundamental Computer Vision Problem Convolutional Neural Network - Dog Filters (that make features) are learnt by the computer so they are most useful for classification The classifier is in-built as part of the Convolutional Network architecture, no need for two separate stages Learnt end-to-end, the entire process from the input image to the label is learnt together, providing better relations between the features and the classifier

A Deeper Look at Convolutional Neural Networks: Structure Convolutional Neural Network Cat (0.15) Bird (0.02) Car (0.01) Dog (0.82) Convolution - Pooling Convolution Pooling Fully Connected Dog Predictions Built of many layers that process image from pixels to label in a hierarchical and sequential manner What makes it deep learning is the sequential layer operations to learn different data representations based on previous layers So many parameters to learn, need lots of data, and lots of compute power, this is a key reason for its rise now Diagram adapted from: https://www.clarifai.com/technology

A Deeper Look at Convolutional Neural Networks: Operations Convolutional Neural Network Cat (0.15) Bird (0.02) Car (0.01) Dog (0.82) Convolution Pooling Convolution 2D Convolution Operation Filter / Weights Representation Image / Representation Pooling Fully Connected Dog Predictions 2D Max Pooling Operation Convolved Feature Left from: http://deeplearning.stanford.edu/wiki/index.php/feature_extraction_using_convolution Right from: http://cs231n.github.io/convolutional-networks/

A Deeper Look at Convolutional Neural Networks: Features Convolutional Neural Network Cat (0.15) Bird (0.02) Car (0.01) Dog (0.82) Convolution - Pooling Convolution Pooling Fully Connected Dog Predictions Learns hierarchical features Layer 1 Layer 2 Layer 5 Images: Matthew D Zeiler, Rob Fergus, Visualizing and Understanding Convolutional Networks

A Deeper Look at Convolutional Neural Networks: Training Convolutional Neural Network Cat (0.55) Bird (0.02) Car (0.01) Dog (0.42) Cat error Convolution - Pooling Convolution Pooling Fully Connected Predictions These models have huge numbers of parameters that need to be tuned, all of the convolutional filters, and all of the connections between the fully connected layers Training all of these parameters is done by using a method called backpropagation with gradient descent Pre-labelled images are fed to the network and predicted on, at the end of the network it s calculated how wrong the network was using a loss function This amount of error is then back-propagated backwards through the network, slightly changing all the filter weights to be more correct for that example So over time we minimise the loss function across a large dataset of labelled examples Training needs many examples (thousands or even better millions) and takes a long time (days or even weeks) with heavy usage of GPU resources

So how Good are they at Classification? Image adapted from: https://www.dsiac.org/resources/journals/dsiac/winter-2017-volume-4-number-1/real-time-situ-intelligent-video-analytics

Deep Learning Applications Presented by Adrian Johnston

Object Detection Rather than just classifying the images as Car or Road we can train the Neural Network to predict bounding boxes for the objects of interest State of the Art: Faster-RCNN, SSD, YOLO9000 Image:https://shaoanlu.wordpress.com/2017/05/07/vihicle-detection-usi ng-ssd-on-floybhub-udacity-self-driving-car-nano-degree/

Semantic Segmentation Convolutional Layers Pixel Softmax Pixel Labels We can also perform semantic segmentation Train the network to classify each pixel in the image to separate sections into semantic classes e.g. Road, Car, Sky, Person State of the Art: FCN, SegNet, RefineNet, DeepLabv3, PSPNet

Instance Segmentation Convolutional Layers Pixel Softmax Pixel Labels Instance Segmentation: Classify pixels to specific instances of a Category rather than just the semantic category One way is to combine Object Detection with Semantic Segmentation: Mask R-CNN

Instance Segmentation Image : Mask R-CNN, https://arxiv.org/pdf/1703.06870.pdf

Depth Estimation We don t always want to classify things Depth Regression: Predict the depth (continuous) per pixel in the image Supervised: Capture ground truth depth from sensors: Microsoft Kinect Lidar Stereo/Multi camera rig Unsupervised using geometry: Train the network to predict the depth given a video or stereo image with known or predicted camera pose Image: https://github.com/tinghuiz/sfmlearner/blob/master/misc/cityscapes_sample_results.gif Train the network to minimize the distance between the predicted depth and the ground truth depth from the sensor data Difficult, but can be trained without ground truth depth

Simple Self Driving Car in GTA V Image: https://github.com/sentdex/pygta5/blob/master/self-driving-car-grand-theft-auto-5.gif

Imitation Learning on Real Data Image: https://github.com/commaai/research/blob/master/images/selfsteer.gif

Generative Adversarial Network (GAN) Generator Network 1: Real Image 0: Fake Image Discriminator Network

Conditional GAN Generator Network 1: Real Image 0: Fake Image Discriminator Network

Conditional GAN

So is Computer Vision a solved problem?

Is Computer Vision a solved problem? No! Lots of challenges still remain: High Level Reasoning E.g. Understanding how other drivers behave on the road Interpretability How do we interpret the decisions made by a AI system? Useful after an accident Uncertainty estimation Teaching our models to understand what they don t know

Other Challenges Data These models are data hungry Need thousands of examples We have lots of data, but it still is not enough in lots of domains Improved algorithms that can learn from smaller amounts of data Compute Resources These models use immense amounts of computer resources Graphics Processing Units (GPU s) Others

Thanks for Listening!