Semantic Segmentation for Driving Scenarios: On Virtual Worlds and Embedded Platforms. German Ros

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Seminar - Organic Computing

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

INPE São José dos Campos

arxiv: v1 [cs.cl] 2 Apr 2017

Generative models and adversarial training

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Model Ensemble for Click Prediction in Bing Search Ads

LEGO MINDSTORMS Education EV3 Coding Activities

A Case Study: News Classification Based on Term Frequency

An Introduction to Simio for Beginners

Linking Task: Identifying authors and book titles in verbose queries

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Calibration of Confidence Measures in Speech Recognition

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Knowledge Transfer in Deep Convolutional Neural Nets

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

(Sub)Gradient Descent

Speech Recognition at ICSI: Broadcast News and beyond

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Software Maintenance

CS Machine Learning

Modeling user preferences and norms in context-aware systems

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

CSL465/603 - Machine Learning

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Human Emotion Recognition From Speech

Top US Tech Talent for the Top China Tech Company

ATENEA UPC AND THE NEW "Activity Stream" or "WALL" FEATURE Jesus Alcober 1, Oriol Sánchez 2, Javier Otero 3, Ramon Martí 4

AQUA: An Ontology-Driven Question Answering System

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Assignment 1: Predicting Amazon Review Ratings

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v2 [cs.cv] 30 Mar 2017

arxiv: v1 [cs.lg] 15 Jun 2015

Time series prediction

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

arxiv: v1 [cs.cv] 10 May 2017

ProFusion2 Sensor Data Fusion for Multiple Active Safety Applications

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Circuit Simulators: A Revolutionary E-Learning Platform

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Learning From the Past with Experiment Databases

Switchboard Language Model Improvement with Conversational Data from Gigaword

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Evolution of Symbolisation in Chimpanzees and Neural Nets

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Deep Neural Network Language Models

Unpacking a Standard: Making Dinner with Student Differences in Mind

Word Segmentation of Off-line Handwritten Documents

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50

Deep Facial Action Unit Recognition from Partially Labeled Data

Laboratorio di Intelligenza Artificiale e Robotica

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Lecture 2: Quantifiers and Approximation

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Comment-based Multi-View Clustering of Web 2.0 Items

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

THE world surrounding us involves multiple modalities

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Softprop: Softmax Neural Network Backpropagation Learning

Using dialogue context to improve parsing performance in dialogue systems

Action Recognition and Video

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Reducing Spoon-Feeding to Promote Independent Thinking

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Lecture 15: Test Procedure in Engineering Design

Automating the E-learning Personalization

Capturing and Organizing Prior Student Learning with the OCW Backpack

Emergency Management Games and Test Case Utility:

Residual Stacking of RNNs for Neural Machine Translation

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY

Active Learning. Yingyu Liang Computer Sciences 760 Fall

A Pipelined Approach for Iterative Software Process Model

Copyright by Sung Ju Hwang 2013

Designing Autonomous Robot Systems - Evaluation of the R3-COP Decision Support System Approach

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

GREAT Britain: Film Brief

A Computer Vision Integration Model for a Multi-modal Cognitive System

Universidade do Minho Escola de Engenharia

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Probability and Statistics Curriculum Pacing Guide

elearning OVERVIEW GFA Consulting Group GmbH 1

Transcription:

Semantic Segmentation for Driving Scenarios: On Virtual Worlds and Embedded Platforms German Ros gros@cvc.uab.es

Contents About myself Understanding Driving Scenes Hungry of data: MDRS3, SYNTHIA & Beyond On Training Method & Transfer Knowledge Closing Remarks 2

About myself

About myself I Come from Barcelona, Spain Background in: Finishing a PhD in Computer Science at Autonomous Univ. of Barcelona Computer Vision Machine Learning Deep Learning Autonomous systems Optimization 4

About myself Predoctoral researcher at Computer Vision Center ~150 researchers (PhDs + students) 8+ research groups: Advanced Driver Assistance Systems (ADAS) Color In Context (CIC) Document Analysis (DAG) Human Pose Recovery and Behavior Analysis (HuPBA) Image Sequence Evaluation (ISE) Learning and Machine Perception Team (LAMP) Machine Vision (MV) Object Recognition (OR) 5

About myself Advanced Driver Assistance Systems Group 10 PhDs (researchers & assoc. prof.) 7 PhD students 5 MSc students 10 bachelor students Working on: Autonomous cars Object detection Domain Adaptation Semantic segmentation Deep learning Visual SLAM GPU optimization Synthetic Worlds 6

About myself When I m not at CVC 7

About myself My research focus (es) Perception for Intelligent Vehicles Visual localization and mapping Semantic segmentation Change detection Machine (Deep) Learning Synthetic environments Boosting training methods Domain adaptation Task/Knowledge transference Applied Mathematics Manifold optimization Compressed regression Robust decompositions (R-PCA) 8

Understanding Driving Scenes

Understanding Driving Scenes The importance of AD for society Reduce accidents to ~0% Decrease congestions in urban areas Reduce emissions Improve road usage efficiency Increase human efficiency (more time) 10

Understanding Driving Scenes Three fundamental approaches Mapping & Retrieval Scene Understanding End-to-end driving 11

Understanding Driving Scenes Three fundamental approaches Mapping & Retrieval Scene Understanding End-to-end driving 12

Understanding Driving Scenes: Semantic Segmentation Definition of Semantic Segmentation Given an image I R HxW And a collection of classes L L 1,, L K, Produce a map such that M: I L HxW Important step towards full Scene Understanding 13

Understanding Driving Scenes: Semantic Segmentation How can we create the map M? Traditionally: Feature crafting: SIFT, HOG, Histograms of colors, Textons Mid-level representations: Spatial-pyramids, SIFTflow, etc. Pixel-wise classifiers: SVM, Random Forest, Logistic regression Structure prediction: MRF and CRF Currently: Artificial Neural Networks: CNNs, DeconvNets, RNNs, etc. Deep Learning Learn a hierarchy of representations 14

Understanding Driving Scenes: Deep Semantic Segmentation Deep Learning Feed Input-Output data and (hopefully) get a correlation Deep Learning Training data Model Nowadays this correlation procedure requires a high volume of annotated data Do we have enough data for driving scenes? 15

Hungry of Data: MDRS3, SYNTHIA & Beyond

Hungry of Data: MDRS3, SYNTHIA & Beyond Multi-Domain Road Scene Semantic Segmentation (MDRS3) More than 31,000 images 17

Hungry of Data: MDRS3, SYNTHIA & Beyond Multi-Domain Road Scene Semantic Segmentation (MDRS3) More than 31,000 images 18

Hungry of Data: MDRS3, SYNTHIA & Beyond Multi-Domain Road Scene Semantic Segmentation (MDRS3) Testing set examples 19

Hungry of Data: MDRS3, SYNTHIA & Beyond Introducing SYNTHIA

Hungry of Data: MDRS3, SYNTHIA & Beyond A promising alternative: SYNTHIA 21

Hungry of Data: MDRS3, SYNTHIA & Beyond A promising alternative: SYNTHIA Realistic simulation of driving environments: rural, highway, city, etc. Different seasons and lighting conditions Dynamic objects and high variability Multiple sensors: omni-cameras, depth sensors, lasers Automatic ground truth for semantic segmentation, depth estimation, localization & mapping Easy to extend 22

Hungry of Data: MDRS3, SYNTHIA & Beyond SYNTHIA: Multiple seasons 23

Hungry of Data: MDRS3, SYNTHIA & Beyond SYNTHIA: Multiple sensors 24

Hungry of Data: MDRS3, SYNTHIA & Beyond SYNTHIA: Dynamic objects 25

Hungry of Data: MDRS3, SYNTHIA & Beyond SYNTHIA: Automatic ground truth 26

Hungry of Data: MDRS3, SYNTHIA & Beyond YouTube, StreetView and Crowdsourcing

Hungry of Data: MDRS3, SYNTHIA & Beyond The YouTube Driving Collection YouTube videos from: Morocco, China, Japan, India and Australia Very challenging conditions: Noisy, different optics, reflections Used for qualitative evaluation How well do models generalize? 28

Hungry of Data: MDRS3, SYNTHIA & Beyond Google Global StreetView Driving Collection 182 countries All type of roads 60,000 images 29

Hungry of Data: MDRS3, SYNTHIA & Beyond Google Global StreetView Driving Collection 182 countries All type of roads 60,000 images 30

Hungry of Data: MDRS3, SYNTHIA & Beyond Google Global StreetView Driving Collection 182 countries All type of roads 60,000 images 31

Hungry of Data: MDRS3, SYNTHIA & Beyond Google Global StreetView Driving Collection 182 countries All type of roads 60,000 images 32

Hungry of Data: MDRS3, SYNTHIA & Beyond Google Global StreetView Driving Collection How do we label it? Vertical crowdsourcing 33

On Training Methods & Transfer Knowledge

On Training Methods & Transfer Knowledge Defining a suitable model for Semantic Segmentation Target Net (T-Net) 1.4 M parameters very compact Efficient (real-time) Suitable for embedded context 35

On Training Methods & Transfer Knowledge Defining a suitable model for Semantic Segmentation Results on the MDRS3 Domain Training on the dense domain, then testing domain for verification However T-Net trained in the standard way (end-to-end) is not competitive Alternatives are required 36

On Training Methods & Transfer Knowledge Defining a suitable model for Semantic Segmentation FCN 134 M parameters 500 MB SRAM Not suitable for embedded context Good as a reference for T-Net 37

On Training Methods & Transfer Knowledge Defining a suitable model for Semantic Segmentation Results on the MDRS3 Domain Training on the dense + sparse domain, then testing domain for verification Standard end-to-end training with ADAM diverges Further *control* is needed 38

On Training Methods & Transfer Knowledge Alternative training approaches Domain Adaptation by Data projection: Data from the sparse domain is injected into random images of the dense domain, creating a unique domain (a sophisticated method of data augmentation) Balanced Gradient Contribution: Elements from both domains are mixed in a controlled fashion, using the dense domain for stability and the large variability of the sparse domain as a regularizer Ensemble of modalities: Different networks are specialized on a domain. Then these are combined with residual-blocks to produce a unified model 39

On Training Methods & Transfer Knowledge Domain Adaptation by Data projection (FlyingCars) Sparse domain masks are used to crop RGB data This data is injected (after transforming) into a random background Main advantages: Gradient directions become more stable and informative 40

On Training Methods & Transfer Knowledge Balanced Gradient Contribution (BGC) Data from sparse domain is more informative but noisy Let s then add it as a regularizer of our cost function 41

On Training Methods & Transfer Knowledge Ensemble of modalities (S-Net) Dense Net trained on MDRS3 dense Sparse Net trained on MDRS3 sparse Great recognition performance 269M parameters >1000MB SRAM Not suitable for embedded context (automotive chips) 42

On Training Methods & Transfer Knowledge Results of new training methods on the MDRS3 FCN produces better results than T-Net in all its modalities But it is too big for embedded systems Can we reach a trade-off? 43

On Training Methods & Transfer Knowledge Training by Knowledge Transference Original optimization space too complex in a shallow model when training from ground truth data (noisy!) Easier to optimize a deeper net: Source Net (S-Net) Then Use the refined/simplified output of S-Net to train a new (compact) Target Net 44

On Training Methods & Transfer Knowledge Training by Knowledge Transference We designed three approaches following this general idea Transferring knowledge through labels TK-L Transferring knowledge through softmax probabilities TK-SMP Transferring knowledge through softmax with weighted crossed-entropy TK-SMP-WCR 45

On Training Methods & Transfer Knowledge Training by Knowledge Transference TK-L Original data and its ground truth are ignored Knowledge is distilled from S-Net directly through predicted labels Dense and sparse modalities are used along with further data from Google street-view to better mimic the behaviour of S-Net TK-SMP Knowledge is distilled from probability distributions from S-Net This soft-assignment is very informative and simplifies the training Cross-entropy is used as the transference loss function TK-SMP-WCR The previous approach is extended to balance the different distributions according to its importance Weighted cross-entropy is used to this end 46

On Training Methods & Transfer Knowledge Training by Knowledge Transference Results on the MDRS3 test set Adding raw random data from Google Street View 47

On Training Methods & Transfer Knowledge Quantitative results & examples Better performance than an FCN Performance comparable to the Ensemble 0.5% of memory footprint wrt the Ensemble 48

On Training Methods & Transfer Knowledge Qualitative results 49

On Training Methods & Transfer Knowledge Results on The YouTube Driving Collection How do models generalize? FCN vs T-Net 50

On Training Methods & Transfer Knowledge Improving Architectures

On Training Methods & Transfer Knowledge Improving the T-Net: SMART-Net 0.6 M parameters 700 KB SRAM 10x more efficient Perfect for embedded context Visconti (1 MB of SRAM) 52

On Training Methods & Transfer Knowledge SMART-Net (New Target Net) Results on the MDRS3 53

On Training Methods & Transfer Knowledge FCN-BNDrop 134 M parameters 500 MB SRAM 54

On Training Methods & Transfer Knowledge New FCN-Ensemble Dense FCN trained on MDRS3 dense Sparse FCN trained on MDRS3 sparse Dense FCN-BNDrop trained on MDRS3 Great recognition performance 403M parameters >1600MB SRAM Not suitable for embedded context 55

On Training Methods & Transfer Knowledge Results for FCN-BND and FCN-BND Ensemble on MDRS3 56

On Training Methods & Transfer Knowledge Results on The YouTube Driving Collection How do models generalize? FCN-BND Ensemble 57

On Training Methods & Transfer Knowledge Working with Synthetic Data

On Training Methods & Transfer Knowledge Training using SYNTHIA From MDRS3 (dense domains) we create training/validation splits T-Net and FCN are evaluated for each domain SYNTHIA is used to help training Training uses BGC to mix real and synthetic data (30% vs 70%) 59

On Training Methods & Transfer Knowledge Training using SYNTHIA T-Net and FCN on MDRS3 (dense domains) 60

On Training Methods & Transfer Knowledge Training using SYNTHIA T-Net and FCN on MDRS3 (dense domains) 61

On Training Methods & Transfer Knowledge Training using SYNTHIA T-Net on CamVid domain 62

On Training Methods & Transfer Knowledge Training using SYNTHIA FCN on the YouTube Driving Collection 63

Closing Remarks

Closing Remarks Conclusions ML engines are data driven & require large data volumes Dealing with heterogeneous sources of data becomes a requirement Training method needs to be aware about different data sources to release all its potential Transferring knowledge seems to be the most effective way of producing models with low memory footprint and high recognition capabilities It can be combined with orthogonal methods such as pruning, dictionaries, etc. Synthetic data is currently realistic enough to boost accuracy and generalization of semantic segmentation Domain adaptation of synthetic data can be painlessly achieved using BGC Further efforts should go to gather and label new data world wide for proper evaluation 68

Acknowledgments 69

Thank you! Questions?