Semantic Segmentation for Driving Scenarios: On Virtual Worlds and Embedded Platforms German Ros gros@cvc.uab.es
Contents About myself Understanding Driving Scenes Hungry of data: MDRS3, SYNTHIA & Beyond On Training Method & Transfer Knowledge Closing Remarks 2
About myself
About myself I Come from Barcelona, Spain Background in: Finishing a PhD in Computer Science at Autonomous Univ. of Barcelona Computer Vision Machine Learning Deep Learning Autonomous systems Optimization 4
About myself Predoctoral researcher at Computer Vision Center ~150 researchers (PhDs + students) 8+ research groups: Advanced Driver Assistance Systems (ADAS) Color In Context (CIC) Document Analysis (DAG) Human Pose Recovery and Behavior Analysis (HuPBA) Image Sequence Evaluation (ISE) Learning and Machine Perception Team (LAMP) Machine Vision (MV) Object Recognition (OR) 5
About myself Advanced Driver Assistance Systems Group 10 PhDs (researchers & assoc. prof.) 7 PhD students 5 MSc students 10 bachelor students Working on: Autonomous cars Object detection Domain Adaptation Semantic segmentation Deep learning Visual SLAM GPU optimization Synthetic Worlds 6
About myself When I m not at CVC 7
About myself My research focus (es) Perception for Intelligent Vehicles Visual localization and mapping Semantic segmentation Change detection Machine (Deep) Learning Synthetic environments Boosting training methods Domain adaptation Task/Knowledge transference Applied Mathematics Manifold optimization Compressed regression Robust decompositions (R-PCA) 8
Understanding Driving Scenes
Understanding Driving Scenes The importance of AD for society Reduce accidents to ~0% Decrease congestions in urban areas Reduce emissions Improve road usage efficiency Increase human efficiency (more time) 10
Understanding Driving Scenes Three fundamental approaches Mapping & Retrieval Scene Understanding End-to-end driving 11
Understanding Driving Scenes Three fundamental approaches Mapping & Retrieval Scene Understanding End-to-end driving 12
Understanding Driving Scenes: Semantic Segmentation Definition of Semantic Segmentation Given an image I R HxW And a collection of classes L L 1,, L K, Produce a map such that M: I L HxW Important step towards full Scene Understanding 13
Understanding Driving Scenes: Semantic Segmentation How can we create the map M? Traditionally: Feature crafting: SIFT, HOG, Histograms of colors, Textons Mid-level representations: Spatial-pyramids, SIFTflow, etc. Pixel-wise classifiers: SVM, Random Forest, Logistic regression Structure prediction: MRF and CRF Currently: Artificial Neural Networks: CNNs, DeconvNets, RNNs, etc. Deep Learning Learn a hierarchy of representations 14
Understanding Driving Scenes: Deep Semantic Segmentation Deep Learning Feed Input-Output data and (hopefully) get a correlation Deep Learning Training data Model Nowadays this correlation procedure requires a high volume of annotated data Do we have enough data for driving scenes? 15
Hungry of Data: MDRS3, SYNTHIA & Beyond
Hungry of Data: MDRS3, SYNTHIA & Beyond Multi-Domain Road Scene Semantic Segmentation (MDRS3) More than 31,000 images 17
Hungry of Data: MDRS3, SYNTHIA & Beyond Multi-Domain Road Scene Semantic Segmentation (MDRS3) More than 31,000 images 18
Hungry of Data: MDRS3, SYNTHIA & Beyond Multi-Domain Road Scene Semantic Segmentation (MDRS3) Testing set examples 19
Hungry of Data: MDRS3, SYNTHIA & Beyond Introducing SYNTHIA
Hungry of Data: MDRS3, SYNTHIA & Beyond A promising alternative: SYNTHIA 21
Hungry of Data: MDRS3, SYNTHIA & Beyond A promising alternative: SYNTHIA Realistic simulation of driving environments: rural, highway, city, etc. Different seasons and lighting conditions Dynamic objects and high variability Multiple sensors: omni-cameras, depth sensors, lasers Automatic ground truth for semantic segmentation, depth estimation, localization & mapping Easy to extend 22
Hungry of Data: MDRS3, SYNTHIA & Beyond SYNTHIA: Multiple seasons 23
Hungry of Data: MDRS3, SYNTHIA & Beyond SYNTHIA: Multiple sensors 24
Hungry of Data: MDRS3, SYNTHIA & Beyond SYNTHIA: Dynamic objects 25
Hungry of Data: MDRS3, SYNTHIA & Beyond SYNTHIA: Automatic ground truth 26
Hungry of Data: MDRS3, SYNTHIA & Beyond YouTube, StreetView and Crowdsourcing
Hungry of Data: MDRS3, SYNTHIA & Beyond The YouTube Driving Collection YouTube videos from: Morocco, China, Japan, India and Australia Very challenging conditions: Noisy, different optics, reflections Used for qualitative evaluation How well do models generalize? 28
Hungry of Data: MDRS3, SYNTHIA & Beyond Google Global StreetView Driving Collection 182 countries All type of roads 60,000 images 29
Hungry of Data: MDRS3, SYNTHIA & Beyond Google Global StreetView Driving Collection 182 countries All type of roads 60,000 images 30
Hungry of Data: MDRS3, SYNTHIA & Beyond Google Global StreetView Driving Collection 182 countries All type of roads 60,000 images 31
Hungry of Data: MDRS3, SYNTHIA & Beyond Google Global StreetView Driving Collection 182 countries All type of roads 60,000 images 32
Hungry of Data: MDRS3, SYNTHIA & Beyond Google Global StreetView Driving Collection How do we label it? Vertical crowdsourcing 33
On Training Methods & Transfer Knowledge
On Training Methods & Transfer Knowledge Defining a suitable model for Semantic Segmentation Target Net (T-Net) 1.4 M parameters very compact Efficient (real-time) Suitable for embedded context 35
On Training Methods & Transfer Knowledge Defining a suitable model for Semantic Segmentation Results on the MDRS3 Domain Training on the dense domain, then testing domain for verification However T-Net trained in the standard way (end-to-end) is not competitive Alternatives are required 36
On Training Methods & Transfer Knowledge Defining a suitable model for Semantic Segmentation FCN 134 M parameters 500 MB SRAM Not suitable for embedded context Good as a reference for T-Net 37
On Training Methods & Transfer Knowledge Defining a suitable model for Semantic Segmentation Results on the MDRS3 Domain Training on the dense + sparse domain, then testing domain for verification Standard end-to-end training with ADAM diverges Further *control* is needed 38
On Training Methods & Transfer Knowledge Alternative training approaches Domain Adaptation by Data projection: Data from the sparse domain is injected into random images of the dense domain, creating a unique domain (a sophisticated method of data augmentation) Balanced Gradient Contribution: Elements from both domains are mixed in a controlled fashion, using the dense domain for stability and the large variability of the sparse domain as a regularizer Ensemble of modalities: Different networks are specialized on a domain. Then these are combined with residual-blocks to produce a unified model 39
On Training Methods & Transfer Knowledge Domain Adaptation by Data projection (FlyingCars) Sparse domain masks are used to crop RGB data This data is injected (after transforming) into a random background Main advantages: Gradient directions become more stable and informative 40
On Training Methods & Transfer Knowledge Balanced Gradient Contribution (BGC) Data from sparse domain is more informative but noisy Let s then add it as a regularizer of our cost function 41
On Training Methods & Transfer Knowledge Ensemble of modalities (S-Net) Dense Net trained on MDRS3 dense Sparse Net trained on MDRS3 sparse Great recognition performance 269M parameters >1000MB SRAM Not suitable for embedded context (automotive chips) 42
On Training Methods & Transfer Knowledge Results of new training methods on the MDRS3 FCN produces better results than T-Net in all its modalities But it is too big for embedded systems Can we reach a trade-off? 43
On Training Methods & Transfer Knowledge Training by Knowledge Transference Original optimization space too complex in a shallow model when training from ground truth data (noisy!) Easier to optimize a deeper net: Source Net (S-Net) Then Use the refined/simplified output of S-Net to train a new (compact) Target Net 44
On Training Methods & Transfer Knowledge Training by Knowledge Transference We designed three approaches following this general idea Transferring knowledge through labels TK-L Transferring knowledge through softmax probabilities TK-SMP Transferring knowledge through softmax with weighted crossed-entropy TK-SMP-WCR 45
On Training Methods & Transfer Knowledge Training by Knowledge Transference TK-L Original data and its ground truth are ignored Knowledge is distilled from S-Net directly through predicted labels Dense and sparse modalities are used along with further data from Google street-view to better mimic the behaviour of S-Net TK-SMP Knowledge is distilled from probability distributions from S-Net This soft-assignment is very informative and simplifies the training Cross-entropy is used as the transference loss function TK-SMP-WCR The previous approach is extended to balance the different distributions according to its importance Weighted cross-entropy is used to this end 46
On Training Methods & Transfer Knowledge Training by Knowledge Transference Results on the MDRS3 test set Adding raw random data from Google Street View 47
On Training Methods & Transfer Knowledge Quantitative results & examples Better performance than an FCN Performance comparable to the Ensemble 0.5% of memory footprint wrt the Ensemble 48
On Training Methods & Transfer Knowledge Qualitative results 49
On Training Methods & Transfer Knowledge Results on The YouTube Driving Collection How do models generalize? FCN vs T-Net 50
On Training Methods & Transfer Knowledge Improving Architectures
On Training Methods & Transfer Knowledge Improving the T-Net: SMART-Net 0.6 M parameters 700 KB SRAM 10x more efficient Perfect for embedded context Visconti (1 MB of SRAM) 52
On Training Methods & Transfer Knowledge SMART-Net (New Target Net) Results on the MDRS3 53
On Training Methods & Transfer Knowledge FCN-BNDrop 134 M parameters 500 MB SRAM 54
On Training Methods & Transfer Knowledge New FCN-Ensemble Dense FCN trained on MDRS3 dense Sparse FCN trained on MDRS3 sparse Dense FCN-BNDrop trained on MDRS3 Great recognition performance 403M parameters >1600MB SRAM Not suitable for embedded context 55
On Training Methods & Transfer Knowledge Results for FCN-BND and FCN-BND Ensemble on MDRS3 56
On Training Methods & Transfer Knowledge Results on The YouTube Driving Collection How do models generalize? FCN-BND Ensemble 57
On Training Methods & Transfer Knowledge Working with Synthetic Data
On Training Methods & Transfer Knowledge Training using SYNTHIA From MDRS3 (dense domains) we create training/validation splits T-Net and FCN are evaluated for each domain SYNTHIA is used to help training Training uses BGC to mix real and synthetic data (30% vs 70%) 59
On Training Methods & Transfer Knowledge Training using SYNTHIA T-Net and FCN on MDRS3 (dense domains) 60
On Training Methods & Transfer Knowledge Training using SYNTHIA T-Net and FCN on MDRS3 (dense domains) 61
On Training Methods & Transfer Knowledge Training using SYNTHIA T-Net on CamVid domain 62
On Training Methods & Transfer Knowledge Training using SYNTHIA FCN on the YouTube Driving Collection 63
Closing Remarks
Closing Remarks Conclusions ML engines are data driven & require large data volumes Dealing with heterogeneous sources of data becomes a requirement Training method needs to be aware about different data sources to release all its potential Transferring knowledge seems to be the most effective way of producing models with low memory footprint and high recognition capabilities It can be combined with orthogonal methods such as pruning, dictionaries, etc. Synthetic data is currently realistic enough to boost accuracy and generalization of semantic segmentation Domain adaptation of synthetic data can be painlessly achieved using BGC Further efforts should go to gather and label new data world wide for proper evaluation 68
Acknowledgments 69
Thank you! Questions?