Lecture 1: Introduction - PDF Free Download

Administration CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 1: Introduction Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr CSED703R: Deep Learning for Visual Recognition Instructor: Prof. Bohyung Han (bhhan@postech.ac.kr, B4-123) Time & Location: TuTh 12:30 ~ 13:45 PM, B2-105 Office hour: by appointment Textbook (for reference) Deep Learning by I. Goodfellow, Y. Bengio, and A. Courville Neural Networks and Deep Learning by M. Nielsen Prerequisite Coursework: probability theory, linear algebra, computer vision Substantial programming experience including script languages in Unix/Linux/ MacOS environment 2 3 Class Coverage Introduction and preliminaries Unsupervised representation learning Convolutional Neural Networks (CNNs) Image classification, object detection and localization Visual tracking, action recognition and localization Semantic segmentation CNN optimization and analysis Recurrent Neural Networks (RNNs) Vision and languages Image caption generation Visual question answering Generative Adversarial Networks (GANs) Deep reinforcement learning Deep learning applications and others Grading Assignments (30%) Problem solving Programming projects Mid-term exam (20%) Presentations (10%) Final project (40%) Research project Final report Note: No Pass/Fail grading Individual percentages are subject to change. 4

Final Project Team organization: individual project Deliverables Demo, source code, and presentation Frequent intermediate reports Final report Guideline You should decide the theme of your project. Final report should adhere to the standard quality and format of reputable conferences and journals. Top venues in machine learning: ICML, NIPS, AISTATS, ICLR, JMLR Top venues in computer vision: CVPR, ICCV, ECCV, TPAMI, IJCV Course Policy Assignments submission Late assignments will be accepted for three days with score deduction. Programming platform TensorFlow Other platforms such as Caffe, Torch, Theano, and MatConvNet are also allowed but you should make sure to minimize grading complexity with proper documentation. Academic integrity Make sure to acknowledge the POSTECH academic integrity. Violating the academic integrity means the automatic failure (F) in this class with NO exception. 5 6 Course Policy Course identity This is NOT an introductory course in machine learning or deep learning. The major requirement of this course is the final project. The students willing or competent to do the project very seriously are recommended to take this course. The instructor has the right to evaluate students based only on the performance in the final project if necessary. Deep Learning 7 8

Pipeline of Visual Recognition Components of Standard Visual Recognition Data: images and videos Computer vision algorithms Representation of visual data Visual features Hand-crafted features: HOG, BoW, GIST, LBP, MSER, SIFT, SURF, Learned features: CNN, RNN, Auto-encoder Classifiers Discriminative methods: NN, SVM, random forest, boosting, Generative methods: naïve Bayes Features vs. classifiers Good features are key ingredients to recent progress in recognition. Various classification algorithms have been proposed so far. Features Classifiers 9 10 What is deep learning? A learning method to model high-level abstractions in data by using model architectures composed of multi-layer non-linear operations Representation learning A buzzword of neural network with many layers Deep Learning Applications Computer vision Natural language processing Speech recognition Bioinformatics Medical imaging And many others Deep Learning 11 12

Speech Recognition Machine Translation 13 [Johnson16] M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. B. Viégas, M. Wattenberg, G. Corrado, M. Hughes, J. Dean: Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. arxiv 1611.04558, 2016 14 Image Classification Object Detection R-CNN: regions with CNN features Supervised pretraining using large-scale data Domain-specific fine tuning Linear SVM applied to pool5, fc6, and fc7 Input image Extract region Compute CNN features Classification proposal Any proposal method Any architecture Softmax or SVM 15 Image Classification Top-5 Errors (%) [Girshick2014] R. Girshick, J. Donahue, S. Guadarrama, T. Darrell, J. Malik: Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014 16

Semantic Segmentation Visual Tracking MDNet (Multi-Domain Network) Multi-domain learning Separating shared and domain-specific layers Input image Ground-truth FCN DeconvNet The Winner of Visual Object Tracking Challenge 2015 [Noh15] H. Noh, S. Hong, B. Han: Learning Deconvolution Network for Semantic Segmentation, ICCV 2015 17 [Nam16] H. Nam, B. Han: Learning Multi-Domain Convolutional Neural Networks for Visual Tracking. CVPR 2016 18 Face Verification Pose Estimation FaceNet [Schroff15] F. Schroff, D. Kalenichenko, J. Philbin: FaceNet: A Unified Embedding for Face Recognition and Clustering. CVPR 2015 DeepFace [Taigman14] Y. Taigman, M. Yang, M. Ranzato, L Wolf: DeepFace: Closing the Gap to Human-Level Performance in Face Verification. CVPR 2014 19 [Insafutdinov16] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, B. Schiele: DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model. ECCV 2016 20

Image Caption Generation Image Question Answering Classification Network Dynamic Parameter Layer 0.1 0.1-0.2-0.7 1.2-0.2 0.1-0.7-0.7 1.2 0.3-0.2 0.3 0.3 0.1 1.2 teddy bear What is in the cabinet? Parameter Prediction Network GRU GRU GRU GRU GRU GRU What is in the cabinet? Hashing -0.2 0.3-0.7 1.2 0.1 Candidate Weights [Vinyals15] O. Vinyals, A. Toshev, S. Bengio, D. Erhan: Show and Tell: A Neural Image Caption Generator. CVPR 2015 21 [Noh15] H. Noh, P. H. Seo, B. Han: Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction, arxiv:1511.05756, 2015 22 Neural Artistic Style Image Generation Style1: The Starry Night Source Style2: The Scream [Gatys15] L. A. Gatys, A. S. Ecker, M. Bethge: A Neural Algorithm of Artistic Style. arxiv:1508.06576, 2015 23 Generative Adversarial Networks (GANs) [Goodfellow14] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, Y. Bengio: Generative Adversarial Nets. NIPS 2014 24

Deep Reinforcement Learning: Atari Games Deep Reinforcement Learning: AlphaGo [Minh2013] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. A. Riedmiller: Playing Atari with Deep Reinforcement Learning. arxiv: 1312.5602, 2013 25 26 [Silver16] D. Silver et al.: Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 2016 27