Lecture 1: Introduction

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Python Machine Learning

Lip Reading in Profile

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

arxiv: v1 [cs.lg] 15 Jun 2015

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Generative models and adversarial training

arxiv:submit/ [cs.cv] 2 Aug 2017

Webly Supervised Learning of Convolutional Networks

arxiv: v2 [cs.cv] 4 Mar 2016

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

arxiv: v2 [cs.lg] 8 Aug 2017

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs

arxiv: v4 [cs.cv] 13 Aug 2017

(Sub)Gradient Descent

Georgetown University at TREC 2017 Dynamic Domain Track

Residual Stacking of RNNs for Neural Machine Translation

CS 446: Machine Learning

AI Agent for Ice Hockey Atari 2600

arxiv: v1 [cs.cl] 27 Apr 2016

Diverse Concept-Level Features for Multi-Object Classification

THE enormous growth of unstructured data, including

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

arxiv: v1 [cs.dc] 19 May 2017

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

arxiv: v4 [cs.cl] 28 Mar 2016

SORT: Second-Order Response Transform for Visual Recognition

Exploration. CS : Deep Reinforcement Learning Sergey Levine

CSL465/603 - Machine Learning

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Deep Neural Network Language Models

Lecture 1: Machine Learning Basics

Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task

arxiv: v1 [cs.cl] 20 Jul 2015

arxiv: v1 [cs.lg] 7 Apr 2015

Top US Tech Talent for the Top China Tech Company

Offline Writer Identification Using Convolutional Neural Network Activation Features

THE world surrounding us involves multiple modalities

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Cultivating DNN Diversity for Large Scale Video Labelling

Second Exam: Natural Language Parsing with Neural Networks

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Learning From the Past with Experiment Databases

Rule Learning With Negation: Issues Regarding Effectiveness

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

arxiv: v1 [cs.cv] 10 May 2017

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

A study of speaker adaptation for DNN-based speech synthesis

Syllabus for ART 365 Digital Photography 3 Credit Hours Spring 2013

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Human Emotion Recognition From Speech

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories

Forget catastrophic forgetting: AI that learns after deployment

Knowledge Transfer in Deep Convolutional Neural Nets

LEARNING TO PLAY IN A DAY: FASTER DEEP REIN-

Artificial Neural Networks written examination

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Rule Learning with Negation: Issues Regarding Effectiveness

arxiv: v2 [cs.cv] 30 Mar 2017

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Word Segmentation of Off-line Handwritten Documents

A Deep Bag-of-Features Model for Music Auto-Tagging

Speech Emotion Recognition Using Support Vector Machine

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Data Structures and Algorithms

arxiv: v1 [cs.cv] 2 Jun 2017

arxiv: v2 [cs.cl] 26 Mar 2015

Semi-Supervised Face Detection

Reducing Features to Improve Bug Prediction

The taming of the data:

ON THE USE OF WORD EMBEDDINGS ALONE TO

Conference Presentation

Calibration of Confidence Measures in Speech Recognition

Deep Facial Action Unit Recognition from Partially Labeled Data

Learning Methods for Fuzzy Systems

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

arxiv: v2 [cs.cv] 3 Aug 2017

Universidade do Minho Escola de Engenharia

WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web

Australian Journal of Basic and Applied Sciences

Laboratorio di Intelligenza Artificiale e Robotica

WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

A Case Study: News Classification Based on Term Frequency

Transcription:

Administration CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 1: Introduction Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr CSED703R: Deep Learning for Visual Recognition Instructor: Prof. Bohyung Han (bhhan@postech.ac.kr, B4-123) Time & Location: TuTh 12:30 ~ 13:45 PM, B2-105 Office hour: by appointment Textbook (for reference) Deep Learning by I. Goodfellow, Y. Bengio, and A. Courville Neural Networks and Deep Learning by M. Nielsen Prerequisite Coursework: probability theory, linear algebra, computer vision Substantial programming experience including script languages in Unix/Linux/ MacOS environment 2 3 Class Coverage Introduction and preliminaries Unsupervised representation learning Convolutional Neural Networks (CNNs) Image classification, object detection and localization Visual tracking, action recognition and localization Semantic segmentation CNN optimization and analysis Recurrent Neural Networks (RNNs) Vision and languages Image caption generation Visual question answering Generative Adversarial Networks (GANs) Deep reinforcement learning Deep learning applications and others Grading Assignments (30%) Problem solving Programming projects Mid-term exam (20%) Presentations (10%) Final project (40%) Research project Final report Note: No Pass/Fail grading Individual percentages are subject to change. 4

Final Project Team organization: individual project Deliverables Demo, source code, and presentation Frequent intermediate reports Final report Guideline You should decide the theme of your project. Final report should adhere to the standard quality and format of reputable conferences and journals. Top venues in machine learning: ICML, NIPS, AISTATS, ICLR, JMLR Top venues in computer vision: CVPR, ICCV, ECCV, TPAMI, IJCV Course Policy Assignments submission Late assignments will be accepted for three days with score deduction. Programming platform TensorFlow Other platforms such as Caffe, Torch, Theano, and MatConvNet are also allowed but you should make sure to minimize grading complexity with proper documentation. Academic integrity Make sure to acknowledge the POSTECH academic integrity. Violating the academic integrity means the automatic failure (F) in this class with NO exception. 5 6 Course Policy Course identity This is NOT an introductory course in machine learning or deep learning. The major requirement of this course is the final project. The students willing or competent to do the project very seriously are recommended to take this course. The instructor has the right to evaluate students based only on the performance in the final project if necessary. Deep Learning 7 8

Pipeline of Visual Recognition Components of Standard Visual Recognition Data: images and videos Computer vision algorithms Representation of visual data Visual features Hand-crafted features: HOG, BoW, GIST, LBP, MSER, SIFT, SURF, Learned features: CNN, RNN, Auto-encoder Classifiers Discriminative methods: NN, SVM, random forest, boosting, Generative methods: naïve Bayes Features vs. classifiers Good features are key ingredients to recent progress in recognition. Various classification algorithms have been proposed so far. Features Classifiers 9 10 What is deep learning? A learning method to model high-level abstractions in data by using model architectures composed of multi-layer non-linear operations Representation learning A buzzword of neural network with many layers Deep Learning Applications Computer vision Natural language processing Speech recognition Bioinformatics Medical imaging And many others Deep Learning 11 12

Speech Recognition Machine Translation 13 [Johnson16] M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. B. Viégas, M. Wattenberg, G. Corrado, M. Hughes, J. Dean: Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. arxiv 1611.04558, 2016 14 Image Classification Object Detection R-CNN: regions with CNN features Supervised pretraining using large-scale data Domain-specific fine tuning Linear SVM applied to pool5, fc6, and fc7 Input image Extract region Compute CNN features Classification proposal Any proposal method Any architecture Softmax or SVM 15 Image Classification Top-5 Errors (%) [Girshick2014] R. Girshick, J. Donahue, S. Guadarrama, T. Darrell, J. Malik: Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014 16

Semantic Segmentation Visual Tracking MDNet (Multi-Domain Network) Multi-domain learning Separating shared and domain-specific layers Input image Ground-truth FCN DeconvNet The Winner of Visual Object Tracking Challenge 2015 [Noh15] H. Noh, S. Hong, B. Han: Learning Deconvolution Network for Semantic Segmentation, ICCV 2015 17 [Nam16] H. Nam, B. Han: Learning Multi-Domain Convolutional Neural Networks for Visual Tracking. CVPR 2016 18 Face Verification Pose Estimation FaceNet [Schroff15] F. Schroff, D. Kalenichenko, J. Philbin: FaceNet: A Unified Embedding for Face Recognition and Clustering. CVPR 2015 DeepFace [Taigman14] Y. Taigman, M. Yang, M. Ranzato, L Wolf: DeepFace: Closing the Gap to Human-Level Performance in Face Verification. CVPR 2014 19 [Insafutdinov16] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, B. Schiele: DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model. ECCV 2016 20

Image Caption Generation Image Question Answering Classification Network Dynamic Parameter Layer 0.1 0.1-0.2-0.7 1.2-0.2 0.1-0.7-0.7 1.2 0.3-0.2 0.3 0.3 0.1 1.2 teddy bear What is in the cabinet? Parameter Prediction Network GRU GRU GRU GRU GRU GRU What is in the cabinet? Hashing -0.2 0.3-0.7 1.2 0.1 Candidate Weights [Vinyals15] O. Vinyals, A. Toshev, S. Bengio, D. Erhan: Show and Tell: A Neural Image Caption Generator. CVPR 2015 21 [Noh15] H. Noh, P. H. Seo, B. Han: Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction, arxiv:1511.05756, 2015 22 Neural Artistic Style Image Generation Style1: The Starry Night Source Style2: The Scream [Gatys15] L. A. Gatys, A. S. Ecker, M. Bethge: A Neural Algorithm of Artistic Style. arxiv:1508.06576, 2015 23 Generative Adversarial Networks (GANs) [Goodfellow14] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, Y. Bengio: Generative Adversarial Nets. NIPS 2014 24

Deep Reinforcement Learning: Atari Games Deep Reinforcement Learning: AlphaGo [Minh2013] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. A. Riedmiller: Playing Atari with Deep Reinforcement Learning. arxiv: 1312.5602, 2013 25 26 [Silver16] D. Silver et al.: Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 2016 27