Day 2 Lecture 5. Transfer learning and domain adaptation

Similar documents
Generative models and adversarial training

arxiv: v1 [cs.lg] 15 Jun 2015

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Python Machine Learning

Lip Reading in Profile

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Diverse Concept-Level Features for Multi-Object Classification

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Assignment 1: Predicting Amazon Review Ratings

arxiv:submit/ [cs.cv] 2 Aug 2017

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Lecture 1: Machine Learning Basics

Webly Supervised Learning of Convolutional Networks

arxiv: v1 [cs.cv] 10 May 2017

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

arxiv: v2 [cs.cv] 30 Mar 2017

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Cultivating DNN Diversity for Large Scale Video Labelling

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Attributed Social Network Embedding

Knowledge Transfer in Deep Convolutional Neural Nets

Semi-Supervised Face Detection

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Deep Facial Action Unit Recognition from Partially Labeled Data

Residual Stacking of RNNs for Neural Machine Translation

WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web

arxiv: v1 [cs.lg] 7 Apr 2015

Device Independence and Extensibility in Gesture Recognition

(Sub)Gradient Descent

SORT: Second-Order Response Transform for Visual Recognition

Improvements to the Pruning Behavior of DNN Acoustic Models

Evolution of Symbolisation in Chimpanzees and Neural Nets

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Learning From the Past with Experiment Databases

An empirical study of learning speed in backpropagation

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

THE enormous growth of unstructured data, including

arxiv: v1 [cs.cl] 20 Jul 2015

arxiv: v2 [cs.cl] 26 Mar 2015

Artificial Neural Networks written examination

Discriminative Learning of Beam-Search Heuristics for Planning

Forget catastrophic forgetting: AI that learns after deployment

CS 446: Machine Learning

Deep Neural Network Language Models

A Deep Bag-of-Features Model for Music Auto-Tagging

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

arxiv: v1 [cs.cl] 27 Apr 2016

Speech Recognition at ICSI: Broadcast News and beyond

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

A study of speaker adaptation for DNN-based speech synthesis

CS Machine Learning

arxiv: v4 [cs.cv] 13 Aug 2017

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Model Ensemble for Click Prediction in Bing Search Ads

Calibration of Confidence Measures in Speech Recognition

Softprop: Softmax Neural Network Backpropagation Learning

Copyright by Sung Ju Hwang 2013

Transfer Learning with Applications

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Multivariate k-nearest Neighbor Regression for Time Series data -

CSL465/603 - Machine Learning

INPE São José dos Campos

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Switchboard Language Model Improvement with Conversational Data from Gigaword

Axiom 2013 Team Description Paper

arxiv: v4 [cs.cl] 28 Mar 2016

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Evolutive Neural Net Fuzzy Filtering: Basic Description

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Word Segmentation of Off-line Handwritten Documents

Rule Learning With Negation: Issues Regarding Effectiveness

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

On-the-Fly Customization of Automated Essay Scoring

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Exposé for a Master s Thesis

Summarizing Answers in Non-Factoid Community Question-Answering

Transcription:

Day 2 Lecture 5 Transfer learning and domain adaptation

Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations from unlabelled data You can transfer learned representations from a related task You can train on a nearby surrogate objective for which it is easy to generate labels

Transfer learning: idea Instead of training a deep network from scratch for your task: Take a network trained on a different domain for a different source task Adapt it for your domain and your target task This lecture will talk about how to do this. Variations: Same domain, different task Different domain, same task

Transfer learning: idea Target labels Source labels Small amount of data/labels Large amount of data/labels Source model Source data E.g. ImageNet Transfer Learned Knowledge Target model Target data E.g. PASCAL

Example: PASCAL VOC 2007 Standard classification benchmark, 20 classes, ~10K images, 50% train, 50% test Deep networks can have many parameters (e.g. 60M in Alexnet) Direct training (from scratch) using only 5K training images can be problematic. Model overfits. How can we use deep networks in this setting?

Off-the-shelf Idea: use outputs of one or more layers of a network trained on a different task as generic feature detectors. Train a new shallow model on these features. loss Shallow classifier (e.g. SVM) softmax features fc2 fc1 fc1 conv3 conv3 conv2 TRANSFER conv2 conv1 conv1 Data and labels (e.g. ImageNet) Target data and labels

Off-the-shelf features Works surprisingly well in practice! Surpassed or on par with state-of-the-art in several tasks in 2014 Image classification: PASCAL VOC 2007 Oxford flowers CUB Bird dataset MIT indoors Image retrieval: Paris 6k Holidays UKBench Oxford 102 flowers dataset Razavian et al, CNN Features off-the-shelf: an Astounding Baseline for Recognition, CVPRW 2014 http://arxiv.org/abs/1403.6382

Can we do better than off the shelf features? Domain adaptation

Fine-tuning: supervised domain adaptation Train deep net on nearby task for which it is easy to get labels using standard backprop E.g. ImageNet classification Pseudo classes from augmented data Slow feature learning, ego-motion Cut off top layer(s) of network and replace with supervised objective for target domain surrogate real loss loss my_fc2 fc2 + softmax + softmax fc1 conv3 conv2 Fine-tune network using backprop with labels for target domain until validation loss starts to increase conv1 surrogate data real data real labels labels

Freeze or fine-tune? LR > 0 loss Fine tuned Bottom n layers can be frozen or fine tuned. Frozen: not updated during backprop Fine-tuned: updated during backprop fc2 + softmax fc1 Which to do depends on target task: Freeze: target task labels are scarce, and we want to avoid overfitting Fine-tune: target task labels are more plentiful In general, we can set learning rates to be different for each layer to find a tradeoff between freezing and fine tuning frozen conv3 conv2 conv1 LR = 0 data labels

How transferable are features? Lower layers: more general features. Transfer very well to other tasks. Higher layers: more task specific. Fine-tuning improves generalization when sufficient examples are available. Transfer learning and fine tuning often lead to better performance than training from scratch on the target dataset. Even features transferred from distant tasks are often better than random initial weights! Yosinki et al. How transferable are features in deep neural networks. NIPS 2014. https://arxiv.org/abs/1411.1792

Unsupervised domain adaptation Also possible to do domain adaptation without labels in target set. Y Ganin and V Lempitsky, Unsupervised Domain Adaptation by Backpropagation, ICML 2015 https://arxiv.org/abs/1409.7495

Unsupervised domain adaptation Y Ganin and V Lempitsky, Unsupervised Domain Adaptation by Backpropagation, ICML 2015 https://arxiv.org/abs/1409.7495

Summary Possible to train very large models on small data by using transfer learning and domain adaptation Off the shelf features work very well in various domains and tasks Lower layers of network contain very generic features, higher layers more task specific features Supervised domain adaptation via fine tuning almost always improves performance Possible to do unsupervised domain adaptation by matching feature distributions