Deep Learning Basics Lecture 11: Practical Methodology. Princeton University COS 495 Instructor: Yingyu Liang

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

(Sub)Gradient Descent

arxiv: v1 [cs.lg] 7 Apr 2015

Artificial Neural Networks written examination

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

arxiv: v1 [cs.cv] 10 May 2017

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

arxiv: v1 [cs.lg] 15 Jun 2015

Model Ensemble for Click Prediction in Bing Search Ads

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Knowledge Transfer in Deep Convolutional Neural Nets

Active Learning. Yingyu Liang Computer Sciences 760 Fall

arxiv: v2 [cs.cv] 30 Mar 2017

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Assignment 1: Predicting Amazon Review Ratings

CS Machine Learning

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Software Maintenance

Second Exam: Natural Language Parsing with Neural Networks

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Cultivating DNN Diversity for Large Scale Video Labelling

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

A Deep Bag-of-Features Model for Music Auto-Tagging

What is a Mental Model?

Generative models and adversarial training

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Learning to Schedule Straight-Line Code

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Faculty Schedule Preference Survey Results

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Discriminative Learning of Beam-Search Heuristics for Planning

Learning From the Past with Experiment Databases

Lecture 1: Basic Concepts of Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Dropout improves Recurrent Neural Networks for Handwriting Recognition

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Online Updating of Word Representations for Part-of-Speech Tagging

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

arxiv: v4 [cs.cl] 28 Mar 2016

Attributed Social Network Embedding

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

arxiv: v1 [cs.cl] 27 Apr 2016

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

An Introduction to Simio for Beginners

Ensemble Technique Utilization for Indonesian Dependency Parser

SARDNET: A Self-Organizing Feature Map for Sequences

Device Independence and Extensibility in Gesture Recognition

WHEN THERE IS A mismatch between the acoustic

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Deep Neural Network Language Models

Rule Learning With Negation: Issues Regarding Effectiveness

CS 446: Machine Learning

Residual Stacking of RNNs for Neural Machine Translation

Georgetown University at TREC 2017 Dynamic Domain Track

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

arxiv: v1 [cs.cl] 20 Jul 2015

A study of speaker adaptation for DNN-based speech synthesis

Rule Learning with Negation: Issues Regarding Effectiveness

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

THE world surrounding us involves multiple modalities

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

TD(λ) and Q-Learning Based Ludo Players

Calibration of Confidence Measures in Speech Recognition

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

Dialog-based Language Learning

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

An empirical study of learning speed in backpropagation

A Reinforcement Learning Variant for Control Scheduling

CSL465/603 - Machine Learning

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

INPE São José dos Campos

Human Emotion Recognition From Speech

Beyond the Pipeline: Discrete Optimization in NLP

arxiv: v2 [cs.ir] 22 Aug 2016

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

IMGD Technical Game Development I: Iterative Development Techniques. by Robert W. Lindeman

A Review: Speech Recognition with Deep Learning Methods

arxiv: v1 [cs.cl] 2 Apr 2017

Reinforcement Learning by Comparing Immediate Reward

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Transcription:

Deep Learning Basics Lecture 11: Practical Methodology Princeton University COS 495 Instructor: Yingyu Liang

Designing process

Practical methodology Important to know a variety of techniques and understand their pros and cons In practice, can do much better with a correct application of a commonplace algorithm than by sloppily applying an obscure algorithm

Practical designing process 1. Determine your goals: input and output; evaluation metrics 2. Establish an end-to-end pipeline 3. Determine bottlenecks in performance 4. Repeatedly make incremental changes based on findings From Andrew Ng s lecture and the book deep Learning

Practical designing process 1. Determine your goals: input and output; evaluation metrics What is the input of the system? What is the output of the system? What can be regarded as a good system? Accuracy? Speed? Memory? 2. Establish an end-to-end pipeline 3. Determine bottlenecks in performance 4. Repeatedly make incremental changes based on findings

Practical designing process 1. Determine your goals: input and output; evaluation metrics 2. Establish an end-to-end pipeline Design the system as soon as possible, no need to be perfect Can be based on existing systems for similar goals 3. Determine bottlenecks in performance 4. Repeatedly make incremental changes based on findings

Practical designing process 1. Determine your goals: input and output; evaluation metrics 2. Establish an end-to-end pipeline 3. Determine bottlenecks in performance Divide the system into components Diagnose which component performing worse than expected Overfitting? Underfitting? Bugs in the software? Bad/too small dataset? 4. Repeatedly make incremental changes based on findings

Practical designing process 1. Determine your goals: input and output; evaluation metrics 2. Establish an end-to-end pipeline 3. Determine bottlenecks in performance 4. Repeatedly make incremental changes based on findings Do not make big changes (unless the system just too bad) Replace system component? Change optimization algorithm? Adjust hyperparameters? Get more/new data?

To begin with

Deep learning? First question: do you really need deep learning systems? Maybe simple models like logistic regression/svm suffice for your goals (i.e., shallow models) Choose deep learning if The task fall into the areas that deep learning is known to perform well The task is complicated enough that deep models have a better chance to win

Which networks to choose? Based on the input and the goal Vector input, supervised learning: feedforward networks If know input topological structure, use convolution Activation function: typically ReLU

Which networks to choose? Based on the input and the goal Vector input, unsupervised: generative model; autoencoder; energy based model Highly depend on your goal

Which networks to choose? Based on the input and the goal Sequential input: Recurrent network LSTM (long-short term memory network) GRU (Gated Recurrent Unit) Memory network Attention-based variants

Which optimization algorithm? SGD with momentum and a decaying learning rate Momentum: 0.5 at the beginning and 0.9 at the end Learning rate decaying schemes linearly until reaching a fixed minimum learning rate decaying exponentially decreasing the learning rate by a factor of 2-10 each time validation error plateaus

What regularizations? l 2 regularization Early stopping Dropout Batch Normalization: can replace dropout Data augmentation if the transformations known/easy to implement

Reusing models If your task is similar to another task studied: copy the model/optimization algorithm/hyperparameters, improve them Even can copy the trained models and then fine-tune it

Whether to use unsupervised pretraining? NLP: yes, use word embeddings almost all the time Computer vision: not quite; unsupervised now only good for semisupervised learning (a few labeled data, a lot of unlabeled data)

Tuning hyperparameters

Why? Performance: training/test errors; reconstruction; generative ability Resources: training time; test time; memory

Two types of approaches Manually tune: need to understand the hyperparameters and their effects on the goals Automatically tune: need resources

Manually tune Need to know: the relationship between hyperparameters and training/test errors and computational resources (memory and runtime) Example: increase number of hidden units in each layer will Increase the model capacity Increase the generalization error (= test error training error) Increase memory and runtime

Automatically tune Grid search Random search Model-based optimization (another level of optimization) Variables: hyperparameters Objective: validation errors

Debugging strategies

Difficulties Do not know a prior what performance/behavior to expect Components of the model can adapt for each other One components fails but the other components adapt to cover the failure

Debugging Try a small dataset Faster, save time Inspect components Monitor histograms of activations and gradients Compare symbolic derivatives to numerical derivatives Compare training/validation/test errors Overfitting or underfitting? Focus on worst mistake On which data points it perform worst? Why?