Adapting Deep Learning to New Data Using ORNL s Titan Supercomputer

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Generative models and adversarial training

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

CS Machine Learning

INPE São José dos Campos

Artificial Neural Networks written examination

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Knowledge Transfer in Deep Convolutional Neural Nets

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

arxiv: v1 [cs.lg] 15 Jun 2015

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Word Segmentation of Off-line Handwritten Documents

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Seminar - Organic Computing

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Human Emotion Recognition From Speech

Model Ensemble for Click Prediction in Bing Search Ads

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

arxiv: v4 [cs.cl] 28 Mar 2016

Forget catastrophic forgetting: AI that learns after deployment

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

(Sub)Gradient Descent

Laboratorio di Intelligenza Artificiale e Robotica

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Australian Journal of Basic and Applied Sciences

Laboratorio di Intelligenza Artificiale e Robotica

Softprop: Softmax Neural Network Backpropagation Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Rule Learning With Negation: Issues Regarding Effectiveness

Calibration of Confidence Measures in Speech Recognition

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

LEARNING THROUGH INTERACTION AND CREATIVITY IN ONLINE LABORATORIES

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Assignment 1: Predicting Amazon Review Ratings

SARDNET: A Self-Organizing Feature Map for Sequences

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Major Milestones, Team Activities, and Individual Deliverables

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Moderator: Gary Weckman Ohio University USA

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Test Effort Estimation Using Neural Network

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Software Maintenance

Rule Learning with Negation: Issues Regarding Effectiveness

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Skillsoft Acquires SumTotal: Frequently Asked Questions. October 2014

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

IMPROVE THE QUALITY OF WELDING

Classification Using ANN: A Review

Lip Reading in Profile

While you are waiting... socrative.com, room number SIMLANG2016

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Evolutive Neural Net Fuzzy Filtering: Basic Description

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Linking Task: Identifying authors and book titles in verbose queries

arxiv: v1 [cs.cv] 10 May 2017

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Top US Tech Talent for the Top China Tech Company

A study of speaker adaptation for DNN-based speech synthesis

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Exposé for a Master s Thesis

Cooperative evolutive concept learning: an empirical study

A Vector Space Approach for Aspect-Based Sentiment Analysis

THE enormous growth of unstructured data, including

An Introduction to Simio for Beginners

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Reducing Features to Improve Bug Prediction

Title:A Flexible Simulation Platform to Quantify and Manage Emergency Department Crowding

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v2 [cs.cv] 30 Mar 2017

Evolution of Symbolisation in Chimpanzees and Neural Nets

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Learning to Schedule Straight-Line Code

Attributed Social Network Embedding

Institutionen för datavetenskap. Hardware test equipment utilization measurement

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Axiom 2013 Team Description Paper

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Evaluation of Learning Management System software. Part II of LMS Evaluation

Probability estimates in a scenario tree

Modeling function word errors in DNN-HMM based LVCSR systems

Learning Methods for Fuzzy Systems

Modeling function word errors in DNN-HMM based LVCSR systems

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Speech Emotion Recognition Using Support Vector Machine

WHEN THERE IS A mismatch between the acoustic

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Transcription:

Adapting Deep Learning to New Data Using ORNL s Titan Supercomputer Steven R. Young Travis Johnston Oak Ridge National Laboratory ORNL is managed by UT-Battelle for the US Department of Energy

Overview Deep Learning for Problems of National Interest Challenges Tools Next Steps 2 Adapting DL to New Data

Deep Learning for National Interest Problems Commercial Interest State of the Art Results Object Recognition National Interest Challenging New Domains Material Science High Energy Physics Face Recognition Remote Sensing Characteristics Data is easy to collect Inexpensive labels Characteristics Data is difficult to collect Few labels available 3 Adapting DL to New Data

Problem: Adaptability Challenge Premise: For every data set, there exists a corresponding neural network that performs ideally with that data What s the ideal neural network architecture (i.e., hyper-parameters) for a particular data set? Widely-used approach: intuition 1. Pick some deep learning software (Caffe, Torch, Theano, etc) 2. Design a set of parameters that defines your deep learning network 3. Try it on your data 4. If it doesn t work as well as you want, go back to step 2 and try again. 4 Adapting DL to New Data

The Challenge Deep Learning Toolbox Output Learning Rate Batch Size Fully Connected Pooling Output Fully Convolutional Connected Pooling Momentum Weight Decay Convolutional Pooling Convolutional Input 5 Adapting DL to New Data

The Challenge Deep Learning Toolbox Output Learning Rate Batch Size Convolutional Momentum Weight Decay Convolutional Input 6 Adapting DL to New Data

Current Approaches to Hyper-parameter Optimization Use out-of-the-box network Why spend time trying to create your own network when there are already so many good ones available? Surely, one of those networks will also solve your problem. Tune an out-of-the-box network Hyper-parameter sweeps Assumes independence of hyper-parameters Grid search Requires training an exponential number of networks (infeasible) Bergstra, J, and Bengio, Y. Random Search for Hyperparameter Optimization, Journal of Machine Learning Research, Feb. 2012. Random search Significant improvement over grid search, but doesn t make use of information learned during training. 7 Adapting DL to New Data

What can we do with Titan? 18,688 GPUs 8 Adapting DL to New Data

Two Approaches RAvENNA: RApidly Evolving Neural Network Architecture Optimizes hyper-parameters of a pre-existing network. MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning Constructs neural networks from scratch. Chooses number of layers, layer types, and layer hyper-parameters. 9 Adapting DL to New Data

RAvENNA: Improved Random Search Bad Hyperparameters Good Hyperparameters 10 Adapting DL to New Data

RAvENNA: Does smart searching help? Random Search Smart Search 11 Adapting DL to New Data

RAvENNA: Current status, quick stats Implemented in Apache-Spark and Caffe. Running on Titan Typical jobs 1,000-4,000 nodes (1 GPU / node) Have run optimizations on up to 18,000 nodes Applied to several datasets/problems Image segmentation (cloud detection in overhead imagery) Model prediction (neutron scattering data) Crystal lattice structure prediction 12 Adapting DL to New Data

MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning Evolutionary algorithm as a solution for searching hyper-parameter space for deep learning Focus on Convolutional Neural Networks Evolve only the topology with EA; typical SGD training process Generally: Provide scalability and adaptability for many data sets and compute platforms Leverage more GPUs; ORNL s Titan has 18k GPUs Next generation, Summit, will have increased GPU capability Provide the ability to apply DL to new datasets quickly Climate science, material science, physics, etc. 13 Adapting DL to New Data

Designing the Genetic Code Parameters Feature Layers Classification Layers Individual - Network Goal: facilitate complete network definition exploration Population Group of Networks Each population member is a network which has a genome with sets of genes Fixed width set of genes corresponds to a layer Layers contain multiple distinct parameters Restrict layer types based on section Feature extraction and classification Minor guided design in network, otherwise we attempt to fully encompass all layer types 14 Adapting DL to New Data

MENNDL: Communication Genetic Algorithm Master Gene: Population Network Parameters Fitness Metrics: Accuracy MPI Network 1 Parameters, Model Predictions Performance Metrics Network 2 Parameters, Model Predictions Performance Metrics Worker (one per node) Network N Parameters, Model Predictions Performance Metrics 15 Adapting DL to New Data

Hyper-parameter Values and Improved Performance Evolved Currently T&E of latest code that changes all possible parameters (e.g., # of layers, layer types, etc) Using just 4 nodes From 27% to 65% Accuracy 16 Adapting DL to New Data

Hyper-parameter Values and Improved Performance Evolved Improved performance over known good network Using just 4 nodes From 75% to 82% 17 Adapting DL to New Data

Unusual Layers (limited training examples) 18 Adapting DL to New Data

MINERvA Detector Vertex Reconstruction Goal: Classify which segment the vertex is located in. Challenge: Events can have very different characteristics. 19 Adapting DL to New Data

Application: 3D Electron Microscopy St. Jude Children s Research Hospital is interested in developing tools which will aid biologists in labeling and analyzing new image volumes for the location, density, shape, and other characteristics of sub-cellular structures such as mitochondria. Segmentation of 3D electron microscopy (EM) imagery is an important initial characterization task as mitochondria are relatively distinct but occur in a variety of locations, shapes, and sizes. MENNDL evaluated nearly 900k convolutional networks on +18k of Titan s nodes for 24 consecutive hours. Achieved a classification accuracy of 93.8%, representing a 30% reduction in error vs. a human expert defined network configuration. 20 Adapting DL to New Data

MENNDL Current Status Scaled to 18,000 nodes of Titan 460,000 Networks evaluated in 24 hours Expanding to more complex topologies Evaluating on a wide range of science datasets Preparing for Summit (6 Volta GPUs per node, 4,600 nodes) 21 Adapting DL to New Data

Acknowledgements Gabriel Perdue (FNAL) and Sohini Upadhyay (University of Chicago) Adam Terwilliger (Grand Valley State University) and David Isele (University of Pennsylvania) Robert Patton, Seung-Hwan Lim, Thomas Karnowski, and Derek Rose (ORNL) Devin White and David Hughes (ORNL) 22 Adapting DL to New Data

Questions 23 Adapting DL to New Data