Perspective on HPC-enabled AI Tim Barr September 7, 2017

Similar documents
Python Machine Learning

(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

CS Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Basic Concepts of Machine Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

arxiv: v1 [cs.lg] 15 Jun 2015

Artificial Neural Networks written examination

Model Ensemble for Click Prediction in Bing Search Ads

Axiom 2013 Team Description Paper

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Learning Methods for Fuzzy Systems

Circuit Simulators: A Revolutionary E-Learning Platform

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Artificial Neural Networks

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

INPE São José dos Campos

Knowledge Transfer in Deep Convolutional Neural Nets

arxiv: v1 [cs.cv] 10 May 2017

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Calibration of Confidence Measures in Speech Recognition

CSL465/603 - Machine Learning

Top US Tech Talent for the Top China Tech Company

InTraServ. Dissemination Plan INFORMATION SOCIETY TECHNOLOGIES (IST) PROGRAMME. Intelligent Training Service for Management Training in SMEs

An Introduction to Simio for Beginners

Probabilistic Latent Semantic Analysis

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

A study of speaker adaptation for DNN-based speech synthesis

Laboratorio di Intelligenza Artificiale e Robotica

MYCIN. The MYCIN Task

Learning to Schedule Straight-Line Code

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Seminar - Organic Computing

Softprop: Softmax Neural Network Backpropagation Learning

Evolutive Neural Net Fuzzy Filtering: Basic Description

Word Segmentation of Off-line Handwritten Documents

Evolution of Symbolisation in Chimpanzees and Neural Nets

Forget catastrophic forgetting: AI that learns after deployment

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

SARDNET: A Self-Organizing Feature Map for Sequences

Skillsoft Acquires SumTotal: Frequently Asked Questions. October 2014

Modeling function word errors in DNN-HMM based LVCSR systems

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Mining Association Rules in Student s Assessment Data

An empirical study of learning speed in backpropagation

BLACKBOARD & ANGEL LEARNING FREQUENTLY ASKED QUESTIONS. Introduction... 2

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Laboratorio di Intelligenza Artificiale e Robotica

Modeling function word errors in DNN-HMM based LVCSR systems

Like much of the country, Detroit suffered significant job losses during the Great Recession.

arxiv: v2 [cs.cv] 30 Mar 2017

Emergency Management Games and Test Case Utility:

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

EXECUTIVE SUMMARY. Online courses for credit recovery in high schools: Effectiveness and promising practices. April 2017

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

A Pipelined Approach for Iterative Software Process Model

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

European Cooperation in the field of Scientific and Technical Research - COST - Brussels, 24 May 2013 COST 024/13

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Deep Neural Network Language Models

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Test Effort Estimation Using Neural Network

STRATEGIC GROWTH FROM THE BASE OF THE PYRAMID

A Case Study: News Classification Based on Term Frequency

Knowledge-Based - Systems

Time series prediction

Harness the power of public media and partnerships for the digital age. WQED Multimedia Strategic Plan

The Enterprise Knowledge Portal: The Concept

Speech Emotion Recognition Using Support Vector Machine

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Lucintel. Publisher Sample

FF+FPG: Guiding a Policy-Gradient Planner

The Transformation Agenda Johtaminen digitaalisessa murroksessa Ari Lampela, Johtaja, Pilvi-liiketoiminta. Speech to Text

Australian Journal of Basic and Applied Sciences

Operational Knowledge Management: a way to manage competence

Understanding Co operatives Through Research

SELECCIÓN DE CURSOS CAMPUS CIUDAD DE MÉXICO. Instructions for Course Selection

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

On the Combined Behavior of Autonomous Resource Management Agents

Transcription:

Perspective on HPC-enabled AI Tim Barr September 7, 2017

AI is Everywhere 2

Deep Learning Component of AI The punchline: Deep Learning is a High Performance Computing problem Delivers benefits similar to HPC in other disciplines The value is in the decisions that are enabled Characterized by the same underlying factors Large amount of computation Large amount of data motion (I/O and network) The same methods work HPC Technology and HPC Best Practice apply directly to DL 3

Deep Learning Training: Behind the Scenes Computationally-intensive training phase Process samples Compute gradients locally P 1 P 2 P n } One Mini-batch Global average of gradients Repeat } One Mini-batch Deploying lots of computational power requires lots of communication. 4

Why Are We Here? Faster is better Communication Intensive High Performance Simulation High Performance Machine and Deep Learning More accurate is better Computationally Intensive 5

Let s Use Weather As An Example More Accurate is Better At100km (top) and 25km (bottom) Missed tropical cyclones and big waves up to 30 meters high Faster is Better Higher resolution simulation requires 64X more computation http://www.nersc.gov/news-publications/nersc-news/science-news/2017-2/researchers-catch-extreme-waves-with-high-resolution-modeling 6

HPC and AI Will Converge 2x Digital data is doubling in size every two years, and by 2020 the digital universe will reach 44 zettabytes 2 Big Data Machine Learning Deep Learning 1. Are AI/Machine Learning/Deep Learning in Your Company s Future?, insidebigdata + NVIDIA 2. EMC Digital Universe with Research & Analysis by IDC HPC 28% believe HPC will allow them to scale computationally to build deep learning algorithms that can take advantage of high volumes of data 1 40% Reduction in error rates when 10x more data is being used in coordination with AI in speech recognition 1 7

What is Deep Learning? ARTIFICIAL INTELLIGENCE Design of intelligent systems that augments human productivity. Systems that help decision makers do what they do best; leveraging computers doing what they do best Sense Comprehend Predict Act and Adapt ANALYTICS Search for the what, when, where and why Leverage domain and data science to query datasets for insights: Descriptive What happened? MACHINE LEARNING Learn patterns from the past to predict future Unsupervised Group, cluster and organize content with domain-specific heuristic models Supervised Train mathematical predictive models with labelled data Diagnostic Why did it happen? DEEP LEARNING Predictive What will happen? Train and use neural networks as a predictive model Prescriptive How to make it happen? Vision Speech Language 8

Performance will be an AI Innovation and Adoption Driver AI and machine learning have reached a critical tipping point and will increasingly augment and extend virtually every technology enabled service, thing or application. The combination of extensive parallel processing power, advanced algorithms and massive data sets to feed the algorithms has unleashed this new era. Gartner s Top 10 Strategic Technology Trends for 2017 Fast data is just as important as big data. In 2016, we ll witness the emergence of a new class of real-time applications in e- commerce and financial technology services powered by superspeedy data analytics. Fast data is the second iteration of big data, and it will create a lot of value. Fortune Magazine, December 2015 In a competitive international economy, advanced AI combined with supercomputing are essential ingredients for: Solution of strategically important problems Maintaining global leadership in industry, government and academia Creating next generation technologies, products and services 9

Deep Learning Will Require Supercomputing An AI Revolution Started For Courageous Enterprises Yes, Deep Learning Warrants All The Fuss Expect To Need Thousands Of Cores 10

Deep Learning with Supercomputers NERSC Deep Learning in Science Opportunities to apply DL widely in support of classic HPC simulation and modelling 11

Deep Learning in Automotive Noise, Vibration and Harshness at Daimler Noise, Vibration and Harshness is a traditional HPC application used in automotive and aerospace Deep Learning has the potential to do an automatic evaluation of results in complex, multicomponent, non-linear applications 12

Deep Learning Examples in Manufacturing Aerospace Drones 10-fold increase in the commercial drone fleet by 2021 FAA, 2017 Digital Twin Top 10 technologies for 2017, Gartner Autonomous Vehicle OEMs will invest $7 billion in development Frost &Sullivan, 2016 Leveraging data analytics and deep learning between engineering disciplines and across the enterprise has great potential for product quality and innovation 13

When Should You Start? A Sample from the Financial Services Sector ROI payoff will be 1 2 years Time to begin experimentation is now See significant ROI Beginning to see ROI Will not see ROI imminently Will not see ROI for sometime 10% 25% 46% 17% <1 year 1 year 1 to 2 years 3 to 4 years 5 to 7 years ROI Timeline Source: Innovita Partners, 7/2017, exclusively for Cray 14

Why Deep Learning Now? "Large Enough" Data to Train Compute Power Advanced Algorithms and Software Frameworks Data Science Expertise Deep Learning Now Electronic brain Perceptron ADALINE XOR Backpropagation SVM Deep Learning Golden Age AI Winter Adjustable weights Weights are not learned Learnable weights and threshold XOR Problem Solution to nonlinearly separable problems Big computation, local optima/overfitting Limitations of learning prior Kernel function: Human intervention Image Source: Andrew L. Beam. (2017, February 13). Deep Learning 101 Part 1:History and Background[Blog post]. Retrieved from https://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html Hierarchical feature learning 15

Deep Learning Challenges AI systems still demand considered design, knowledge engineering and model building, Forrester AI TechRadar Q1 2017 A lot to learn for practitioners and end-users: Large, complex workflows Different Toolkits + Data Movement + Network Defining the value returned to the business Training times grow with data sizes and complexity: Days to Weeks Compounded with hyper parameter optimization (O(1000) is not unrealistic) 16

HPC and AI Enabling resource intensive training by delivering performance efficiencies and scalability Architectures Deep Learning Platforms - dense GPU to scalable platforms with optimized software stacks Platforms Software Expertise Apply HPC best practices and expertise to improve deep learning frameworks and core algorithms 17

Reduce Total Workflow Time Why? The Deep Neural Net Training Problem DNN model with weights on all connections Largest models now hundreds of layers, and millions (to billions) of nodes Large set of labeled training data Idealized training algorithm: For every minibatch of training samples: run samples forward through the model compute the error vs. the training data A (not particularly deep) neural net back-propagate error through the NN to update the weights (gradient descent) After all data processed, iteratively optimize hyperparameters until required accuracy is achieved 18

Reduce Total Workflow Time Data Acquisition Data Preparation Apply HPC best practices and expertise to improve deep learning frameworks and core algorithms Model Training Model Testing Minutes, Hours: Interactive research! Instant gratification! 1-4 days Tolerable Interactivity replaced by running many experiments in parallel 1-4 weeks: High value experiments only Progress stalls >1 month Don t even try Source: Large-Scale Deep Learning for Intelligent Computer Systems, Jeff Dean, Google 19

Cray Focus: Deep Learning Training at Scale CNTK: Distributed Version vs Cray MPI Parallel Implementation Epoch Elapsed Time (Seconds) Apply HPC Best Practices and Cray Expertise to improve DL systems and core algorithms with real-world use cases Collaborations across Cray customers and other stakeholders Currently optimizing different toolkits: CNTK TensorFlow MXNet 700 600 500 400 300 200 Applying a supercomputing approach to optimize deep learning workloads represents a powerful breakthrough for training and evaluating deep learning algorithms at scale. Our collaboration with Cray and CSCS has demonstrated how the Microsoft Cognitive Toolkit can be used to push the boundaries of deep learning. 100 0 64 Nodes 128 Nodes 256 Nodes 512 Nodes 1024 Nodes 2048 Nodes - Dr. Xuedong Huang, distinguished engineer, Microsoft AI and Research Microsoft Cognitive Toolkit 20

HPC Focus: Comprehensive Systems Configuration Data Collection Data Verification ML Code Machine Resource Management Analysis Tools Serving Infrastructure Monitoring Feature Extraction Process Management Tools Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex. -Adapted from Hidden Technical Debt in Machine Learning Systems, Sculley et. al., NIPS 15 21

HPC Supports the Entire AI Workflow Deep Learning workflows are not limited to training. Data Acquisition Data Preparation Iterative Model Training Model Testing Similar to other HPC and analytics workloads, significant portions of DL jobs are devoted to data collection, preparation and management. Cleansing Shaping Enrichment Data Annotation (Ground Truth) Training Set Test Set Validation Set Train Model Evaluate Performance and optimize model Cross- Validation 22

AI is everywhere Even the grocery store 23

Thank You