Deep Learning and Storage - PDF Free Download

Keep Those GPUs Busy Deep Learning and Storage Igor Ostrovsky igor@purestorage.com 1

THREE PILLARS OF DEEP LEARNING EXPERTISE TECHNIQUES & TOOLS COMPUTE FROM CPU TO GPU SERVERS DATA MASSIVE TRAINING SETS 2

THREE PILLARS OF DEEP LEARNING EXPERTISE TECHNIQUES & TOOLS COMPUTE FROM CPU TO GPU SERVERS DATA MASSIVE TRAINING SETS 3

DEEP LEARNING AND DATA 4

MODEL SIZE GROWTH BEN TAYLOR, ZIFF GB kb MB 1987 1997 2007 2015 5

BIG MODELS NEED BIG DATA SETS Dog Cat Dog or cat? Powerful models have many parameters Many parameters require big training sets 6

Performance DEEP LEARNING AND TRAINING SET OBSERVATION BY ANDREW NG, AI LEADER Deep learning Older learning algorithms Amount of data 7

TRAINING SETS FOR DEEP LEARNING 2011 flickr30k 2012 GBs TBs PBs 8

TRAINING SETS FOR DEEP LEARNING Synthetic data 2011 flickr30k 2012 GBs TBs PBs 9

We don t have better algorithms. We just have more data. PETER NORVIG Google Research Director 10

DEEP LEARNING INFRASTRUCTURE FOR TRAINING 11

INGEST COMPLEXITIES OF AI IN PRODUCTION CLEAN & TRANSFORM EXPLORE TRAIN From sensors, machines, & user generated Label, anomaly detection, ETL, prep, stage Quickly iterate to converge on models Run for hours to days in production cluster CPU Servers GPU Server GPU Production Cluster COPY & TRANSFORM COPY & TRANSFORM COPY & TRANSFORM 12

INGEST REAL WORLD PIPELINE IN AN AUTONOMOUS CAR COMPANY CLEAN, LABEL, RESIZE EXPLORE TRAIN INFERENCE IN VIRTUAL WORLD CPU Servers GPU Server GPU Production Cluster GPU Production Cluster 10 S OF PB COLD STORAGE 13

SOFTWARE PIPELINE EXAMPLE 14

WIDE RANGE OF NEEDS IN THE PIPELINE SIGNIFICANT CHALLENGE INGEST CLEAN & TRANSFORM EXPLORE TRAIN & VALIDATE Sensors, Synthetic CPU Servers GPU Server GPU Cluster 15

WHAT IS FLASHBLADE? 17 TB or 52 TB blades 1.5 M IOPS and 16 GB/s performance ELASTIC FABRIC is FAST AND SIMPLE and scales linearly NFS, S3, SMB, HTTP and 1.6 PB (3:1) N+2 redundancy POWER max 1850 WATT fully loaded 16

FROM THE EXPERTS Building and managing data pipelines is typically one of the most costly pieces of a complete machine learning solution. If your boss asks you, tell them that I said [to] build a unified data warehouse. Jeremy Hermann & Mike Del Balso Uber Machine Learning Platform https://eng.uber.com/michelangelo/ Andrew Ng Former head of Baidu AI/Google Brain Nuts and Bolts of Applying Deep Learning 17

TRAINING BENCHMARKS 18

AI SYSTEMS DESIGN PATTERNS HOW MUCH PERFORMANCE PENALTY DUE TO SHARED STORAGE? FULL TRAINING WORKFLOW decode scale evaluate forward-propagation update back-propagation I/O CPU GPU BENCHMARK SETUP Setup #1: DGX-1 with 4x Local SSDs Setup #2: DGX-1 with 1x FlashBlade 19

IMAGENET TRAINING DGX-1, 8 x P100 Model Year Top-5 DGX-1 images/s AlexNet 2012 84.7% 9968 Image classification challenge 1.28M labeled images 1000 categories VGG-16 2014 92.5% 1093 Inception V3 2015 94.4% 1052 ResNet-50 2016 94.8% 1542 ResNet-152 2016 95.5% 673 20

TIME TO RESULTS TIME TO RESULTS TensorFlow ResNet-50 Training, 200kB Images 1.8 Hours process 10M images 1.8 Hours process 10M images 21 NVIDIA DGX-1 Local SSDs NVIDIA DGX-1 Pure FlashBlade

TIME TO RESULTS TIME TO RESULTS TensorFlow ResNet-50 Training, 200kB Images 0.9 Hours load 2TB 33% Faster 1.8 Hours process 10M images 1.8 Hours process 10M images 22 NVIDIA DGX-1 Local SSDs NVIDIA DGX-1 Pure FlashBlade

TIME TO RESULTS TIME TO FIRST RESULT TensorFlow ResNet-50 Training, 200kB Images 0.9 Hours load 2TB Instant 23 NVIDIA DGX-1 Local SSDs NVIDIA DGX-1 Pure FlashBlade

TIME TO RESULTS TIME TO FIRST RESULT TensorFlow ResNet-50 Training, 200kB Images [M]achine learning is [ ] an iterative process of running the learner, analyzing the results, modifying the data and/or the learner, and repeating. Pedro Domingos professor at University of Washington author of The Master Algorithm 0.9 Hours load 2TB Instant 24 NVIDIA DGX-1 Local SSDs NVIDIA DGX-1 Pure FlashBlade

CASE STUDIES 25

MAKING AUTONOMOUS CARS POSSIBLE BY 2021 Zenuity, a joint venture of Volvo and Autoliv, chose NVIDIA DGX-1 and Pure FlashBlade systems for their deep learning infrastructure. 26

10X FASTER INVESTMENT DECISIONS WITH PURE FLASHBLADE Our quants want to test a model, get the results, and then test another one all day long. 27 Gary Collier, co-cto, Man AHL

LESSONS FROM CUSTOMERS HOW IS STORAGE IMPORTANT? 1. 2. 3. DATA COPY ELIMINATION Dramatically improve time to results & time to first result PERFORMANCE & SCALABILITY Support evolving data pipeline with varying I/O patterns SIMPLICITY Focus more on AI, less on infrastructure 28

igor@purestorage.com