Deep Learning: An Overview. Bradley J Erickson, MD PhD Mayo Clinic, Rochester

Size: px

Start display at page:

Download "Deep Learning: An Overview. Bradley J Erickson, MD PhD Mayo Clinic, Rochester"

Valentine Harrington
6 years ago
Views:

1 Deep Learning: An Overview Bradley J Erickson, MD PhD Mayo Clinic, Rochester Medical Imaging Informatics and Teleradiology Conference 1:30-2:05pm June 17, 2016

2 Disclosures Relationships with commercial interests: Board of OneMedNet Board of VoiceIT

3 What is Machine Learning? It is a part of Artificial Intelligence Finds patterns in data Patterns that reflect properties of examples (supervised) Patterns that separate examples (unsupervised) (Other types of artificial intelligence include rules systems)

4 Machine Learning Classes Supervised Unsupervised Reinforced ANN Clusters SVM Adaptive Resonance Random Forest Bayes DNN

5 Machine Learning History Artificial Neural Networks (ANN) Starting point of machine learning Early versions didn t work well Other Machine Learning Methods Naïve Bayes Support Vector Machine (SVM) Random Forest Classifier (RFC)

6 Artificial Neural Network/Perceptron Input Layer Hidden Layer Output Layer f(σ) T1 Pre f(σ) f(σ) Tumor T1 Post f(σ) f(σ) Brain T2 f(σ) f(σ)

7 Artificial Neural Network/Perceptron Input Layer Hidden Layer Output Layer f(σ) T1 Pre 45 f(σ) f(σ) Tumor T1 Post 322 f(σ) f(σ) Brain T2 128 f(σ) f(σ)

8 Artificial Neural Network/Perceptron Input Layer Hidden Layer Output Layer 57 T1 Pre f(σ) Tumor T1 Post f(σ) Brain T

9 Artificial Neural Network/Perceptron Input Layer Hidden Layer Output Layer 57 T1 Pre Tumor T1 Post Brain T

10 How ANNs Learn Propagation Multiple prior layer node value times weight Activation function. E.g. threshold the sum Weight Update Compute error = actual output expected output Weight gradient = error * input value New weight = old weight * gradient * learning rate

11 Learning = Optimization Problem Learning depends on: Correct gradient directions Correct gradient multiplier (learning rate) Global Minimum Local Minimum Small Gradient

12 Support Vector Machines Maps input data to new space Creates hyperplane that separates classes in that space f(x)

13 Deep Learning: Why the Hype? Performance in ImageNet Challenge Team / Software Year Error Rate XRCE (not Deep Learning) % SuperVision (AlexNet) % Clarifai % GoogLeNet (Inception) % Andrej Karpathy (human comparison) % BN-Inception (Arxiv) % Inception-v3 (Arxiv) %

14 What is Deep Learning Deep because it uses many layers ANN typically had 3 or fewer layers

15 DNNs have 15+ layers

16 Types of DNNs Convolutional Neural Network (CNN) Early layers have windows of image as input Multiplied by a kernel to get output Known as a convolution

17 Types of DNNs Convolutional Neural Network (CNN) Early layers have windows of image as input Multiplied by a kernel to get output Known as a convolution

18 Types of DNNs Convolutional Neural Network (CNN) Early layers have windows of image as input Multiplied by a kernel to get output Known as a convolution * =

19 Types of DNNs Convolutional Neural Network (CNN) Early layers have windows of image as input Multiplied by a kernel to get output Known as a convolution /

20 Types of DNNs Convolutional Neural Network (CNN) Early layers have windows of image as input Multiplied by a kernel to get output Known as a convolution /

21 Why the Excitement Now? Advances That Addressed Problems Many layers -> Overfitting Implement sparsity in weights: Dropout

22 Why the Excitement Now? Advances That Addressed Problems Many layers -> Vanishing Gradients Drop out partially addresses this Can use pre-trained weights for early layers, and fix those, with weights of later layers for learning higher level features

23 Typical CNNs Convolution Pooling Pooling Convolution Pooling Fully Connected

24 Typical CNNs Andrei Karpathy:

25 Why the Excitement Now? Batch Normalization What should be the initial set of weights connecting nodes? All the same = no gradients Random. But what range of values? BatchNorm: After each Convolutional layer Subtract mean / divide by standard deviation Simple but effective

26 Why the Excitement Now? Residual Networks Residual defines if and how to pass data through from layer to layer. Makes deep network construction reliable *Targ, ICLR 2016

27 Why the Excitement Now? Deep Neural Network Theory Exponential Compute Power Growth

28 Moore s Law Computing performance doubles approximately every 18 months

29 Exponentials In Real Life If you put 1 drop of water into a football stadium, and then double the number of drops each minute: At 5 minutes, you will have 32 drops At 45 minutes, you will cover the field 1" At 55 minutes, the stadium will be full It is not natural for humans to grasp exponential growth

30 Deep Learning Works Well on GPUs Naturally parallel Less precision (single precision FP) actually can be advantage Now building cards with no video output and optimized for Deep Learning (P-100)

31 GPUs are Beating Moore s Law 1,000, ,000 10,000 FPGA FPGA TPU GPU CPU Ice Age

32 Deep Learning Myths You Need Millions of Exams to Train and Use Deep Learning Methods

33 Deep Learning Myths You Need Millions of Exams to Train and Use Deep Learning Methods

34 Ways To Avoid Need For Large Data Data Augmentation Sets Essentially, creating variants of data that are different enough that they are learnable Similar enough that they teaching point is kept Mirror/Flip/Rotate/Contrast/Crop

35 Image Conv Conv MaxPool Conv Conv MaxPool Conv Conv MaxPool Fully Connected Fully Connected Fully Connected SoftMax Ways To Avoid Need For Large Data Sets Data Augmentation Transfer Learning Train on Large Corpus like ImageNet

36 Image Conv Conv MaxPool Conv Conv MaxPool Conv Conv MaxPool Fully Connected Fully Connected Fully Connected SoftMax Ways To Avoid Need For Large Data Sets Data Augmentation Transfer Learning Freeze These Layers

37 Image Conv Conv MaxPool Conv Conv MaxPool Conv Conv MaxPool Fully Connected Fully Connected Fully Connected SoftMax Ways To Avoid Need For Large Data Sets Data Augmentation Transfer Learning Freeze These Layers Train this

38 Take Home Point Deep Learning Learns Features and Connections vs Just Connections Hand-Crafted Feature Extraction Classifier Learning Feature Extractor Classifier

39 Examples of CNN in Medical Imaging: Body Part *Roth, Arxiv 2016

40 Moeskops, IEEE-TMI, 2016 Examples of CNN in Medical Imaging: Segmentation

41 Mayo: AutoEncoder for Segmentation Dataset Trained on Brats 2015 Flair enhancing signal Preprocessing N4 bias correction Nuyl intensity normalization Autoencoders trained on ROIs (size=12) Time 1 hour for 155 slices (DNN would be days or weeks) Korfiatis, Submitted

42 What is an AutoEncoder? Korfiatis, Submitted

43 Dice = 0.92 over BRATS dataset Korfiatis, Submitted

44 Machine Learning & Radiomics Computers find textures reflecting genomics: 1p19q 85 Subjects with FISH results, computed multiple textures, SVM SVM Abstract # Features Sens Spec F-score Accuracy Naïve Bayes Erickson, Proc ASNR, 2016

45 Machine Learning & Radiomics 155 Subjects, GBM, MGMT Methylation Compute textures (T2 was best) -> SVM Korfiatis, Med Phys, 2016

46 Deep Learning: MGMT Methylation Same set of patients, use VGGNet / Xfer: Az=0.86 Autoencoder is giving nearly as good performance and trains about 10x faster Now testing DeepMedic and RNN Korfiatis, unpublished

47 The Pace of Change

48 Will Computers Replace Radiologists? Deep Learning will likely be able to create reports for diagnostic images in the future. 5 years: Mammo & CXR 10 years: CT Head, Chest, Abd, Pelvis, MR head, knee, shoulder, US: liver, thyroid, carotids years: most diagnostic imaging Will likely see more than we do today Will allow radiologists for focus on patient interaction and invasive procedures

49 How Might Medicine Best Embrace Deep Learning

50 How Might Medicine Best Embrace Deep Learning Algorithms for Machine Learning are rapidly improving. CNN are not the only game in town Hardware for Machine Learning is REALLY rapidly improving The amount of change in 20 years will be unbelievable

51 How Might Medicine Best Embrace Deep Learning Medicine needs to remain flexible about hardware and software The VALUE is in the data and metadata Physicians are OBLIGATED to make sure the data are properly handled. Improper interpretation of data will lead to bad implementations and poor patient care Non-cooperation is also counter-productive

Python Machine Learning

Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled