Deep Learning: An Overview. Bradley J Erickson, MD PhD Mayo Clinic, Rochester

Similar documents
Python Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

arxiv: v1 [cs.lg] 15 Jun 2015

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Knowledge Transfer in Deep Convolutional Neural Nets

Generative models and adversarial training

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

arxiv: v1 [cs.cv] 10 May 2017

Lecture 1: Machine Learning Basics

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Artificial Neural Networks written examination

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems

Learning Methods for Fuzzy Systems

CSL465/603 - Machine Learning

Lip Reading in Profile

Softprop: Softmax Neural Network Backpropagation Learning

Reducing Features to Improve Bug Prediction

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Cultivating DNN Diversity for Large Scale Video Labelling

Word Segmentation of Off-line Handwritten Documents

arxiv: v2 [cs.cv] 30 Mar 2017

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Lecture 1: Basic Concepts of Machine Learning

arxiv: v1 [cs.cl] 27 Apr 2016

INPE São José dos Campos

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Human Emotion Recognition From Speech

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Learning to Schedule Straight-Line Code

Speech Recognition at ICSI: Broadcast News and beyond

Issues in the Mining of Heart Failure Datasets

A study of speaker adaptation for DNN-based speech synthesis

(Sub)Gradient Descent

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Forget catastrophic forgetting: AI that learns after deployment

Exploration. CS : Deep Reinforcement Learning Sergey Levine

CS Machine Learning

A Deep Bag-of-Features Model for Music Auto-Tagging

Test Effort Estimation Using Neural Network

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Semi-Supervised Face Detection

Australian Journal of Basic and Applied Sciences

THE enormous growth of unstructured data, including

arxiv: v1 [cs.lg] 7 Apr 2015

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

On the Formation of Phoneme Categories in DNN Acoustic Models

Calibration of Confidence Measures in Speech Recognition

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Linking Task: Identifying authors and book titles in verbose queries

Deep Facial Action Unit Recognition from Partially Labeled Data

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

SORT: Second-Order Response Transform for Visual Recognition

Learning From the Past with Experiment Databases

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Rule Learning With Negation: Issues Regarding Effectiveness

Model Ensemble for Click Prediction in Bing Search Ads

Artificial Neural Networks

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Assignment 1: Predicting Amazon Review Ratings

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Laboratorio di Intelligenza Artificiale e Robotica

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

A Case Study: News Classification Based on Term Frequency

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Indian Institute of Technology, Kanpur

MYCIN. The MYCIN Task

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

arxiv: v4 [cs.cl] 28 Mar 2016

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Probabilistic Latent Semantic Analysis

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Classification Using ANN: A Review

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Axiom 2013 Team Description Paper

Time series prediction

A survey of multi-view machine learning

A Comparison of Two Text Representations for Sentiment Analysis

arxiv: v2 [cs.ir] 22 Aug 2016

Transcription:

Deep Learning: An Overview Bradley J Erickson, MD PhD Mayo Clinic, Rochester Medical Imaging Informatics and Teleradiology Conference 1:30-2:05pm June 17, 2016

Disclosures Relationships with commercial interests: Board of OneMedNet Board of VoiceIT

What is Machine Learning? It is a part of Artificial Intelligence Finds patterns in data Patterns that reflect properties of examples (supervised) Patterns that separate examples (unsupervised) (Other types of artificial intelligence include rules systems)

Machine Learning Classes Supervised Unsupervised Reinforced ANN Clusters SVM Adaptive Resonance Random Forest Bayes DNN

Machine Learning History Artificial Neural Networks (ANN) Starting point of machine learning Early versions didn t work well Other Machine Learning Methods Naïve Bayes Support Vector Machine (SVM) Random Forest Classifier (RFC)

Artificial Neural Network/Perceptron Input Layer Hidden Layer Output Layer f(σ) T1 Pre f(σ) f(σ) Tumor T1 Post f(σ) f(σ) Brain T2 f(σ) f(σ)

Artificial Neural Network/Perceptron Input Layer Hidden Layer Output Layer f(σ) T1 Pre 45 f(σ) f(σ) Tumor T1 Post 322 f(σ) f(σ) Brain T2 128 f(σ) f(σ)

Artificial Neural Network/Perceptron Input Layer Hidden Layer Output Layer 57 T1 Pre 45 418 f(σ) Tumor T1 Post 322-68 f(σ) Brain T2 128 34 312

Artificial Neural Network/Perceptron Input Layer Hidden Layer Output Layer 57 T1 Pre 45 418 1 Tumor T1 Post 322-68 0 Brain T2 128 34 312

How ANNs Learn Propagation Multiple prior layer node value times weight Activation function. E.g. threshold the sum Weight Update Compute error = actual output expected output Weight gradient = error * input value New weight = old weight * gradient * learning rate

Learning = Optimization Problem Learning depends on: Correct gradient directions Correct gradient multiplier (learning rate) Global Minimum Local Minimum Small Gradient

Support Vector Machines Maps input data to new space Creates hyperplane that separates classes in that space f(x)

Deep Learning: Why the Hype? Performance in ImageNet Challenge Team / Software Year Error Rate XRCE (not Deep Learning) 2011 25.8% SuperVision (AlexNet) 2012 16.4% Clarifai 2013 11.7% GoogLeNet (Inception) 2014 6.66% Andrej Karpathy (human comparison) 2014 5.1% BN-Inception (Arxiv) 2015 4.9% Inception-v3 (Arxiv) 2015 3.46%

What is Deep Learning Deep because it uses many layers ANN typically had 3 or fewer layers

DNNs have 15+ layers

Types of DNNs Convolutional Neural Network (CNN) Early layers have windows of image as input Multiplied by a kernel to get output Known as a convolution 22 13 0 31 71 1 2 1 14 27 28 43 21 2 4 2 18 64 89 65 32 1 2 1 44 55 32 41 4 21 32 15 33 7

Types of DNNs Convolutional Neural Network (CNN) Early layers have windows of image as input Multiplied by a kernel to get output Known as a convolution 22 13 0 31 71 1 2 1 14 27 28 43 21 2 4 2 18 64 89 65 32 1 2 1 44 55 32 41 4 21 32 15 33 7

Types of DNNs Convolutional Neural Network (CNN) Early layers have windows of image as input Multiplied by a kernel to get output Known as a convolution * = 22 13 0 31 71 1 2 1 22 14 27 28 43 21 2 4 2 18 64 89 65 32 1 2 1 44 55 32 41 4 21 32 15 33 7

Types of DNNs Convolutional Neural Network (CNN) Early layers have windows of image as input Multiplied by a kernel to get output Known as a convolution 22 13 0 31 71 1 2 1 22 26 0 14 27 28 43 21 2 4 2 28 108 56 / 9 53 18 64 89 65 32 1 2 1 18 128 89 44 55 32 41 4 21 32 15 33 7

Types of DNNs Convolutional Neural Network (CNN) Early layers have windows of image as input Multiplied by a kernel to get output Known as a convolution 22 13 0 31 71 1 2 1 13 0 31 14 27 28 43 21 2 4 2 56 112 86 / 9 53 67 18 64 89 65 32 1 2 1 64 178 65 44 55 32 41 4 21 32 15 33 7

Why the Excitement Now? Advances That Addressed Problems Many layers -> Overfitting Implement sparsity in weights: Dropout

Why the Excitement Now? Advances That Addressed Problems Many layers -> Vanishing Gradients Drop out partially addresses this Can use pre-trained weights for early layers, and fix those, with weights of later layers for learning higher level features

Typical CNNs Convolution Pooling Pooling Convolution Pooling Fully Connected

Typical CNNs Andrei Karpathy: http://karpathy.github.io/2015/10/25/selfie/

Why the Excitement Now? Batch Normalization What should be the initial set of weights connecting nodes? All the same = no gradients Random. But what range of values? BatchNorm: After each Convolutional layer Subtract mean / divide by standard deviation Simple but effective

Why the Excitement Now? Residual Networks Residual defines if and how to pass data through from layer to layer. Makes deep network construction reliable *Targ, ICLR 2016

Why the Excitement Now? Deep Neural Network Theory Exponential Compute Power Growth

Moore s Law Computing performance doubles approximately every 18 months

Exponentials In Real Life If you put 1 drop of water into a football stadium, and then double the number of drops each minute: At 5 minutes, you will have 32 drops At 45 minutes, you will cover the field 1" At 55 minutes, the stadium will be full It is not natural for humans to grasp exponential growth

Deep Learning Works Well on GPUs Naturally parallel Less precision (single precision FP) actually can be advantage Now building cards with no video output and optimized for Deep Learning (P-100)

GPUs are Beating Moore s Law 1,000,000 100,000 10,000 FPGA FPGA TPU 1000 100 10 GPU CPU Ice Age 2000 2005 2010 2015 2020

Deep Learning Myths You Need Millions of Exams to Train and Use Deep Learning Methods

Deep Learning Myths You Need Millions of Exams to Train and Use Deep Learning Methods

Ways To Avoid Need For Large Data Data Augmentation Sets Essentially, creating variants of data that are different enough that they are learnable Similar enough that they teaching point is kept Mirror/Flip/Rotate/Contrast/Crop

Image Conv Conv MaxPool Conv Conv MaxPool Conv Conv MaxPool Fully Connected Fully Connected Fully Connected SoftMax Ways To Avoid Need For Large Data Sets Data Augmentation Transfer Learning Train on Large Corpus like ImageNet

Image Conv Conv MaxPool Conv Conv MaxPool Conv Conv MaxPool Fully Connected Fully Connected Fully Connected SoftMax Ways To Avoid Need For Large Data Sets Data Augmentation Transfer Learning Freeze These Layers

Image Conv Conv MaxPool Conv Conv MaxPool Conv Conv MaxPool Fully Connected Fully Connected Fully Connected SoftMax Ways To Avoid Need For Large Data Sets Data Augmentation Transfer Learning Freeze These Layers Train this

Take Home Point Deep Learning Learns Features and Connections vs Just Connections Hand-Crafted Feature Extraction Classifier Learning Feature Extractor Classifier

Examples of CNN in Medical Imaging: Body Part *Roth, Arxiv 2016

Moeskops, IEEE-TMI, 2016 Examples of CNN in Medical Imaging: Segmentation

Mayo: AutoEncoder for Segmentation Dataset Trained on Brats 2015 Flair enhancing signal Preprocessing N4 bias correction Nuyl intensity normalization Autoencoders trained on 110.000 ROIs (size=12) Time 1 hour for 155 slices (DNN would be days or weeks) Korfiatis, Submitted

What is an AutoEncoder? Korfiatis, Submitted

Dice = 0.92 over BRATS dataset Korfiatis, Submitted

Machine Learning & Radiomics Computers find textures reflecting genomics: 1p19q 85 Subjects with FISH results, computed multiple textures, SVM SVM Abstract # Features Sens Spec F-score Accuracy 10 10 0.91 0.95 0.87 0.93 0.93 0.96 0.91 0.95 Naïve Bayes 12 0.95 0.77 0.92 0.89 Erickson, Proc ASNR, 2016

Machine Learning & Radiomics 155 Subjects, GBM, MGMT Methylation Compute textures (T2 was best) -> SVM Korfiatis, Med Phys, 2016

Deep Learning: MGMT Methylation Same set of patients, use VGGNet / Xfer: Az=0.86 Autoencoder is giving nearly as good performance and trains about 10x faster Now testing DeepMedic and RNN Korfiatis, unpublished

The Pace of Change

Will Computers Replace Radiologists? Deep Learning will likely be able to create reports for diagnostic images in the future. 5 years: Mammo & CXR 10 years: CT Head, Chest, Abd, Pelvis, MR head, knee, shoulder, US: liver, thyroid, carotids 15-20 years: most diagnostic imaging Will likely see more than we do today Will allow radiologists for focus on patient interaction and invasive procedures

How Might Medicine Best Embrace Deep Learning

How Might Medicine Best Embrace Deep Learning Algorithms for Machine Learning are rapidly improving. CNN are not the only game in town Hardware for Machine Learning is REALLY rapidly improving The amount of change in 20 years will be unbelievable

How Might Medicine Best Embrace Deep Learning Medicine needs to remain flexible about hardware and software The VALUE is in the data and metadata Physicians are OBLIGATED to make sure the data are properly handled. Improper interpretation of data will lead to bad implementations and poor patient care Non-cooperation is also counter-productive