Introduction to Machine Learning & Its Application in Healthcare Lecture 4 Oct 3, 2018 Presentation by: Leila Karimi

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

CSL465/603 - Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Active Learning. Yingyu Liang Computer Sciences 760 Fall

CS Machine Learning

Learning From the Past with Experiment Databases

Speech Recognition at ICSI: Broadcast News and beyond

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Probabilistic Latent Semantic Analysis

Assignment 1: Predicting Amazon Review Ratings

Switchboard Language Model Improvement with Conversational Data from Gigaword

Human Emotion Recognition From Speech

Australian Journal of Basic and Applied Sciences

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Reducing Features to Improve Bug Prediction

Artificial Neural Networks written examination

Issues in the Mining of Heart Failure Datasets

A study of speaker adaptation for DNN-based speech synthesis

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Time series prediction

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A Case Study: News Classification Based on Term Frequency

Speech Emotion Recognition Using Support Vector Machine

arxiv: v2 [cs.cv] 30 Mar 2017

Rule Learning With Negation: Issues Regarding Effectiveness

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Generative models and adversarial training

Applications of data mining algorithms to analysis of medical data

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Evolutive Neural Net Fuzzy Filtering: Basic Description

CS 446: Machine Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Learning Methods in Multilingual Speech Recognition

Calibration of Confidence Measures in Speech Recognition

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

arxiv: v1 [cs.lg] 15 Jun 2015

Knowledge Transfer in Deep Convolutional Neural Nets

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

WHEN THERE IS A mismatch between the acoustic

Universidade do Minho Escola de Engenharia

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Model Ensemble for Click Prediction in Bing Search Ads

A survey of multi-view machine learning

Learning Methods for Fuzzy Systems

Axiom 2013 Team Description Paper

Softprop: Softmax Neural Network Backpropagation Learning

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Word Segmentation of Off-line Handwritten Documents

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Modeling function word errors in DNN-HMM based LVCSR systems

Semi-Supervised Face Detection

Rule Learning with Negation: Issues Regarding Effectiveness

INPE São José dos Campos

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Corrective Feedback and Persistent Learning for Information Extraction

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Medical Complexity: A Pragmatic Theory

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Linking Task: Identifying authors and book titles in verbose queries

arxiv: v1 [cs.cv] 10 May 2017

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

MYCIN. The MYCIN Task

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Using dialogue context to improve parsing performance in dialogue systems

Evolution of Symbolisation in Chimpanzees and Neural Nets

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Introduction to Causal Inference. Problem Set 1. Required Problems

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Universal Design for Learning Lesson Plan

Machine Learning and Development Policy

Data Fusion Through Statistical Matching

Discriminative Learning of Beam-Search Heuristics for Planning

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Modeling function word errors in DNN-HMM based LVCSR systems

Truth Inference in Crowdsourcing: Is the Problem Solved?

Laboratorio di Intelligenza Artificiale e Robotica

Probability estimates in a scenario tree

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Laboratorio di Intelligenza Artificiale e Robotica

Probability and Statistics Curriculum Pacing Guide

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Transcription:

Introduction to Machine Learning & Its Application in Healthcare Lecture 4 Oct 3, 2018 Presentation by: Leila Karimi 1

What Is Machine Learning? A branch of artificial intelligence, concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data. Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed. Tom Mitchell (1998) Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. 2

What Is Machine Learning? Example A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. Classifying emails as spam or not spam ---> Task T Watching you label emails as spam or not spam ---> Experience E The number (or fraction) of emails correctly classified as spam/not spam ---> Performance measure P Slide credit: Andrew Ng 3

ML Applications Slide credit: Lior Rokach 4

The Learning Setting Imagine learning algorithm is trying to decide which loan applicants are bad credit risks. Might represent each person by n features. (e.g., income range, debt load, employment history, etc.) Take sample S of data, labeled according to whether they were/weren t good risks. Goal of algorithm is to use data seen so far produce good prediction rule (a hypothesis ) h(x) for future data. Slide credit: Avrim Blum 5

The learning setting example Given this data, some reasonable rules might be: Predict YES iff (!recent delinq) AND (%down > 5). Predict YES iff 100*[mmp/inc] 1*[%down] < 25.... Slide credit: Avrim Blum 6

Big Questions (A) How might we automatically generate rules that do well on observed data? ---> Algorithms (B) What kind of confidence do we have that they will do well in the future? ---> Performance Evaluation Slide credit: Avrim Blum 7

The machine learning framework y = f(x) Output Prediction Function Input Training: given a training set of labeled examples {(x 1,y 1 ),, (x n,y n )}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x)

ML in a Nutshell Every machine learning algorithm has three components: Representation Evaluation Optimization 9

Representation Decision trees Sets of rules / Logic programs Graphical models (Bayes/Markov nets) Neural networks Support vector machines 10

Evaluation Accuracy Precision and recall Squared error Likelihood Posterior probability Cost / Utility Margin Entropy K-L divergence 11

Optimization Combinatorial optimization E.g.: Greedy search Convex optimization E.g.: Gradient descent Constrained optimization E.g.: Linear programming 12

Machine Learning Algorithms Supervised Learning Training data includes desired outputs Unsupervised Learning Training data does not include desired outputs Semi-supervised learning Training data includes a few desired outputs Others: Reinforcement learning, recommender systems 13

Supervised Learning Slide credit: Yi-Fan Chang 14

Supervised learning process: two steps Learning (training): Learn a model using the training data Testing: Test the model using unseen test data to assess the model accuracy Number of correct classifica tions Accuracy, Total number of test cases Slide credit: Bing Liu 15

Unsupervised Learning Learning patterns from unlabeled data Tasks understanding and visualization anomaly detection information retrieval data compression 16

Unsupervised Learning (Cont.) Slide credit: Yi-Fan Chang 17

Supervised Learning (Cont.) Supervised learning categories and techniques Linear classifier (numerical functions) Parametric (Probabilistic functions) Naïve Bayes, Gaussian discriminant analysis (GDA), Hidden Markov models (HMM), Probabilistic graphical models Non-parametric (Instance-based functions) K-nearest neighbors, Kernel regression, Kernel density estimation, Local regression Non-metric (Symbolic functions) Classification and regression tree (CART), decision tree Aggregation Bagging (bootstrap + aggregation), Adaboost, Random forest 18

Unsupervised Learning (Cont.) Unsupervised learning categories and techniques Clustering K-means clustering Spectral clustering Density Estimation Gaussian mixture model (GMM) Graphical models Dimensionality reduction Principal component analysis (PCA) Factor analysis 19

Supervised Learning: Linear Classifier, where w is an d-dim vector (learned) Find a linear function to separate the classes Techniques: Perceptron Logistic regression Support vector machine (SVM) Ada-line Multi-layer perceptron (MLP) 20

Supervised Learning: Non-Linear Classification Techniques: Support vector machine (SVM) Neural Networks 21

Supervised Learning: Decision Trees Should I wait at this restaurant? Slide credit: SRI International 22

Decision Tree Induction (Recursively) partition examples according to the most important attribute. Key Concepts entropy impurity of a set of examples (entropy = 0 if perfectly homogeneous) (#bits needed to encode class of an arbitrary example) information gain expected reduction in entropy caused by partitioning Slide credit: SRI International 23

Decision Tree Induction: Decision Boundary Slide credit: SRI International 24

Supervised Learning: Neural Networks Motivation: human brain massively parallel (10 11 neurons, ~20 types) small computational units with simple low-bandwidth communication (10 14 synapses, 1-10ms cycle time) Realization: neural network units ( neurons) connected by directed weighted links activation function from inputs to output Slide credit: SRI International 25

Neural Networks (continued) Neural Network = parameterized family of nonlinear functions types Slide credit: SRI International 26

Neural Network Learning Key Idea: Adjusting the weights changes the function represented by the neural network (learning = optimization in weight space). Iteratively adjust weights to reduce error (difference between network output and target output). Weight Update perceptron training rule linear programming delta rule backpropagation Slide credit: SRI International 27

Neural Network Learning: Decision Boundary single-layer perceptron multi-layer network Slide credit: SRI International 28

Supervised Learning: Support Vector Machines Kernel Trick: Map data to higher-dimensional space where they will be linearly separable. Learning a Classifier : optimal linear separator is one that has the largest margin between positive examples on one side and negative examples on the other Φ: x φ(x) Slide credit: SRI International & Andrew Moore 29

Support Vector Machines: Decision Boundary Ф 30

Supervised Learning: Nearest Neighbor Models Key Idea: Properties of an input x are likely to be similar to those of points in the neighborhood of x. Basic Idea: Find (k) nearest neighbor(s) of x and infer target attribute value(s) of x based on corresponding attribute value(s). Slide credit: SRI International 31

Nearest Neighbor Model: Decision Boundary Slide credit: SRI International 32

Evaluating classification methods Predictive accuracy Accuracy Efficiency time to construct the model time to use the model Robustness: handling noise and missing values Scalability: efficiency in disk-resident databases Interpretability: understandable and insight provided by the model Compactness of the model Number of Total correct classifica tions number of test cases, Slide credit: Bing Liu 33

Performance Evaluation Randomly split examples into training set U and test set V. Use training set to learn a hypothesis H. Measure % of V correctly classified by H. Repeat for different random splits and average results. Slide credit: SRI International 34

Generalization Components of generalization error Bias: how much the average model over all training sets differ from the true model? Error due to inaccurate assumptions/simplifications made by the model Variance: how much models estimated from different training sets differ from each other Underfitting: model is too simple to represent all the relevant class characteristics High bias and low variance High training error and high test error Overfitting: model is too complex and fits irrelevant characteristics (noise) in the data Low bias and high variance Low training error and high test error 35

Bias-Variance Trade-off Models with too few parameters are inaccurate because of a large bias (not enough flexibility). Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). Slide credit: L. Lazebnik 36

Machine Learning for Healthcare 37

Applying Machine Learning to Healthcare Healthcare sector is being transformed by the ability to record massive amounts of information Machine learning provides a way to automatically find patterns and reason about data It enables healthcare professionals to move to personalized care known as precision medicine. 38

Why to use ML? Adoption of Electronic Health Records (EHR) has increased 9x since 2008 [Henry et al., ONC Data Brief, May 2016] 39

Why to use ML? Large datasets MIT Laboratory for Computational Physiology de-identified health data from ~40K critical care patients Demographics, vital signs, laboratory tests, medications, notes, Available data on nearly 230 million unique patients since 1995 Slide credit: David Sontag 40

Why to use ML? Diversity of digital health data Slide credit: David Sontag 41

Why to use ML? Standardization Diagnosis codes: ICD-9 and ICD-10 (International Classification of Diseases) Laboratory tests: LOINC codes Pharmacy: National Drug Codes (NDCs) Unified Medical Language System (UMLS): millions of medical concepts [https://blog.curemd.com/the-most-bizarreicd-10-codes-infographic/] 42

Industry interest in AI & healthcare Slide credit: David Sontag 43

What can machine learning do for the healthcare industry? Improve accuracy of diagnosis, prognosis, and risk prediction. Reduce medication errors and adverse events. Model and prevent spread of hospital acquired infections. Improve quality of care and population Optimize hospital processes such as resource allocation and patient health outcomes, while reducing flow. healthcare costs. Identify patient subgroups for personalized and precision medicine. Discover new medical knowledge (clinical guidelines, best practices). Automate detection of relevant findings in pathology, radiology, etc. 44

Example Application: Improve accuracy of diagnosis and risk prediction New methods are developed for chronic disease risk prediction and visualization. These methods give clinicians a comprehensive view of their patient population, risk levels, and risk factors, along with the estimated effects of potential interventions. 45

Example Application: Optimize hospital processes By early and accurate prediction of each patient s Diagnosis Related Group (DRG), demand for scarce hospital resources such as beds and operating rooms can be better predicted. 46

Example Application: Automate detection of relevant findings Pattern detection approaches have been successfully applied to detect regions of interest in digital pathology slides, and work surprisingly well to detect cancers. Automatic detection of anomalies and patterns is especially valuable when the key to diagnosis is a tiny piece of the patient s health data. 47

Example Application: Breast Cancer Diagnosis Research by Mangasarian,Street, Wolberg

Breast Cancer Diagnosis Separation Research by Mangasarian,Street, Wolberg

Example Application: ICU Admission An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether to put a new patient in an intensivecare unit. Due to the high cost of ICU, those patients who may survive less than a month are given higher priority. Problem: to predict high-risk patients and discriminate them from low-risk patients. Slide credit: Bing Liu 50

What is unique about ML in healthcare? Life or death decisions Need robust algorithms Checks and balances built into ML deployment (Also arises in other applications of AI such as autonomous driving) Need fair and accountable algorithms Many questions are about unsupervised learning Discovering disease subtypes, or answering question such as characterize the types of people that are highly likely to be readmitted to the hospital? Many of the questions we want to answer are causal Naïve use of supervised machine learning is insufficient Slide credit: Bing Liu 51

What makes healthcare different? Often very little labeled data (e.g., for clinical NLP) Motivates semi-supervised learning algorithms Sometimes small numbers of samples (e.g., a rare disease) Learn as much as possible from other data (e.g. healthy patients) Model the problem carefully Lots of missing data, varying time intervals, censored labels Slide credit: Bing Liu 52

What makes healthcare different? Difficulty of de-identifying data Need for data sharing agreements and sensitivity Difficulty of deploying ML Commercial electronic health record software is difficult to modify Data is often in silos; everyone recognizes need for interoperability, but slow progress Careful testing and iteration is needed Slide credit: Bing Liu 53