Machine Learning for Chemoinformatics An introduction

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

CS Machine Learning

Artificial Neural Networks written examination

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Time series prediction

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Human Emotion Recognition From Speech

Learning From the Past with Experiment Databases

Knowledge Transfer in Deep Convolutional Neural Nets

Evolutive Neural Net Fuzzy Filtering: Basic Description

Assignment 1: Predicting Amazon Review Ratings

Issues in the Mining of Heart Failure Datasets

Lecture 1: Basic Concepts of Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CSL465/603 - Machine Learning

Model Ensemble for Click Prediction in Bing Search Ads

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Softprop: Softmax Neural Network Backpropagation Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Probabilistic Latent Semantic Analysis

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Modeling function word errors in DNN-HMM based LVCSR systems

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

A study of speaker adaptation for DNN-based speech synthesis

Speech Emotion Recognition Using Support Vector Machine

Reducing Features to Improve Bug Prediction

SARDNET: A Self-Organizing Feature Map for Sequences

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

INPE São José dos Campos

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Modeling function word errors in DNN-HMM based LVCSR systems

Australian Journal of Basic and Applied Sciences

TD(λ) and Q-Learning Based Ludo Players

A Case Study: News Classification Based on Term Frequency

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

An OO Framework for building Intelligence and Learning properties in Software Agents

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Laboratorio di Intelligenza Artificiale e Robotica

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Word Segmentation of Off-line Handwritten Documents

Laboratorio di Intelligenza Artificiale e Robotica

Data Fusion Through Statistical Matching

Multivariate k-nearest Neighbor Regression for Time Series data -

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Welcome to. ECML/PKDD 2004 Community meeting

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Axiom 2013 Team Description Paper

STA 225: Introductory Statistics (CT)

Test Effort Estimation Using Neural Network

Calibration of Confidence Measures in Speech Recognition

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Speech Recognition by Indexing and Sequencing

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Linking Task: Identifying authors and book titles in verbose queries

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Attributed Social Network Embedding

Probability and Statistics Curriculum Pacing Guide

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Universidade do Minho Escola de Engenharia

Mining Association Rules in Student s Assessment Data

Kamaldeep Kaur University School of Information Technology GGS Indraprastha University Delhi

Detecting English-French Cognates Using Orthographic Edit Distance

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

For Jury Evaluation. The Road to Enlightenment: Generating Insight and Predicting Consumer Actions in Digital Markets

arxiv: v1 [cs.lg] 15 Jun 2015

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Artificial Neural Networks

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Conference Presentation

The Good Judgment Project: A large scale test of different methods of combining expert predictions

On the Formation of Phoneme Categories in DNN Acoustic Models

arxiv: v1 [cs.cl] 2 Apr 2017

Using dialogue context to improve parsing performance in dialogue systems

Evolution of Symbolisation in Chimpanzees and Neural Nets

Cross-lingual Short-Text Document Classification for Facebook Comments

Speaker Identification by Comparison of Smart Methods. Abstract

A Vector Space Approach for Aspect-Based Sentiment Analysis

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

arxiv: v2 [cs.cv] 30 Mar 2017

Exploration. CS : Deep Reinforcement Learning Sergey Levine

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Transcription:

Machine Learning for Chemoinformatics An introduction Francesca Grisoni University of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milan, Italy ETH Zurich, Dept. of Chemistry and Applied Biosciences, Zurich, Switzerland francesca.grisoni@unimib.it F. Grisoni, BigChem online course 17.05.2017 1

Presentation Outline Introduction Definition Elements of Machine Learning Additional Considerations The NFL theorem Validation Applicability Machine Learning approaches: some examples Local methods Tree-like approaches Neural Networks F. Grisoni, BigChem online course 17.05.2017 2

Introduction Machine learning in chemoinformatics Biological activity prediction Toxicity Physico-chemical properties P = f ( ) Multi-objective optimization Rational drug design F. Grisoni, BigChem online course 17.05.2017 3

Introduction Machine Learning (ML) Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed. 1 https://www.toptal.com/machine-learning/machinelearning-theory-an-introductory-primer (1) Data (2) Task A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. 2 (3) Performance 1 Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of research and development, 3(3), 210-229. 2 Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37), 870-877. F. Grisoni, BigChem online course 17.05.2017 4

Machine Learning Elements (1) The data and the G-I-G-O principle X (n x p) n samples p variables X f ( ) f ( 0.1, 1, 0, 3, 3.5, 2, ) F. Grisoni, BigChem online course 17.05.2017 5

Machine Learning Elements (1) The data and the G-I-G-O principle X (n x p) Y (n x 1) p variables p' P = f ( ) n samples X n samples Y P = f ( 0.1, 1, 0, 3, 3.5, 2, ) F. Grisoni, BigChem online course 17.05.2017 6

Machine Learning Elements (1) The data and the G-I-G-O principle X (n x p) p variables Y (n x 1) p' Garbage In = Garbage Out n samples X n samples Y Structures Experimental Responses F. Grisoni, BigChem online course 17.05.2017 7

Machine Learning Elements (2) Machine Learning Tasks Unsupervised Learning X (n x p) x 2 p variables n samples X x 1 F. Grisoni, BigChem online course 17.05.2017 8

Machine Learning Elements (2) Machine Learning Tasks Unsupervised Learning X (n x p) x 2 p variables n samples X x 1 F. Grisoni, BigChem online course 17.05.2017 9

Machine Learning Elements (2) Machine Learning Tasks Supervised Learning x 2 X (n x p) p variables n samples X x 1 F. Grisoni, BigChem online course 17.05.2017 10

Machine Learning Elements (2) Machine Learning Tasks x 2 Supervised Learning X (n x p) + p variables Y (n x 1) p' n samples X n samples Y x 1 F. Grisoni, BigChem online course 17.05.2017 11

Machine Learning Elements (2) Machine Learning Tasks x 2 Supervised Learning X (n x p) + p variables Y (n x 1) p' n samples X n samples Y x 1 F. Grisoni, BigChem online course 17.05.2017 12

Machine Learning Elements (2) Machine Learning Tasks F. Grisoni, BigChem online course 17.05.2017 13

Machine Learning Elements (2) Machine Learning Tasks Classification Regression F. Grisoni, BigChem online course 17.05.2017 14

Machine Learning Elements (3) Performance Classification N P Single class N TN FP Sensitivity or True Positive rate (TPR) P FN TP Specificity or True Negative rate (TNR) Precision F. Grisoni, BigChem online course 17.05.2017 15

Machine Learning Elements (3) Performance Classification N P Global Performance N TN FP Non-Error Rate or Balanced-Accuracy ϵ [0,1] P FN TP Matthews Correlation Coefficient (MCC) ϵ [-1,1] Accuracy ϵ [0,1] F. Grisoni, BigChem online course 17.05.2017 16

Machine Learning Elements (3) Performance Classification N P Global Performance N TN FP Non-Error Rate or Balanced-Accuracy ϵ [0,1] P FN TP N = 990; P = 10 TN = 990 (100%) TP = 0 (0%) Sn p = 0/10 = 0% Sn p = 990/990 = 100% Matthews Correlation Coefficient (MCC) ϵ [-1,1] Accuracy ϵ [0,1] NER = 50% Acc = 99% F. Grisoni, BigChem online course 17.05.2017 17

Machine Learning Elements (3) Performance Classification N P Global Performance N TN FP Non-Error Rate or Balanced-Accuracy ϵ [0,1] P FN TP N = 990; P = 10 TN = 990 (100%) TP = 0 (0%) Sn P = 0/10 = 0% Sn N = 990/990 = 100% Matthews Correlation Coefficient (MCC) ϵ [-1,1] Accuracy ϵ [0,1] NER = 50% Acc = 99% F. Grisoni, BigChem online course 17.05.2017 18

Machine Learning Elements (3) Performance Classification N P Global Performance N TN FP Non-Error Rate or Balanced-Accuracy ϵ [0,1] P FN TP N = 990; P = 10 TN = 990 (100%) TP = 0 (0%) Sn P = 0/10 = 0% Sn N = 990/990 = 100% Matthews Correlation Coefficient (MCC) ϵ [-1,1] Accuracy ϵ [0,1] NER = 50% Acc = 99% F. Grisoni, BigChem online course 17.05.2017 19

Machine Learning Elements (3) Performance Classification N P Global Performance N TN FP Non-Error Rate or Balanced-Accuracy ϵ [0,1] P FN TP N = 990; P = 10 TN = 990 (100%) TP = 0 (0%) Sn P = 0/10 = 0% Sn N = 990/990 = 100% Matthews Correlation Coefficient (MCC) ϵ [-1,1] Accuracy ϵ [0,1] NER = 50% Acc = 99% F. Grisoni, BigChem online course 17.05.2017 20

Machine Learning Elements (3) Performance Regression Real Pred. Root Mean Squared Error in Prediction (RMSEP) y y# F. Grisoni, BigChem online course 17.05.2017 21

Considerations on ML Additional Considerations F. Grisoni, BigChem online course 17.05.2017 22

Considerations on ML Additional Considerations 1. Choice of the learner No Free Lunch Theorem: For every learner, there exists a task on which it fails, even though that task can be successfully learned by another learner. 1 1 Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge university press. F. Grisoni, BigChem online course 17.05.2017 23

Considerations on ML Additional Considerations 2. Bias-Variance Trade-off Error Bias à generalization (underfitting) Variance à descriptive ability (overfitting) Complexity F. Grisoni, BigChem online course 17.05.2017 24

Considerations on ML Additional Considerations 2. Bias-Variance Trade-off Error Bias à generalization (underfitting) Variance à descriptive ability (overfitting) Complexity F. Grisoni, BigChem online course 17.05.2017 25

Considerations on ML Additional Considerations 3. Validation group 1 group 2 Initial dataset group 4 F. Grisoni, BigChem online course 17.05.2017 26

Considerations on ML Additional Considerations 3. Validation group 1 group 2 Initial dataset Training set group 4 Test set F. Grisoni, BigChem online course 17.05.2017 27

Considerations on ML Additional Considerations 3. Validation group 1 group 2 Initial dataset Training set group 4 Test set Training set Validation set Test set F. Grisoni, BigChem online course 17.05.2017 28

Considerations on ML Additional Considerations 3. Validation group 1 group 2 group 3 group 4 group 5 Validation set Training set F. Grisoni, BigChem online course 17.05.2017 29

Considerations on ML Additional Considerations 4. Applicability The No Free Dessert (either!) theorem Machine learning models à Reductionist Types of chemical structures Physicochemical properties Mechanisms of action considered Applicability Domain: Chemical space where the property can be reliably predicted F. Grisoni, BigChem online course 17.05.2017 30

Considerations on ML Additional Considerations 4. Applicability min( x ), max( x ) 1 1 min( x ), max( x ) 2 2 H=X (X X) T -1 X p Man xy = å j - j j= 1 D x y F. Grisoni, BigChem online course 17.05.2017 31

Standard Machine Learning workflow (in chemoinformatics) Considerations on ML F. Grisoni, BigChem online course 17.05.2017 32

Standard Machine Learning workflow (in chemoinformatics) Considerations on ML Information extraction F. Grisoni, BigChem online course 17.05.2017 33

Standard Machine Learning workflow (in chemoinformatics) Considerations on ML Information extraction Applicability & predictivity F. Grisoni, BigChem online course 17.05.2017 34

Standard Machine Learning workflow (in chemoinformatics) Considerations on ML Information extraction Applicability & predictivity Application & new knowledge F. Grisoni, BigChem online course 17.05.2017 35

Machine Learning methods (overview) 1. Decision Tree-based learning Decision Trees Random Forest 2. Local Methods k-means algorithm k-nn algorithm 3. Artificial Neural Networks Feed-Forward NN Kohonen Maps F. Grisoni, BigChem online course 17.05.2017 36

(1) Decision Tree Learning Root node Decision node(s) Leaves F. Grisoni, BigChem online course 17.05.2017 37

(1) Decision Tree Learning Root node 1. Easy to interpret 2. No data pretreatment 3. Numerical/categorical variables 4. Classification and regression 5. Non parametric 6. Automatic variable selection Decision node(s) Leaves F. Grisoni, BigChem online course 17.05.2017 38

(1) Decision Tree Learning Random Forest Bagging (Bootstrap Aggregating) = the power of the crowd F. Grisoni, BigChem online course 17.05.2017 39

(2) Local approaches k-means clustering x 2 x 1 F. Grisoni, BigChem online course 17.05.2017 40

(2) Local approaches k-means clustering x 2 1. Select a k (3) x 1 F. Grisoni, BigChem online course 17.05.2017 41

(2) Local approaches k-means clustering x 2 1. Select a k (3) 2. Random assignment x 1 F. Grisoni, BigChem online course 17.05.2017 42

(2) Local approaches k-means clustering x 2 1. Select a k (3) 2. Random assignment 3. Centroid calculation x 1 F. Grisoni, BigChem online course 17.05.2017 43

(2) Local approaches k-means clustering x 2 1. Select a k (3) 2. Random assignment 3. Centroid calculation 4. Closest centroid x 1 F. Grisoni, BigChem online course 17.05.2017 44

(2) Local approaches k-means clustering x 2 1. Select a k (3) 2. Random assignment 3. Centroid calculation 4. Closest centroid x 1 F. Grisoni, BigChem online course 17.05.2017 45

(2) Local approaches k-means clustering x 2 1. Select a k (3) 2. Random assignment 3. Centroid calculation 4. Closest centroid x 1 F. Grisoni, BigChem online course 17.05.2017 46

(2) Local approaches k-means clustering x 2 1. Select a k (3) 2. Random assignment 3. Centroid calculation 4. Closest centroid x 1 F. Grisoni, BigChem online course 17.05.2017 47

(2) Local approaches k-means clustering x 2 1. Select a k (3) 2. Random assignment 3. Centroid calculation 4. Closest centroid 5. End x 1 F. Grisoni, BigChem online course 17.05.2017 48

(2) Local approaches k-nearest Neighbor (knn)? F. Grisoni, BigChem online course 17.05.2017 49

(2) Local approaches k-nearest Neighbor (knn) 1. Calculate a distance (n train times)? F. Grisoni, BigChem online course 17.05.2017 50

(2) Local approaches k-nearest Neighbor (knn) 1. Calculate a distance (n train times) 2. Select a number of neighbors (k) to predict the response? F. Grisoni, BigChem online course 17.05.2017 51

(2) Local approaches k-nearest Neighbor (knn) 1. Calculate a distance (n train times) k = 1 2. Select a number of neighbors (k) to predict the response? F. Grisoni, BigChem online course 17.05.2017 52

(2) Local approaches k-nearest Neighbor (knn) 1. Calculate a distance (n train times) k = 2 2. Select a number of neighbors (k) to predict the response? F. Grisoni, BigChem online course 17.05.2017 53

(2) Local approaches k-nearest Neighbor (knn) 1. Calculate a distance (n train times) k = 3 2. Select a number of neighbors (k) to predict the response? F. Grisoni, BigChem online course 17.05.2017 54

(2) Local approaches k-nearest Neighbor (knn) 1. Calculate a distance (n train times) k = 4 2. Select a number of neighbors (k) to predict the response? F. Grisoni, BigChem online course 17.05.2017 55

(2) Local approaches k-nearest Neighbor (knn) 1. Good for large training set with localized differences 2. Difficult to be interpreted 3. Which k? 4. Which distance measure? 5. Curse of dimensionality à Variable selection F. Grisoni, BigChem online course 17.05.2017 56

(3) Neural Networks Artificial Neurons Inputs x 2 x 1 Output x 3 x 4 f (x) y x p Activation Function F. Grisoni, BigChem online course 17.05.2017 57

(3) Neural Networks Artificial Neurons Inputs x 2 x 1 Output x 3 x 4 f (x) y x p Activation Function Neural networks. Comprehensive Chemometrics: Chemical and Biochemical Data Analysis Vol, 3. F. Grisoni, BigChem online course 17.05.2017 58

(3) Neural Networks Feed-Forward NN Output Layer 1. Untrained Network 2. Compute the outcome Hidden Layer(s) 3. Compute the error 4. Back propagation learning 5. Repeat until stop criterion Input Layer F. Grisoni, BigChem online course 17.05.2017 59

(3) Neural Networks Feed-Forward NN Error Training set Epochs F. Grisoni, BigChem online course 17.05.2017 60

(3) Neural Networks Feed-Forward NN Error Validation set Training set Epochs F. Grisoni, BigChem online course 17.05.2017 61

(3) Neural Networks Feed-Forward NN Error Validation set Training set Epochs F. Grisoni, BigChem online course 17.05.2017 62

(3) Neural Networks Kohonen Maps p dimensional Unsupervised non-linear mapping Topology preserving map 2 dimensional F. Grisoni, BigChem online course 17.05.2017 63

(3) Neural Networks Kohonen Maps Input Neurons Kohonen Layer 1. Competitive Learning Similarity to each neuron Winner takes all 2. Collaborative Learning Winning neuron update Update of close neurons Weights F. Grisoni, BigChem online course 17.05.2017 64

(3) Neural Networks Kohonen Maps Top map (compounds) F. Grisoni, BigChem online course 17.05.2017 65

(3) Neural Networks Kohonen Maps Top map (compounds) Weight maps (p) F. Grisoni, BigChem online course 17.05.2017 66

(3) Neural Networks Kohonen Maps Top map (compounds) Weight maps (p) F. Grisoni, BigChem online course 17.05.2017 67

Which ML algorithm? Purpose (clustering, regression, classification) Performance vs interpretability Covered chemical space (e.g., AD) Types of included variables http://scikit-learn.org/stable/tutorial/machine_learning_map/ F. Grisoni, BigChem online course 17.05.2017 68

Summary Machines can learn from our data No ML algorithm always outperforms the others Validation and Applicability Domain assessment are crucial Pay attention to what the performance metric is telling you! F. Grisoni, BigChem online course 17.05.2017 69

Supplementary reading Theory and Algorithms Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge university press. Marini, F. (2009). Neural networks. In: Comprehensive Chemometrics: Chemical and Biochemical Data Analysis - Vol, 3. Online resources [Coursera] Ng, A. Machine Learning, Stanford University. https://www.coursera.org/learn/machine-learning [Online Book] Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com/ F. Grisoni, BigChem online course 17.05.2017 70