Machine Learning Opportunities and Limitations Holger H. Hoos LIACS Universiteit Leiden The Netherlands LCDS Conference 2017/11/28
The age of computation Clear, precise instructions flawlessly executed 1
The age of computation Clear, precise instructions flawlessly executed algorithms = recipes for data processing 1
The age of computation Clear, precise instructions flawlessly executed algorithms = recipes for data processing predictable results, behaviour 1
The age of computation Clear, precise instructions flawlessly executed algorithms = recipes for data processing predictable results, behaviour performance guarantees 1
The age of computation Clear, precise instructions flawlessly executed algorithms = recipes for data processing predictable results, behaviour performance guarantees trusted, effective solutions to complex problems 1
The age of advanced computation AI vast amounts of cheap computation 2
The age of advanced computation AI vast amounts of cheap computation automatically designed algorithms 2
The age of advanced computation AI vast amounts of cheap computation automatically designed algorithms effective but complex, heuristic, black-box methods 2
Key idea: explicit programming learning / automatic adaptation to data 3
Key idea: explicit programming learning / automatic adaptation to data Success stories: game playing (e.g., Go, poker) 3
Key idea: explicit programming learning / automatic adaptation to data Success stories: game playing (e.g., Go, poker) medical diagnosis (lung disease) 3
Key idea: explicit programming learning / automatic adaptation to data Success stories: game playing (e.g., Go, poker) medical diagnosis (lung disease) transportation (autonomous driving) 3
Key idea: explicit programming learning / automatic adaptation to data Success stories: game playing (e.g., Go, poker) medical diagnosis (lung disease) transportation (autonomous driving) energy (demand prediction and trading) 3
The Machine Learning Revolution machine learning (ML) = automatic construction of software that works well on given data 4
The Machine Learning Revolution machine learning (ML) = automatic construction of software that works well on given data ideas reach back to 1950s (Alan Turing) 4
The Machine Learning Revolution machine learning (ML) = automatic construction of software that works well on given data ideas reach back to 1950s (Alan Turing) based on statistics, mathematical optimisation 4
The Machine Learning Revolution machine learning (ML) = automatic construction of software that works well on given data ideas reach back to 1950s (Alan Turing) based on statistics, mathematical optimisation and principled experimentation (heuristic mechanisms) 4
The Machine Learning Revolution machine learning (ML) = automatic construction of software that works well on given data ideas reach back to 1950s (Alan Turing) based on statistics, mathematical optimisation and principled experimentation (heuristic mechanisms) key ingredient to artificial intelligence (AI) 4
The Machine Learning Revolution machine learning (ML) = automatic construction of software that works well on given data ideas reach back to 1950s (Alan Turing) based on statistics, mathematical optimisation and principled experimentation (heuristic mechanisms) key ingredient to artificial intelligence (AI) but: AI is more than ML 4
Supervised vs unsupervised ML unsupervised: discover patterns in data 5
Supervised vs unsupervised ML unsupervised: discover patterns in data data mining 5
Supervised vs unsupervised ML unsupervised: discover patterns in data data mining (e.g., clustering) 5
Supervised vs unsupervised ML unsupervised: discover patterns in data data mining (e.g., clustering) supervised: make predictions based on known training examples 5
Supervised vs unsupervised ML unsupervised: discover patterns in data data mining (e.g., clustering) supervised: make predictions based on known training examples statistical modelling 5
Supervised vs unsupervised ML unsupervised: discover patterns in data data mining (e.g., clustering) supervised: make predictions based on known training examples statistical modelling Key assumption: training data is representative Key assumption: of application scenario 5
Supervised vs unsupervised ML unsupervised: discover patterns in data data mining (e.g., clustering) supervised: make predictions based on known training examples statistical modelling Key assumption: training data is representative Key assumption: of application scenario other types of ML exist (e.g., semi-supervised learning, reinforcement learning) 5
Regression Example: predict plant growth for given set Example: of environmental conditions 6
Regression Example: predict plant growth for given set Example: of environmental conditions Given: set of training examples Given: = feature values + numerical outputs 6
Regression Example: predict plant growth for given set Example: of environmental conditions Given: set of training examples Given: = feature values + numerical outputs Objective: predict output for new feature values 6
Classification Example: predict whether someone takes a loan Example: based on demographic + personal financial data 7
Classification Example: predict whether someone takes a loan Example: based on demographic + personal financial data Given: set of training examples Given: = feature values + classes 7
Classification Example: predict whether someone takes a loan Example: based on demographic + personal financial data Given: set of training examples Given: = feature values + classes Objective: predict class for new feature values 7
Classification Example: predict whether someone takes a loan Example: based on demographic + personal financial data Given: set of training examples Given: = feature values + classes Objective: predict class for new feature values Important special case: binary classification Important special case: = 2 classes (e.g., yes/no) 7
Example: Binary classification with decision trees [Source: www.simafore.com] 8
Random forests (state-of-the-art method) [Source: blog.citizennet.com] 9
Key distinction: Classification procedure (classifier; model ): algorithm used for solving a classification problem e.g., decision tree 10
Key distinction: Classification procedure (classifier; model ): algorithm used for solving a classification problem e.g., decision tree Input: feature values Output: class (yes/no) 10
Key distinction: Classification procedure (classifier; model ): algorithm used for solving a classification problem e.g., decision tree Input: feature values Output: class (yes/no) Learning procedure: algorithm used for constructing a classifier e.g., C4.5 (well-known decision tree learning algorithm) 10
Key distinction: Classification procedure (classifier; model ): algorithm used for solving a classification problem e.g., decision tree Input: feature values Output: class (yes/no) Learning procedure: algorithm used for constructing a classifier e.g., C4.5 (well-known decision tree learning algorithm) Input: set of training data Output: classification procedure (decision tree) 10
Evaluation and Bias How to evaluate supervised ML algorithms? Key idea: Assess quality of predictions obtained Key idea: (e.g., from trained binary classifier) 11
Evaluation and Bias How to evaluate supervised ML algorithms? Key idea: Assess quality of predictions obtained Key idea: (e.g., from trained binary classifier) Prediction quality of binary classifiers accuracy: expected rate of misclassifications 11
Evaluation and Bias How to evaluate supervised ML algorithms? Key idea: Assess quality of predictions obtained Key idea: (e.g., from trained binary classifier) Prediction quality of binary classifiers accuracy: expected rate of misclassifications false positive rate: expected rate of incorrect yes predictions 11
Evaluation and Bias How to evaluate supervised ML algorithms? Key idea: Assess quality of predictions obtained Key idea: (e.g., from trained binary classifier) Prediction quality of binary classifiers accuracy: expected rate of misclassifications false positive rate: expected rate of incorrect yes predictions false negative rate: expected rate of incorrect no predictions 11
Evaluation and Bias How to evaluate supervised ML algorithms? Key idea: Assess quality of predictions obtained Key idea: (e.g., from trained binary classifier) Prediction quality of binary classifiers accuracy: expected rate of misclassifications false positive rate: expected rate of incorrect yes predictions false negative rate: expected rate of incorrect no predictions trade-off (weighted average; ROC curve) 11
Caution: Typically, no single correct evaluation metric evaluation metrics can introduce unfairness / bias 12
Caution: Typically, no single correct evaluation metric evaluation metrics can introduce unfairness / bias especially when training sets are unbalanced (many more no than yes cases, prevalence/lack of input feature combinations) 12
Caution: Typically, no single correct evaluation metric evaluation metrics can introduce unfairness / bias especially when training sets are unbalanced (many more no than yes cases, prevalence/lack of input feature combinations) use great care when constructing training sets 12
Caution: Typically, no single correct evaluation metric evaluation metrics can introduce unfairness / bias especially when training sets are unbalanced (many more no than yes cases, prevalence/lack of input feature combinations) use great care when constructing training sets use multiple evaluation metrics 12
Caution: Typically, no single correct evaluation metric evaluation metrics can introduce unfairness / bias especially when training sets are unbalanced (many more no than yes cases, prevalence/lack of input feature combinations) use great care when constructing training sets use multiple evaluation metrics perform detailed evaluations (beyond simple metrics) 12
The problem of overfitting good performance on training data may not generalise to previously unseen data overfitting (well-known problem) 13
The problem of overfitting good performance on training data may not generalise to previously unseen data overfitting (well-known problem) detect overfitting using validation techniques hold-out validation: evaluate on set of test cases hold-out validation: strictly separate from training set 13
The problem of overfitting good performance on training data may not generalise to previously unseen data overfitting (well-known problem) detect overfitting using validation techniques hold-out validation: evaluate on set of test cases hold-out validation: strictly separate from training set cross-validation: like hold-out, but with many different cross-validation: training/test splits 13
The problem of overfitting good performance on training data may not generalise to previously unseen data overfitting (well-known problem) detect overfitting using validation techniques hold-out validation: evaluate on set of test cases hold-out validation: strictly separate from training set cross-validation: like hold-out, but with many different cross-validation: training/test splits prevent overfitting using regularisation techniques (= modification / specific setting of ML method used) 13
The problem of overfitting good performance on training data may not generalise to previously unseen data overfitting (well-known problem) detect overfitting using validation techniques hold-out validation: evaluate on set of test cases hold-out validation: strictly separate from training set cross-validation: like hold-out, but with many different cross-validation: training/test splits prevent overfitting using regularisation techniques (= modification / specific setting of ML method used) Caution: Overfitting can introduce bias! 13
Problematic features certain (input) features can help improve performance, but are inappropriate to use 14
Problematic features certain (input) features can help improve performance, but are inappropriate to use examples: 14
Problematic features certain (input) features can help improve performance, but are inappropriate to use examples: race, gender, sexual orientation 14
Problematic features certain (input) features can help improve performance, but are inappropriate to use examples: race, gender, sexual orientation using problematic features in machine learning can cause (unintentional) discrimination 14
Problematic features certain (input) features can help improve performance, but are inappropriate to use examples: race, gender, sexual orientation using problematic features in machine learning can cause (unintentional) discrimination Easy solution: do not use problematic features 14
Problematic features certain (input) features can help improve performance, but are inappropriate to use examples: race, gender, sexual orientation using problematic features in machine learning can cause (unintentional) discrimination Easy solution: do not use problematic features Wrong!! combinations of other, harmless features can yield equivalent information 14
Problematic features certain (input) features can help improve performance, but are inappropriate to use examples: race, gender, sexual orientation using problematic features in machine learning can cause (unintentional) discrimination Easy solution: do not use problematic features Wrong!! combinations of other, harmless features can yield equivalent information especially problematic for deep learning and other, powerful black-box methods 14
Problematic features certain (input) features can help improve performance, but are inappropriate to use examples: race, gender, sexual orientation using problematic features in machine learning can cause (unintentional) discrimination Easy solution: do not use problematic features Wrong!! combinations of other, harmless features can yield equivalent information especially problematic for deep learning and other, powerful black-box methods Better solution: careful, detailed evaluation 14
Explainability & Transparency Challenge: How can we trust an ML system? 15
Explainability & Transparency Challenge: How can we trust an ML system? carefully evaluate performance; identify strengths and weaknesses (requires detailed evaluation = computational experiments) 15
Explainability & Transparency Challenge: How can we trust an ML system? carefully evaluate performance; identify strengths and weaknesses (requires detailed evaluation = computational experiments) understand how it works 15
Explainability & Transparency Challenge: How can we trust an ML system? carefully evaluate performance; identify strengths and weaknesses (requires detailed evaluation = computational experiments) understand how it works understand its output 15
Key distinction: understanding a classifier (e.g., decision tree) vs understanding the training procedure that produced it 16
Key distinction: understanding a classifier (e.g., decision tree) vs understanding the training procedure that produced it Note: to understand a given classifier (and its output), we do not need to understand how it was built 16
Key distinction: understanding a classifier (e.g., decision tree) vs understanding the training procedure that produced it Note: to understand a given classifier (and its output), we do not need to understand how it was built understanding of what happens at every step does not mean understanding behaviour of an algorithm 16
Key distinction: understanding a classifier (e.g., decision tree) vs understanding the training procedure that produced it Note: to understand a given classifier (and its output), we do not need to understand how it was built understanding of what happens at every step does not mean understanding behaviour of an algorithm some classifiers are easier to understand than others 16
Neural networks [Source: www.texsample.net] 17
Deep learning uses neural networks with many layers 18
Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers 18
Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers idea + research dates back to 1960s/1970s 18
Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers idea + research dates back to 1960s/1970s successful real-world applications since the 1980s 18
Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers idea + research dates back to 1960s/1970s successful real-world applications since the 1980s very popular since 2012 18
Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers idea + research dates back to 1960s/1970s successful real-world applications since the 1980s very popular since 2012 impressive results in increasing number of application areas 18
Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers idea + research dates back to 1960s/1970s successful real-world applications since the 1980s very popular since 2012 impressive results in increasing number of application areas requires large amounts of data, specialised hardware, considerable human expertise + experimentation 18
Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers idea + research dates back to 1960s/1970s successful real-world applications since the 1980s very popular since 2012 impressive results in increasing number of application areas requires large amounts of data, specialised hardware, considerable human expertise + experimentation Caution! Deep learning machine learning AI 18
Deep neural networks are black-box methods easy to understand function of each neuron in the network; very hard / impossible to understand the behaviour of the network 19
Deep neural networks are black-box methods easy to understand function of each neuron in the network; very hard / impossible to understand the behaviour of the network lack of transparency / explainability 19
Deep neural networks are black-box methods easy to understand function of each neuron in the network; very hard / impossible to understand the behaviour of the network lack of transparency / explainability Possible remedies: principled, detailed evaluation of behaviour 19
Deep neural networks are black-box methods easy to understand function of each neuron in the network; very hard / impossible to understand the behaviour of the network lack of transparency / explainability Possible remedies: principled, detailed evaluation of behaviour use alternate methods with similar performance (e.g., random forests) 19
Deep neural networks are black-box methods easy to understand function of each neuron in the network; very hard / impossible to understand the behaviour of the network lack of transparency / explainability Possible remedies: principled, detailed evaluation of behaviour use alternate methods with similar performance (e.g., random forests) trade off performance against explainability 19
Deep neural networks are black-box methods easy to understand function of each neuron in the network; very hard / impossible to understand the behaviour of the network lack of transparency / explainability Possible remedies: principled, detailed evaluation of behaviour use alternate methods with similar performance (e.g., random forests) trade off performance against explainability frugal learning (new research direction) 19
Automated Machine Learning Machine learning is powerful 20
Automated Machine Learning Machine learning is powerful, but successful application is far from trivial. 20
Automated Machine Learning Machine learning is powerful, but successful application is far from trivial. Fundamental problem: Which of many available algorithms (models) applicable to given machine learning problem to use, and with which hyper-parameter settings? 20
Automated Machine Learning Machine learning is powerful, but successful application is far from trivial. Fundamental problem: Which of many available algorithms (models) applicable to given machine learning problem to use, and with which hyper-parameter settings? Example: WEKA contains 39 classification algorithms, Example: 3 8 feature selection methods 20
Automated Machine Learning Machine learning is powerful, but successful application is far from trivial. Fundamental problem: Which of many available algorithms (models) applicable to given machine learning problem to use, and with which hyper-parameter settings? Solution: Automatically select ML methods and hyper-parameter settings 20
Automated Machine Learning Machine learning is powerful, but successful application is far from trivial. Fundamental problem: Which of many available algorithms (models) applicable to given machine learning problem to use, and with which hyper-parameter settings? Solution: Automatically select ML methods and hyper-parameter settings Automated machine learning (AutoML) 20
AutoML... achieves substantial performance improvements over solutions hand-crafted by human experts 21
AutoML... achieves substantial performance improvements over solutions hand-crafted by human experts enables frugal learning (explainable/transparent ML) 21
AutoML... achieves substantial performance improvements over solutions hand-crafted by human experts enables frugal learning (explainable/transparent ML) helps non-experts effectively apply ML techniques 21
AutoML... achieves substantial performance improvements over solutions hand-crafted by human experts enables frugal learning (explainable/transparent ML) helps non-experts effectively apply ML techniques intense international research focus (academia + industry) 21
AutoML... achieves substantial performance improvements over solutions hand-crafted by human experts enables frugal learning (explainable/transparent ML) helps non-experts effectively apply ML techniques intense international research focus (academia + industry) ongoing research focus at LIACS (Leiden Institute of Advanced Computer Science); see ada.liacs.nl/projects, Auto-WEKA. 21
Take-Home Message Machine learning can (help to) solve many proplems 22
Take-Home Message Machine learning can (help to) solve many proplems... but is no panacea. 22
Take-Home Message Machine learning can (help to) solve many proplems... but is no panacea. Methods and results strongly depend on quantity + quality of input data. 22
Take-Home Message Machine learning can (help to) solve many proplems... but is no panacea. Methods and results strongly depend on quantity + quality of input data. Challenges: Risk of overfitting training data, hidden bias 22
Take-Home Message Machine learning can (help to) solve many proplems... but is no panacea. Methods and results strongly depend on quantity + quality of input data. Challenges: Risk of overfitting training data, hidden bias Lack of transparency, explainability 22
Take-Home Message Machine learning can (help to) solve many proplems... but is no panacea. Methods and results strongly depend on quantity + quality of input data. Challenges: Risk of overfitting training data, hidden bias Lack of transparency, explainability Human expertise: crucial for successful, responsible use 22
Take-Home Message Machine learning can (help to) solve many proplems... but is no panacea. Methods and results strongly depend on quantity + quality of input data. Challenges: Risk of overfitting training data, hidden bias Lack of transparency, explainability Human expertise: crucial for successful, responsible use Current + future research (far from solved) 22
Take-Home Message Machine learning can (help to) solve many proplems... but is no panacea. Methods and results strongly depend on quantity + quality of input data. Challenges: Risk of overfitting training data, hidden bias Lack of transparency, explainability Human expertise: crucial for successful, responsible use Current + future research (far from solved) AI should augment, not replace human expertise! (Likewise for machine learning.) 22