Machine Learning Opportunities and Limitations

Machine Learning Opportunities and Limitations Holger H. Hoos LIACS Universiteit Leiden The Netherlands LCDS Conference 2017/11/28

The age of computation Clear, precise instructions flawlessly executed 1

The age of computation Clear, precise instructions flawlessly executed algorithms = recipes for data processing 1

The age of computation Clear, precise instructions flawlessly executed algorithms = recipes for data processing predictable results, behaviour 1

The age of computation Clear, precise instructions flawlessly executed algorithms = recipes for data processing predictable results, behaviour performance guarantees 1

The age of computation Clear, precise instructions flawlessly executed algorithms = recipes for data processing predictable results, behaviour performance guarantees trusted, effective solutions to complex problems 1

The age of advanced computation AI vast amounts of cheap computation 2

The age of advanced computation AI vast amounts of cheap computation automatically designed algorithms 2

The age of advanced computation AI vast amounts of cheap computation automatically designed algorithms effective but complex, heuristic, black-box methods 2

Key idea: explicit programming learning / automatic adaptation to data 3

Key idea: explicit programming learning / automatic adaptation to data Success stories: game playing (e.g., Go, poker) 3

Key idea: explicit programming learning / automatic adaptation to data Success stories: game playing (e.g., Go, poker) medical diagnosis (lung disease) 3

Key idea: explicit programming learning / automatic adaptation to data Success stories: game playing (e.g., Go, poker) medical diagnosis (lung disease) transportation (autonomous driving) 3

Key idea: explicit programming learning / automatic adaptation to data Success stories: game playing (e.g., Go, poker) medical diagnosis (lung disease) transportation (autonomous driving) energy (demand prediction and trading) 3

The Machine Learning Revolution machine learning (ML) = automatic construction of software that works well on given data 4

The Machine Learning Revolution machine learning (ML) = automatic construction of software that works well on given data ideas reach back to 1950s (Alan Turing) 4

The Machine Learning Revolution machine learning (ML) = automatic construction of software that works well on given data ideas reach back to 1950s (Alan Turing) based on statistics, mathematical optimisation and principled experimentation (heuristic mechanisms) 4

Supervised vs unsupervised ML unsupervised: discover patterns in data 5

Supervised vs unsupervised ML unsupervised: discover patterns in data data mining 5

Supervised vs unsupervised ML unsupervised: discover patterns in data data mining (e.g., clustering) 5

Supervised vs unsupervised ML unsupervised: discover patterns in data data mining (e.g., clustering) supervised: make predictions based on known training examples 5

Supervised vs unsupervised ML unsupervised: discover patterns in data data mining (e.g., clustering) supervised: make predictions based on known training examples statistical modelling 5

Supervised vs unsupervised ML unsupervised: discover patterns in data data mining (e.g., clustering) supervised: make predictions based on known training examples statistical modelling Key assumption: training data is representative Key assumption: of application scenario 5

Regression Example: predict plant growth for given set Example: of environmental conditions 6

Regression Example: predict plant growth for given set Example: of environmental conditions Given: set of training examples Given: = feature values + numerical outputs 6

Regression Example: predict plant growth for given set Example: of environmental conditions Given: set of training examples Given: = feature values + numerical outputs Objective: predict output for new feature values 6

Classification Example: predict whether someone takes a loan Example: based on demographic + personal financial data 7

Classification Example: predict whether someone takes a loan Example: based on demographic + personal financial data Given: set of training examples Given: = feature values + classes 7

Classification Example: predict whether someone takes a loan Example: based on demographic + personal financial data Given: set of training examples Given: = feature values + classes Objective: predict class for new feature values 7

Example: Binary classification with decision trees [Source: www.simafore.com] 8

Random forests (state-of-the-art method) [Source: blog.citizennet.com] 9

Key distinction: Classification procedure (classifier; model ): algorithm used for solving a classification problem e.g., decision tree 10

Key distinction: Classification procedure (classifier; model ): algorithm used for solving a classification problem e.g., decision tree Input: feature values Output: class (yes/no) 10

Key distinction: Classification procedure (classifier; model ): algorithm used for solving a classification problem e.g., decision tree Input: feature values Output: class (yes/no) Learning procedure: algorithm used for constructing a classifier e.g., C4.5 (well-known decision tree learning algorithm) 10

Evaluation and Bias How to evaluate supervised ML algorithms? Key idea: Assess quality of predictions obtained Key idea: (e.g., from trained binary classifier) 11

Evaluation and Bias How to evaluate supervised ML algorithms? Key idea: Assess quality of predictions obtained Key idea: (e.g., from trained binary classifier) Prediction quality of binary classifiers accuracy: expected rate of misclassifications false positive rate: expected rate of incorrect yes predictions 11

Caution: Typically, no single correct evaluation metric evaluation metrics can introduce unfairness / bias 12

Caution: Typically, no single correct evaluation metric evaluation metrics can introduce unfairness / bias especially when training sets are unbalanced (many more no than yes cases, prevalence/lack of input feature combinations) use great care when constructing training sets 12

The problem of overfitting good performance on training data may not generalise to previously unseen data overfitting (well-known problem) 13

The problem of overfitting good performance on training data may not generalise to previously unseen data overfitting (well-known problem) detect overfitting using validation techniques hold-out validation: evaluate on set of test cases hold-out validation: strictly separate from training set cross-validation: like hold-out, but with many different cross-validation: training/test splits 13

Problematic features certain (input) features can help improve performance, but are inappropriate to use 14

Problematic features certain (input) features can help improve performance, but are inappropriate to use examples: 14

Problematic features certain (input) features can help improve performance, but are inappropriate to use examples: race, gender, sexual orientation 14

Problematic features certain (input) features can help improve performance, but are inappropriate to use examples: race, gender, sexual orientation using problematic features in machine learning can cause (unintentional) discrimination Easy solution: do not use problematic features Wrong!! combinations of other, harmless features can yield equivalent information 14

Explainability & Transparency Challenge: How can we trust an ML system? 15

Explainability & Transparency Challenge: How can we trust an ML system? carefully evaluate performance; identify strengths and weaknesses (requires detailed evaluation = computational experiments) 15

Explainability & Transparency Challenge: How can we trust an ML system? carefully evaluate performance; identify strengths and weaknesses (requires detailed evaluation = computational experiments) understand how it works 15

Key distinction: understanding a classifier (e.g., decision tree) vs understanding the training procedure that produced it 16

Key distinction: understanding a classifier (e.g., decision tree) vs understanding the training procedure that produced it Note: to understand a given classifier (and its output), we do not need to understand how it was built understanding of what happens at every step does not mean understanding behaviour of an algorithm 16

Neural networks [Source: www.texsample.net] 17

Deep learning uses neural networks with many layers 18

Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers 18

Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers idea + research dates back to 1960s/1970s 18

Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers idea + research dates back to 1960s/1970s successful real-world applications since the 1980s 18

Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers idea + research dates back to 1960s/1970s successful real-world applications since the 1980s very popular since 2012 18

Deep learning uses neural networks with many layers AlphaGo Zero: 84 layers idea + research dates back to 1960s/1970s successful real-world applications since the 1980s very popular since 2012 impressive results in increasing number of application areas requires large amounts of data, specialised hardware, considerable human expertise + experimentation 18

Deep neural networks are black-box methods easy to understand function of each neuron in the network; very hard / impossible to understand the behaviour of the network 19

Deep neural networks are black-box methods easy to understand function of each neuron in the network; very hard / impossible to understand the behaviour of the network lack of transparency / explainability Possible remedies: principled, detailed evaluation of behaviour use alternate methods with similar performance (e.g., random forests) 19

Automated Machine Learning Machine learning is powerful 20

Automated Machine Learning Machine learning is powerful, but successful application is far from trivial. 20

Automated Machine Learning Machine learning is powerful, but successful application is far from trivial. Fundamental problem: Which of many available algorithms (models) applicable to given machine learning problem to use, and with which hyper-parameter settings? 20

AutoML... achieves substantial performance improvements over solutions hand-crafted by human experts 21

AutoML... achieves substantial performance improvements over solutions hand-crafted by human experts enables frugal learning (explainable/transparent ML) 21

AutoML... achieves substantial performance improvements over solutions hand-crafted by human experts enables frugal learning (explainable/transparent ML) helps non-experts effectively apply ML techniques intense international research focus (academia + industry) ongoing research focus at LIACS (Leiden Institute of Advanced Computer Science); see ada.liacs.nl/projects, Auto-WEKA. 21

Take-Home Message Machine learning can (help to) solve many proplems 22

Take-Home Message Machine learning can (help to) solve many proplems... but is no panacea. 22

Take-Home Message Machine learning can (help to) solve many proplems... but is no panacea. Methods and results strongly depend on quantity + quality of input data. 22

Take-Home Message Machine learning can (help to) solve many proplems... but is no panacea. Methods and results strongly depend on quantity + quality of input data. Challenges: Risk of overfitting training data, hidden bias Lack of transparency, explainability Human expertise: crucial for successful, responsible use 22