Practical considerations about the implementation of some Machine Learning LGD models in companies

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Learning From the Past with Experiment Databases

Python Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Lecture 1: Machine Learning Basics

Assignment 1: Predicting Amazon Review Ratings

CS Machine Learning

Rule Learning with Negation: Issues Regarding Effectiveness

Probability and Statistics Curriculum Pacing Guide

Speech Emotion Recognition Using Support Vector Machine

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

learning collegiate assessment]

Model Ensemble for Click Prediction in Bing Search Ads

(Sub)Gradient Descent

STA 225: Introductory Statistics (CT)

Calibration of Confidence Measures in Speech Recognition

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

On-Line Data Analytics

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Linking Task: Identifying authors and book titles in verbose queries

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Universidade do Minho Escola de Engenharia

Word Segmentation of Off-line Handwritten Documents

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

CS 446: Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Analysis of Enzyme Kinetic Data

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

CSL465/603 - Machine Learning

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

The Oregon Literacy Framework of September 2009 as it Applies to grades K-3

Mining Association Rules in Student s Assessment Data

arxiv: v1 [cs.lg] 15 Jun 2015

Multi-label classification via multi-target regression on data streams

Generative models and adversarial training

Software Maintenance

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Learning to Rank with Selection Bias in Personal Search

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

TU-E2090 Research Assignment in Operations Management and Services

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

On-the-Fly Customization of Automated Essay Scoring

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Evidence for Reliability, Validity and Learning Effectiveness

Learning goal-oriented strategies in problem solving

Navigating the PhD Options in CMS

A Case Study: News Classification Based on Term Frequency

Diagnostic Test. Middle School Mathematics

Human Emotion Recognition From Speech

Intermediate Computable General Equilibrium (CGE) Modelling: Online Single Country Course

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

The Singapore Copyright Act applies to the use of this document.

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Mathematics subject curriculum

Applications of data mining algorithms to analysis of medical data

Implementing a tool to Support KAOS-Beta Process Model Using EPF

MGT/MGP/MGB 261: Investment Analysis

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

CSC200: Lecture 4. Allan Borodin

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Axiom 2013 Team Description Paper

Fourth Grade. Reporting Student Progress. Libertyville School District 70. Fourth Grade

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

A study of speaker adaptation for DNN-based speech synthesis

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Grade Dropping, Strategic Behavior, and Student Satisficing

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Softprop: Softmax Neural Network Backpropagation Learning

University Library Collection Development and Management Policy

Go fishing! Responsibility judgments when cooperation breaks down

Lecture 1: Basic Concepts of Machine Learning

Probability estimates in a scenario tree

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Reflective problem solving skills are essential for learning, but it is not my job to teach them

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Learning Methods for Fuzzy Systems

Detailed course syllabus

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

Handling Concept Drifts Using Dynamic Selection of Classifiers

Mathematics Success Grade 7

Discriminative Learning of Beam-Search Heuristics for Planning

Test Effort Estimation Using Neural Network

Practice Examination IREB

An OO Framework for building Intelligence and Learning properties in Software Agents

A Neural Network GUI Tested on Text-To-Phoneme Mapping

have professional experience before graduating... The University of Texas at Austin Budget difficulties

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

Transcription:

Practical considerations about the implementation of some Machine Learning LGD models in companies September 15 th 2017 Louvain-la-Neuve Sébastien de Valeriola Please read the important disclaimer at the end of this presentation

Storytelling In this talk, I would like to tell you a tale: 2

Storytelling In this talk, I would like to tell you a tale: in a not so far away land, 3

Storytelling In this talk, I would like to tell you a tale: in a not so far away land, 1. An actuary was asked by his employer, a mid-sized European bank, to improve their current Loss Given Default (LGD) model, 4

Credit Risk Assessment under Basel II/III Under Basel II/III*, banks may calculate credit risk using internal ratings based approach (IRB). For corporate credit, the directives distinguish between two possible alternatives: o Foundation IRB: estimating ratings scales only based on probabilities of default and adequately allocating each of the corporate loans to a specific rating class, o Advanced IRB: ratings scale is established considering not only the PD s but also all other credit parameters including Loss-Given-Default (LGD), Maturity adjustments, EAD with CFs, etc. Expected Loss (EL) = Probability of Default (PD) X Loss Given Default (LGD) X Exposure at Default (EAD) The likelihood of a borrower being unable to repay The fraction of exposure at default that is lost in the case of default The exposure at risk in the case of default 5 (*) i.e. CRR & CRD IV

Storytelling In this talk, I would like to tell you a tale: in a not so far away land, 1. An actuary was asked by his employer, a mid-sized European bank, to improve their current Loss Given Default (LGD) model, 2. Our actuary then began to explore the LGD models, and eventually opted for a Machine Learning model, 6

LGD models From the modelling point of view, LGD is the «poor little brother» of PD, as literature about LGD models is rather rare when compared to the one about PD models. In particular, no «standard benchmark» exists for LGD models. One property of LGD explains maybe at least partially this poverty: the observed distribution is generally bimodal. 7

Choosing a Machine Learning model for its predictive ability The deviance/error scores of the tested models clearly suggest to choose a Machine Learning model. 8

Storytelling In this talk, I would like to tell you a tale: in a not so far away land, 1. An actuary was asked by his employer, a mid-sized European bank, to improve their current Loss Given Default (LGD) model, 2. Our actuary then began to explore the LGD models, and eventually opted for a Machine Learning model, 3. Once all the technical stuff was complete, an important question arose: how to convince management (among others ) to effectively put the chosen model into production? 9

Understanding the machine learning methods The «quants» of the company are generally able to understand the technical details of the chosen model, and therefore trust its outputs, as they are convinced b by cross-validation, error measures and assesment plots. However, in order to be accepted and effectively put into production, the model and its outputs should also be understood and trusted by a large set of other protagonists, which are not necessarily «quantitative people»: decision-takers «above» the technical team (e.g. management/comitee/board), which take decision based on the output of the model, other departments of the company (e.g. Pricing department), the validators and regulators, Moreover, the European Parliament has adopted the General Data Protection Regulation, which will be in force in 2018. Among the new rules introduced by this legal text is the creation of a «right to explanation»: [... a data subject has the right to] meaningful information about the logic involved. (Article 13) This sentence is not cristal clear (for details, see GOODMAN and FLAXMAN (2016)), but 10

The problem When we say «understanding» a model, we mean «understanding how predictions are made» (for example, it is not necessary for the management to fully understand the calibration process) In the case of regression trees, understanding how the model predicts LGD values for new data points is not a problem, as it is very intuitive. However, some intepretation may still be required, as this model (and its output) is very different from the distribution-based models which are generally in production. In the case of more complex methods such as Bagging and Random forests, even understanding how the model predicts LGD values for new data points is rather difficult. Things may be even worse for Gradient Boosting Machine, Support Vector Machine and Neural Networks In the remainging part of this talk, I will give some pointers /ideas to overcome this difficulty. 11

Ideas to understand/interpret Machine Learning techniques 1. Implement ML techniques on top of traditional, in-production, models: a. Use ML technique to perform variables selection, b. Use ML technique to segment data before applying traditional model. 2. Put some parameters in the hands of the user: a. Handle the complexity parameter of the pruning process, b. Show and handle the trees behind an ensemble methods such as Random Forest. 3. Mimic properties of traditional models: a. Force monotonicity in a ML technique, b. Build confidance intervals in order to stick to traditional models output. 4. Local interpretation of the ML methods: a. Explain local predictions as linear combinations, b. Locally Interpretable Model-agnostic Explanations. 5. Global interpretation of the ML methods: a. Simplified Tree Ensemble Learner. 12

Variable selection First solution: stick to existing (e.g. in production) model and use machine learning techniques as guiding tools. For example, one could use a random forest to explore the data and decide which features we will select in the model (using the importance score produced by the Random Forest). In particular, this could be very useful to detect the needed interactions between variables, as it is generally difficult to select cross terms in traditional models. Example of such a process: 1. Fit GLM to the data (without any interaction) and extract the residuals, 2. Train a regression tree on these residuals in order to see which couple of variable comes first in the split, 3. Assess the significance of this interaction, 4. If it is significant, then fit once again the GLM, but this time specifying the interaction that was detected. 13

Segmentation Another possibility is to use the output of some ML technique to segment the data before applying traditional models on each segment. 14

Ideas to understand/interpret Machine Learning techniques 1. Implement ML techniques on top of traditional, in-production, models: a. Use ML technique to perform variables selection, b. Use ML technique to segment data before applying traditional model. 2. Put some parameters in the hands of the user: a. Handle the complexity parameter of the pruning process, b. Show and handle the trees behind an ensemble methods such as Random Forest. 3. Mimic properties of traditional models: a. Force monotonicity in a ML technique, b. Build confidance intervals in order to stick to traditional models output. 4. Local interpretation of the ML methods: a. Explain local predictions as linear combinations, b. Locally Interpretable Model-agnostic Explanations. 5. Global interpretation of the ML methods: a. Simplified Tree Ensemble Learner. 15

Giving the user the hand on some parameters: the idea A nice way to give users insights about how the ML techniques work and thus give them reasons to trust them is to let them «play» with some parameters. For example, 1. We can implement the pruning process in a Shiny tool and let the user set the complexity parameter by himself, 2. We can let the user explore the forest of regression trees which is the output of a Random Forest, select subsets of them and compute predictions, etc. 16

Giving the user the hand on some parameters 17

Giving the user the hand on some parameters 18

Ideas to understand/interpret Machine Learning techniques 1. Implement ML techniques on top of traditional, in-production, models: a. Use ML technique to perform variables selection, b. Use ML technique to segment data before applying traditional model. 2. Put some parameters in the hands of the user: a. Handle the complexity parameter of the pruning process, b. Show and handle the trees behind an ensemble methods such as Random Forest. 3. Mimic properties of traditional models: a. Force monotonicity in a ML technique, b. Build confidance intervals in order to stick to traditional models output. 4. Local interpretation of the ML methods: a. Explain local predictions as linear combinations, b. Locally Interpretable Model-agnostic Explanations. 5. Global interpretation of the ML methods: a. Simplified Tree Ensemble Learner. 19

Force monotonocity One of the hard-to-understand features of the ML techniques is the non-monotonicity. Let us for example consider the case of the business size of the debtor: Globally, this variable has a positive effect on the LGD, i.e. the corresponding parameter in a traditional model (e.g. coefficient β in a LM) would be significantly positive, However, it is possible that the effect of this variable is negative on a subset of the dataset, and that a regression tree would detect this. In order to «mimic» the monotonicity of the traditional models, it is possible to force monotonicity of ML techniques. In the case of regression trees (and other tree methods), we can constrain the calibration process: every split based on the chosen variable will be such that the prediction in, say, the child node situated on the left side of the split is larger, the prediction in the child node situated on the right side of the split is smaller. Some R package allow imposing such constraints, as XGBoost (available on github). 20

Computing confidence intervals for random forests As the output of most of traditional models involve confidence intervals, practitioners are sometimes disappointed that ML methods do not produce such output. While it is true that one very rarely sees it, it is possible to build such confidence intervals. For example, EFRON, HASTIE and WAGER (2014) use jackknife and infinitesimal jackknife to do so: jackknife: systematically recompute the statistic estimate, leaving out one observation at a time from the sample set, infinitesimal jackknife: instead of leaving one observation out, introduce weights and affect a slightly smaller weight to one observation (i.e. weights are equal for each observations except for this particular one). The R package randomforestci (available on github) implements this method. 21

Computing confidence intervals for random forests 22

Ideas to understand/interpret Machine Learning techniques 1. Implement ML techniques on top of traditional, in-production, models: a. Use ML technique to perform variables selection, b. Use ML technique to segment data before applying traditional model. 2. Put some parameters in the hands of the user: a. Handle the complexity parameter of the pruning process, b. Show and handle the trees behind an ensemble methods such as Random Forest. 3. Mimic properties of traditional models: a. Force monotonicity in a ML technique, b. Build confidance intervals in order to stick to traditional models output. 4. Local interpretation of the ML methods: a. Explain local predictions as linear combinations, b. Locally Interpretable Model-agnostic Explanations. 5. Global interpretation of the ML methods: a. Simplified Tree Ensemble Learner. 23

Explain predictions as a linear combination Local interpretations focus on a limited set of data points, for which they give explanations which are only locally correct. In the case of a regression tree, it is easy to give an interpretation as a «traditional model» (weither it is additive or multiplicative) by following the branches from the root to the chosen leaf. The resulting explanation is of course only valid for one leaf of the tree. This rather naive process can be extended to other treebased methods. Linear explanation of the prediction for the sixth leaf: LGD = 0.43 (general) 0.22 (loan or credit card) + 0.35 (closed loan) 0.10 (large recoverable amount) + 0.12 (non revolving loan) = 0.58 24

Locally Interpretable Model-agnostic Explanations The Locally Interpretable Model-agnostic Explanations (LIME) have been introduced by RIBEIRO, SINGH and GUESTRIN (2016). This method is model-agnostic because it considers the explained model as a black box, so that it can explain all types of models. It gets a trade-off between interpretability and fidelity. The process is the following: 1. Choose a limited set of features (the most important ones), 2. Sample points around the point we want to explain, 3. Use the black box model to obtain predictions for these neighbour points, 4. Fit an interpretable model on these predictions using weights ~ distance to the point we want to explain. This interpretable model could be a e.g. a linear model or a regression tree. The R package lime (available on github) implements this method. 25

Locally Interpretable Model-agnostic Explanations Explaining a default corresponding to a LGD of 0.4674411 using the four most important features of the Random Forest model: 26

Ideas to understand/interpret Machine Learning techniques 1. Implement ML techniques on top of traditional, in-production, models: a. Use ML technique to perform variables selection, b. Use ML technique to segment data before applying traditional model. 2. Put some parameters in the hands of the user: a. Handle the complexity parameter of the pruning process, b. Show and handle the trees behind an ensemble methods such as Random Forest. 3. Mimic properties of traditional models: a. Force monotonicity in a ML technique, b. Build confidance intervals in order to stick to traditional models output. 4. Local interpretation of the ML methods: a. Explain local predictions as linear combinations, b. Locally Interpretable Model-agnostic Explanations. 5. Global interpretation of the ML methods: a. Simplified Tree Ensemble Learner. 27

Simplified Tree Ensemble Learner Simplified Tree Ensemble Learner (STEL) can be applied to any ensemble method, and give a global intepretation of the model. The process of the InTrees method, presented in DENG (2014), is the following: 1. Transform the output of the ensemble method into a large set of rules (e.g. «Y 1 > 0 & Y 2 = yes predict 0.29») whose length can be rather large, 2. Compute rules metrics, such as importance, predictive ability, etc. 3. Prune the rules (i.e. reduce their length), 4. Select a set of efficient and non-redundant rules (building a new database and fitting a regularized random forest on it). The output of this process is a STEL, which consists in a list of rules ordered by priority. To obtain a prediction, the rules are applied, from the top to the bottom, to a new data point, until a rule condition is satisfied by the point. The prediction value corresponding to the rule then becomes the prediction for the new point. The prediction quality is of course worse that the original ensemble method, but can be rather close to it. R package InTrees (Interpretable Trees) is an implementation of this process. 28

Simplified Tree Ensemble Learner From a 100 trees random forest, R extract 5,292 rules. It then prunes the rules, as one can see in the length distribution: length before after 1 0 1720 2 2 2436 3 23 879 4 144 210 5 703 46 6 4420 1 n Length Frequence Error Condition Prediction Importance 1 2 0,45 0,03 2 2 0,11 0,03 3 2 0,12 0,06 4 2 0,03 0,05 5 2 0,04 0,01 6 2 0,06 0,01 7 2 0,29 0,05 8 2 0,27 0,05 defoffbalsheet <= 176.005 & loanstatus %in% c('closed') loanrevolving %in% c('yes') & loanrecovamount > 2115.555 defexposureat > 76818.65 & loanrecovamount <= 86292.95 defoffbalsheet > 295.355 & loanrecovamount > 30987.605 loannominalamount <= 9700 & loanrecovamount > 4293.385 loannominalamount <= 7250 & loanrecovamount > 2115.555 defexposureat > 6443.915 & loanrecovamount <= 8654.235 defexposureat <= 62184.495 & loanrecovamount > 2360.735 0,04 1,00 0,07 0,05 0,88 0,05 0,10 0,02 0,01 0,02 0,02 0,02 0,92 0,02 0,09 0,02 9 1 0,24 0,04 loanrecovamount > 74918.68 0,07 0,01 10 2 0,30 0,03 loanmaturitydate <= 41477.5 & loanrecovamount > 7372.245 0,05 0,01 29

End of the story? In this talk, I would like to tell you a tale: in a not so far away land, 1. An actuary was asked by his employer, a mid-sized European bank, to improve their current Loss Given Default (LGD) model, 2. Our actuary then began to explore the LGD models, and eventually opted for a Machine Learning model, 3. Once all the technical stuff was complete, an important question arose: how to convince management (among others ) to effectively put the chosen model into production? 4. Happy end? 30

Thank you for your attention! Some references: GOODMAN, B., and FLAXMAN, S., European Union regulations on algorithmic decision-making and a right to explanation, in ArXiv e-prints 1606.08813 (2016). EFRON, B., HASTIE, T. and WAGER, S., Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife, in Journal of Machine Learning Research 15 (2014), p. 1625-1651. RIBEIRO, M. T., SINGH, S., and GUESTRIN, C., Why Should I Trust You? Explaining the Predictions of Any Classifier, in KDD '16 Proceedings of the 22nd ACM SIGK4D International Conference on Knowledge Discovery and Data Mining (2016), p. 1135-1144. DENG, H., Interpreting Tree Ensembles with intrees, technical report (2014), online: http://arxiv.org/pdf/1408.5456v1.pdf. 31