Practical considerations about the implementation of some Machine Learning LGD models in companies September 15 th 2017 Louvain-la-Neuve Sébastien de Valeriola Please read the important disclaimer at the end of this presentation
Storytelling In this talk, I would like to tell you a tale: 2
Storytelling In this talk, I would like to tell you a tale: in a not so far away land, 3
Storytelling In this talk, I would like to tell you a tale: in a not so far away land, 1. An actuary was asked by his employer, a mid-sized European bank, to improve their current Loss Given Default (LGD) model, 4
Credit Risk Assessment under Basel II/III Under Basel II/III*, banks may calculate credit risk using internal ratings based approach (IRB). For corporate credit, the directives distinguish between two possible alternatives: o Foundation IRB: estimating ratings scales only based on probabilities of default and adequately allocating each of the corporate loans to a specific rating class, o Advanced IRB: ratings scale is established considering not only the PD s but also all other credit parameters including Loss-Given-Default (LGD), Maturity adjustments, EAD with CFs, etc. Expected Loss (EL) = Probability of Default (PD) X Loss Given Default (LGD) X Exposure at Default (EAD) The likelihood of a borrower being unable to repay The fraction of exposure at default that is lost in the case of default The exposure at risk in the case of default 5 (*) i.e. CRR & CRD IV
Storytelling In this talk, I would like to tell you a tale: in a not so far away land, 1. An actuary was asked by his employer, a mid-sized European bank, to improve their current Loss Given Default (LGD) model, 2. Our actuary then began to explore the LGD models, and eventually opted for a Machine Learning model, 6
LGD models From the modelling point of view, LGD is the «poor little brother» of PD, as literature about LGD models is rather rare when compared to the one about PD models. In particular, no «standard benchmark» exists for LGD models. One property of LGD explains maybe at least partially this poverty: the observed distribution is generally bimodal. 7
Choosing a Machine Learning model for its predictive ability The deviance/error scores of the tested models clearly suggest to choose a Machine Learning model. 8
Storytelling In this talk, I would like to tell you a tale: in a not so far away land, 1. An actuary was asked by his employer, a mid-sized European bank, to improve their current Loss Given Default (LGD) model, 2. Our actuary then began to explore the LGD models, and eventually opted for a Machine Learning model, 3. Once all the technical stuff was complete, an important question arose: how to convince management (among others ) to effectively put the chosen model into production? 9
Understanding the machine learning methods The «quants» of the company are generally able to understand the technical details of the chosen model, and therefore trust its outputs, as they are convinced b by cross-validation, error measures and assesment plots. However, in order to be accepted and effectively put into production, the model and its outputs should also be understood and trusted by a large set of other protagonists, which are not necessarily «quantitative people»: decision-takers «above» the technical team (e.g. management/comitee/board), which take decision based on the output of the model, other departments of the company (e.g. Pricing department), the validators and regulators, Moreover, the European Parliament has adopted the General Data Protection Regulation, which will be in force in 2018. Among the new rules introduced by this legal text is the creation of a «right to explanation»: [... a data subject has the right to] meaningful information about the logic involved. (Article 13) This sentence is not cristal clear (for details, see GOODMAN and FLAXMAN (2016)), but 10
The problem When we say «understanding» a model, we mean «understanding how predictions are made» (for example, it is not necessary for the management to fully understand the calibration process) In the case of regression trees, understanding how the model predicts LGD values for new data points is not a problem, as it is very intuitive. However, some intepretation may still be required, as this model (and its output) is very different from the distribution-based models which are generally in production. In the case of more complex methods such as Bagging and Random forests, even understanding how the model predicts LGD values for new data points is rather difficult. Things may be even worse for Gradient Boosting Machine, Support Vector Machine and Neural Networks In the remainging part of this talk, I will give some pointers /ideas to overcome this difficulty. 11
Ideas to understand/interpret Machine Learning techniques 1. Implement ML techniques on top of traditional, in-production, models: a. Use ML technique to perform variables selection, b. Use ML technique to segment data before applying traditional model. 2. Put some parameters in the hands of the user: a. Handle the complexity parameter of the pruning process, b. Show and handle the trees behind an ensemble methods such as Random Forest. 3. Mimic properties of traditional models: a. Force monotonicity in a ML technique, b. Build confidance intervals in order to stick to traditional models output. 4. Local interpretation of the ML methods: a. Explain local predictions as linear combinations, b. Locally Interpretable Model-agnostic Explanations. 5. Global interpretation of the ML methods: a. Simplified Tree Ensemble Learner. 12
Variable selection First solution: stick to existing (e.g. in production) model and use machine learning techniques as guiding tools. For example, one could use a random forest to explore the data and decide which features we will select in the model (using the importance score produced by the Random Forest). In particular, this could be very useful to detect the needed interactions between variables, as it is generally difficult to select cross terms in traditional models. Example of such a process: 1. Fit GLM to the data (without any interaction) and extract the residuals, 2. Train a regression tree on these residuals in order to see which couple of variable comes first in the split, 3. Assess the significance of this interaction, 4. If it is significant, then fit once again the GLM, but this time specifying the interaction that was detected. 13
Segmentation Another possibility is to use the output of some ML technique to segment the data before applying traditional models on each segment. 14
Ideas to understand/interpret Machine Learning techniques 1. Implement ML techniques on top of traditional, in-production, models: a. Use ML technique to perform variables selection, b. Use ML technique to segment data before applying traditional model. 2. Put some parameters in the hands of the user: a. Handle the complexity parameter of the pruning process, b. Show and handle the trees behind an ensemble methods such as Random Forest. 3. Mimic properties of traditional models: a. Force monotonicity in a ML technique, b. Build confidance intervals in order to stick to traditional models output. 4. Local interpretation of the ML methods: a. Explain local predictions as linear combinations, b. Locally Interpretable Model-agnostic Explanations. 5. Global interpretation of the ML methods: a. Simplified Tree Ensemble Learner. 15
Giving the user the hand on some parameters: the idea A nice way to give users insights about how the ML techniques work and thus give them reasons to trust them is to let them «play» with some parameters. For example, 1. We can implement the pruning process in a Shiny tool and let the user set the complexity parameter by himself, 2. We can let the user explore the forest of regression trees which is the output of a Random Forest, select subsets of them and compute predictions, etc. 16
Giving the user the hand on some parameters 17
Giving the user the hand on some parameters 18
Ideas to understand/interpret Machine Learning techniques 1. Implement ML techniques on top of traditional, in-production, models: a. Use ML technique to perform variables selection, b. Use ML technique to segment data before applying traditional model. 2. Put some parameters in the hands of the user: a. Handle the complexity parameter of the pruning process, b. Show and handle the trees behind an ensemble methods such as Random Forest. 3. Mimic properties of traditional models: a. Force monotonicity in a ML technique, b. Build confidance intervals in order to stick to traditional models output. 4. Local interpretation of the ML methods: a. Explain local predictions as linear combinations, b. Locally Interpretable Model-agnostic Explanations. 5. Global interpretation of the ML methods: a. Simplified Tree Ensemble Learner. 19
Force monotonocity One of the hard-to-understand features of the ML techniques is the non-monotonicity. Let us for example consider the case of the business size of the debtor: Globally, this variable has a positive effect on the LGD, i.e. the corresponding parameter in a traditional model (e.g. coefficient β in a LM) would be significantly positive, However, it is possible that the effect of this variable is negative on a subset of the dataset, and that a regression tree would detect this. In order to «mimic» the monotonicity of the traditional models, it is possible to force monotonicity of ML techniques. In the case of regression trees (and other tree methods), we can constrain the calibration process: every split based on the chosen variable will be such that the prediction in, say, the child node situated on the left side of the split is larger, the prediction in the child node situated on the right side of the split is smaller. Some R package allow imposing such constraints, as XGBoost (available on github). 20
Computing confidence intervals for random forests As the output of most of traditional models involve confidence intervals, practitioners are sometimes disappointed that ML methods do not produce such output. While it is true that one very rarely sees it, it is possible to build such confidence intervals. For example, EFRON, HASTIE and WAGER (2014) use jackknife and infinitesimal jackknife to do so: jackknife: systematically recompute the statistic estimate, leaving out one observation at a time from the sample set, infinitesimal jackknife: instead of leaving one observation out, introduce weights and affect a slightly smaller weight to one observation (i.e. weights are equal for each observations except for this particular one). The R package randomforestci (available on github) implements this method. 21
Computing confidence intervals for random forests 22
Ideas to understand/interpret Machine Learning techniques 1. Implement ML techniques on top of traditional, in-production, models: a. Use ML technique to perform variables selection, b. Use ML technique to segment data before applying traditional model. 2. Put some parameters in the hands of the user: a. Handle the complexity parameter of the pruning process, b. Show and handle the trees behind an ensemble methods such as Random Forest. 3. Mimic properties of traditional models: a. Force monotonicity in a ML technique, b. Build confidance intervals in order to stick to traditional models output. 4. Local interpretation of the ML methods: a. Explain local predictions as linear combinations, b. Locally Interpretable Model-agnostic Explanations. 5. Global interpretation of the ML methods: a. Simplified Tree Ensemble Learner. 23
Explain predictions as a linear combination Local interpretations focus on a limited set of data points, for which they give explanations which are only locally correct. In the case of a regression tree, it is easy to give an interpretation as a «traditional model» (weither it is additive or multiplicative) by following the branches from the root to the chosen leaf. The resulting explanation is of course only valid for one leaf of the tree. This rather naive process can be extended to other treebased methods. Linear explanation of the prediction for the sixth leaf: LGD = 0.43 (general) 0.22 (loan or credit card) + 0.35 (closed loan) 0.10 (large recoverable amount) + 0.12 (non revolving loan) = 0.58 24
Locally Interpretable Model-agnostic Explanations The Locally Interpretable Model-agnostic Explanations (LIME) have been introduced by RIBEIRO, SINGH and GUESTRIN (2016). This method is model-agnostic because it considers the explained model as a black box, so that it can explain all types of models. It gets a trade-off between interpretability and fidelity. The process is the following: 1. Choose a limited set of features (the most important ones), 2. Sample points around the point we want to explain, 3. Use the black box model to obtain predictions for these neighbour points, 4. Fit an interpretable model on these predictions using weights ~ distance to the point we want to explain. This interpretable model could be a e.g. a linear model or a regression tree. The R package lime (available on github) implements this method. 25
Locally Interpretable Model-agnostic Explanations Explaining a default corresponding to a LGD of 0.4674411 using the four most important features of the Random Forest model: 26
Ideas to understand/interpret Machine Learning techniques 1. Implement ML techniques on top of traditional, in-production, models: a. Use ML technique to perform variables selection, b. Use ML technique to segment data before applying traditional model. 2. Put some parameters in the hands of the user: a. Handle the complexity parameter of the pruning process, b. Show and handle the trees behind an ensemble methods such as Random Forest. 3. Mimic properties of traditional models: a. Force monotonicity in a ML technique, b. Build confidance intervals in order to stick to traditional models output. 4. Local interpretation of the ML methods: a. Explain local predictions as linear combinations, b. Locally Interpretable Model-agnostic Explanations. 5. Global interpretation of the ML methods: a. Simplified Tree Ensemble Learner. 27
Simplified Tree Ensemble Learner Simplified Tree Ensemble Learner (STEL) can be applied to any ensemble method, and give a global intepretation of the model. The process of the InTrees method, presented in DENG (2014), is the following: 1. Transform the output of the ensemble method into a large set of rules (e.g. «Y 1 > 0 & Y 2 = yes predict 0.29») whose length can be rather large, 2. Compute rules metrics, such as importance, predictive ability, etc. 3. Prune the rules (i.e. reduce their length), 4. Select a set of efficient and non-redundant rules (building a new database and fitting a regularized random forest on it). The output of this process is a STEL, which consists in a list of rules ordered by priority. To obtain a prediction, the rules are applied, from the top to the bottom, to a new data point, until a rule condition is satisfied by the point. The prediction value corresponding to the rule then becomes the prediction for the new point. The prediction quality is of course worse that the original ensemble method, but can be rather close to it. R package InTrees (Interpretable Trees) is an implementation of this process. 28
Simplified Tree Ensemble Learner From a 100 trees random forest, R extract 5,292 rules. It then prunes the rules, as one can see in the length distribution: length before after 1 0 1720 2 2 2436 3 23 879 4 144 210 5 703 46 6 4420 1 n Length Frequence Error Condition Prediction Importance 1 2 0,45 0,03 2 2 0,11 0,03 3 2 0,12 0,06 4 2 0,03 0,05 5 2 0,04 0,01 6 2 0,06 0,01 7 2 0,29 0,05 8 2 0,27 0,05 defoffbalsheet <= 176.005 & loanstatus %in% c('closed') loanrevolving %in% c('yes') & loanrecovamount > 2115.555 defexposureat > 76818.65 & loanrecovamount <= 86292.95 defoffbalsheet > 295.355 & loanrecovamount > 30987.605 loannominalamount <= 9700 & loanrecovamount > 4293.385 loannominalamount <= 7250 & loanrecovamount > 2115.555 defexposureat > 6443.915 & loanrecovamount <= 8654.235 defexposureat <= 62184.495 & loanrecovamount > 2360.735 0,04 1,00 0,07 0,05 0,88 0,05 0,10 0,02 0,01 0,02 0,02 0,02 0,92 0,02 0,09 0,02 9 1 0,24 0,04 loanrecovamount > 74918.68 0,07 0,01 10 2 0,30 0,03 loanmaturitydate <= 41477.5 & loanrecovamount > 7372.245 0,05 0,01 29
End of the story? In this talk, I would like to tell you a tale: in a not so far away land, 1. An actuary was asked by his employer, a mid-sized European bank, to improve their current Loss Given Default (LGD) model, 2. Our actuary then began to explore the LGD models, and eventually opted for a Machine Learning model, 3. Once all the technical stuff was complete, an important question arose: how to convince management (among others ) to effectively put the chosen model into production? 4. Happy end? 30
Thank you for your attention! Some references: GOODMAN, B., and FLAXMAN, S., European Union regulations on algorithmic decision-making and a right to explanation, in ArXiv e-prints 1606.08813 (2016). EFRON, B., HASTIE, T. and WAGER, S., Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife, in Journal of Machine Learning Research 15 (2014), p. 1625-1651. RIBEIRO, M. T., SINGH, S., and GUESTRIN, C., Why Should I Trust You? Explaining the Predictions of Any Classifier, in KDD '16 Proceedings of the 22nd ACM SIGK4D International Conference on Knowledge Discovery and Data Mining (2016), p. 1135-1144. DENG, H., Interpreting Tree Ensembles with intrees, technical report (2014), online: http://arxiv.org/pdf/1408.5456v1.pdf. 31