General Insurance Claims Modelling with Factor Collapsing and Bayesian Model Averaging

Similar documents
Karan Thompson Consulting Ltd Management, Cultural & Arts Consultancy PRESENTED TO EXECUTIVE OFFICER MUSIC NETWORK 22 AUGUST 2007 QUESTIONNAIRE

The Johnstown Estate Enfield, Co. Meath.

Lecture 1: Machine Learning Basics

Assignment 1: Predicting Amazon Review Ratings

STA 225: Introductory Statistics (CT)

Introduction to Simulation

CS Machine Learning

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Generative models and adversarial training

Probability and Statistics Curriculum Pacing Guide

Python Machine Learning

Seminar - Organic Computing

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

A study of speaker adaptation for DNN-based speech synthesis

Attachment No. 4 to Report. Forward Planning Section Report To The New Schools Establishment Group. New Post-Primary Schools

Detailed course syllabus

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

The Divergent Lexicon: Lexical Overlap Decreases With Age in a Large Corpus of Conversational Speech

Is there a limit to how often I can attempt the Irish Language Requirement by this method?

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Data Fusion Through Statistical Matching

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Go fishing! Responsibility judgments when cooperation breaks down

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Acquiring Competence from Performance Data

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Why Did My Detector Do That?!

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

Grade 6: Correlated to AGS Basic Math Skills

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Uncertainty concepts, types, sources

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Planning with External Events

Evolutive Neural Net Fuzzy Filtering: Basic Description

Effective Pre-school and Primary Education 3-11 Project (EPPE 3-11)

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Role Models, the Formation of Beliefs, and Girls Math. Ability: Evidence from Random Assignment of Students. in Chinese Middle Schools

Application of Virtual Instruments (VIs) for an enhanced learning environment

learning collegiate assessment]

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Probability and Game Theory Course Syllabus

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Probabilistic Latent Semantic Analysis

Task Completion Transfer Learning for Reward Inference

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

12- A whirlwind tour of statistics

Reinforcement Learning by Comparing Immediate Reward

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Laboratorio di Intelligenza Artificiale e Robotica

Artificial Neural Networks written examination

Exploration. CS : Deep Reinforcement Learning Sergey Levine

How the Guppy Got its Spots:

MASTER OF PHILOSOPHY IN STATISTICS

On-Line Data Analytics

Towards a Robuster Interpretive Parsing

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Georgetown University at TREC 2017 Dynamic Domain Track

Math Placement at Paci c Lutheran University

PM tutor. Estimate Activity Durations Part 2. Presented by Dipo Tepede, PMP, SSBB, MBA. Empowering Excellence. Powered by POeT Solvers Limited

Task Completion Transfer Learning for Reward Inference

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Communities in Networks. Peter J. Mucha, UNC Chapel Hill

Conference Presentation

Corpus Linguistics (L615)

Guidelines for Project I Delivery and Assessment Department of Industrial and Mechanical Engineering Lebanese American University

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

The Strong Minimalist Thesis and Bounded Optimality

Bluetooth mlearning Applications for the Classroom of the Future

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

An Online Handwriting Recognition System For Turkish

NIH Public Access Author Manuscript J Prim Prev. Author manuscript; available in PMC 2009 December 14.

On-the-Fly Customization of Automated Essay Scoring

The Effect of Collaborative Partnerships on Interorganizational

arxiv: v1 [cs.lg] 15 Jun 2015

Model Ensemble for Click Prediction in Bing Search Ads

A Reinforcement Learning Variant for Control Scheduling

Statistics and Data Analytics Minor

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Your Guide to the New Train The Trainer

Personal Statement David Draper Professor and Chair Department of Applied Mathematics and Statistics (AMS) University of California, Santa Cruz

Finding Your Friends and Following Them to Where You Are

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Transcription:

General Insurance Claims Modelling with Factor Collapsing and Bayesian Model Averaging Sen HU, Dr Adrian O Hagan, Prof Brendan Murphy June 13, 2017

Motivation: Model uncertainty with variable selection how confident we should be about the final model Existence of high multi-level factors - a factor having too many levels for a GLM structure model parsimony and interpretability issues lack of sufficient number of observations insignificant levels should be merged (too many parameters) 2 questions to answer: Which categorical predictors should be included in the model? Which categories within one categorical predictor should be distinguished? Insight Centre for Data Analytics June 13, 2017 Slide 2

Motivation Factor collapsing (FC) assesses the optimal manner of categories: which differs from one another w.r.t dependent variable uncertainty about the optimal manner Bayesian model averaging (BMA) takes such model uncertainty into consideration: variable selection uncertainty factor level selection uncertainty Insight Centre for Data Analytics June 13, 2017 Slide 3

Example: a question from "faraway" package [1] Standard GLM output in R, for "Make" predictor in frequency model Standard GLM output in R, for "Kilometres" predictor in severity model Insight Centre for Data Analytics June 13, 2017 Slide 4

Factor collapsing Set partition: grouping elements within a set into non-empty subsets, in such a way that every element is included in one and only one subsets. ("partitions" R package [2]) {{1}, {2}, {3}} {{1, 2}, {3}} Partitioning 3-element set {1, 2, 3}: {{1, 3}, {2}} {{1}, {2, 3}} {{1, 2, 3}} variable removed Fit each (combination of) partition into a pre-specified model Bell number increases nearly exponentially Insight Centre for Data Analytics June 13, 2017 Slide 5

BMA Use BMA to average the best models (where possible) K Pr( D) = Pr( M k, D)Pr(M k D) (1) k=1 P(M k D) exp(.5bic k ) K r=0 exp(.5bic r ) (2) Average over model prediction Average over model coefficients Insight Centre for Data Analytics June 13, 2017 Slide 6

Stochastic search Number of set partitions increases nearly exponentially computationally intensive it becomes an optimisation problem Insight Centre for Data Analytics June 13, 2017 Slide 7

Simulated Annealing Global optimisation technique based on Monte Carlo method, similar to the MC 3 technique proposed in Hoeting et al. (1999) [3]. Starting from a random state Make random state changes, accepting worse moves with probability determined by temperature Reduce temperature after reaching (close-to) equilibrium Stop once temperature gets very small Other stochastic optimisation methods also work for this non-linear non-differentiable objective function, such as genetic algorithm etc. Insight Centre for Data Analytics June 13, 2017 Slide 8

FC-BMA illustration Comparing FC-BMA with stepwise selection using BIC/AIC: forward selection null model Insight Centre for Data Analytics June 13, 2017 Slide 9

FC-BMA illustration Comparing FC-BMA with stepwise selection using BIC/AIC: forward selection null model Insight Centre for Data Analytics June 13, 2017 Slide 10

FC-BMA illustration Comparing FC-BMA with stepwise selection using BIC/AIC: backward selection saturated model Insight Centre for Data Analytics June 13, 2017 Slide 11

FC-BMA illustration Comparing FC-BMA with stepwise selection using BIC/AIC: backward selection saturated model Insight Centre for Data Analytics June 13, 2017 Slide 12

FC-BMA illustration Comparing FC-BMA with stepwise selection using BIC/AIC: backward selection saturated model Insight Centre for Data Analytics June 13, 2017 Slide 13

FC-BMA illustration Comparing FC-BMA with stepwise selection using BIC/AIC: backward selection saturated model Insight Centre for Data Analytics June 13, 2017 Slide 14

FC-BMA illustration Comparing FC-BMA with stepwise selection using BIC/AIC: backward selection saturated model Insight Centre for Data Analytics June 13, 2017 Slide 15

FC-BMA illustration Comparing FC-BMA with stepwise selection using BIC/AIC: backward selection saturated model Insight Centre for Data Analytics June 13, 2017 Slide 16

FC-BMA illustration Comparing FC-BMA with stepwise selection using BIC/AIC: Insight Centre for Data Analytics June 13, 2017 Slide 17

Following up the example... Table: Results for collapsing "Make" factor only in frequency model. Here only the best 5 models (based on their BIC values) are shown. Make: 1, 2, 3, 4, 5, 6, 7, 8, 9 combination BIC BMA weight (1,8)(2)(3)(4)(5)(6)(7,9) 10301.11 0.34579 (1,8)(2,5)(3)(4)(6)(7,9) 10301.81 0.24257 (1,7,8)(2)(3)(4)(5)(6)(9) 10303.44 0.10764 (1,7,8)(2,5)(3)(4)(6)(9) 10304.15 0.07541 (1)(2)(3)(4)(5)(6)(7,8,9) 10304.92 0.05136 Insight Centre for Data Analytics June 13, 2017 Slide 18

Following up the example... Table: Result for collapsing Kilometres" factor only in severity model, only the best 5 models (based on BIC values) are shown. Kilometres: 1, 2, 3, 4, 5 combinations BIC BMA weight (1)(23)(45) 1874293 0.90779 (1)(2)(3)(45) 1874299 0.05977 (1)(23)(4)(5) 1874300 0.03043 (1)(2)(3)(4)(5) 1874305 0.00200 (1)(25)(3)(4) 1874338 0.00000 Insight Centre for Data Analytics June 13, 2017 Slide 19

Irish counties Irish county level clustering with an Irish GI insurer: Figure: Frequency Insight Centre for Data Analytics Figure: Severity June 13, 2017 Slide 20

County model coef. new coef. Waterford City -6.6556-6.6415 Unknown -6.6130-6.6415 Waterford County -6.6073-6.6415 Donegal County -6.5959-6.6415 Offaly County -6.5787-6.5733 Monaghan County -6.5670-6.5733 Kildare County -6.5638-6.5733 Wicklow County -6.5397-6.5733 Wexford County -6.5217-6.5733 South Tipperary -6.5063-6.5001 Cavan County -6.4809-6.5001 Clare County -6.4764-6.5001 Cork County -6.4738-6.5001 Louth County -6.4720-6.5001 South Dublin -6.4708-6.5001 Dun Laoghaire-Rathdown -6.4489-6.4648 Limerick County -6.4473-6.4648 Cork City -6.4385-6.4648 Fingal -6.4379-6.4648 North Tipperary -6.4323-6.4648 Limerick City -6.4306-6.4648 Kilkenny County -6.4299-6.4648 Laois County -6.3923-6.3766 Carlow County -6.3865-6.3766 Longford County -6.3813-6.3766 Westmeath County -6.3808-6.3766 Dublin City -6.3694-6.3766 Galway City -6.3421-6.3766 Galway County -6.3415-6.3766 Kerry County -6.3323-6.3766 Meath County -6.3282-6.3766 Roscommon County -6.3031-6.3766 Sligo County -6.2503-6.2106 Leitrim County -6.2282-6.2106 Mayo County -6.1615-6.2106 Insight Centre for Data Analytics June 13, 2017 Slide 21

Irish counties Figure: Frequency: before clustering Insight Centre for Data Analytics Figure: Frequency: after clustering June 13, 2017 Slide 22

Table: (Subset of) Frequency model coefficients for the baseline standard GLM, and results of FC-BMA. Categorical levels are of increasing order based on the standard GLM. Only 5 are selected here for illustration. Std. GLM BMA Model 1 Model 2 Model 3 Model 4 Model 5 BIC 62807.2927 62807.3039 62807.3972 62807.4069 62807.4294 Model weights of all selected models 0.0233 0.0232 0.0221 0.0220 0.0218 Model weights of the 5 models 0.2074 0.2062 0.1968 0.1959 0.1937 Waterford City -6.6556-6.6359-6.6414-6.6399-6.6326-6.6341-6.6311 Unknown -6.6130-6.6359-6.6414-6.6399-6.6326-6.6341-6.6311 Waterford County -6.6073-6.6359-6.6414-6.6399-6.6326-6.6341-6.6311 Donegal County -6.5959-6.6359-6.6414-6.6399-6.6326-6.6341-6.6311 Offaly County -6.5787-6.6218-6.5733-6.6399-6.6326-6.6341-6.6311 Monaghan County -6.5670-6.6080-6.5733-6.5732-6.6326-6.6341-6.6311 Kildare County -6.5638-6.5695-6.5733-6.5732-6.5689-6.5674-6.5645 Wicklow County -6.5397-6.5695-6.5733-6.5732-6.5689-6.5674-6.5645 Wexford County -6.5217-6.5695-6.5733-6.5732-6.5689-6.5674-6.5645 South Tipperary -6.5062-6.5263-6.5000-6.5023-6.5006-6.5674-6.5645 Cavan County -6.4809-6.5004-6.5000-6.5023-6.5006-6.5011-6.4980 Clare County -6.4764-6.5004-6.5000-6.5023-6.5006-6.5011-6.4980 Cork County -6.4738-6.5004-6.5000-6.5023-6.5006-6.5011-6.4980 Louth County -6.4720-6.5004-6.5000-6.5023-6.5006-6.5011-6.4980 South Dublin -6.4708-6.5004-6.5000-6.5023-6.5006-6.5011-6.4980 Insight Centre for Data Analytics June 13, 2017 Slide 23

Table: Prediction comparison in Swedish TPML dataset, using MSE, Gini index, concordance correlation coefficient (CCC), Wasserstein distance, Kolmogorov-Smirnov test (KS-test), KL divergence respectively. 80% and 20% split MSE Gini CCC Wass. KS-test KL no FC-BMA 266.9408 0.8266 0.9968 3.0340 0.0736(0.3045) 0.0122 Frequency FC-only 224.7803 0.8267 0.9943 2.9696 0.0788(0.2358) 0.0114 FC-BMA(5) 456.3766 0.8267 0.9973 4.2012 0.0778(0.2535) 0.0113 no FC-BMA 14748455 0.0567 0.0409 1948.3340 0.4489(0) 0.2191 Severity FC-only 14664567 0.0576 0.0667 1825.0540 0.4067(0) 0.2178 FC-BMA(5) 14666355 0.0576 0.0657 1822.9450 0.4033(0) 0.2178 Insight Centre for Data Analytics June 13, 2017 Slide 24

Summary FC-BMA deals with model selection and uncertainty, categorical level selection simultaneously. It helps improve the model parsimony, interpretability, and prediction. Compared with other existing methods in literature, it does not require deciding extra parameters. It can be a challenge to obtain the optimum through stochastic optimisation, and may take a long time to reach the optimum. Insight Centre for Data Analytics June 13, 2017 Slide 25

References J. Faraway. faraway: Function and datasets for books by Julian Faraway. In: R package version 1.0.7 (2016). R. K. S. Hankin. Additive integer partitions in R. In: Journal of Statistical Software, Code Snippets 16 (1 2006). Jennifer A Hoeting et al. Bayesian Model Averaging: A Tutorial. In: Statistical Science 14.4 (1999), pp. 382 417. ISSN: 08834237. Torsten Hothorn, Frank Bretz, and Peter Westfall. Simultaneous Inference in General Parametric Models. In: Biometrical Journal 50.3 (2008), pp. 346 363. Insight Centre for Data Analytics June 13, 2017 Slide 26

Q & A... Insight Centre for Data Analytics June 13, 2017 Slide 27