Cultivating disaster donors A case application of scalable analytics on massive data Ilya O. Ryzhov 1 Bin Han 2 Jelena Bradić 3 1 Robert H. Smith School of Business University of Maryland, College Park, MD 20742 2 Applied Mathematics, Statistics, and Scientific Computation University of Maryland, College Park, MD 20742 3 Department of Mathematics University of California San Diego, La Jolla, CA 92093 M&SOM Conference INSEAD July 29, 2013 1 / 35
Outline 1 Introduction 2 Statistical learning for massive data Statistical methodology Results and insights 3 Prescriptive analytics with optimal learning 4 Conclusions 2 / 35
Outline 1 Introduction 2 Statistical learning for massive data Statistical methodology Results and insights 3 Prescriptive analytics with optimal learning 4 Conclusions 3 / 35
STAART data: donors, disasters and designs 4 / 35
STAART data: layers and contents (a) All communications. (b) Gifts only. 5 / 35
Segment-specific strategies Has this person donated within the past 6 months? Did the appeal use dynamic ask amounts? Are both of these statements true? Some designs may have segment-specific effects. 6 / 35
Research objectives What are the determinants of campaign success rates? Are dynamic donation options an effective donor retention strategy? Should the stories emphasize relief or preparedness? Do gift items help convince donors to return? Does the most effective strategy differ by donor segment? How can we predict the effectiveness of the next campaign? How can we design the next campaign to be as effective as possible? Overall goal: provide insights into effective donor retention strategies, and help guide the development of new campaigns. 7 / 35
Research challenges Unobservable information: donor behaviour is affected by factors that the Red Cross cannot observe (PII) These factors have been widely studied based on economic panel data (Brown & Minty 2008, Brown et al. 2009, List 2011) However, we do not get to see them when actually designing a campaign Massive data: standard statistical methods work poorly when dealing with 8.6 million communications A widespread (yet inaccurate) view Large sample size is a good thing and it never causes trouble for statistical analysis. 8 / 35
Outline 1 Introduction 2 Statistical learning for massive data Statistical methodology Results and insights 3 Prescriptive analytics with optimal learning 4 Conclusions 9 / 35
Predicting the outcome of a single communication For communication j with donor i, we use the logistic regression model ( ) pij log = β 0 + β T x ij, 1 p ij where p ij probability that this communication will be successful x ij a p-vector of features of the communication β effects of the features We will modify this basic model to deal with the structure of the data. 10 / 35
Regression features The features of the jth communication with donor i are obtained from the data: Designs: 1 Does the communication use dynamic donation options? 2 Does the communication include a supporter card? 3 Does the communication have an option to donate online? 4 Which type of story is used for the communication? Donors: 1 Is the donor classified as Lapsed? (Acquisition? Renewal?) 2 Does the donor belong to the high donation class? 3 What is the recency of the donor? Cross terms: 1 Are we sending dynamic options to a Lapsed donor? 2 Are we sending a card to a donor with 0-6 mos. recency? Other features (e.g. previous gifts received from i) 11 / 35
Donor-level effects in panel data We are studying panel data, where communications are grouped by donor. We write ( ) pij log = β 0 + β T x ij +b i, 1 p ij where b i is an effect specific to donor account i. We use a random-effect model, where b i N ( 0,s 2). 12 / 35
Why use random effects? Donor behaviour is affected by factors that are unobservable to the Red Cross (income, demographics, etc.) The dataset represents a sample of a larger population, and the donor pool changes over time A fixed-effect model is computationally intractable (there are over 1 million donors) 13 / 35
Penalized maximum-likelihood estimation We choose β and s 2 by solving (β,s ) = arg max β,s logl(β,s), where l is the relevant likelihood function To focus on the key drivers of donor retention, we solve (β,s ) = arg min β,s logl(β,s)+λ β 1, with an extra penalty for non-zero values of β Lasso method: trade-off between accuracy and conciseness of the model 14 / 35
Why use Lasso? Typically λ is chosen to optimize criteria such as AIC, BIC, or cross-validation Thus, the regularized model will actually have more predictive power than the original model (with λ = 0) Lasso also addresses the problem of empirical correlation between columns of data The output has an intuitive managerial interpretation (identifying the determinants of success) 15 / 35
The challenge of massive data The likelihood function l(β,s) = I i=1 N i j=1 ( e xt ij β+b i 1 + e xt ij β+b i ) yij ( 1 1 + e xt ij β+b i is extremely time-consuming to optimize for 8.6 million communications ) 1 yij e b2 i 2s 2 2πs 2 db i A fixed-effect model avoids numerical integration, but has a much larger p The models work on paper, but the software cannot handle massive data 16 / 35
Illustration of small-sample analysis 17 / 35
Illustration of small-sample analysis 17 / 35
Illustration of small-sample analysis 17 / 35
Illustration of small-sample analysis 17 / 35
Illustration of small-sample analysis 17 / 35
Illustration of small-sample analysis 17 / 35
Illustration of small-sample analysis 17 / 35
Discussion of small-sample analysis The size of each small sample can be around N 0.7, a small fraction of the overall size of the dataset Computational speed-up is much more than 10 times, so it is easy to analyze many samples A feature is significant if it is selected in over 50% of small samples Theoretical results show that we can control the bias of the procedure and the number of false positives (Kleiner et al. 2012, Bradić 2013) 18 / 35
Summary of our approach 1 Use a logistic regression model to predict success/failure of an communication based on donor/design characteristics 2 Add random effects to compensate for unobservable variation between donors 3 Reduce model size and extract key determinants through model selection and the Lasso method 4 Handle massive data by considering many small samples from the big dataset 19 / 35
Model I: design information only Number of selected features over 120 small samples (8.6M communications, 197 total features): 20 / 35
Model I: design information only Highlights of the analysis (notable positive and negative effects): Feature Avg. coefficient Std. deviation Card 0.2602 0.0521 Dynamic options/renewal type 0.1362 0.0780 Preparedness story 0.3209 0.0285 Renewal type 0.2918 0.0773 Allow choice of fund -1.7889 0.1479 Dynamic options/acquisition type -1.9435 0.1768 Dynamic options/lapsed type -2.5819 0.4101 Generic story/generic type -1.0266 0.0448 The effect of dynamic options heavily depends on the campaign type. 21 / 35
Model I: design information only Breakdown of p-values for selected features across small samples: 22 / 35
Model II: design/segmentation information Number of selected features over 53 small samples (4.3M communications, 310 total features): 23 / 35
Model II: design/segmentation information Highlights of the analysis (notable positive and negative effects): Feature Avg. coefficient Std. deviation Allow choice of fund/0-6 mos. recency 0.1112 0.1454 Card 0.5766 0.0404 Dynamic options/0-6 mos. recency 0.1044 0.0385 Preparedness story 0.4080 0.0310 13-18 mos. recency -0.2765 0.0434 37-48 mos. recency -0.2796 0.1175 Generic story -0.7587 0.0493 Specific disaster story -0.5989 0.0389 Preparedness stories and supporter cards continue to be effective. 24 / 35
Model II: design/segmentation information Breakdown of p-values for selected features across small samples: 25 / 35
Model III: campaign-oriented We also studied the data at an aggregate (campaign) level The model ( ) pij log = β 0 + β T x ij + b i, 1 p ij is the same, but p ij is now the success rate of the jth campaign on the ith donor segment There are 60 campaigns on 952 segments, so no small-sample analysis is required We use this model to corroborate the results of the first two 26 / 35
Model III: campaign-oriented Highlights of the analysis (notable positive and negative effects): Feature Estimate Std. deviation Allow choice of fund/0-6 mos. recency 0.61133 0.33262 Card 0.66038 0.18892 Dynamic options/0-6 mos. recency 0.22205 0.10655 Preparedness story 0.21504 0.08808 13-18 mos. recency -0.40337 0.06540 Dynamic options/lapsed type -0.86614 0.43200 Generic story -0.33655 0.13033 Specific disaster story -0.38501 0.07145 This corroborates our findings on dynamic options, story types, fund choices, and supporter cards. 27 / 35
Model III: reducing empirical correlation Lasso eliminates columns of data with strong empirical correlation: 28 / 35
Model III: reducing empirical correlation Lasso eliminates columns of data with strong empirical correlation: 28 / 35
Summary of insights 1 Dynamic options: This strategy works well for current supporters of the program, but not for one-time or lapsed donors 2 Relief vs. preparedness: Preparedness stories comprise about 10% of RC appeals, but appear to be very effective 3 Gift items: Among all the various items, only supporter cards appear to contribute to campaign success 4 Donors, designs, and disasters: Donors appear to make little distinction between disaster types 29 / 35
Outline 1 Introduction 2 Statistical learning for massive data Statistical methodology Results and insights 3 Prescriptive analytics with optimal learning 4 Conclusions 30 / 35
From descriptive to prescriptive 31 / 35
Decision-making with optimal learning Experience-based learning: improve the belief model in real time, after every new campaign Requires a concise model that can be updated quickly and easily The empirical results can be used to initialize the model We then use the most recent beliefs to design the next campaign Anticipatory learning: forecasting future changes the model before they occur Requires a way to measure the uncertainty or potential for improvement of the current model The margin for error factors into the next action as well 32 / 35
Outline 1 Introduction 2 Statistical learning for massive data Statistical methodology Results and insights 3 Prescriptive analytics with optimal learning 4 Conclusions 33 / 35
Conclusions Model selection and small-sample analysis can help extract key features from a massive dataset Statistical learning provides insights into dynamic options, supporter cards, preparedness stories, and fund choice Once key features have been selected, we can adapt the model to new information very quickly We have an algorithmic procedure for designing new campaigns; experiments are in progress 34 / 35
References Bradić, J. (2013) Efficient support recovery via weighted maximum-contrast subagging. Submitted for publication. Brown, P.H. & Minty, J.H. (2008) Media coverage and charitable giving after the 2004 tsunami. Southern Economic Journal 75(1), 9 25. Brown, S., Harris, M.N. & Taylor, K. (2011) Modeling charitable donations to an unexpected natural disaster: evidence from the U.S. Panel Study of Income Dynamics. Technical report, Department of Economics, University of Sheffield. Kleiner, A., Talwalkar, A., Sarkar, P. & Jordan, M.I. (2012) A scalable bootstrap for massive data. Arxiv preprint, arxiv:1112.5016. List, J.A. (2011) The market for charitable giving. Journal of Economic Perspectives 25(2), 157 180. Ryzhov, I.O., Han, B., Bradić, J. & Bradić, A. (2013) Cultivating disaster donors: a case application of scalable analytics on massive data. In revision at Management Science. 35 / 35