Cultivating disaster donors

Similar documents
The Good Judgment Project: A large scale test of different methods of combining expert predictions

Lecture 1: Machine Learning Basics

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CS Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Exploration. CS : Deep Reinforcement Learning Sergey Levine

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

The Effect of Income on Educational Attainment: Evidence from State Earned Income Tax Credit Expansions

Comparison of network inference packages and methods for multiple networks inference

Access Center Assessment Report

Python Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

Envision Success FY2014-FY2017 Strategic Goal 1: Enhancing pathways that guide students to achieve their academic, career, and personal goals

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Software Maintenance

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Junior (61-90 semester hours or quarter hours) Two-year Colleges Number of Students Tested at Each Institution July 2008 through June 2013

Psychometric Research Brief Office of Shared Accountability

Probability and Statistics Curriculum Pacing Guide

Why Did My Detector Do That?!

Learning From the Past with Experiment Databases

Higher Education Six-Year Plans

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Detailed course syllabus

Learning to Rank with Selection Bias in Personal Search

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

How to Judge the Quality of an Objective Classroom Test

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

NBER WORKING PAPER SERIES WOULD THE ELIMINATION OF AFFIRMATIVE ACTION AFFECT HIGHLY QUALIFIED MINORITY APPLICANTS? EVIDENCE FROM CALIFORNIA AND TEXAS

Predicting the Performance and Success of Construction Management Graduate Students using GRE Scores

12- A whirlwind tour of statistics

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

LANGUAGE DIVERSITY AND ECONOMIC DEVELOPMENT. Paul De Grauwe. University of Leuven

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

BMBF Project ROBUKOM: Robust Communication Networks

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Evaluation of ecodriving performances and teaching method: comparing training and simple advice

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Admitting Students to Selective Education Programs: Merit, Profiling, and Affirmative Action

Multi-Lingual Text Leveling

Speech Emotion Recognition Using Support Vector Machine

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Role Models, the Formation of Beliefs, and Girls Math. Ability: Evidence from Random Assignment of Students. in Chinese Middle Schools

A Comparison of Charter Schools and Traditional Public Schools in Idaho

We re Listening Results Dashboard How To Guide

STEPS TO EFFECTIVE ADVOCACY

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Knowledge Transfer in Deep Convolutional Neural Nets

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Individual Differences & Item Effects: How to test them, & how to test them well

PROGRAMME SYLLABUS International Management, Bachelor programme, 180

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

Multivariate k-nearest Neighbor Regression for Time Series data -

Model Ensemble for Click Prediction in Bing Search Ads

Development of Multistage Tests based on Teacher Ratings

Grade Dropping, Strategic Behavior, and Student Satisficing

PEER EFFECTS IN THE CLASSROOM: LEARNING FROM GENDER AND RACE VARIATION *

Working Paper: Do First Impressions Matter? Improvement in Early Career Teacher Effectiveness Allison Atteberry 1, Susanna Loeb 2, James Wyckoff 1

One Hour of Code 10 million students, A foundation for success

Graduation Initiative 2025 Goals San Jose State

w o r k i n g p a p e r s

arxiv: v1 [cs.cl] 2 Apr 2017

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Semi-Supervised Face Detection

Truth Inference in Crowdsourcing: Is the Problem Solved?

Investment in e- journals, use and research outcomes

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Linking Task: Identifying authors and book titles in verbose queries

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Len Lundstrum, Ph.D., FRM

On-the-Fly Customization of Automated Essay Scoring

SPM 5309: SPORT MARKETING Fall 2017 (SEC. 8695; 3 credits)

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

Effectiveness of McGraw-Hill s Treasures Reading Program in Grades 3 5. October 21, Research Conducted by Empirical Education Inc.

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Active Learning. Yingyu Liang Computer Sciences 760 Fall

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Global Television Manufacturing Industry : Trend, Profit, and Forecast Analysis Published September 2012

Firms and Markets Saturdays Summer I 2014

Corrective Feedback and Persistent Learning for Information Extraction

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Reducing Features to Improve Bug Prediction

MGT/MGP/MGB 261: Investment Analysis

STABILISATION AND PROCESS IMPROVEMENT IN NAB

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

Transcription:

Cultivating disaster donors A case application of scalable analytics on massive data Ilya O. Ryzhov 1 Bin Han 2 Jelena Bradić 3 1 Robert H. Smith School of Business University of Maryland, College Park, MD 20742 2 Applied Mathematics, Statistics, and Scientific Computation University of Maryland, College Park, MD 20742 3 Department of Mathematics University of California San Diego, La Jolla, CA 92093 M&SOM Conference INSEAD July 29, 2013 1 / 35

Outline 1 Introduction 2 Statistical learning for massive data Statistical methodology Results and insights 3 Prescriptive analytics with optimal learning 4 Conclusions 2 / 35

Outline 1 Introduction 2 Statistical learning for massive data Statistical methodology Results and insights 3 Prescriptive analytics with optimal learning 4 Conclusions 3 / 35

STAART data: donors, disasters and designs 4 / 35

STAART data: layers and contents (a) All communications. (b) Gifts only. 5 / 35

Segment-specific strategies Has this person donated within the past 6 months? Did the appeal use dynamic ask amounts? Are both of these statements true? Some designs may have segment-specific effects. 6 / 35

Research objectives What are the determinants of campaign success rates? Are dynamic donation options an effective donor retention strategy? Should the stories emphasize relief or preparedness? Do gift items help convince donors to return? Does the most effective strategy differ by donor segment? How can we predict the effectiveness of the next campaign? How can we design the next campaign to be as effective as possible? Overall goal: provide insights into effective donor retention strategies, and help guide the development of new campaigns. 7 / 35

Research challenges Unobservable information: donor behaviour is affected by factors that the Red Cross cannot observe (PII) These factors have been widely studied based on economic panel data (Brown & Minty 2008, Brown et al. 2009, List 2011) However, we do not get to see them when actually designing a campaign Massive data: standard statistical methods work poorly when dealing with 8.6 million communications A widespread (yet inaccurate) view Large sample size is a good thing and it never causes trouble for statistical analysis. 8 / 35

Outline 1 Introduction 2 Statistical learning for massive data Statistical methodology Results and insights 3 Prescriptive analytics with optimal learning 4 Conclusions 9 / 35

Predicting the outcome of a single communication For communication j with donor i, we use the logistic regression model ( ) pij log = β 0 + β T x ij, 1 p ij where p ij probability that this communication will be successful x ij a p-vector of features of the communication β effects of the features We will modify this basic model to deal with the structure of the data. 10 / 35

Regression features The features of the jth communication with donor i are obtained from the data: Designs: 1 Does the communication use dynamic donation options? 2 Does the communication include a supporter card? 3 Does the communication have an option to donate online? 4 Which type of story is used for the communication? Donors: 1 Is the donor classified as Lapsed? (Acquisition? Renewal?) 2 Does the donor belong to the high donation class? 3 What is the recency of the donor? Cross terms: 1 Are we sending dynamic options to a Lapsed donor? 2 Are we sending a card to a donor with 0-6 mos. recency? Other features (e.g. previous gifts received from i) 11 / 35

Donor-level effects in panel data We are studying panel data, where communications are grouped by donor. We write ( ) pij log = β 0 + β T x ij +b i, 1 p ij where b i is an effect specific to donor account i. We use a random-effect model, where b i N ( 0,s 2). 12 / 35

Why use random effects? Donor behaviour is affected by factors that are unobservable to the Red Cross (income, demographics, etc.) The dataset represents a sample of a larger population, and the donor pool changes over time A fixed-effect model is computationally intractable (there are over 1 million donors) 13 / 35

Penalized maximum-likelihood estimation We choose β and s 2 by solving (β,s ) = arg max β,s logl(β,s), where l is the relevant likelihood function To focus on the key drivers of donor retention, we solve (β,s ) = arg min β,s logl(β,s)+λ β 1, with an extra penalty for non-zero values of β Lasso method: trade-off between accuracy and conciseness of the model 14 / 35

Why use Lasso? Typically λ is chosen to optimize criteria such as AIC, BIC, or cross-validation Thus, the regularized model will actually have more predictive power than the original model (with λ = 0) Lasso also addresses the problem of empirical correlation between columns of data The output has an intuitive managerial interpretation (identifying the determinants of success) 15 / 35

The challenge of massive data The likelihood function l(β,s) = I i=1 N i j=1 ( e xt ij β+b i 1 + e xt ij β+b i ) yij ( 1 1 + e xt ij β+b i is extremely time-consuming to optimize for 8.6 million communications ) 1 yij e b2 i 2s 2 2πs 2 db i A fixed-effect model avoids numerical integration, but has a much larger p The models work on paper, but the software cannot handle massive data 16 / 35

Illustration of small-sample analysis 17 / 35

Illustration of small-sample analysis 17 / 35

Illustration of small-sample analysis 17 / 35

Illustration of small-sample analysis 17 / 35

Illustration of small-sample analysis 17 / 35

Illustration of small-sample analysis 17 / 35

Illustration of small-sample analysis 17 / 35

Discussion of small-sample analysis The size of each small sample can be around N 0.7, a small fraction of the overall size of the dataset Computational speed-up is much more than 10 times, so it is easy to analyze many samples A feature is significant if it is selected in over 50% of small samples Theoretical results show that we can control the bias of the procedure and the number of false positives (Kleiner et al. 2012, Bradić 2013) 18 / 35

Summary of our approach 1 Use a logistic regression model to predict success/failure of an communication based on donor/design characteristics 2 Add random effects to compensate for unobservable variation between donors 3 Reduce model size and extract key determinants through model selection and the Lasso method 4 Handle massive data by considering many small samples from the big dataset 19 / 35

Model I: design information only Number of selected features over 120 small samples (8.6M communications, 197 total features): 20 / 35

Model I: design information only Highlights of the analysis (notable positive and negative effects): Feature Avg. coefficient Std. deviation Card 0.2602 0.0521 Dynamic options/renewal type 0.1362 0.0780 Preparedness story 0.3209 0.0285 Renewal type 0.2918 0.0773 Allow choice of fund -1.7889 0.1479 Dynamic options/acquisition type -1.9435 0.1768 Dynamic options/lapsed type -2.5819 0.4101 Generic story/generic type -1.0266 0.0448 The effect of dynamic options heavily depends on the campaign type. 21 / 35

Model I: design information only Breakdown of p-values for selected features across small samples: 22 / 35

Model II: design/segmentation information Number of selected features over 53 small samples (4.3M communications, 310 total features): 23 / 35

Model II: design/segmentation information Highlights of the analysis (notable positive and negative effects): Feature Avg. coefficient Std. deviation Allow choice of fund/0-6 mos. recency 0.1112 0.1454 Card 0.5766 0.0404 Dynamic options/0-6 mos. recency 0.1044 0.0385 Preparedness story 0.4080 0.0310 13-18 mos. recency -0.2765 0.0434 37-48 mos. recency -0.2796 0.1175 Generic story -0.7587 0.0493 Specific disaster story -0.5989 0.0389 Preparedness stories and supporter cards continue to be effective. 24 / 35

Model II: design/segmentation information Breakdown of p-values for selected features across small samples: 25 / 35

Model III: campaign-oriented We also studied the data at an aggregate (campaign) level The model ( ) pij log = β 0 + β T x ij + b i, 1 p ij is the same, but p ij is now the success rate of the jth campaign on the ith donor segment There are 60 campaigns on 952 segments, so no small-sample analysis is required We use this model to corroborate the results of the first two 26 / 35

Model III: campaign-oriented Highlights of the analysis (notable positive and negative effects): Feature Estimate Std. deviation Allow choice of fund/0-6 mos. recency 0.61133 0.33262 Card 0.66038 0.18892 Dynamic options/0-6 mos. recency 0.22205 0.10655 Preparedness story 0.21504 0.08808 13-18 mos. recency -0.40337 0.06540 Dynamic options/lapsed type -0.86614 0.43200 Generic story -0.33655 0.13033 Specific disaster story -0.38501 0.07145 This corroborates our findings on dynamic options, story types, fund choices, and supporter cards. 27 / 35

Model III: reducing empirical correlation Lasso eliminates columns of data with strong empirical correlation: 28 / 35

Model III: reducing empirical correlation Lasso eliminates columns of data with strong empirical correlation: 28 / 35

Summary of insights 1 Dynamic options: This strategy works well for current supporters of the program, but not for one-time or lapsed donors 2 Relief vs. preparedness: Preparedness stories comprise about 10% of RC appeals, but appear to be very effective 3 Gift items: Among all the various items, only supporter cards appear to contribute to campaign success 4 Donors, designs, and disasters: Donors appear to make little distinction between disaster types 29 / 35

Outline 1 Introduction 2 Statistical learning for massive data Statistical methodology Results and insights 3 Prescriptive analytics with optimal learning 4 Conclusions 30 / 35

From descriptive to prescriptive 31 / 35

Decision-making with optimal learning Experience-based learning: improve the belief model in real time, after every new campaign Requires a concise model that can be updated quickly and easily The empirical results can be used to initialize the model We then use the most recent beliefs to design the next campaign Anticipatory learning: forecasting future changes the model before they occur Requires a way to measure the uncertainty or potential for improvement of the current model The margin for error factors into the next action as well 32 / 35

Outline 1 Introduction 2 Statistical learning for massive data Statistical methodology Results and insights 3 Prescriptive analytics with optimal learning 4 Conclusions 33 / 35

Conclusions Model selection and small-sample analysis can help extract key features from a massive dataset Statistical learning provides insights into dynamic options, supporter cards, preparedness stories, and fund choice Once key features have been selected, we can adapt the model to new information very quickly We have an algorithmic procedure for designing new campaigns; experiments are in progress 34 / 35

References Bradić, J. (2013) Efficient support recovery via weighted maximum-contrast subagging. Submitted for publication. Brown, P.H. & Minty, J.H. (2008) Media coverage and charitable giving after the 2004 tsunami. Southern Economic Journal 75(1), 9 25. Brown, S., Harris, M.N. & Taylor, K. (2011) Modeling charitable donations to an unexpected natural disaster: evidence from the U.S. Panel Study of Income Dynamics. Technical report, Department of Economics, University of Sheffield. Kleiner, A., Talwalkar, A., Sarkar, P. & Jordan, M.I. (2012) A scalable bootstrap for massive data. Arxiv preprint, arxiv:1112.5016. List, J.A. (2011) The market for charitable giving. Journal of Economic Perspectives 25(2), 157 180. Ryzhov, I.O., Han, B., Bradić, J. & Bradić, A. (2013) Cultivating disaster donors: a case application of scalable analytics on massive data. In revision at Management Science. 35 / 35