Customer Analytics Data Mining Techniques and applications to CRM: decision trees and neural networks Data Mining techniques Data mining, or knowledge discovery, is the process of discovering valid, novel and useful patterns in large data sets Many different data mining algorithms exist Statistics Decision trees Neural networks Clustering algorithms Rule induction algorithms Rough sets Genetic algorithms decision trees and neural networks are widely used, they are part of most data mining tools 2 Supervised vs unsupervised techniques Supervised learning techniques - guided by known output(decision trees, most neural network types) Or Unsupervised learning techniques use inputs only Clustering algorithms Kohonen Feature Maps (type of neural networks) They us use similarity measures 3 1
Decision Trees Tree-shaped structures Can be converted to rules 4 Decision Trees Decision trees are built by iterative process of splitting the data into partitions Many different algorithms Most common are: ID3, C4.5, CART, CHAID Algorithms differ in the number of splits and the diversity function: Gini index, Entropy 5 Decision trees example (adapted from Information Discovery.Inc,1996) Develop a tree to predict profit Manufact urer State City Smith CA Product Color Profit Los Angeles Blue High Smith AZ Flagstaff Green Low Adams NY NYC Blue High Adams AZ Flagstaff Red Low Johnson NY NYC Green Avg. Johnson CA Los Angeles Red Avg. 6 2
First step: the tree algorithm splits the original table into three tables using State attribute Table 1A : For State = AZ Manufacture r Smith Adams These two tables must be split further Stat City e Flagstaf AZ fflagstaf AZ Red Table 1B f : For State = CA Product Color Green Manufacture Stat City r e Smith CA Los Angeles Johnson CA Los Table 1C : For State Angeles = NY Profi t Low Low Product Color Blue Red Manufacture Stat Product Profi City r e Color t Adams NY NYC Blue High Johnson NY NYC Green Avg. This one is classified Profi t High Avg. A decision tree derived from this table 8 Corresponding rules This decision tree above can in fact be translated into a set of rules as follows: 1. IF State= AZ THEN Profit = Low; 2. IF State= CA and Manufacturer = Smith THEN Profit = High; 3. IF State= CA and Manufacturer = Johnson THEN Profit = Avg; 4. IF State= NY and Manufacturer = Adams THEN Profit = High; 5. IF State= NY and Manufacturer = Johnson THEN Profit = Avg; 9 3
Tree 2 10 Exercise Note: different trees are possible 1. Derive a different tree using manufacturer attribute first and then state 2. Derive a tree starting with colour attribute Represent the trees as rules Compare the rules 11 Classification And Regression Trees (CART) algorithm Based on binary recursive partitioning looks at all possible splits for all variables searches through them all ranks order each splitting rule on the basis of a quality-of-split criterion measured by a diversity function eg. Gini rule or entropy how well the splitting rule separates the classes contained in the parent node 12 4
Entropy Entropy is a measure of disorder If an object can be classified into n classes (Ci,... Cn) and the probability of an object belonging to class Ci is p(ci) the entropy of classification is 13 CART: Classification And Regression Trees once a best split is found, the search is repeated for each child node, until further splitting is impossible or some other criterion is met The tree is then tested on a testing sample(prediction/classification rate, error rate) 14 Decision trees, applications Decision trees can be used for: Prediction (eg. churn prediction) classification, (eg. into, good and bad accounts) Exploration Segmentation 15 5
Decision trees can be used for segmentation All Customers Young Old Income high Income low Segment1 Segment2 Segment3 (Zikmund et al 2003) 16 Decision trees can be used for Churn modelling 50 Churners 50 Non-churners New technology? ne old w 30 Churners 20 Churners 50 Non-churners 0 Non-churners Years as customer <= 2.3 years > 2.3 years 25 Churners 10 Non-churners 5 Churners 40 Non-churners Age <= 45 > 45 20 Churners 0 Non-churners 5 Churners 10 Non-churners (adapted from Berson et al, 2000) 17 Case study: data analysis for marketing Groth resources<ftp://ftp.prenhall.com/pub/ptr/ c++_programming.w-050/groth/> case study 8 18 6
Data mining exercises: building models using decision tress tutorial in building models using decision trees Students are to complete the decision tree tutorial from the Groth text, p 127-147 : Tool : KnowledgeSEEKER download and install (next week tutorial) from <ftp://ftp.prenhall.com/pub/ptr/c++_programmi ng.w-050/groth/> 19 References Groth R. (2000) Data Mining Also: Rules are Much More than Decision Trees, Information Discovery Inc. www.thearling.com Salford Systems White Paper Series, An Overview of CART Methodology Berson A.., Smith S., Thearling K (2000), Building Data Mining Applications for CRM, McGrow-Hill 20 7