Multi-objective Evolutionary Approaches for ROC Performance Maximization Ke Tang USTC-Birmingham Joint Research Institute in Intelligent Computation and Its Applications (UBRI) School of Computer Science and Technology University of Science and Technology of China July 2014 @ USTC 1
Outline Introduction to ROC analysis Related works A Multi-Objective Evolutionary Approach to ROCCH maximization (CH-MOEA) Conclusions 2
Introduction to ROC Analysis Many real-world classification problems are either cost sensitive or have imbalanced class distribution. In such situations, a classifier with large classification accuracy might not make sense at all. Alternative performance metric is needed. In Big Data era, misclassification cost and class distribution may even change over time. 3
Introduction to ROC Analysis Confusion Matrix Predicted Positive Predicted Negative Positive True Positive rate False Negative rate Negative False Positive rate True Negative rate 4
Introduction to ROC Analysis Receiver Operating Characteristic (ROC) 5
Introduction to ROC Analysis ROC Curve A curve in the ROC space, generated by tuning the threshold of a classifier. f(x)=w T x+b 6
Introduction to ROC Analysis From ROC analysis to performance measure simple version: Area Under the ROC Curve (AUC) Complicated version: ROC Convex Hull (ROCCH) 7
Introduction to ROC Analysis An important characteristic of ROCCH: Under any target cost and class distributions, the best classifier for those conditions must be a vertex or on the edge of the convex hull of all classifiers. 8
Related Work Both AUC and ROCCH can be used as objective functions for training a classifier/learner. When seeking a (soft) classifier with maximum AUC or ROCCH, we actually seek a set of (hard) classifiers, e.g., classifiers with different thresholds. More intuitively, we tries to find a classifier that is roughly good (robust) and can be easily adapted to different misclassification costs, or class distributions. 9
Related Work AUC maximization is (in some circumstances), equivalent to a bipartite ranking problem, and can be addressed with learning-to-rank approaches. Rank-SVM (Joachims, 2005) Rankboost (Freund, 2003) ROCCH maximization more challenging than AUC-maximization problem. Can only be tackled with heuristic approaches PRIE (Fawcett, 2008) 10
CH-MOEA Existing approaches tries to obtain a set of homogenous classifiers in the sense that the classifiers only adopts different thresholds. Question: why the classifiers must be homogeneous? Heterogeneous classifiers might spread better in the ROC space. The difference between homogenous and heterogeneous classifiers make little difference in practical implementation. 11
CH-MOEA Our Target: Train a set of (Heterogeneous) classifiers such that the ROCCH is maximized. A set-based optimization problem can could hardly be solved with existing mathematical programming tools. Evolutionary Algorithms provides a natural way to search for the desired classifier set. 12
CH-MOEA In particular, multi-objective evolutionary algorithms are off-the-shelf tools for this problem. Maximize TP Minimize FP 13
' CH-MOEA General framework of EAs 1. Generate the initial population P (0) at random, and set i 0; 2. REPEAT & (a) Evaluate the fitness of each individual in P (i); (b) Select parents from P (i) based on their fitness in P (i); (c) Generate offspring from the parents using crossover and mutation to form P (i + 1); (d) i i + 1; 3. UNTIL halting criteria are satisfied 14
CH-MOEA What is the most famous MOEAs so far? Probably NSGA-II (Kalyan Deb, 2002), mainly famous for its selection scheme: 15
CH-MOEA However, directly application of NSGA-II (or ay other MOEA) might be inappropriate as: A non-dominated (or pareto optimal) solution is not necessarily on the convex hull. The objective space of the problem is essentially discrete (may cause redundant solutions) 16
CH-MOEA Our approach: Convex Hull-based MOEA (CH- MOEA) New features of CH-MOEA: Redundancy elimination A new sorting scheme dedicated to ROOCH maximization. 17
CH-MOEA Redundancy Elimination 18
CH-MOEA New sorting scheme for ROCCH maximization 19
CH-MOEA The CH-MOEA can be combined with any learning models that can be evolved Neural Network Decision Tree SVM Genetic Programming is adopted in our work, which can be viewed as the evolving a decision tree. 20
CH-MOEA Pseudo-code of CH-MOGP 21
CH-MOEA Dataset for empirical studies 22
CH-MOEA Compared methods 23
CH-MOEA CH-MOGP outperformed state-of-the-art MOEAs 24
CH-MOEA CH-MOGP outperformed other non-ea methods. 25
CH-MOEA CH-MOGP outperformed other non-ea methods. 26
Conclusions Cost-sensitive or class imbalance learning are commonly encountered in the real world. ROCCH fits these type of problem very well for its insensitivity with respect to misclassification cost and class distribution ROCCH is formulated as a special MOP that has not been well addressed by existing MOEAs. A new MOEA, namely CH-MOEA, is proposed to tackle this learning problem. CH-MOEA could be extended to any machine learning model. 27
Reference P. Wang, M. Emmerich, R. Li, K. Tang, T. Baeck and X. Yao, Convex Hull- Based Multi-objective Genetic Programming for Maximizing Receiver Operating Characteristic Performance, IEEE Transactions on Evolutionary Computation, in press (DOI: 10.1109/TEVC.2014.2305671). P. Wang, K. Tang, T. Weise, E. P. K. Tsang and X. Yao, Multiobjective Genetic Programming for Maximizing ROC Performance, Neurocomputing, 125: 102-118, February 2014. 28
Collaborators Dr. Pu Wang Prof. Xin Yao Prof. Edward Tsang Dr. Thomas Weise Dr. Michael Emmerich Dr. Rui Li Prof. Thomas Baeck 29
Thanks for your time! Q&A? 30