Comprehensible Data Mining: Gaining Insight from Data

Similar documents
Rule Learning With Negation: Issues Regarding Effectiveness

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Rule Learning with Negation: Issues Regarding Effectiveness

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Lecture 1: Machine Learning Basics

A Case Study: News Classification Based on Term Frequency

Learning From the Past with Experiment Databases

ESC Declaration and Management of Conflict of Interest Policy

Gridlocked: The impact of adapting survey grids for smartphones. Ashley Richards 1, Rebecca Powell 1, Joe Murphy 1, Shengchao Yu 2, Mai Nguyen 1

Critical Thinking in Everyday Life: 9 Strategies

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

learning collegiate assessment]

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Applications of data mining algorithms to analysis of medical data

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Lecture 1: Basic Concepts of Machine Learning

Software Maintenance

CS Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Australian Journal of Basic and Applied Sciences

Welcome. Paulo Goes Dean, Eller College of Management Welcome Our region

CSL465/603 - Machine Learning

Mining Association Rules in Student s Assessment Data

School Year 2017/18. DDS MySped Application SPECIAL EDUCATION. Training Guide

Reducing Features to Improve Bug Prediction

Running head: DELAY AND PROSPECTIVE MEMORY 1

A Comparison of Standard and Interval Association Rules

Summary results (year 1-3)

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

CS 446: Machine Learning

Early Warning System Implementation Guide

Cooking Matters at the Store Evaluation: Executive Summary

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

University Library Collection Development and Management Policy

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

COLLEGE OF INTEGRATED CHINESE MEDICINE ADMISSIONS POLICY

Firms and Markets Saturdays Summer I 2014

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

This Access Agreement is for only, to align with the WPSA and in light of the Browne Review.

Research computing Results

File # for photo

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Probability estimates in a scenario tree

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

LANGUAGE DIVERSITY AND ECONOMIC DEVELOPMENT. Paul De Grauwe. University of Leuven

Effectiveness of Electronic Dictionary in College Students English Learning

INTERNAL MEDICINE IN-TRAINING EXAMINATION (IM-ITE SM )

THE QUEEN S SCHOOL Whole School Pay Policy

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Detailed Instructions to Create a Screen Name, Create a Group, and Join a Group

Guide for Fieldwork Educators

The Keele University Skills Portfolio Personal Tutor Guide

Truth Inference in Crowdsourcing: Is the Problem Solved?

Field Experience Management 2011 Training Guides

University of Cincinnati College of Medicine. DECISION ANALYSIS AND COST-EFFECTIVENESS BE-7068C: Spring 2016

Softprop: Softmax Neural Network Backpropagation Learning

Doctor of Public Health (DrPH) Degree Program Curriculum for the 60 Hour DrPH Behavioral Science and Health Education

Word Segmentation of Off-line Handwritten Documents

Ministry of Education General Administration for Private Education ELT Supervision

(Sub)Gradient Descent

This Access Agreement is for only, to align with the WPSA and in light of the Browne Review.

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

On-Line Data Analytics

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

We re Listening Results Dashboard How To Guide

Python Machine Learning

MYCIN. The MYCIN Task

University of Essex Access Agreement

Getting Started with Deliberate Practice

Universidade do Minho Escola de Engenharia

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Why Did My Detector Do That?!

Inside the mind of a learner

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Institution-Set Standards: CTE Job Placement Resources. February 17, 2016 Danielle Pearson, Institutional Research

Mining Student Evolution Using Associative Classification and Clustering

Price Sensitivity Analysis

ANNUAL SCHOOL REPORT SEDA COLLEGE SUITE 1, REDFERN ST., REDFERN, NSW 2016

Capturing and Organizing Prior Student Learning with the OCW Backpack

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Programme Specification. MSc in Palliative Care: Global Perspectives (Distance Learning) Valid from: September 2012 Faculty of Health & Life Sciences

What is a Mental Model?

Houghton Mifflin Online Assessment System Walkthrough Guide

Chapter 2 Rule Learning in a Nutshell

Meet the Experts Fall Freebie November 5, 2015

Livermore Valley Joint Unified School District. B or better in Algebra I, or consent of instructor

Bayley scales of Infant and Toddler Development Third edition

TABLE OF CONTENTS Credit for Prior Learning... 74

Addressing TB in the Mines: A Multi- Sector Approach in Practice

Word learning as Bayesian inference

NATIONAL CENTER FOR EDUCATION STATISTICS RESPONSE TO RECOMMENDATIONS OF THE NATIONAL ASSESSMENT GOVERNING BOARD AD HOC COMMITTEE ON.

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Transcription:

Comprehensible Data Mining: Gaining Insight from Data Michael J. Pazzani Information and Computer Science University of California, Irvine pazzani@ics.uci.edu http://www.ics.uci.edu/~pazzani

Outline UC Irvine s data mining program KDD: Goals: Gaining insight from data Methods: Learn predictive and/or descriptive models Conclusion: Not all models provide insight» Validate Findings» Deliver Findings Comprehensibility and Prior Knowledge Expert IF/Then Rules Monotonocity constraints Negative Interactions Knowledge placed in the perspective of what is already known. - Dr Ruth David

University of California, Irvine Ph.D and M.S. with focus on data mining Rina Dechter Bayesian Networks Richard Granger Neural Networks Dennis Kibler Inductive Learning Richard Lathrop Learning and Molecular Biology Michael Pazzani Knowledge-intensive learning Padhraic Smyth Probabilistic Models & KDD Archive of over 100 databases used in learning research http://www.ics.uci.edu/~mlearn Proprietary databases analyzed in conjunction with sponsors

Applications Telephone(NYNEX)- Diagnosis of local loop. Economic Sanctions (RAND)- Predict whether economic sanctions will have desired goal. Foreign Trade Negotiations (ORD)- Predict conditions under partner will make a concession. Pharmaceutical- Dementia- (UCI and CERAD)- Screening for Alzheimer s disease. Cognitive and Functional questionnaires Supermarket scanner data User Profiles- text & demographics

Summary A variety of techniques can learn predictive models that exceed or rival the performance of human experts Demonstrating predictive accuracy is not sufficient for adopting a predictive model. Experts will not gain any insight from a relationship that they don t believe Signs of acceptance Publication in peer-reviewed journals Adopted in practice Experts give more credence to models that don t unnecessarily violate prior expectations

Economic Sanctions In 1983, Australia refused to sell uranium to France, unless France ceased nuclear testing in the South Pacific. France paid a higher price to buy uranium from South Africa. In 1980, the US refused to sell grain to the Soviet Union unless the Soviet Union withdrew troops from Afghanistan. The Soviet Union paid a higher price to buy grain from Argentina and did not withdraw from Afghanistan.

Regression Predicting amount of effect of sanctions as a linear combination of variables. Hufbauer, Schott & Elliot (1985). Economic sanctions Reconsidered. Institute for International Economics Effect= 12.23-0.94SCOST + 0.17TCOST +10.26WW-0.16Cooperation-0.24 Years R 2 =.21 Selecting and Inventing relevant variables Equation doesn t always make sense

Learning Rules and Trees Least General Generalization: If an English speaking democracy that imports oil threatens a country in the Northern Hemisphere that has a strong economic health and exports weapons, then the sanction will fail because a country in the Southern Hemisphere will sell them the product. Decision Tree Language of Source English... French Location of Target Exports of Target

Dementia Screening Analysis of data collected by the Consortium to Establish a Registry for Alzheimer s Disease (CERAD) Distinguish normal or mildly impaired patients Demographic data (age, gender, education, occupation) Answers to Cognitive Questionnaires Mini-Mental Status Exam Blessed Orientation, Memory and Concentration e.g., remember address: John Brown, 42 Market Street, Chicago Current usage is a simple threshold on the number of errors If there are more than 9 mistakes, then the patient is impaired Accuracy 49.0%;sensitivity 13.7%; specificity 99.27%

Learning Rules for Dementia Screening IF the years of education of the patient is > 5 AND the patient does not know the date AND the patient does not know the name of a nearby street THEN The patient is NORMAL OTHERWISE IF the number of repetitions before correctly reciting the address is > 2 AND the age of the patient is > 86 THEN The patient is NORMAL OTHERWISE IF the years of education of the patient is > 9 AND the mistakes recalling the address is < 2 THEN The patient is NORMAL OTHERWISE The patient is IMPAIRED

Accuracy of Learned Models Algorithm General Practitioner ~60% Neurologists ~85% C4.5 86.7 C4.5 rules 82.6 Naïve Bayes 88.7 FOCL 90.6 Accuracy Although accuracy is acceptable, experts were hesitant to accept rules because they violated the intended use of the tests Getting a question right used as sign of dementia Getting questions wrong used as evidence against dementia. 2.13 violations for an average rule

Comprehensibility of Learned Models Pruning- Simplicity bias Delete unnecessarily complex structures Visualization Interactive Exploration of Complex Structures Iteration- Delete, invent variables Change parameters, learning algorithm Consistency with existing knowledge Strong Domain Theories Weak Domain Theories Association Rules

Simpler isn t always better Most work in ML and KDD equates understandable with concise A. If the native language of the country is English Then the sales of leisure products will be high B. If there is a large population with high income and there is a free market economy Then the sales of leisure products will be high Problem- There are often many models with similar complexity consistent with the data A. If the average height < 6foot6inch Then the the team will score on fast breaks B. If the average time at 40m is < 4.2 sec Then the the team will score on fast breaks

Visualizing Incomprehensible Decision Trees

Comprehensibility and Prior Knowledge When creating models from data, there are many possible models with equivalent predictive power. Understandability by users should be used to constrain model selection. One factor that influences understandability is consistency with domain knowledge.

Explanation-based Learning: Using Strong Domain Knowledge Explain why an item belongs to a class Retain features of examples used in explanation If the supply of an object decreases Then the price will increase If a country has strong economic health, Then it can tolerate a price increase. If a country that exports a commonly available commodity tries to coerce a wealthy country, the sanction will fail because the country will buy the commodity at a higher price from another supplier Constrained to learning implications of existing knowledge

Theory Revision: Revising Expert Rules Focus inductive learning on correcting errors in existing knowledge Search for revisions to domain theory- add or delete rules or tests from rules Experts prefer revision of expert rules to learning new rules Condition Original Revised None NA 68.0 Novice rules 44.0 70.0 Original 61.3 73.3 expert rules Revised expert rules 72.0 81.3

Monotonicity Constraints Problem: In some domains, experts know direction of effect of variable but not necessary and sufficient causal account. Spurious correlations and uninformed selections from statistically indistinguishable tests resulted in rules that aren t understandable Monotonicity Constraints: Only use tests in intended direction For each numeric variable: Specify if increasing values are known to increase likelihood of class membership For each nominal variable: Specify which values are known to increase likelihood of class membership No effect on accuracy (90.7 vs. 90.6) or length (4.3 vs. 4.6) in dementia screening

Learning a Clause with Monotonicity Constraints Impaired 600 normal 400 Age < 68 125 150 Age < 72 170 250 Age >= 68 475 250 Recall < 2 425 350 Months >= 2 500 50 Months < 2 100 350 Age < 68 100 30 Age < 72 170 40 Age >= 68 450 20 Recall < 2 375 300 Gender = F 275 20 Gender = M 225 30 p 1 log 2 p 1 p 1 +n 1 -log 2 p 0 p 0 +n 0 Gender = F 250 5 Gender =M 200 15 Recall >= 2 125 18 Recall < 2 325 2 Count >= 1 400 10 Count < 1 50 10

Learning Understandable Rules for Dementia Screening IF the years of education of the patient is > 5 AND the mistakes recalling the address is < 2 THEN The patient is NORMAL OTHERWISE IF the years of education of the patient is > 11 AND the errors made saying the months backward is < 2 THEN The patient is NORMAL OTHERWISE IF the years of education of the patient is > 17 THEN The patient is NORMAL OTHERWISE The patient is IMPAIRED

Do experts prefer rules without constraint violations? Procedure: generated 8 decision lists with and without monotonicity constraints (on different subsets of the CERAD) Asked 2 neurologists to rate each rule on 1-10 scale: How willing would you be to follow the decision rule in screening for cognitively impaired patients N1: with 5.56 without 3.25 t (15) = 6.60, p <.001. N2: with 2.38 without 0.25 t (15) = 5.09, p <.001. Correlation Neurologist 1 Neurologist 2 Violations.433.623 Number of tests.208.020 Number of clauses.278.011

Learning Monotonicity Constraints Q: Where do monotonicity constraints come from? A: Learn them from the entire training set When considering a test (selection bias) 1. Most informative on partition of data set under consideration 2. Informative on the entire training set Rationale: A variable that has the opposite effect under special circumstances is exceptional Disadvantage: Cannot detect negative interactions among variables. Preference Bias rather Selection Bias: Negative interaction must be significantly superior (using chi square at 0.95 level) when used

Accuracy Results Selection Bias Selection Bias with Pruning

Current Research Directions Learning user profiles from feedback and demographics Explaining difference between models Understand algorithms Spot changes in trends Identify discrepancy between specification and implementation Classification of time series data for intruder detection

Conclusion: Adding knowledge to data mining gives more control over output To be understandable, learned concepts should conform to the cognitive biases of human experts. Experts prefer rules learned with monotonicity constraints. Current work: Explore other constraints Expert judgement on learned monotonicity constraints. Consistent contrast Use of abstraction in concept definitions UCI wants your data (particularly unstructured) Publicly available archive Work with us under nondisclosure agreements