Activity: Evaluating Classifiers. Exercise 1. Exercise 1 - Data

Exercise 1 % Data for Evaluation exercises. % % We have run a classifier to learn whether someone will pass a course. % In the results below, we have 12 cases with learned % prediction and % actual result. % % %Case # Predicted Actual 01 yes yes 02 yes yes 03 yes yes 04 yes yes 05 yes yes 06 yes yes 07 yes no 08 no yes 09 no yes 10 no no 11 no no 12 no no Instructions Exercise 1 - Data Exercise 1 Data has a set of predicted and actual results for a system which predicts someone will pass a course. Based on this set: 1. What is the accuracy of this classifier? 2. What is the accuracy of the majority classifier for the same data?

3. Draw the confusion matrix for this classifier. 4. If you can admit as many students as you wish, would you use this system to decide who gets in? 5. If you only have room for seven students, would you use this system to decide who gets in?

Exercise 2 using Weka Note Weka provides a lot of information. This module does not go into detail about the more complex data. If students want to explore more on their own, the Weka documentation page (http://www.cs.waikato.ac.nz/ml/weka/documentation.html) provides information. This exercise assumes that the students have access to Weka. The questions can actually be addressed without running Weka by giving the students copies of Figures 3, 4, 5 and 8. Instructions 1. Start Weka and the Explorer. Figure 1: Weka start page

2. In the Preprocess tab, open the file credit-r.arff, which is in the installed Weka data file. This is a set of cases for deciding whether someone has good or bad credit. 3. In the Preprocess tab, with credit-r open: Figure 2: Opening credit-r Figure 3: credit-r open

3a. What is the relation being studied? (german_credit)? 3b. How many instances are there? (1000) 3c. Which attribute number are we trying to predict? In other words, which is the class? (#21) 3d. How many good cases are there? (700) How many bad? (300) 3e. Looking just at the information in the Preprocess tab, what is the accuracy for the majority classifier? (70%. We will predict good for all thousand cases, and be right for the 700 good cases, so 700/1000, or 70%. 4. Choose the Classify tab. Figure 4: Classifier tab with defaults

Look first at the defaults. 4a. What is the default Classifier? (ZeroR, which is what Weka calls the majority classifier). 4b. What is the default test protocol? (Cross-validation with a 10 fold split). What is the default class? (class, #21. Weka defaults to the last attribute) 5. Run the ZeroR classifier by clicking the start button. Figure 5: Result of running ZeroR

5a. How long did it take to run and test the classifier? Answer: no noticeable time, unless your computer is incredibly slow.) 5b. What is the accuracy? (70%) 5c. How many good credit applications are called bad? (0) 5d. How many bad credit applications are called good? (300) 5e: Is this useful? (no) 6. Change the classifier to SMO. (This is the Weka Support Vector Machine method. It s slower and not as good as SVM, but does not involve installing additional software). Turn off all options except Output confusion matrix. Run the SMO classifier by choosing Start. Figure 6: Choosing SMO, under Classifiers->Functions

Figure 7: Setting the options

Figure 8: SMO Classifier results

6a. How long did it take to run and test the classifier? (Typically, a few seconds; much longer than ZeroR, even for a few cases) 6b. What is the accuracy? (75.1%) 6c. How many good credit applications are called bad? (90) 6d. How many bad credit applications are called good? (159) 6e. Is this useful? (Note first that although the accuracy hasn t changed much, the confusion matrix is very different. So maybe. It s certainly more informative than the majority classifier, but you re still going to deny credit to some people who would be good credit risks, and give loans to 159 people who are bad risks. Possibly better than your human evaluators do? Comes down to the cost of making a mistake. 7. Advanced exercise: can the students do better with other classifiers, or other parameters? This can be an interesting exploration exercise. The typical classifiers with Weka defaults (decision tree, regression, neural nets, simple naive Bayes) do about the same, or slightly worse. Also note that neural nets (multilayer perceptrons in Weka) are very slow. With 10- fold cross-validation, this can take minutes for the credit-r data set.