CSC-272 Exam #2 March 20, 2015

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Rule Learning With Negation: Issues Regarding Effectiveness

Lecture 1: Machine Learning Basics

Learning From the Past with Experiment Databases

Association Between Categorical Variables

Rule Learning with Negation: Issues Regarding Effectiveness

A Case Study: News Classification Based on Term Frequency

Python Machine Learning

CS Machine Learning

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Assignment 1: Predicting Amazon Review Ratings

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Australian Journal of Basic and Applied Sciences

Chapter 2 Rule Learning in a Nutshell

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Functional Skills Mathematics Level 2 assessment

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

(Sub)Gradient Descent

Financial Education and the Credit Behavior of Young Adults

Grade 6: Correlated to AGS Basic Math Skills

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Introduction to Causal Inference. Problem Set 1. Required Problems

Word Segmentation of Off-line Handwritten Documents

Reducing Features to Improve Bug Prediction

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

May To print or download your own copies of this document visit Name Date Eurovision Numeracy Assignment

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

SARDNET: A Self-Organizing Feature Map for Sequences

Financing Education In Minnesota

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Northern Kentucky University Department of Accounting, Finance and Business Law Financial Statement Analysis ACC 308

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

CSL465/603 - Machine Learning

Data Fusion Through Statistical Matching

Evaluation of Teach For America:

Functional Maths Skills Check E3/L x

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Training Staff with Varying Abilities and Special Needs

Firms and Markets Saturdays Summer I 2014

Applications of data mining algorithms to analysis of medical data

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Office Hours: Mon & Fri 10:00-12:00. Course Description

Calibration of Confidence Measures in Speech Recognition

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Level 1 Mathematics and Statistics, 2015

Universidade do Minho Escola de Engenharia

Software Maintenance

Study Guide for Right of Way Equipment Operator 1

Australia s tertiary education sector

CS 446: Machine Learning

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Probabilistic Latent Semantic Analysis

Lecture 1: Basic Concepts of Machine Learning

Welcome to ACT Brain Boot Camp

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Softprop: Softmax Neural Network Backpropagation Learning

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

HWS Colleges' Social Norms Surveys Online. Survey of Student-Athlete Norms

Paper 2. Mathematics test. Calculator allowed. First name. Last name. School KEY STAGE TIER

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Learning Methods in Multilingual Speech Recognition

GCE. Mathematics (MEI) Mark Scheme for June Advanced Subsidiary GCE Unit 4766: Statistics 1. Oxford Cambridge and RSA Examinations

Paper Reference. Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier. Monday 6 June 2011 Afternoon Time: 1 hour 30 minutes

Julia Smith. Effective Classroom Approaches to.

CSC200: Lecture 4. Allan Borodin

Annual Report Accredited Member

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

Artificial Neural Networks written examination

Issues in the Mining of Heart Failure Datasets

Multivariate k-nearest Neighbor Regression for Time Series data -

Speech Recognition at ICSI: Broadcast News and beyond

12- A whirlwind tour of statistics

Generating Test Cases From Use Cases

EDEXCEL FUNCTIONAL SKILLS PILOT TEACHER S NOTES. Maths Level 2. Chapter 4. Working with measures

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

Using Proportions to Solve Percentage Problems I

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Tuesday 13 May 2014 Afternoon

Course Content Concepts

APA Basics. APA Formatting. Title Page. APA Sections. Title Page. Title Page

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A Computational Evaluation of Case-Assignment Algorithms

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Statistics and Data Analytics Minor

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Model Ensemble for Click Prediction in Bing Search Ads

Time series prediction

Comparison of network inference packages and methods for multiple networks inference

Transcription:

CSC-272 Exam #2 March 20, 2015 Name Questions are weighted as indicated. Show your work and state your assumptions for partial credit consideration. Unless explicitly stated, there are NO intended errors and NO trick questions. If in doubt, ask! You have 50 minutes to work. Now, take a moment to relax. If you don't immediately see how to do something, THINK! Don't panic! Multiple Choice (2 points each): 1. Which of the following is true about data mining? a. simple algorithms sometimes work surprisingly well b. different approaches work better for different data c. successful data mining usually involves trying a number of approaches in a series of experiments d. all of the above e. none of the above 2. Which of these have the potential to result in overfitting? a. attributes with a large number of values in 1R b. inclusion of an identifying (ID) attribute in any algorithm c. too many rules in PRISM d. a decision tree with too many leaves e. redundant attributes in Naïve Bayes or nearest neighbor f. all of the above 3. Which of the following is true about the OneR algorithm? a. it considers exactly one attribute b. it chooses exactly one attribute to use in making predictions during tests c. it works especially well for attributes with many possible values d. all of the above 4. A nearest neighbor approach is best used a. with large-sized datasets b. when irrelevant attributes have been removed from the data c. when a generalized model of the data is desireable d. with noisy data

5. Which of the following is true of the Naïve Bayes algorithm? a. it considers exactly one attribute b. it cannot handle numeric values for input attributes c. it is able to make numeric predictions d. it easily accommodates missing data in training examples e. all of the above 6. Which statement is true about the decision tree attribute selection process described in lecture? a. a nominal attribute may appear in a tree node several times but a numeric attribute may appear at most once b. a numeric attribute may appear in several tree nodes but a nominal attribute may appear at most once c. both numeric and nominal attributes may appear in several tree nodes d. numeric and nominal attributes may appear in at most one tree node 7. Which of the following is true of the PRISM algorithm? a. it generates exactly one rule for every value of the class attribute b. it sometimes requires the use of the probability density function c. it is an example of a lazy classification algorithm d. it generates a rule by adding tests that maximize accuracy while reducing coverage 8. Eric Siegel hypothesizes that data mining was unable to predict the 2008 financial crisis because, among other reasons: a. instances of such rare events appear rarely in existing datasets b. data mining algorithms don t work well with financial data c. it could have, but no one was attempting to predict such an event d. insufficient data pertaining to individual investors has been collected

Fill in the Blank (2 points each blank): Use the following list of terms to fill in the blanks with the best possible term. More than one answer might be justifiable. Resulting sentences are not necessarily grammatically correct. ZeroR decision tree algorithm nearest neighbor OneR Naïve Bayes covering discretize bucket overfitting classification baseline instance-based association numeric estimation clustering rules a priori a posteriori correlation causation Anxiety Index probability density function Laplace estimator normal distribution redundant attributes rules goodness Euclidean distance lazy method normalization univariate model multivariate model ensemble learning doesn't work well if many of the input attributes have fewer possible values than the class/output attribute does. assumes that all attributes are statistically independent, whether or not they actually are. is used to solve the zero-frequency problem. Numerical values in a dataset must be neighbor algorithm can work correctly. PRISM is an example of a The nearest neighbor algorithm is an example of learning. before the nearest classification algorithm. classification If the accuracy of the algorithm is very high, you probably should find a dataset with more balance in class attribute values. is the result of data mining blog posts, and correlates with stock market performance.

Short Answer (8 points each): Give (relatively) short answers to the following questions. You must omit any one question by writing OMIT clearly in the space provided. 1. Consider the following excerpt of data from the Credit Card Promotion dataset. In this case the class attribute is Magazine, which indicates whether or not the credit card holder responded to a magazine promotion. Age: 19 27 29 35 38 39 40 41 42 43 43 43 45 55 55 57 Magazine: N Y N Y N Y Y N Y Y Y N N Y N N a. Discretize the Age attribute using a minimum bucket size of 3. Clearly show the resulting buckets. (Work from left to right.) b. Explain what would happen if you discretized Age with no minimum bucket size. 2. Consider the dataset on the last page of the exam, and the following new instance: buyingprice maintenanc person safet recommendatio e s y n med med 4 med??? How would the nearest-neighbor algorithm predict the recommendation, with k=1? Explain your answer. HINT: Since all of the attributes are nominal, I suggest you forego the Euclidean distance formula for the city-block metric. That is, you don t need to square anything or take the square root of anything.

3. Consider the dataset on the last page of the exam, and the following partial summary of 1R results: maintenance: high unacc (4 out of 5) med good * (2 out of 5) Overall: 7/12 = 58% low acc * (1 out of 2) persons: 2 unacc (3 out of 3) 4 acc (2 out of 5) Overall: 7/12 = 58% more unacc (2 out of 4) safety: high good * (2 out of 4) med acc (3 out of 5) Overall: 7/12 = 58% low unacc (2 out of 3) a. Show the rules that 1R generates for buying-price, and compute the overall accuracy as a percentage. b. Which attribute s rules will be selected by 1R, and why? 4. Using the results from problem #3, draw the first level (only) of a decision tree for the dataset. Show all of your work and carefully explain the decisions you make.

5. Consider the dataset on the last page of the exam, and the following partial set of possible tests in the first iteration of the PRISM algorithm: buying-price = high buying-price = low buying-price = med maintenance = high 0/5 maintenance = low 0/2 maintenance = med 2/5 persons = 2 0/3 persons = 4 1/5 persons = more 1/4 safety = high 2/4 safety = low 0/3 safety = med 0/5 Generate one rule for recommendation = good using PRISM. Start by completing the ratios for buying-price above. If the rule is complete, stop. If the rule is not complete, finish it. Show all of your work. Explain how you know your rule is finished. (For extra credit, also indicate whether or not another rule is necessary for recommendation = good, and explain.)

6. Eric Siegel introduces the concept of ensemble learning via the example of the Netflix Prize contest. a. Define ensemble learning in the context of data mining. b. How was it fostered by the Netflix contest? 7. The table below contains counts and ratios for a set of data instances to be used for Naïve Bayesian learning. The output attribute is sex with possible values male and female. Consider an individual who has said no to the life insurance promotion, yes to the magazine promotion, yes to the watch promotion and has credit card insurance. Use the values in the table together to determine the probability that this individual is male, and the accompanying probability that this individual is female. Give your final answers as normalized percentages and show all work. magazine promotion watch promotion life insurance promotion credit card insurance sex male female male female male female male female male female Yes 4 3 2 2 2 3 2 1 6 4 No 2 1 4 2 4 1 4 3 Yes 4/6 3 /4 2/6 2/4 2/6 3/4 2/6 1/4 6/10 4/10 No 2/6 1 /4 4/6 2/4 4/6 1/4 4/6 3/4 Probability the individual is male: Probability the individual is female:

ATTRIBUTES: POSSIBLE VALUES: buying-price {high,med,low} maintenance {high,med,low} persons {2,4,more} % Assumed to be a nominal attribute safety {low,med,high} recommendation {unacc,acc,good} % unacceptable, acceptable, good buyingprice maintenanc e person s safet y recommendatio n high med 4 high good low med 2 med unacc low high 2 high unacc low high more med unacc med high 4 low acc high high 4 med unacc med med more med acc med high more low unacc med low 4 med acc high med 4 low unacc low med more high good low low 2 high unacc