CSC-272 Exam #2 March 20, PDF Free Download

CSC-272 Exam #2 March 20, 2015 Name Questions are weighted as indicated. Show your work and state your assumptions for partial credit consideration. Unless explicitly stated, there are NO intended errors and NO trick questions. If in doubt, ask! You have 50 minutes to work. Now, take a moment to relax. If you don't immediately see how to do something, THINK! Don't panic! Multiple Choice (2 points each): 1. Which of the following is true about data mining? a. simple algorithms sometimes work surprisingly well b. different approaches work better for different data c. successful data mining usually involves trying a number of approaches in a series of experiments d. all of the above e. none of the above 2. Which of these have the potential to result in overfitting? a. attributes with a large number of values in 1R b. inclusion of an identifying (ID) attribute in any algorithm c. too many rules in PRISM d. a decision tree with too many leaves e. redundant attributes in Naïve Bayes or nearest neighbor f. all of the above 3. Which of the following is true about the OneR algorithm? a. it considers exactly one attribute b. it chooses exactly one attribute to use in making predictions during tests c. it works especially well for attributes with many possible values d. all of the above 4. A nearest neighbor approach is best used a. with large-sized datasets b. when irrelevant attributes have been removed from the data c. when a generalized model of the data is desireable d. with noisy data

5. Which of the following is true of the Naïve Bayes algorithm? a. it considers exactly one attribute b. it cannot handle numeric values for input attributes c. it is able to make numeric predictions d. it easily accommodates missing data in training examples e. all of the above 6. Which statement is true about the decision tree attribute selection process described in lecture? a. a nominal attribute may appear in a tree node several times but a numeric attribute may appear at most once b. a numeric attribute may appear in several tree nodes but a nominal attribute may appear at most once c. both numeric and nominal attributes may appear in several tree nodes d. numeric and nominal attributes may appear in at most one tree node 7. Which of the following is true of the PRISM algorithm? a. it generates exactly one rule for every value of the class attribute b. it sometimes requires the use of the probability density function c. it is an example of a lazy classification algorithm d. it generates a rule by adding tests that maximize accuracy while reducing coverage 8. Eric Siegel hypothesizes that data mining was unable to predict the 2008 financial crisis because, among other reasons: a. instances of such rare events appear rarely in existing datasets b. data mining algorithms don t work well with financial data c. it could have, but no one was attempting to predict such an event d. insufficient data pertaining to individual investors has been collected

Fill in the Blank (2 points each blank): Use the following list of terms to fill in the blanks with the best possible term. More than one answer might be justifiable. Resulting sentences are not necessarily grammatically correct. ZeroR decision tree algorithm nearest neighbor OneR Naïve Bayes covering discretize bucket overfitting classification baseline instance-based association numeric estimation clustering rules a priori a posteriori correlation causation Anxiety Index probability density function Laplace estimator normal distribution redundant attributes rules goodness Euclidean distance lazy method normalization univariate model multivariate model ensemble learning doesn't work well if many of the input attributes have fewer possible values than the class/output attribute does. assumes that all attributes are statistically independent, whether or not they actually are. is used to solve the zero-frequency problem. Numerical values in a dataset must be neighbor algorithm can work correctly. PRISM is an example of a The nearest neighbor algorithm is an example of learning. before the nearest classification algorithm. classification If the accuracy of the algorithm is very high, you probably should find a dataset with more balance in class attribute values. is the result of data mining blog posts, and correlates with stock market performance.

Short Answer (8 points each): Give (relatively) short answers to the following questions. You must omit any one question by writing OMIT clearly in the space provided. 1. Consider the following excerpt of data from the Credit Card Promotion dataset. In this case the class attribute is Magazine, which indicates whether or not the credit card holder responded to a magazine promotion. Age: 19 27 29 35 38 39 40 41 42 43 43 43 45 55 55 57 Magazine: N Y N Y N Y Y N Y Y Y N N Y N N a. Discretize the Age attribute using a minimum bucket size of 3. Clearly show the resulting buckets. (Work from left to right.) b. Explain what would happen if you discretized Age with no minimum bucket size. 2. Consider the dataset on the last page of the exam, and the following new instance: buyingprice maintenanc person safet recommendatio e s y n med med 4 med??? How would the nearest-neighbor algorithm predict the recommendation, with k=1? Explain your answer. HINT: Since all of the attributes are nominal, I suggest you forego the Euclidean distance formula for the city-block metric. That is, you don t need to square anything or take the square root of anything.

3. Consider the dataset on the last page of the exam, and the following partial summary of 1R results: maintenance: high unacc (4 out of 5) med good * (2 out of 5) Overall: 7/12 = 58% low acc * (1 out of 2) persons: 2 unacc (3 out of 3) 4 acc (2 out of 5) Overall: 7/12 = 58% more unacc (2 out of 4) safety: high good * (2 out of 4) med acc (3 out of 5) Overall: 7/12 = 58% low unacc (2 out of 3) a. Show the rules that 1R generates for buying-price, and compute the overall accuracy as a percentage. b. Which attribute s rules will be selected by 1R, and why? 4. Using the results from problem #3, draw the first level (only) of a decision tree for the dataset. Show all of your work and carefully explain the decisions you make.

5. Consider the dataset on the last page of the exam, and the following partial set of possible tests in the first iteration of the PRISM algorithm: buying-price = high buying-price = low buying-price = med maintenance = high 0/5 maintenance = low 0/2 maintenance = med 2/5 persons = 2 0/3 persons = 4 1/5 persons = more 1/4 safety = high 2/4 safety = low 0/3 safety = med 0/5 Generate one rule for recommendation = good using PRISM. Start by completing the ratios for buying-price above. If the rule is complete, stop. If the rule is not complete, finish it. Show all of your work. Explain how you know your rule is finished. (For extra credit, also indicate whether or not another rule is necessary for recommendation = good, and explain.)

6. Eric Siegel introduces the concept of ensemble learning via the example of the Netflix Prize contest. a. Define ensemble learning in the context of data mining. b. How was it fostered by the Netflix contest? 7. The table below contains counts and ratios for a set of data instances to be used for Naïve Bayesian learning. The output attribute is sex with possible values male and female. Consider an individual who has said no to the life insurance promotion, yes to the magazine promotion, yes to the watch promotion and has credit card insurance. Use the values in the table together to determine the probability that this individual is male, and the accompanying probability that this individual is female. Give your final answers as normalized percentages and show all work. magazine promotion watch promotion life insurance promotion credit card insurance sex male female male female male female male female male female Yes 4 3 2 2 2 3 2 1 6 4 No 2 1 4 2 4 1 4 3 Yes 4/6 3 /4 2/6 2/4 2/6 3/4 2/6 1/4 6/10 4/10 No 2/6 1 /4 4/6 2/4 4/6 1/4 4/6 3/4 Probability the individual is male: Probability the individual is female:

ATTRIBUTES: POSSIBLE VALUES: buying-price {high,med,low} maintenance {high,med,low} persons {2,4,more} % Assumed to be a nominal attribute safety {low,med,high} recommendation {unacc,acc,good} % unacceptable, acceptable, good buyingprice maintenanc e person s safet y recommendatio n high med 4 high good low med 2 med unacc low high 2 high unacc low high more med unacc med high 4 low acc high high 4 med unacc med med more med acc med high more low unacc med low 4 med acc high med 4 low unacc low med more high good low low 2 high unacc