Data Mining Midterm Exam

Similar documents
Assignment 1: Predicting Amazon Review Ratings

CS Machine Learning

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

A Case Study: News Classification Based on Term Frequency

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Python Machine Learning

Reducing Features to Improve Bug Prediction

Go fishing! Responsibility judgments when cooperation breaks down

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Multivariate k-nearest Neighbor Regression for Time Series data -

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Speech Emotion Recognition Using Support Vector Machine

K-Medoid Algorithm in Clustering Student Scholarship Applicants

Algebra 2- Semester 2 Review

Functional English 47251

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Detailed course syllabus

Functional Skills Mathematics Level 2 assessment

Human Emotion Recognition From Speech

Lecture 1: Basic Concepts of Machine Learning

Introduction to Causal Inference. Problem Set 1. Required Problems

B.S/M.A in Mathematics

Data Fusion Through Statistical Matching

(Sub)Gradient Descent

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Using Proportions to Solve Percentage Problems I

Parent Teacher Association Constitution

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Axiom 2013 Team Description Paper

School of Innovative Technologies and Engineering

CSL465/603 - Machine Learning

STAT 220 Midterm Exam, Friday, Feb. 24

Rule Learning With Negation: Issues Regarding Effectiveness

We endorse the aims and objectives of the primary curriculum for SPHE: To promote the personal development and well-being of the child

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

KeyTrain Level 7. For. Level 7. Published by SAI Interactive, Inc., 340 Frazier Avenue, Chattanooga, TN

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Reinforcement Learning by Comparing Immediate Reward

Virginia Commonwealth University Retrospective Concussion Diagnostic Interview - Blast. (dd mmm yyyy)

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

American Association of University Women Manhattan Branch KSU Scholarship Fund

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Rule Learning with Negation: Issues Regarding Effectiveness

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Measuring physical factors in the environment

CS 446: Machine Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Truth Inference in Crowdsourcing: Is the Problem Solved?

Generative models and adversarial training

An Introduction to Simio for Beginners

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Universidade do Minho Escola de Engenharia

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Probability and Game Theory Course Syllabus

Mathematics subject curriculum

Why Did My Detector Do That?!

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

The Federal Reserve Bank of New York

Cal s Dinner Card Deals

How to Prepare for the Growing Price Tag

Grammar Lesson Plan: Yes/No Questions with No Overt Auxiliary Verbs

Learning Methods in Multilingual Speech Recognition

ACCT 100 Introduction to Accounting Course Syllabus Course # on T Th 12:30 1:45 Spring, 2016: Debra L. Schmidt-Johnson, CPA

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Building People. Building Nations. GUIDELINES for the interpretation of Kenyan school reports

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

Bellevue University Bellevue, NE

Improving recruitment, hiring, and retention practices for VA psychologists: An analysis of the benefits of Title 38

Mining Student Evolution Using Associative Classification and Clustering

Artificial Neural Networks written examination

Foothill College Summer 2016

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Applications of data mining algorithms to analysis of medical data

Learning From the Past with Experiment Databases

Lucy Calkins Units of Study 3-5 Heinemann Books Support Document. Designed to support the implementation of the Lucy Calkins Curriculum

Guide to the Uniform mark scale (UMS) Uniform marks in A-level and GCSE exams

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

LIM College New York, NY

DOCTOR OF PHILOSOPHY HANDBOOK

Unit 3: Lesson 1 Decimals as Equal Divisions

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

Probability and Statistics Curriculum Pacing Guide

Transcription:

Data Mining Midterm Exam 10.04.2014 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. The duration of the whole mid-term exam is 1 hour and 30 minutes. This is a closed book exam. The only resources allowed to use are blank paper, pens, and your head. Good luck! Reserved for the Teacher Max. points 15 Points

Multiple Choice Questions (4 points) 1. Assume that one of the attributes describing students is Exam which can take one of two values pass or fail. If we want to put more emphasis on students who passed the exam when analyzing the data, what should be the type of the Exam attribute: symmetric binary asymmetric binary nominal 2. Normalization is used when attribute values are measured on different scales to convert them to a common scale. The goal is to avoid that an attribute dominants some other attributes when computing similarity between data objects. For example, the age attribute goes from 0 to 100 while incomes are in the order of thousands of Euros. What is the problem that you might encounter doing normalization? sensitivity to outliers biased similarity measures unbounded values 3. Imagine you are working in a bank and you are asked to manage loan applications. Your task is to select the criteria that the financial committee should take into account to make the decision about approving the loan or not. By analyzing past loan applications, you describe each applicant by a set of features and assign to him/her a class label successful or failed. Which of the following techniques help you to achieve your task? Bayesian classifiers Nearest Neighbor classifiers Decision trees 4. Imagine you are responsible for fixing the price of a new mid-class Italian product in the Chinese market. The first thing you need to do is to analyze the average amount that Chinese costumers spend for similar products. However, the data about such costumers is huge, so you need to take your decision about the price based on some samples. The price you decide would be more suitable when: the sample contains costumers with representative incomes the sample does not contain costumers of high quality products and costumers of low quality products the sample contains costumers with representative spending behaviors

Classification Algorithms (5 points) 1. Briefly, what are the main steps to build a Bayesian Network? 2. In decision trees, attribute selection techniques decide the goodness of a nominal attribute based on the purity of its relevant partitions. Each partition contains only tuples that have the same value of that attribute. Explain how attribute selection techniques deal with numeric attributes? 3. Some nonlinear regression models can be converted to linear models by applying transformations to the predictor variables. Show how the nonlinear regression equation y = αx β can be converted to a linear regression equation solvable by the method of least squares 4. Given the data in the table below, and given a data tuple having the values systems, 26...30, and 4650K for the attributes department, age, and salary, respectively, what would a Naive Bayesian classification of the status for the tuple be? department status age salary count sales senior 31...35 46K...50K 30 sales junior 26...30 26K..30K 40 sales junior 31..35 31K...35K 40 systems junior 21...25 46K...50K 20 systems senior 31...35 66K...70K 5 systems junior 26...30 46K...50K 3 systems senior 41...45 66K...70K 3 marketing senior 36...40 46K...50K 10 marketing junior 31...35 41K...45K 4 secretary senior 46...50 36K...40K 4 secretary junior 26...30 26K...30K 6

Problem Solving (6 points) 1. Assume we have two classes: positive opinion and negative opinion. (a) Describe how would you classify a sentence into these two classes. (b) Based on your approach, how would you classify the following sentences: Sentence1: the company is doing a good job by hiring people with excellent competencies Sentence2: employees are competent but they work in bad conditions which often leads them to depression (c) Do encounter any problem with the following sentences? Sentence3: Marco is very serious in his work Sentence4: Maria s health condition is serious

2. Assume the Weather changes from sunny to rainy in a random way. (a) Draw a Markov Model that represents the situation and give the prior distribution on the states as well as the transition matrix. (b) You observe that 70% of people are in a good mood when it is sunny and 40% of people are in a bad mood when it is raining. Assuming that the current mood state does not depend on the previous mood state, how would you update your model to capture mood change. (c) Draw the corresponding Bayesian network for the first three days.