# PRESENTATION TITLE. A Two-Step Data Mining Approach for Graduation Outcomes CAIR Conference

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 PRESENTATION TITLE A Two-Step Data Mining Approach for Graduation Outcomes 2013 CAIR Conference Afshin Karimi Ed Sullivan James Hershey Sunny Moon November 21, 2013

2 Data Mining Science of extracting patterns and knowledge from large data sets to predict future trends and behavior. o Supervised Learning o Unsupervised Learning

3 Two Step Process Classification decision tree model to predict six-year graduation of FTF (supervised learning) Cluster analysis (K-Means clustering) on the identified at-risk students to reveal patterns and suggest cluster-level intervention (unsupervised learning)

4 Classification Model Using Decision Tree Decision Tree vs. Neural Networks, Logistic Regression, SVM, etc. Decision trees are easy to understand, implement, and visualize

5 Decision Trees Continued Used in different disciplines including Operations Research Inverted trees with root at the top; used to create model that predicts target variable Generated by recursive partitioning An example of node selection criteria is Information Gain (C5.0) that selects node variable with least entropy with respect to target variable

6 Example decision tree Play tennis or not? (depending on weather conditions) Each branch corresponds to an attribute value Outlook Sunny Overcast Rainy Each internal node tests an attribute Humidity Yes Wind High Normal Strong Weak No Yes No Yes Each leaf assigns a classification Example taken from Kurt Driessens slides

7 Overfitting Generated decision tree relies too much on irrelevant feature of training data. The generated model performs poorly on future/unseen data. To reduce overfitting, use pruning (technique in which leaf nodes that do not add to the discriminative power of the decision tree are removed)

8 Training/Building the Tree Using 24 predictor variables: 12 socio-economic, demographics, HS performance variables 12 first term college variables All converted to nominal variables 1 target variable: 6 Yr Degree (with Yes/No values) Using the fall 03, 04, 05, 06 FTF cohorts for training

9 Predictor Variables Gender Under-Represented Status Residence (county) Parents Education HS GPA # of College Prep Math Courses Passed in HS # of College Prep Science Courses Passed in HS # of College Prep Social Science Courses Passed in HS # of College Prep Art Courses passed in HS SAT Math SAT Verb Prior Institution Type Admission Basis Code Pell Grant Recepient Freshman Program Participation College (Entry) Entry Level Math Proficiency English Proficiency Degree-Applicable Units Earned in First Semester F,D or WU Grade in 1st Semester First Term GPA Math Course (1st term) English Course (1st term)

10 Model Validation & Testing Total of 14,152 records from fall 03, 04, 05, 06 cohorts (missing HS GPAs, SATs excluded) for model training Random 1,000 records removed and set aside for future testing Remaining 13,152 records used for training/validation using a 5-fold cross validation

11 5-Fold Cross Validation 2,630 records 10,522 records

12 5-Fold Cross Validation 2,630 records 10,522 records

13 5-Fold Cross Validation 2,630 records 10,522 records

14 5-Fold Cross Validation 10,522 records 2,630records

15 5-Fold Cross Validation 10,522 records 2,630records

16 Model s Accuracy Classification accuracy is the average accuracy of the 5 runs: Classification Accuracy: 66.4% Sensitivity (true positive rate): 72.4% Specificity (true negative rate): 60.3%

17 RapidMiner 5.0

18

19 Relevance (weights) of the variables on the Information Gain Ratio Variable Weight (normalized) F,D or WU Grade in 1st Semester Degree-Applicable Units Earned in First Semester First Term GPA Math Course (1st term) Admission Basis Code HS GPA 0.01 Gender Freshman Program Participation Entry Level Math Proficiency English Course (1st term) Under-represented Status # of College Prep Math Courses Passed in HS English Proficiency College (entry) Parents Education SAT Verbal Pell Grant Recepient SAT Math Prior Institution Type Residence (county) # of College Prep Social Science Courses Passed in HS # of College Prep Science Courses Passed in HS # of College Prep Art Courses Passed in HS 0.001

20 Generated Tree

21 Testing Tested the model using the 1,000 records that were NOT used in building the model. Also, later (when summer 13 degrees were posted) tested the model using the Fall 07 cohort

22 Testing with Fall 07 FTF Cohort (Sept 13) Model predicts 1,717 (out of 4,026) students not to graduate in 6 years Model s classification accuracy: 68% ( )/4026 sensitivity: 1567/2101 = 75% specificity: 1183/1925 = 61% Top half of predicted non-graduates predicted with 82% accuracy

23 Clustering Place these 859 students who were predicted not to graduate in clusters such that: Students in each cluster are as similar as possible (based on their HS and 1 st term college academic performances) and Clusters are as different from each other as possible (again based on students HS and 1 st -term college academic performances)

24 K-Means Clustering-Using Mixed Euclidean Distance (both numeric and nominal variables) Focus is on the HS to college transition Variables used (only academic performance precollege and 1 st term): HS GPA SAT Verb SAT Math Number of degree-applicable units earned in 1 st term Number of F, D, WU or NC grades in 1 st term 1 st term type of math course passed/failed

25 Clusters Centroid Plot

26 Clusters Analysis Cluster N High School GPA SAT Math SAT Verb Degreeapplicable Units Earned # of F, D, WU or NC grades Mean σ Mean σ Mean σ Mean σ Mean σ

27 Clusters Analysis Continued Cluster 1st Term Math Course Outcome Failed Remedial Failed GE Passed Remedial Passed Math Math Math GE Math None 0 20% 57% 16% 6% 2% 1 15% 45% 29% 6% 5% 2 18% 30% 29% 20% 3%

28 Cluster 0 (The Un-motivated) HS GPA 2.8 SAT Math 493, SAT Verb st term college: Earned 1.6 degree-applicable units # of F, D, WU or NC grades: % took & failed GE math, 20% took and failed remedial math 1 st term GPA: 0.58 Mostly men (59% men, 41% women) College of major group mode: hierarchical, followed by semi-hierarchical Benefits from (Probation) Advisement Cluster 2 (The Slow Starters) HS GPA 2.9 SAT Math 471, SAT Verb st term college: Earned 6.3 degree-applicable units # of F, D, WU or NC grades: % took & failed GE math, 30% took and passed remedial math 1 st term GPA: 1.63 Mostly women (47% men, 53% women) College of major group mode: semi-hierarchical, followed by non-hierarchical Benefits from Academic Support

29 Cluster 1 (The Disconnected) HS GPA: 3.4 (above avg. HS GPA of fall 07 incoming freshmen) SAT Math 472, SAT Verb st term college: Earned 2.4 degree-applicable units # of F, D, WU or NC grades: % took & failed GE math, 29% took and passed remedial math 1 st term GPA: 0.83 Largely 1 st generation college students (40.4%) Majority underrepresented students (55.3%) Majority from outside local area high schools (57%) Mostly Women (36% men, 64% women) Benefits from Practices that Promote Campus Engagement, Early Warning System

30 Summary Predictive model for early identification of at-risk students using early indicators (not past 1 st term in college) Provides insight into clusters of at-risk students; suggests cluster-level intervention Don t need expertise in machine learning, AI, statistics (data mining tools handle algorithms) Need to know the data intimately (data compilation & preparation most critical, most time-consuming)

### Machine Learning. June 22, 2006 CS 486/686 University of Waterloo

Machine Learning June 22, 2006 CS 486/686 University of Waterloo Outline Inductive learning Decision trees Reading: R&N Ch 18.1-18.3 CS486/686 Lecture Slides (c) 2006 K.Larson and P. Poupart 2 What is

### Machine Learning B, Fall 2016

Machine Learning 10-601 B, Fall 2016 Decision Trees (Summary) Lecture 2, 08/31/ 2016 Maria-Florina (Nina) Balcan Learning Decision Trees. Supervised Classification. Useful Readings: Mitchell, Chapter 3

### CSC 4510/9010: Applied Machine Learning Rule Inference

CSC 4510/9010: Applied Machine Learning Rule Inference Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 CSC 4510.9010 Spring 2015. Paula Matuszek 1 Red Tape Going

### Introduction to Machine Learning

Introduction to Machine Learning D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 April 6, 2009 Outline Outline Introduction to Machine Learning Outline Outline Introduction to Machine Learning

### Decision Tree for Playing Tennis

Decision Tree Decision Tree for Playing Tennis (outlook=sunny, wind=strong, humidity=normal,? ) DT for prediction C-section risks Characteristics of Decision Trees Decision trees have many appealing properties

### ECT7110 Classification Decision Trees. Prof. Wai Lam

ECT7110 Classification Decision Trees Prof. Wai Lam Classification and Decision Tree What is classification? What is prediction? Issues regarding classification and prediction Classification by decision

### CSC 4510/9010: Applied Machine Learning. Rule Inference. Dr. Paula Matuszek

CSC 4510/9010: Applied Machine Learning 1 Rule Inference Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 Classification rules Popular alternative to decision trees

### CLASSIFICATION: DECISION TREES

CLASSIFICATION: DECISION TREES Gökhan Akçapınar (gokhana@hacettepe.edu.tr) Seminar in Methodology and Statistics John Nerbonne, Çağrı Çöltekin University of Groningen May, 2012 Outline Research question

### Rule Learning (1): Classification Rules

14s1: COMP9417 Machine Learning and Data Mining Rule Learning (1): Classification Rules March 19, 2014 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill,

### Inductive Learning and Decision Trees

Inductive Learning and Decision Trees Doug Downey EECS 349 Spring 2017 with slides from Pedro Domingos, Bryan Pardo Outline Announcements Homework #1 was assigned on Monday (due in five days!) Inductive

### Decision Tree For Playing Tennis

Decision Tree For Playing Tennis ROOT NODE BRANCH INTERNAL NODE LEAF NODE Disjunction of conjunctions Another Perspective of a Decision Tree Model Age 60 40 20 NoDefault NoDefault + + NoDefault Default

### Let the data speak: Machine Learning methods for data editing and imputation. Paper by: Felibel Zabala Presented by: Amanda Hughes

Let the data speak: Machine Learning methods for data editing and imputation Paper by: Felibel Zabala Presented by: Amanda Hughes September 2015 Objective Machine Learning (ML) methods can be used to help

### Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. January 11, 2011

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 11, 2011 Today: What is machine learning? Decision tree learning Course logistics Readings: The Discipline

### COMP 551 Applied Machine Learning Lecture 11: Ensemble learning

COMP 551 Applied Machine Learning Lecture 11: Ensemble learning Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp551

### 18 LEARNING FROM EXAMPLES

18 LEARNING FROM EXAMPLES An intelligent agent may have to learn, for instance, the following components: A direct mapping from conditions on the current state to actions A means to infer relevant properties

### Inductive Learning and Decision Trees

Inductive Learning and Decision Trees Doug Downey EECS 349 Winter 2014 with slides from Pedro Domingos, Bryan Pardo Outline Announcements Homework #1 assigned Have you completed it? Inductive learning

### Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. January 12, 2015

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 12, 2015 Today: What is machine learning? Decision tree learning Course logistics Readings: The Discipline

### IAI : Machine Learning

IAI : Machine Learning John A. Bullinaria, 2005 1. What is Machine Learning? 2. The Need for Learning 3. Learning in Neural and Evolutionary Systems 4. Problems Facing Expert Systems 5. Learning in Rule

### Unsupervised Learning

09s1: COMP9417 Machine Learning and Data Mining Unsupervised Learning June 3, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997 http://www-2.cs.cmu.edu/~tom/mlbook.html

### Supervised learning can be done by choosing the hypothesis that is most probable given the data: = arg max ) = arg max

The learning problem is called realizable if the hypothesis space contains the true function; otherwise it is unrealizable On the other hand, in the name of better generalization ability it may be sensible

### P(A, B) = P(A B) = P(A) + P(B) - P(A B)

AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) P(A B) = P(A) + P(B) - P(A B) Area = Probability of Event AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) If, and only if, A and B are independent,

### Introduction to Classification

Introduction to Classification Classification: Definition Given a collection of examples (training set ) Each example is represented by a set of features, sometimes called attributes Each example is to

### Deriving Decision Trees from Case Data

Topic 4 Automatic Kwledge Acquisition PART II Contents 5.1 The Bottleneck of Kwledge Aquisition 5.2 Inductive Learning: Decision Trees 5.3 Converting Decision Trees into Rules 5.4 Generating Decision Trees:

### Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA

Adult Income and Letter Recognition - Supervised Learning Report An objective look at classifier performance for predicting adult income and Letter Recognition Dudon Wai Georgia Institute of Technology

### COMP 551 Applied Machine Learning Lecture 12: Ensemble learning

COMP 551 Applied Machine Learning Lecture 12: Ensemble learning Associate Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

### A Review on Classification Techniques in Machine Learning

A Review on Classification Techniques in Machine Learning R. Vijaya Kumar Reddy 1, Dr. U. Ravi Babu 2 1 Research Scholar, Dept. of. CSE, Acharya Nagarjuna University, Guntur, (India) 2 Principal, DRK College

### Machine Learning. Announcements (7/15) Announcements (7/16) Comments on the Midterm. Agents that Learn. Agents that Don t Learn

Machine Learning Burr H. Settles CS540, UWMadison www.cs.wisc.edu/~cs5401 Summer 2003 Announcements (7/15) If you haven t already, read Sections 18.118.3 in AI: A Modern Approach Homework #3 due tomorrow

### Predicting Student Performance by Using Data Mining Methods for Classification

BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

### Course 395: Machine Learning Lectures

Course 395: Machine Learning Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic) Lecture 5-6: Artificial Neural Networks (S. Zafeiriou) Lecture 7-8: Instance

### Machine Learning :: Introduction. Konstantin Tretyakov

Machine Learning :: Introduction Konstantin Tretyakov (kt@ut.ee) MTAT.03.183 Data Mining November 5, 2009 So far Data mining as knowledge discovery Frequent itemsets Descriptive analysis Clustering Seriation

### Course 395: Machine Learning Lectures

Course 395: Machine Learning Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic) Lecture 5-6: Artificial Neural Networks (THs) Lecture 7-8: Instance Based

### Access Center Assessment Report

Access Center Assessment Report The purpose of this report is to provide a description of the demographics as well as higher education access and success of Access Center students at CSU. College access

### Introduction to Classification, aka Machine Learning

Introduction to Classification, aka Machine Learning Classification: Definition Given a collection of examples (training set ) Each example is represented by a set of features, sometimes called attributes

### Machine Learning for NLP

Natural Language Processing SoSe 2014 Machine Learning for NLP Dr. Mariana Neves April 30th, 2014 (based on the slides of Dr. Saeedeh Momtazi) Introduction Field of study that gives computers the ability

### Lahore University of Management Sciences. DISC 420 Business Analytics Fall Semester 2017

DISC 420 Business Analytics Fall Semester 2017 Instructors Zainab Riaz Room No. SDSB 4 38 Office Hours TBA Email zainab.riaz@lums.edu.pk Telephone 5130 Secretary/TA Sec: Muhammad Umer Manzoor, TA: TBA

### Ensemble Learning CS534

Ensemble Learning CS534 Ensemble Learning How to generate ensembles? There have been a wide range of methods developed We will study to popular approaches Bagging Boosting Both methods take a single (base)

### A Survey on Hoeffding Tree Stream Data Classification Algorithms

CPUH-Research Journal: 2015, 1(2), 28-32 ISSN (Online): 2455-6076 http://www.cpuh.in/academics/academic_journals.php A Survey on Hoeffding Tree Stream Data Classification Algorithms Arvind Kumar 1*, Parminder

### A Combination of Decision Trees and Instance-Based Learning Master s Scholarly Paper Peter Fontana,

A Combination of Decision s and Instance-Based Learning Master s Scholarly Paper Peter Fontana, pfontana@cs.umd.edu March 21, 2008 Abstract People are interested in developing a machine learning algorithm

### A Practical Tour of Ensemble (Machine) Learning

A Practical Tour of Ensemble (Machine) Learning Nima Hejazi Evan Muzzall Division of Biostatistics, University of California, Berkeley D-Lab, University of California, Berkeley slides: https://googl/wwaqc

### Unsupervised Learning

17s1: COMP9417 Machine Learning and Data Mining Unsupervised Learning May 2, 2017 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997 http://www-2.cs.cmu.edu/~tom/mlbook.html

### Machine Learning and Auto-Evaluation

Machine Learning and Auto-Evaluation In very simple terms, Machine Learning is about training or teaching computers to take decisions or actions without explicitly programming them. For example, whenever

### Evaluation and Comparison of Performance of different Classifiers

Evaluation and Comparison of Performance of different Classifiers Bhavana Kumari 1, Vishal Shrivastava 2 ACE&IT, Jaipur Abstract:- Many companies like insurance, credit card, bank, retail industry require

### Predicting Student Academic Performance at Degree Level: A Case Study

I.J. Intelligent Systems and Applications, 2015, 01, 49-61 Published Online December 2014 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2015.01.05 Predicting Student Academic Performance at Degree

### Performance Analysis of Various Data Mining Techniques on Banknote Authentication

International Journal of Engineering Science Invention ISSN (Online): 2319 6734, ISSN (Print): 2319 6726 Volume 5 Issue 2 February 2016 PP.62-71 Performance Analysis of Various Data Mining Techniques on

### Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 15th, 2018

Data Mining CS573 Purdue University Bruno Ribeiro February 15th, 218 1 Today s Goal Ensemble Methods Supervised Methods Meta-learners Unsupervised Methods 215 Bruno Ribeiro Understanding Ensembles The

### The Health Economics and Outcomes Research Applications and Valuation of Digital Health Technologies and Machine Learning

The Health Economics and Outcomes Research Applications and Valuation of Digital Health Technologies and Machine Learning Workshop W29 - Session V 3:00 4:00pm May 25, 2016 ISPOR 21 st Annual International

### The Current State of Retention & Graduation Rates at the University of Colorado Boulder

The Current State of Retention & Graduation Rates at the University of Colorado Boulder Presentation to the Arts & Sciences Council 2 1 3 100 95 90 85 80 75 70 65 60 55 50 4 2 A&S: Graduation Rate at 6th

### Towards Freshman Retention Prediction: A Comparative Study

Towards Freshman Retention Prediction: A Comparative Study Admir Djulovic and Dan Li Abstract The objective of this research is to employ data mining tools and techniques on student enrollment data to

### Classifying Breast Cancer By Using Decision Tree Algorithms

Classifying Breast Cancer By Using Decision Tree Algorithms Nusaibah AL-SALIHY, Turgay IBRIKCI (Presenter) Cukurova University, TURKEY What Is A Decision Tree? Why A Decision Tree? Why Decision TreeClassification?

### Predicting Student Retention and Academic Success at New Mexico Tech

Predicting Student Retention and Academic Success at New Mexico Tech by Julie Luna Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Mathematics with Operations

### TOWARDS DATA-DRIVEN AUTONOMICS IN DATA CENTERS

TOWARDS DATA-DRIVEN AUTONOMICS IN DATA CENTERS ALINA SIRBU, OZALP BABAOGLU SUMMARIZED BY ARDA GUMUSALAN MOTIVATION 2 MOTIVATION Human-interaction-dependent data centers are not sustainable for future data

### Towards semantics-enabled infrastructure for knowledge acquisition from distributed data

Towards semantics-enabled infrastructure for knowledge acquisition from distributed data Vasant Honavar and Doina Caragea Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology

### Stanford NLP. Evan Jaffe and Evan Kozliner

Stanford NLP Evan Jaffe and Evan Kozliner Some Notable Researchers Chris Manning Statistical NLP, Natural Language Understanding and Deep Learning Dan Jurafsky sciences Percy Liang Natural Language Understanding,

### Ensemble Learning CS534

Ensemble Learning CS534 Ensemble Learning How to generate ensembles? There have been a wide range of methods developed We will study some popular approaches Bagging ( and Random Forest, a variant that

### Welcome to CMPS 142: Machine Learning. Administrivia. Lecture Slides for. Instructor: David Helmbold,

Welcome to CMPS 142: Machine Learning Instructor: David Helmbold, dph@soe.ucsc.edu Web page: www.soe.ucsc.edu/classes/cmps142/winter07/ Text: Introduction to Machine Learning, Alpaydin Administrivia Sign

### Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 11: 21 May 2012 Unsupervised Learning (cont ) Slides

### A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and Planning Overview Motivation for Analyses Analyses and

### Machine Learning with MATLAB Antti Löytynoja Application Engineer

Machine Learning with MATLAB Antti Löytynoja Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB MATLAB as an interactive

### MACHINE LEARNING WITH SAS

This webinar will be recorded. Please engage, use the Questions function during the presentation! MACHINE LEARNING WITH SAS SAS NORDIC FANS WEBINAR 21. MARCH 2017 Gert Nissen Technical Client Manager Georg

### CSC-272 Exam #2 March 20, 2015

CSC-272 Exam #2 March 20, 2015 Name Questions are weighted as indicated. Show your work and state your assumptions for partial credit consideration. Unless explicitly stated, there are NO intended errors

### Statistics for Risk Modeling Exam September 2018

Statistics for Risk Modeling Exam September 2018 IMPORTANT NOTICE This version of the syllabus is final, though minor changes may occur. This March 2018 version includes updates to this page and to the

### Overview COEN 296 Topics in Computer Engineering Introduction to Pattern Recognition and Data Mining Course Goals Syllabus

Overview COEN 296 Topics in Computer Engineering to Pattern Recognition and Data Mining Instructor: Dr. Giovanni Seni G.Seni@ieee.org Department of Computer Engineering Santa Clara University Course Goals

### Machine Learning L, T, P, J, C 2,0,2,4,4

Subject Code: Objective Expected Outcomes Machine Learning L, T, P, J, C 2,0,2,4,4 It introduces theoretical foundations, algorithms, methodologies, and applications of Machine Learning and also provide

### Data Mining CAP

Data Mining CAP 5771-001 Administrative Details The text is a high-level overview of data mining. You can supplement this by papers from the bibliography available on the Web. They will provide some details.

### Getting started with Weka. Yishuang Geng, Kexin Shi, Pei Zhang, Angel Trifonov, Jiefeng He, Xiaolu Xiong

Getting started with Weka Yishuang Geng, Kexin Shi, Pei Zhang, Angel Trifonov, Jiefeng He, Xiaolu Xiong Lesson 1.1 - Introduction Purpose of this course Take the mystery out of data mining. How to use

### Analysis of DFW Rates for the Fall Spring 2017 CSU Chico Classes

Analysis of DFW Rates for the Fall 2013- Spring 2017 CSU Chico Classes Jeff Bell Institutional Research Faculty Fellow Poor grades, especially in lower division courses (LD), are responsible for the Underrepresented

### Session 1: Gesture Recognition & Machine Learning Fundamentals

IAP Gesture Recognition Workshop Session 1: Gesture Recognition & Machine Learning Fundamentals Nicholas Gillian Responsive Environments, MIT Media Lab Tuesday 8th January, 2013 My Research My Research

### Assignment 6 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

Assignment 6 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Assume that you are given a data set and a neural network model trained on the data set. You are asked to build a decision tree

### IS 665: Data Analysis for Information Systems

New Jersey Institute of Technology College of Computing Sciences IS 665: Data Analysis for Information Systems Course Syllabus Summer 2016 Instructor: Dr. Lin Lin Office: 5600A Guttenberg Information Technology

### Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

### White Paper. Using Sentiment Analysis for Gaining Actionable Insights

corevalue.net info@corevalue.net White Paper Using Sentiment Analysis for Gaining Actionable Insights Sentiment analysis is a growing business trend that allows companies to better understand their brand,

### IS 665: Data Analysis for Information Systems

New Jersey Institute of Technology College of Computing Sciences IS 665: Data Analysis for Information Systems Course Syllabus Spring 2017 Instructor: Dr. Lin Lin Office: 5600A Guttenberg Information Technology

### Analysis of Different Classifiers for Medical Dataset using Various Measures

Analysis of Different for Medical Dataset using Various Measures Payal Dhakate ME Student, Pune, India. K. Rajeswari Associate Professor Pune,India Deepa Abin Assistant Professor, Pune, India ABSTRACT

### Retention of Fall 2009 First-Time Freshmen Executive Summary

Retention of Fall 2009 First-Time Freshmen Executive Summary Data show that students who experience academic success their first semester of college have higher persistence. Students who are more prepared

### Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 1 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 1 of Data Mining by I. H. Witten and E. Frank What s it all about Data vs information Data mining and machine learning Structural

### Statistics Short Courses Faculty of Health, Arts and Design

SEMESTER 2, 2017 Statistics Short Courses Faculty of Health, Arts and Design Online quizzes are available for each course. To pass the course you are expected to attend most of the classes and pass the

### INTRODUCTION TO DATA SCIENCE

DATA11001 INTRODUCTION TO DATA SCIENCE EPISODE 6: MACHINE LEARNING TODAY S MENU 1. WHAT IS ML? 2. CLASSIFICATION AND REGRESSSION 3. EVALUATING PERFORMANCE & OVERFITTING WHAT IS MACHINE LEARNING? Definition:

### English Placement Models for the Multiple Measures Assessment Project Phase 2

ASSESS FOR STUDENT SUCCESS PROFESSIONAL DEVELOPMENT English Placement Models for the Multiple Measures Assessment Project Phase 2 Revised November 2016 MMAP Research Team EDUCATIONAL RESULTS PARTNERSHIP

### Python Machine Learning

Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

### MASTER THESIS AUTOMATIC ESSAY SCORING: MACHINE LEARNING MEETS APPLIED LINGUISTICS. Victor Dias de Oliveira Santos July, 2011

1 MASTER THESIS AUTOMATIC ESSAY SCORING: MACHINE LEARNING MEETS APPLIED LINGUISTICS Victor Dias de Oliveira Santos July, 2011 European Masters in Language and Communication Technologies Supervisors: Prof.

### Introduction to Machine Learning Reykjavík University Spring Instructor: Dan Lizotte

Introduction to Machine Learning Reykjavík University Spring 2007 Instructor: Dan Lizotte Logistics To contact Dan: dlizotte@cs.ualberta.ca http://www.cs.ualberta.ca/~dlizotte/teaching/ Books: Introduction

### Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

### Multi-Class Sentiment Analysis with Clustering and Score Representation

Multi-Class Sentiment Analysis with Clustering and Score Representation Mohsen Farhadloo Erik Rolland mfarhadloo@ucmerced.edu 1 CONTENT Introduction Applications Related works Our approach Experimental

### X-TREPAN: AN EXTENDED TREPAN FOR COMPREHENSIBILITY AND CLASSIFICATION ACCURACY IN ARTIFICIAL NEURAL NETWORKS

X-TREPAN: AN EXTENDED TREPAN FOR COMPREHENSIBILITY AND CLASSIFICATION ACCURACY IN ARTIFICIAL NEURAL NETWORKS Awudu Karim 1, Shangbo Zhou 2 College of Computer Science, Chongqing University, Chongqing,

### Introduction to Machine Learning

Introduction to Machine Learning Hamed Pirsiavash CMSC 678 http://www.csee.umbc.edu/~hpirsiav/courses/ml_fall17 The slides are closely adapted from Subhransu Maji s slides Course background What is the

### L1: Course introduction

Introduction Course organization Grading policy Outline What is pattern recognition? Definitions from the literature Related fields and applications L1: Course introduction Components of a pattern recognition

### Tanagra Tutorials. Figure 1 Tree size and generalization error rate (Source:

1 Topic Describing the post pruning process during the induction of decision trees (CART algorithm, Breiman and al., 1984 C RT component into TANAGRA). Determining the appropriate size of the tree is a

### Big Data Analytics Clustering and Classification

E6893 Big Data Analytics Lecture 4: Big Data Analytics Clustering and Classification Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science September 28th, 2017 1

### Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

### Machine Learning (Decision Trees and Intro to Neural Nets) CSCI 3202, Fall 2010

Machine Learning (Decision Trees and Intro to Neural Nets) CSCI 3202, Fall 2010 Assignments To read this week: Chapter 18, sections 1-4 and 7 Problem Set 3 due next week! Learning a Decision Tree We look

### University Recommender System for Graduate Studies in USA

University Recommender System for Graduate Studies in USA Ramkishore Swaminathan A53089745 rswamina@eng.ucsd.edu Joe Manley Gnanasekaran A53096254 joemanley@eng.ucsd.edu Aditya Suresh kumar A53092425 asureshk@eng.ucsd.edu

### Principle Component Analysis for Feature Reduction and Data Preprocessing in Data Science

Principle Component Analysis for Feature Reduction and Data Preprocessing in Data Science Hayden Wimmer Department of Information Technology Georgia Southern University hwimmer@georgiasouthern.edu Loreen

### COMP 527: Data Mining and Visualization. Danushka Bollegala

COMP 527: Data Mining and Visualization Danushka Bollegala Introductions Lecturer: Danushka Bollegala Office: 2.24 Ashton Building (Second Floor) Email: danushka@liverpool.ac.uk Personal web: http://danushka.net/

### K-Means Clustering. By Susan L. Miertschin

K-Means Clustering By Susan L. Miertschin 1 Data Mining - Task Types Classification Clustering Discovering Association Rules Discovering Sequential Patterns Sequence Analysis Regression Detecting Deviations

### Practical Methods for the Analysis of Big Data

Practical Methods for the Analysis of Big Data Module 4: Clustering, Decision Trees, and Ensemble Methods Philip A. Schrodt The Pennsylvania State University schrodt@psu.edu Workshop at the Odum Institute

### Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

### Predicting Student Earnings After College

Miranda Strand Tommy Truong mstrand@stanford.edu tommyt@stanford.edu 1. Introduction Many students see college as an investment to help them earn more and live better lives after graduation. While it is

### Twitter Sentiment Classification on Sanders Data using Hybrid Approach

IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders