# PRESENTATION TITLE. A Two-Step Data Mining Approach for Graduation Outcomes CAIR Conference

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 PRESENTATION TITLE A Two-Step Data Mining Approach for Graduation Outcomes 2013 CAIR Conference Afshin Karimi Ed Sullivan James Hershey Sunny Moon November 21, 2013

2 Data Mining Science of extracting patterns and knowledge from large data sets to predict future trends and behavior. o Supervised Learning o Unsupervised Learning

3 Two Step Process Classification decision tree model to predict six-year graduation of FTF (supervised learning) Cluster analysis (K-Means clustering) on the identified at-risk students to reveal patterns and suggest cluster-level intervention (unsupervised learning)

4 Classification Model Using Decision Tree Decision Tree vs. Neural Networks, Logistic Regression, SVM, etc. Decision trees are easy to understand, implement, and visualize

5 Decision Trees Continued Used in different disciplines including Operations Research Inverted trees with root at the top; used to create model that predicts target variable Generated by recursive partitioning An example of node selection criteria is Information Gain (C5.0) that selects node variable with least entropy with respect to target variable

6 Example decision tree Play tennis or not? (depending on weather conditions) Each branch corresponds to an attribute value Outlook Sunny Overcast Rainy Each internal node tests an attribute Humidity Yes Wind High Normal Strong Weak No Yes No Yes Each leaf assigns a classification Example taken from Kurt Driessens slides

7 Overfitting Generated decision tree relies too much on irrelevant feature of training data. The generated model performs poorly on future/unseen data. To reduce overfitting, use pruning (technique in which leaf nodes that do not add to the discriminative power of the decision tree are removed)

8 Training/Building the Tree Using 24 predictor variables: 12 socio-economic, demographics, HS performance variables 12 first term college variables All converted to nominal variables 1 target variable: 6 Yr Degree (with Yes/No values) Using the fall 03, 04, 05, 06 FTF cohorts for training

9 Predictor Variables Gender Under-Represented Status Residence (county) Parents Education HS GPA # of College Prep Math Courses Passed in HS # of College Prep Science Courses Passed in HS # of College Prep Social Science Courses Passed in HS # of College Prep Art Courses passed in HS SAT Math SAT Verb Prior Institution Type Admission Basis Code Pell Grant Recepient Freshman Program Participation College (Entry) Entry Level Math Proficiency English Proficiency Degree-Applicable Units Earned in First Semester F,D or WU Grade in 1st Semester First Term GPA Math Course (1st term) English Course (1st term)

10 Model Validation & Testing Total of 14,152 records from fall 03, 04, 05, 06 cohorts (missing HS GPAs, SATs excluded) for model training Random 1,000 records removed and set aside for future testing Remaining 13,152 records used for training/validation using a 5-fold cross validation

11 5-Fold Cross Validation 2,630 records 10,522 records

12 5-Fold Cross Validation 2,630 records 10,522 records

13 5-Fold Cross Validation 2,630 records 10,522 records

14 5-Fold Cross Validation 10,522 records 2,630records

15 5-Fold Cross Validation 10,522 records 2,630records

16 Model s Accuracy Classification accuracy is the average accuracy of the 5 runs: Classification Accuracy: 66.4% Sensitivity (true positive rate): 72.4% Specificity (true negative rate): 60.3%

17 RapidMiner 5.0

18

19 Relevance (weights) of the variables on the Information Gain Ratio Variable Weight (normalized) F,D or WU Grade in 1st Semester Degree-Applicable Units Earned in First Semester First Term GPA Math Course (1st term) Admission Basis Code HS GPA 0.01 Gender Freshman Program Participation Entry Level Math Proficiency English Course (1st term) Under-represented Status # of College Prep Math Courses Passed in HS English Proficiency College (entry) Parents Education SAT Verbal Pell Grant Recepient SAT Math Prior Institution Type Residence (county) # of College Prep Social Science Courses Passed in HS # of College Prep Science Courses Passed in HS # of College Prep Art Courses Passed in HS 0.001

20 Generated Tree

21 Testing Tested the model using the 1,000 records that were NOT used in building the model. Also, later (when summer 13 degrees were posted) tested the model using the Fall 07 cohort

22 Testing with Fall 07 FTF Cohort (Sept 13) Model predicts 1,717 (out of 4,026) students not to graduate in 6 years Model s classification accuracy: 68% ( )/4026 sensitivity: 1567/2101 = 75% specificity: 1183/1925 = 61% Top half of predicted non-graduates predicted with 82% accuracy

23 Clustering Place these 859 students who were predicted not to graduate in clusters such that: Students in each cluster are as similar as possible (based on their HS and 1 st term college academic performances) and Clusters are as different from each other as possible (again based on students HS and 1 st -term college academic performances)

24 K-Means Clustering-Using Mixed Euclidean Distance (both numeric and nominal variables) Focus is on the HS to college transition Variables used (only academic performance precollege and 1 st term): HS GPA SAT Verb SAT Math Number of degree-applicable units earned in 1 st term Number of F, D, WU or NC grades in 1 st term 1 st term type of math course passed/failed

25 Clusters Centroid Plot

26 Clusters Analysis Cluster N High School GPA SAT Math SAT Verb Degreeapplicable Units Earned # of F, D, WU or NC grades Mean σ Mean σ Mean σ Mean σ Mean σ

27 Clusters Analysis Continued Cluster 1st Term Math Course Outcome Failed Remedial Failed GE Passed Remedial Passed Math Math Math GE Math None 0 20% 57% 16% 6% 2% 1 15% 45% 29% 6% 5% 2 18% 30% 29% 20% 3%

28 Cluster 0 (The Un-motivated) HS GPA 2.8 SAT Math 493, SAT Verb st term college: Earned 1.6 degree-applicable units # of F, D, WU or NC grades: % took & failed GE math, 20% took and failed remedial math 1 st term GPA: 0.58 Mostly men (59% men, 41% women) College of major group mode: hierarchical, followed by semi-hierarchical Benefits from (Probation) Advisement Cluster 2 (The Slow Starters) HS GPA 2.9 SAT Math 471, SAT Verb st term college: Earned 6.3 degree-applicable units # of F, D, WU or NC grades: % took & failed GE math, 30% took and passed remedial math 1 st term GPA: 1.63 Mostly women (47% men, 53% women) College of major group mode: semi-hierarchical, followed by non-hierarchical Benefits from Academic Support

29 Cluster 1 (The Disconnected) HS GPA: 3.4 (above avg. HS GPA of fall 07 incoming freshmen) SAT Math 472, SAT Verb st term college: Earned 2.4 degree-applicable units # of F, D, WU or NC grades: % took & failed GE math, 29% took and passed remedial math 1 st term GPA: 0.83 Largely 1 st generation college students (40.4%) Majority underrepresented students (55.3%) Majority from outside local area high schools (57%) Mostly Women (36% men, 64% women) Benefits from Practices that Promote Campus Engagement, Early Warning System

30 Summary Predictive model for early identification of at-risk students using early indicators (not past 1 st term in college) Provides insight into clusters of at-risk students; suggests cluster-level intervention Don t need expertise in machine learning, AI, statistics (data mining tools handle algorithms) Need to know the data intimately (data compilation & preparation most critical, most time-consuming)

### Machine Learning. June 22, 2006 CS 486/686 University of Waterloo

Machine Learning June 22, 2006 CS 486/686 University of Waterloo Outline Inductive learning Decision trees Reading: R&N Ch 18.1-18.3 CS486/686 Lecture Slides (c) 2006 K.Larson and P. Poupart 2 What is

### Decision Tree Learning

Decision Tree Example Decision Tree Learning Ronald J. Williams CSU520, Spring 2008 Interesting? Shape circle square triangle Color Size No red blue green large small Yes No Yes Yes No Interesting=Yes

### ECT7110 Classification Decision Trees. Prof. Wai Lam

ECT7110 Classification Decision Trees Prof. Wai Lam Classification and Decision Tree What is classification? What is prediction? Issues regarding classification and prediction Classification by decision

### Introduction to Machine Learning

Introduction to Machine Learning D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 April 7, 2009 Outline Outline Introduction to Machine Learning Decision Tree Naive Bayes K-nearest neighbor

### Introduction to Machine Learning

Introduction to Machine Learning D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 April 6, 2009 Outline Outline Introduction to Machine Learning Outline Outline Introduction to Machine Learning

### Machine Learning B, Fall 2016

Machine Learning 10-601 B, Fall 2016 Decision Trees (Summary) Lecture 2, 08/31/ 2016 Maria-Florina (Nina) Balcan Learning Decision Trees. Supervised Classification. Useful Readings: Mitchell, Chapter 3

### Decision Tree Learning. CSE 6003 Machine Learning and Reasoning

Decision Tree Learning CSE 6003 Machine Learning and Reasoning Outline What is Decision Tree Learning? What is Decision Tree? Decision Tree Examples Decision Trees to Rules Decision Tree Construction Decision

### CAIR 2012 Conference Presentation November 8, 2012

CAIR 2012 Conference Presentation November 8, 2012 Sunny Moon ( hmoon@fullerton.edu ) James Hershey ( jrhershey@fullerton.edu ) Afshin Karimi ( akarimi@fullerton.edu ) Ed Sullivan ( esullivan@fullerton.edu

### Decision Tree for Playing Tennis

Decision Tree Decision Tree for Playing Tennis (outlook=sunny, wind=strong, humidity=normal,? ) DT for prediction C-section risks Characteristics of Decision Trees Decision trees have many appealing properties

### CMPS Advanced Database Systems. Dr. Chengwei Lei CEECS California State University, Bakersfield

CMPS 4420 Advanced Database Systems Dr. Chengwei Lei CEECS California State University, Bakersfield Supervised Learning Basic concepts 3 An example application An emergency room in a hospital measures

### CS480 Introduction to Machine Learning Decision Trees. Edith Law

CS480 Introduction to Machine Learning Decision Trees Edith Law Frameworks of machine learning Classification Supervised Learning Unsupervised Learning Reinforcement Learning 2 Overview What is the idea

### CSC 4510/9010: Applied Machine Learning Rule Inference

CSC 4510/9010: Applied Machine Learning Rule Inference Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 CSC 4510.9010 Spring 2015. Paula Matuszek 1 Red Tape Going

### Decision Trees. Vibhav Gogate The University of Texas at Dallas

Decision Trees Vibhav Gogate The University of Texas at Dallas Recap Supervised learning Given: Training data with desired output Assumption: There exists a function f which transforms input x into output

### Second Semester Examinations 2014/15. Data Mining and Visualisation

PAPER CODE NO. EXAMINER : Dr. Danushka Bollegala COMP527 DEPARTMENT : Computer Science Tel. No. 0151 7954283 Second Semester Examinations 2014/15 Data Mining and Visualisation TIME ALLOWED : Two and a

### Impact of ENG100 on Freshmen Retention and 6-Year Graduation at University of Hawaii-Hilo

Impact of ENG100 on Freshmen Retention and 6-Year Graduation at University of Hawaii-Hilo University of Hawai i System Institutional Research and Analysis Office February 2015 1. Introduction An English

### CSC 4510/9010: Applied Machine Learning. Rule Inference. Dr. Paula Matuszek

CSC 4510/9010: Applied Machine Learning 1 Rule Inference Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 Classification rules Popular alternative to decision trees

### AI Programming CS F-14 Decision Trees

AI Programming CS662-2008F-14 Decision Trees David Galles Department of Computer Science University of San Francisco 14-0: Rule Learning Previously, we ve assumed that background knowledge was given to

### Decision Trees. Doug Downey EECS 348 Spring with slides from Pedro Domingos, Bryan Pardo

Decision Trees Doug Downey EECS 348 Spring 2012 with slides from Pedro Domingos, Bryan Pardo Outline Classical AI Limitations Knowledge Acquisition Bottleneck, Brittleness Modern directions: Situatedness,

### Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lab 3: 19 th March 2012 WEKA A ML and DM software toolkit

### Inductive Learning and Decision Trees

Inductive Learning and Decision Trees Doug Downey EECS 349 with slides from Pedro Domingos, Bryan Pardo Outline Announcements Homework #1 was assigned yesterday Inductive learning Decision Trees 2 Outline

### Inductive Learning and Decision Trees

Inductive Learning and Decision Trees Doug Downey EECS 349 Spring 2017 with slides from Pedro Domingos, Bryan Pardo Outline Announcements Homework #1 was assigned on Monday (due in five days!) Inductive

### Inductive Learning and Decision Trees. Doug Downey with slides from Pedro Domingos, Bryan Pardo

Inductive Learning and Decision Trees Doug Downey with slides from Pedro Domingos, Bryan Pardo Outline Announcements Homework #1 to be assigned soon Inductive learning Decision Trees 2 Outline Announcements

### CLASSIFICATION: DECISION TREES

CLASSIFICATION: DECISION TREES Gökhan Akçapınar (gokhana@hacettepe.edu.tr) Seminar in Methodology and Statistics John Nerbonne, Çağrı Çöltekin University of Groningen May, 2012 Outline Research question

### Let the data speak: Machine Learning methods for data editing and imputation. Paper by: Felibel Zabala Presented by: Amanda Hughes

Let the data speak: Machine Learning methods for data editing and imputation Paper by: Felibel Zabala Presented by: Amanda Hughes September 2015 Objective Machine Learning (ML) methods can be used to help

### Machine Learning & Business Value. By Kush Patel, Data Scientist Resident at Galvanize

Machine Learning & Business Value By Kush Patel, Data Scientist Resident at Galvanize Outline Machine Learning Supervised vs Unsupervised Linear regression Decision Tree Classifier Random Forest Classifier

### Unit Completion and Graduation. Office of Institutional Research

Unit Completion and Graduation Office of Institutional Research March 2014 1 The Graduation Initiative Committee is exploring the idea of pre-enrolling new freshmen with 15 units before they arrive for

### Rule Learning (1): Classification Rules

14s1: COMP9417 Machine Learning and Data Mining Rule Learning (1): Classification Rules March 19, 2014 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill,

### Overview of Introduction

Overview of Introduction Machine Learning Problem definition Example Tasks Dimensions of Machine Learning Problems Example Representation Concept Representation Learning Tasks Evaluation Scenarios Induction

### Morgan C. Wang Department of Statistics and Actuarial Science University of Central Florida

Using Data Mining Techniques to Predict Student Development and Retention Morgan C. Wang Department of Statistics and Actuarial Science University of Central Florida Presenters University of Central Florida

### Machine Learning: Symbolische Ansätze

Machine Learning: Symbolische Ansätze Introduction Machine Learning Problem definition Example Tasks Dimensions of Machine Learning Problems Example Representation Concept Representation Learning Tasks

### Role of Institutional Research to support Data-Driven Decision at CSU Fullerton

Graduation PRESENTATION Initiative TITLE 2025: Role of Institutional Research to support Data-Driven Decision at CSU Fullerton Nov 9, 2017 California Association of Institutional Research Conference Sunny

### Decision Tree For Playing Tennis

Decision Tree For Playing Tennis ROOT NODE BRANCH INTERNAL NODE LEAF NODE Disjunction of conjunctions Another Perspective of a Decision Tree Model Age 60 40 20 NoDefault NoDefault + + NoDefault Default

CLASS 4, APRIL 2018 CHAPTER 9 CLASSIFICATION AND REGRESSION TREES DAY 2 PREDICTING PRICES OF TOYOTA CARS ROGER BOHN APRIL 2018 Notes based on: Data Mining for Business Analytics. Shmueli, et al + Data

### Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. January 11, 2011

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 11, 2011 Today: What is machine learning? Decision tree learning Course logistics Readings: The Discipline

### Knowledge Representation. Model Selection and Assessment. (c) Marcin Sydow. Knowledge. Complexity. Summary

Topics covered by this lecture: knowledge representation decision rules decision trees ID3 algorithm model complexity model selection assessment overtting methods of overcoming it cross-validation Variety

### More on rote learning

AI Principles, Semester 2, week 6, Lecture 13, Machine Learning Overview of Machine Learning Rote Learning Supervised Learning Reinforcement Learning Unsupervised Learning In-depth case study on Decision

### COMP 551 Applied Machine Learning Lecture 11: Ensemble learning

COMP 551 Applied Machine Learning Lecture 11: Ensemble learning Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp551

### Data Structures. Notes for Lecture 13 Techniques of Data Mining By. Classification: Basic Concepts. 1. Classification: Definition

Data Structures Notes for Lecture 13 Techniques of Data Mining By Ass.Prof.Dr.Samaher Al_Janabi 2017-2018 1. Classification: Definition Classification: Basic Concepts Given a collection of records (training

### Machine Learning Opportunities and Limitations

Machine Learning Opportunities and Limitations Holger H. Hoos LIACS Universiteit Leiden The Netherlands LCDS Conference 2017/11/28 The age of computation Clear, precise instructions flawlessly executed

### Learning. Learning Definitions. More Learning Definitions

Learning 2 Learning Learning 2 Learning Definitions....................................... 2 More Learning Definitions................................... 3 Example of Examples......................................

### Introduction to Classification

Introduction to Classification Classification: Definition Given a collection of examples (training set ) Each example is represented by a set of features, sometimes called attributes Each example is to

### Data Mining with Weka

Data Mining with Weka Class 1 Lesson 1 Introduction Data Mining with Weka a practical course on how to use Weka for data mining explains the basic principles of several popular algorithms 2 Data Mining

### Security Analytics Review for Final Exam. Purdue University Prof. Ninghui Li

Security Analytics Review for Final Exam Purdue University Prof. Ninghui Li Exam Date/Time Monday Dec 10 (8am 10am) LWSN B134 Organization of the Course Basic machine learning algorithms Neural networks

### Unsupervised Learning

09s1: COMP9417 Machine Learning and Data Mining Unsupervised Learning June 3, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997 http://www-2.cs.cmu.edu/~tom/mlbook.html

### Foundations of Small-Sample-Size Statistical Inference and Decision Making

Foundations of Small-Sample-Size Statistical Inference and Decision Making Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee November

### Inductive Learning and Decision Trees

Inductive Learning and Decision Trees Doug Downey EECS 349 Winter 2014 with slides from Pedro Domingos, Bryan Pardo Outline Announcements Homework #1 assigned Have you completed it? Inductive learning

### 18 LEARNING FROM EXAMPLES

18 LEARNING FROM EXAMPLES An intelligent agent may have to learn, for instance, the following components: A direct mapping from conditions on the current state to actions A means to infer relevant properties

### Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. January 12, 2015

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 12, 2015 Today: What is machine learning? Decision tree learning Course logistics Readings: The Discipline

### A Review on Classification Techniques in Machine Learning

A Review on Classification Techniques in Machine Learning R. Vijaya Kumar Reddy 1, Dr. U. Ravi Babu 2 1 Research Scholar, Dept. of. CSE, Acharya Nagarjuna University, Guntur, (India) 2 Principal, DRK College

### P(A, B) = P(A B) = P(A) + P(B) - P(A B)

AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) P(A B) = P(A) + P(B) - P(A B) Area = Probability of Event AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) If, and only if, A and B are independent,

### Deriving Decision Trees from Case Data

Topic 4 Automatic Kwledge Acquisition PART II Contents 5.1 The Bottleneck of Kwledge Aquisition 5.2 Inductive Learning: Decision Trees 5.3 Converting Decision Trees into Rules 5.4 Generating Decision Trees:

### IAI : Machine Learning

IAI : Machine Learning John A. Bullinaria, 2005 1. What is Machine Learning? 2. The Need for Learning 3. Learning in Neural and Evolutionary Systems 4. Problems Facing Expert Systems 5. Learning in Rule

### Predictive Analytics 101: An Introduction to the Future of Healthcare

MGMA 2017 ANNUAL CONFERENCE OCT. 8-11 ANAHEIM, CA Predictive Analytics 101: An Introduction to the Future of Healthcare Frank Cohen, MBB, MPA Director, Analytics, Doctors Management LLC Clearwater, Fla.

### Mining Students Characteristics and Effects on University Preference Choice: A Case Study of Applied Marketing in Higher Education

Mining Students Characteristics and Effects on University Preference Choice: A Case Study of Applied Marketing in Higher Education Muhammed Basheer Jasser* Aida Mustapha Fatimah Sidi* Abdulelah Khaled

### Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8

Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights

### Supervised learning can be done by choosing the hypothesis that is most probable given the data: = arg max ) = arg max

The learning problem is called realizable if the hypothesis space contains the true function; otherwise it is unrealizable On the other hand, in the name of better generalization ability it may be sensible

### Puente Student English Success, Retention, and Persistence at Gavilan Community College

Puente Student English Success, Retention, and Persistence at Gavilan Community College Terrence Willett Director of Research April 2002 Summary Participation in Puente in general appeared to enhance performance

### COMP 551 Applied Machine Learning Lecture 12: Ensemble learning

COMP 551 Applied Machine Learning Lecture 12: Ensemble learning Associate Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

### Learning from a Probabilistic Perspective

Learning from a Probabilistic Perspective Data Mining and Concept Learning CSI 5387 1 Learning from a Probabilistic Perspective Bayesian network classifiers Decision trees Random Forest Neural networks

### Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA

Adult Income and Letter Recognition - Supervised Learning Report An objective look at classifier performance for predicting adult income and Letter Recognition Dudon Wai Georgia Institute of Technology

### Institutional Scholarships Makes a Big Impact - A Study of Student Success at Sacramento State

Institutional s Makes a Big Impact - A Study of Student Success at Sacramento State The Office of Institutional Research 1. Introduction 3,303 first-time freshmen and 2,068 transfer students have received

### Machine Learning. Module 12

Machine Learning http://datascience.tntlab.org Module 12 Today s Agenda How You're Already Using Machine Learning Models Overview of Statistical Analysis vs. Machine Learning Terminology differences Model

### Enterprise Computing Community Conference 2011 Marist College, Poughkeepsie, NY June 12-14, 2011

Enterprise Computing Community Conference 2011 Marist College, Poughkeepsie, NY June 12-14, 2011 Eitel J.M. Lauría School of Computer Science & Mathematics Marist College Poughkeepsie, NY 12601 Eitel.Lauria@marist.edu

### Predicting Student Performance by Using Data Mining Methods for Classification

BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

### Data Mining Midterm Exam

Data Mining Midterm Exam 10.04.2014 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. The duration of the whole mid-term

### The Machine Learning Landscape

The Machine Learning Landscape Vineet Bansal Research Software Engineer, Center for Statistics & Machine Learning vineetb@princeton.edu Oct 31, 2018 What is ML? A field of study that gives computers the

### Trees: Themes and Variations

Trees: Themes and Variations Prof. Mari Ostendorf Outline Preface Decision Trees Bagging Boosting BoosTexter 1 Preface: Vector Classifiers Today we again deal with vector classifiers and supervised training:

### Foundations of AI. 11. Machine Learning. Learning from Observations. Wolfram Burgard, Andreas Karwath, Bernhard Nebel, and Martin Riedmiller 11/1

Foundations of AI 11. Machine Learning Learning from Observations Wolfram Burgard, Andreas Karwath, Bernhard Nebel, and Martin Riedmiller 11/1 Learning What is learning? An agent learns when it improves

### Outline. Little green men INTRODUCTION TO STATISTICAL MACHINE LEARNING. Representing things in Machine Learning 10/22/2010

Outline INTRODUCTION TO STATISTICAL MACHINE LEARNING Representing things Feature vector Training sample Unsupervised learning Clustering Supervised learning Classification Regression Xiaojin Zhu jerryzhu@cs.wisc.edu

### IM S5028. Customer Analytics. Supervised vs unsupervised techniques. Data Mining techniques

Customer Analytics Data Mining Techniques and applications to CRM: decision trees and neural networks Data Mining techniques Data mining, or knowledge discovery, is the process of discovering valid, novel

### Machine Learning: Summary

Machine Learning: Summary Greg Grudic CSCI-4830 Machine Learning 1 What is Machine Learning? The goal of machine learning is to build computer systems that can adapt and learn from their experience. Tom

### Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 11: 21 May 2012 Unsupervised Learning (cont ) Slides

### Foundations of AI. 10. Machine Learning. Learning from Observations. Wolfram Burgard, Bernhard Nebel, and Luc De Raedt 10/1

Foundations of AI 10. Machine Learning Learning from Observations Wolfram Burgard, Bernhard Nebel, and Luc De Raedt 10/1 Learning What is learning? An agent learns when it improves its performance w.r.t.

### COMP9444: Neural Networks Committee Machines

COMP9444: Neural Networks Committee Machines OMP9444 09s2 Committee Machines 1 Committee Machines OMP9444 09s2 Committee Machines 2 Motivation If several classifiers are trained on (subsets of) the same

### Semester 2 Statistics Short courses

Semester 2 Statistics Short courses Course: STAA0001 - Basic Statistics Blackboard Site: STAA0001 Dates: Sat 10 th Room EN409 Sept and 22 Oct 2016 (9 am 5 pm) Assumed Knowledge: None Day 1: Exploratory

### Data Mining in Higher Education: University Student Declaration of Major

Association for Information Systems AIS Electronic Library (AISeL) MWAIS 2011 Proceedings Midwest (MWAIS) 5-20-2011 Data Mining in Higher Education: University Student Declaration of Major Joseph Thomas

### Machine Learning for NLP

Natural Language Processing SoSe 2014 Machine Learning for NLP Dr. Mariana Neves April 30th, 2014 (based on the slides of Dr. Saeedeh Momtazi) Introduction Field of study that gives computers the ability

### Decision Tree. Machine Learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Decision Tree Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Decision Tree Fall 1396 1 / 24 Table of contents 1 Introduction 2 Decision

### Introduction to Classification, aka Machine Learning

Introduction to Classification, aka Machine Learning Classification: Definition Given a collection of examples (training set ) Each example is represented by a set of features, sometimes called attributes

### Epilogue: what have you learned this semester?

Epilogue: what have you learned this semester? ʻViagraʼ =0 =1 ʻlotteryʼ ĉ(x) = spam =0 =1 ĉ(x) = ham ĉ(x) = spam 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 What did you get out of this course? What skills

### Clustering Analysis Basics

Clustering Analysis Basics Ke Chen Reading: [Ch. 7, EA], [5., KPM] COMP4 Machine Learning Outline Introduction Data Types and Representations Distance Measures Major Clustering Methodologies Summary COMP4

### Predicting Student Academic Performance using Data Mining Methods

IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.5, May 2017 187 Predicting Student Academic Performance using Data Mining Methods Raheela Asif 1, Saman Hina 1, Saba Izhar

### Overview of Introduction

Overview of Introduction Machine Learning Problem definition Example Tasks Dimensions of Machine Learning Problems Example Representation Concept Representation Learning Tasks Evaluation Scenarios Induction

### Machine Learning :: Introduction. Konstantin Tretyakov

Machine Learning :: Introduction Konstantin Tretyakov (kt@ut.ee) MTAT.03.183 Data Mining November 5, 2009 So far Data mining as knowledge discovery Frequent itemsets Descriptive analysis Clustering Seriation

### Overview of Introduction

Overview of Introduction Machine Learning Problem definition Example Tasks Dimensions of Machine Learning Problems Example Representation Concept Representation Learning Tasks Evaluation Scenarios Induction

### PROCEEDINGS JOURNAL OF INTERDISCIPLINARY RESEARCH

PROCEEDINGS JOURNAL OF INTERDISCIPLINARY RESEARCH www.e-journaldirect.com Open Access Presented in 2 nd Interdisciplinary Research Regional Conference (IRRC) International Research Enthusiast Society Inc.

### INF5390 Kunstig intelligens. Agents That Learn. Roar Fjellheim. INF5390-AI-10 Agents That Learn 1

INF5390 Kunstig intelligens Agents That Learn Roar Fjellheim INF5390-AI-10 Agents That Learn 1 Outline General model Types of learning Learning decision trees Learning logical descriptions Other knowledge-based

### Filip Wójcik Data scientist, senior.net developer Wroclaw University lecturer

MACHINE LEARNING: when big data is not enough Filip Wójcik Data scientist, senior.net developer Wroclaw University lecturer filip.wojcik@outlook.com What is machine learning? (1/4) Artificial intelligence

### Welcome to SQL Saturday Denmark

Welcome to SQL Saturday Denmark Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Thanks you our PLATINUM sponsors Thanks you

### Course 395: Machine Learning Lectures

Course 395: Machine Learning Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic) Lecture 5-6: Artificial Neural Networks (THs) Lecture 7-8: Instance Based

### Course 395: Machine Learning Lectures

Course 395: Machine Learning Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic) Lecture 5-6: Artificial Neural Networks (S. Zafeiriou) Lecture 7-8: Instance

### Introduction. Notices. A Learning Agent 22/11/2012. COMP219: Artificial Intelligence. COMP219: Artificial Intelligence

COMP219: Artificial Intelligence COMP219: Artificial Intelligence Dr. Annabel Latham Room 2.05 Ashton Building Department of Computer Science University of Liverpool Lecture 27: Introduction to Learning,

### Machine Learning. Announcements (7/15) Announcements (7/16) Comments on the Midterm. Agents that Learn. Agents that Don t Learn

Machine Learning Burr H. Settles CS540, UWMadison www.cs.wisc.edu/~cs5401 Summer 2003 Announcements (7/15) If you haven t already, read Sections 18.118.3 in AI: A Modern Approach Homework #3 due tomorrow

### Practical Advice for Building Machine Learning Applications

Practical Advice for Building Machine Learning Applications Machine Learning Fall 2017 Based on lectures and papers by Andrew Ng, Pedro Domingos, Tom Mitchell and others 1 This lecture: ML and the world

### Access Center Assessment Report

Access Center Assessment Report The purpose of this report is to provide a description of the demographics as well as higher education access and success of Access Center students at CSU. College access

### CSC-272 Exam #2 March 20, 2015

CSC-272 Exam #2 March 20, 2015 Name Questions are weighted as indicated. Show your work and state your assumptions for partial credit consideration. Unless explicitly stated, there are NO intended errors

### Machine Learning , Spring 2018

Machine Learning 10-401, Spring 2018 Introduction, Admin, Course Overview Lecture 1, 01/17/ 2018 Maria-Florina (Nina) Balcan Image Classification Document Categorization Machine Learning Speech Recognition

### A Combination of Decision Trees and Instance-Based Learning Master s Scholarly Paper Peter Fontana,

A Combination of Decision s and Instance-Based Learning Master s Scholarly Paper Peter Fontana, pfontana@cs.umd.edu March 21, 2008 Abstract People are interested in developing a machine learning algorithm