ST 562: Data Mining with SAS Enterprise Miner

Similar documents
MBA 510: Critical Thinking for Managers

Python Machine Learning

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

4. Long title: Emerging Technologies for Gaming, Animation, and Simulation

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Statistics and Data Analytics Minor

Mathematics Success Grade 7

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Applications of data mining algorithms to analysis of medical data

Probability and Statistics Curriculum Pacing Guide

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

USF Course Change Proposal Global Citizens Project

Rule Learning With Negation: Issues Regarding Effectiveness

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

GRAPHIC DESIGN TECHNOLOGY Associate in Applied Science: 91 Credit Hours

Rule Learning with Negation: Issues Regarding Effectiveness

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Detailed course syllabus

1.11 I Know What Do You Know?

OFFICE SUPPORT SPECIALIST Technical Diploma

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Office Hours: Mon & Fri 10:00-12:00. Course Description

Foothill College Summer 2016

Ericsson Wallet Platform (EWP) 3.0 Training Programs. Catalog of Course Descriptions

Biome I Can Statements

Rachel Edmondson Adult Learner Analyst Jaci Leonard, UIC Analyst

Syllabus ENGR 190 Introductory Calculus (QR)

Julia Smith. Effective Classroom Approaches to.

Honors Interdisciplinary Seminar

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Soil & Water Conservation & Management Soil 4308/7308 Course Syllabus: Spring 2008

Switchboard Language Model Improvement with Conversational Data from Gigaword

12- A whirlwind tour of statistics

Math 098 Intermediate Algebra Spring 2018

Math 96: Intermediate Algebra in Context

OP-P 602 A-E Page 1 of 8. Operating Protocol-Procedure #: 602 (A-E) Category: Instruction Office of Primary Responsibility: Office of Academic Affairs

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

CURRICULUM PROCEDURES REFERENCE MANUAL. Section 3. Curriculum Program Application for Existing Program Titles (Procedures and Accountability Report)

Working with Rich Mathematical Tasks

Honors Mathematics. Introduction and Definition of Honors Mathematics

Australia s tertiary education sector

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

CS Machine Learning

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Enrollment Trends. Past, Present, and. Future. Presentation Topics. NCCC enrollment down from peak levels

Course Content Concepts

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Fortis College, Cincinnati Ohio

CONQUERING THE CONTENT: STRATEGIES, TASKS AND TOOLS TO MOVE YOUR COURSE ONLINE. Robin M. Smith, Ph.D.

B.S/M.A in Mathematics

Dublin City Schools Mathematics Graded Course of Study GRADE 4

School of Innovative Technologies and Engineering

4.0 CAPACITY AND UTILIZATION

Word Segmentation of Off-line Handwritten Documents

Reducing Features to Improve Bug Prediction

Let's Learn English Lesson Plan

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

EGRHS Course Fair. Science & Math AP & IB Courses

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Algebra 2- Semester 2 Review

Major Milestones, Team Activities, and Individual Deliverables

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Statewide Framework Document for:

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Generating Test Cases From Use Cases

Mathematics Success Level E

Standards and Criteria for Demonstrating Excellence in BACCALAUREATE/GRADUATE DEGREE PROGRAMS

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

M55205-Mastering Microsoft Project 2016


DegreeWorks Advisor Reference Guide

Kenya: Age distribution and school attendance of girls aged 9-13 years. UNESCO Institute for Statistics. 20 December 2012

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

SCNS changed to MUM 2634

UDW+ Student Data Dictionary Version 1.7 Program Services Office & Decision Support Group

BENCHMARK MA.8.A.6.1. Reporting Category

Hawai i Pacific University Sees Stellar Response Rates for Course Evaluations

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

I. Proposal presentations should follow Degree Quality Assessment Board (DQAB) format.

Firms and Markets Saturdays Summer I 2014

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Human Emotion Recognition From Speech

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

SOFTWARE EVALUATION TOOL

DIDACTIC MODEL BRIDGING A CONCEPT WITH PHENOMENA

Mining Association Rules in Student s Assessment Data

BUS Computer Concepts and Applications for Business Fall 2012

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Teacher Action Research Multiple Intelligence Theory in the Foreign Language Classroom. By Melissa S. Ferro George Mason University

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Transcription:

ST 562: Data Mining with SAS Enterprise Miner In Workflow 1. 17ST GR Director of Curriculum (demarti4@ncsu.edu; bondell@stat.ncsu.edu) 2. 17ST Grad Head (demarti4@ncsu.edu; bondell@stat.ncsu.edu; fuentes@ncsu.edu) 3. COS CC Coordinator GR (alun_lloyd@ncsu.edu; clbowma2@ncsu.edu) 4. COS CC Meeting GR (alun_lloyd@ncsu.edu; clbowma2@ncsu.edu) 5. COS CC Chair GR () 6. COS Final Review GR (clbowma2@ncsu.edu; alun_lloyd@ncsu.edu) 7. COS Dean GR (cohen@math.ncsu.edu) 8. ABGS Coordinator (george_hodge@ncsu.edu; lian_lynch@ncsu.edu; mlnosbis@ncsu.edu) 9. ABGS Meeting (george_hodge@ncsu.edu; lian_lynch@ncsu.edu; mlnosbis@ncsu.edu) 10. ABGS Chair (george_hodge@ncsu.edu; lian_lynch@ncsu.edu; mlnosbis@ncsu.edu) 11. Grad Final Review (george_hodge@ncsu.edu; lian_lynch@ncsu.edu; mlnosbis@ncsu.edu) 12. PeopleSoft (ldmihalo@ncsu.edu; blpearso@ncsu.edu; Charles_Clift@ncsu.edu; jmharr19@ncsu.edu; Tracey_Ennis@ncsu.edu) Approval Path 1. Thu, 17 Mar 2016 17:01:52 GMT Donald Martin (demarti4): Approved for 17ST GR Director of Curriculum 2. Thu, 17 Mar 2016 17:19:44 GMT Donald Martin (demarti4): Approved for 17ST Grad Head 3. Thu, 17 Mar 2016 17:38:13 GMT Cheryll Bowman-Medhin (clbowma2): Approved for COS CC Coordinator GR 4. Thu, 17 Mar 2016 17:41:43 GMT Cheryll Bowman-Medhin (clbowma2): Approved for COS CC Meeting GR 5. Wed, 06 Apr 2016 12:22:36 GMT Melissa sbisch (mlnosbis): Approved for COS CC Chair GR 6. Wed, 06 Apr 2016 12:27:20 GMT Melissa sbisch (mlnosbis): Approved for COS Final Review GR 7. Wed, 06 Apr 2016 14:50:45 GMT Jo-Ann Cohen (cohen): Approved for COS Dean GR 8. Mon, 11 Apr 2016 18:01:31 GMT George Hodge (george_hodge): Approved for ABGS Coordinator 9. Thu, 21 Apr 2016 13:27:44 GMT Melissa sbisch (mlnosbis): Approved for ABGS Meeting New Course Proposal Date Submitted: Thu, 17 Mar 2016 16:48:09 GMT Viewing: ST 562 : Data Mining with SAS Enterprise Miner Changes proposed by: boos Change Type Major Course Prefix ST (Statistics) Course Number 562 Dual-Level Course

Cross-listed Course Title Data Mining with SAS Enterprise Miner Abbreviated Title Data Mining with SAS College College of Sciences Academic Org Code Statistics (17ST) CIP Discipline Specialty Number 27.0501 CIP Discipline Specialty Title Statistics, General. Term Offering Spring Only Year Offering Offered Every Year Effective Date Spring 2017 Previously taught as Special Topics? Yes Number of Offerings within the past 5 years 5 Course Prefix/Number Semester/Term Offered Enrollment 610, 610,610, 590, 590 spring 18,27,25,49,44 Course Delivery Face-to-Face (On Campus) Distance Education (DELTA) Grading Method Graded/Audit Credit Hours 3 Course Length

15 weeks Contact Hours (Per Week) Component Type Lecture 3 Course Is Repeatable for Credit Instructor Name David Dickey Instructor Title Professor Grad Faculty Status Full Anticipated On-Campus Enrollment Contact Hours Open when course_delivery = campus OR course_delivery = blended OR course_delivery = flip Enrollment Component Per Semester Per Section Multiple Sections? Comments Lecture 50 50 DELTA/Online Enrollment: Open when course_delivery = distance OR course_delivery = online OR course_delivery = remote Delivery Format Per Semester Per Section Multiple Sections? Comments LEC 30 30 Course Prerequisites, Corequisites, and Restrictive Statement ST 512 or ST 514 or ST 515 or ST 517 Is the course required or an elective for a Curriculum? Catalog Description This is a hands-on course using modeling techniques designed mostly for large observational studies. Estimation topics include recursive splitting, ordinary and logistic regression, neural networks, and discriminant analysis. Clustering and association analysis are covered under the topic unsupervised learning, and the use of training and validation data sets is emphasized. Model evaluation alternatives to statistical significance include lift charts and receiver operating characteristic curves. SAS Enterprise Miner is used in the demonstrations, and some knowledge of basic SAS programming is helpful. Justification for new course: We are in an era in which large amounts of data are being collected, sometimes without a particular goal in mind, then later used for decision making. These data are typically observational in nature rather than from controlled studies, and there can be outliers and large chunks of missing values in some of the variables. Such data call for additional tools in the modern analyst's tool bag. Fast methods that accommodate missing values and outliers, such as recursive splitting methods, have arisen and computer methods for speeding up traditional analyses like logistic regression have been included in software such as SAS Institute's Enterprise Miner package. Flexible models like neural networks have developed a following among analysts. When loyalty cards are scanned at a store, they provide data for association analysis. Learning what items are purchased together and customer segmentation by clustering has also found application in business. The demand for graduates with the skills to analyze such data far exceeds the supply, and demand is growing. Hands-on experience with an industrial strength data mining package that has all of the above abilities, such as SAS Enterprise Miner used in the course, empowers our students at NC State to be competitive in the workforce.

Does this course have a fee? Consultation Instructional Resources Statement Since the course has been taught for 5 years as a special topics course, there will be no need for additional resources. Course Objectives/Goals The goal of this course is to introduce the basic elements of data mining techniques to students with backgrounds equivalent to that supplied by the department's statistical methodology sequence. Students will get hands-on experience with the SAS Enterprise Miner product as well as SAS programming through in class demonstrations and practice with homework data sets. Student Learning Outcomes By the end of this course, the students will be able to Use SAS Enterprise Miner to run analyses Check for problem data and mitigate the problems Use classification and regression trees to perform recursive splitting Perform and interpret logistic regression Evaluate and compare models with modern tools like lift charts Run discriminant analysis and compare it to modern methods Fit neural network models to data Perform cluster analyses for large data sets Use association and sequence analysis on large data sets Student Evaluation Methods Evaluation Method Weighting/Points for Each Details Homework 20% Exam 20% Exam 20% Exam 20% Final Exam 20% Topical Outline/Course Schedule Topic Time Devoted to Each Topic Activity Overview, diagrams, ordinary regression 2 weeks summary of upcoming topics with examples, creating data mining diagrams, setting up the environment for running SAS Enterprise Miner, linking data sets to be used in examples Classification Trees, Regression Trees 4 weeks Use of Chi-square tests in recursive splitting, Interpretation of decision trees using famous Framingham heart study data, Splitting algorithms for decision trees, Treatment of missing values, Simplifying decision trees using validation data, Building trees for estimates, decisions, or ranking gives different results, Build several trees on an example data set, Compare decision trees to regression trees and give a regression tree example, Compute lift charts for trees

Discriminant Analysis 2 weeks Review multivariate normal distribution, Develop discriminant functions from normal distribution definition, Discuss the role of priors in discriminant, analysis, Interpret posterior probabilities and error rates, Compare quadratic discriminants to linear ones Ordinary and Logistic Regression 3 weeks Explain the need for new regression methods when the response is categorical (focus on binary), Show the logistic function, Develop maximum likelihood estimators, graph the likelihood function and discuss Gauss-Newton estimation, Show a logistic example within SAS (space shuttle O-ring data), Review additional data cleaning steps needed here but not in tree based methods, Develop logistic regressions within Enterprise Miner, Interpret logistic output including a discussion of concordance Neural Networks 1 week Relate hyperbolic tangent functions to familiar logistic functions, Demystify neural nets somewhat by showing them as compositions of hyperbolic tangents, Explore that flexibility of neural networks, Control neural network complexity using logistic regression model building as a preliminary variable selection tool. Evaluation Methods 1 week Develop the ROC curve idea, Show how ROC curves relate to concordance, Compare several of the above models through their ROC curves and lift charts, Pick a winner among models and export the model code to C, Java, or SAS code Clustering 1 week Distinguish agglomerative, divisive, and direct clustering, Describe single, average, and complete linkage, Ward's method, and k-means and give examples, Describe the two step method used in Enterprise Miner, Cluster some Census Bureau data on U.S. households within Enterprise Miner, Show graphical depictions of cluster compositions Association Analysis and other topics as time permits 1 week Relate association analysis to simple conditional probability Compute lift for association analysis Show association and sequence analysis on some banking data. Other topics as time permits: multidimensional scaling, bagging and boosting of tree based models Syllabus ST_562_syllabus.pdf Additional Documentation Additional Comments mlnosbis 4/11/2016: overlapping courses. ghodge 4/16/2016 consultation required as it does not seem to overlap with any courses. Ready for ABGS reviewers ABGS Reviewer Comments: -Good, but syllabus has no details about grading of assignments. Course Reviewer Comments

Key: 10017