COLLEGE OF SCIENCE. School of Mathematical Sciences. NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining.

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

STA 225: Introductory Statistics (CT)

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Probabilistic Latent Semantic Analysis

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Statistics and Data Analytics Minor

Learning From the Past with Experiment Databases

(Sub)Gradient Descent

Self Study Report Computer Science

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CSL465/603 - Machine Learning

Artificial Neural Networks written examination

Australian Journal of Basic and Applied Sciences

A Case Study: News Classification Based on Term Frequency

Lecture 1: Basic Concepts of Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

MGT/MGP/MGB 261: Investment Analysis

Mathematics Program Assessment Plan

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Rule Learning With Negation: Issues Regarding Effectiveness

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

EGRHS Course Fair. Science & Math AP & IB Courses

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning Methods for Fuzzy Systems

Reducing Features to Improve Bug Prediction

Assignment 1: Predicting Amazon Review Ratings

Speech Recognition at ICSI: Broadcast News and beyond

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Rule Learning with Negation: Issues Regarding Effectiveness

Mathematics. Mathematics

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

A. What is research? B. Types of research

Time series prediction

Applications of data mining algorithms to analysis of medical data

Speech Emotion Recognition Using Support Vector Machine

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS APPLIED STATICS MET 1040

CS Machine Learning

Switchboard Language Model Improvement with Conversational Data from Gigaword

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Generative models and adversarial training

Math 181, Calculus I

1. Programme title and designation International Management N/A

San José State University Department of Marketing and Decision Sciences BUS 90-06/ Business Statistics Spring 2017 January 26 to May 16, 2017

Instructor: Matthew Wickes Kilgore Office: ES 310

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access


Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

A study of speaker adaptation for DNN-based speech synthesis

DOCTOR OF PHILOSOPHY HANDBOOK

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Math 96: Intermediate Algebra in Context

Universidade do Minho Escola de Engenharia

Business Administration/Management Information Systems, Ph.D.

Syllabus ENGR 190 Introductory Calculus (QR)

GRAPHIC DESIGN TECHNOLOGY Associate in Applied Science: 91 Credit Hours

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Probability and Statistics Curriculum Pacing Guide

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Human Emotion Recognition From Speech

Multivariate k-nearest Neighbor Regression for Time Series data -

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Statewide Framework Document for:

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS APPLIED MECHANICS MET 2025

SOC 175. Australian Society. Contents. S3 External Sociology

MBA 510: Critical Thinking for Managers

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

Unit 7 Data analysis and design

South Carolina English Language Arts

arxiv: v2 [cs.cv] 30 Mar 2017

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Calibration of Confidence Measures in Speech Recognition

Austin Community College SYLLABUS

PHD COURSE INTERMEDIATE STATISTICS USING SPSS, 2018

Mathematics subject curriculum

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

Evaluation of Teach For America:

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Welcome to. ECML/PKDD 2004 Community meeting

MASTER OF PHILOSOPHY IN STATISTICS

Learning By Asking: How Children Ask Questions To Achieve Efficient Search

CS/SE 3341 Spring 2012

What Different Kinds of Stratification Can Reveal about the Generalizability of Data-Mined Skill Assessment Models

Using dialogue context to improve parsing performance in dialogue systems

Word Segmentation of Off-line Handwritten Documents

The CTQ Flowdown as a Conceptual Model of Project Objectives

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

A survey of multi-view machine learning

Learning Methods in Multilingual Speech Recognition

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Note: Principal version Modification Amendment Modification Amendment Modification Complete version from 1 October 2014

This Performance Standards include four major components. They are

Transcription:

ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE School of Mathematical Sciences NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining 1.0 Course Designations and Approvals Required course approvals: Academic Unit Curriculum Committee College Curriculum Committee Approval request date: Approval granted date: Optional designations: General Education: Writing Intensive: Honors Is designation desired? No No No *Approval request date: **Approval granted date: 2.0 Course information: Course title: Principles of Statistical Data Mining Credit hours: 3 Prerequisite(s): One course in basic statistics Co-requisite(s): None Course proposed by: Ernest Fokoué Effective date: August 2013 Contact hours Maximum students/section Classroom 3 25 Lab 0 Studio 0 Other (specify) 0 2.a Course Conversion Designation*** (Please check which applies to this course). *For more information on Course Conversion Designations please see page four. Semester Equivalent (SE) Please indicate which quarter course it is equivalent to: Semester Replacement (SR) Please indicate the quarter course(s) this course is replacing: 0307-846- Principles of Statistical Data Mining 2.b Semester(s) offered (check) September 2010

Fall (online) Spring (campus) Summer Other All courses must be offered at least once every 2 years. If course will be offered on a bi-annual basis, please indicate here: 2.c Student Requirements Students required to take this course: (by program and year, as appropriate) None Students who might elect to take the course: This is an elective for graduate students in Advanced Certificate and MS programs in Applied Statistics. Graduate students in other programs who interested in statistical data mining will also elect to take this class. In the sections that follow, please use sub-numbering as appropriate (eg. 3.1, 3.2, etc.) 3.0 Goals of the course (including rationale for the course, when appropriate): 3.1 To achieve a practical understanding of modern statistical data mining techniques 3.2 To develop the ability to correctly apply modern data mining techniques to a variety of real world case studies involving very massive high dimensional complex data. 3.3 To gain a hands on experience with data mining through case studies, among which examples like: Describing website visitors, Market basket analysis, Describing customer satisfaction, Predicting credit risk of small businesses, Predicting e-learning student performance, Predicting customer lifetime value and Operational risk management. 4.0 Course description (as it will appear in the RIT Catalog, including pre- and corequisites, and quarters offered). Please use the following format: COS-STAT-747 Principles of Statistical Data Mining I This course covers topics such as clustering, classification and regression trees, multiple linear regression under various conditions, logistic regression, PCA and kernel PCA, model-based clustering via mixture of Gaussians, spectral clustering, text mining, neural networks, support vector machines, multidimensional scaling, variable selection, model selection, k-means clustering, k-nearest neighbors classifiers, statistical tools for modern machine learning and data mining, naïve Bayes classifiers, variance reduction methods (bagging) and ensemble methods for predictive optimality. 5.0 Possible resources (texts, references, computer packages, etc.) Required texts 5.1 Applied Data Mining for Business and Industry, 2nd ed., Paolo Giudici and Silvia Figini (2009), Wiley, ISBN 978-0-470-74582-3 Recommended Texts 5.2 Statistical Data Mining Using SAS Applications, 2nd ed., George Fernandez (2009), CRC Press, ISBN 978-1-439-81075-3 5.3 Data Mining Using SAS Enterprise Miner, Randall Matignon (2009), Wiley 5.4 Getting Started with SAS Enterprise Miner (From SAS) 2

5.5 Applied Analytics Using SAS Enterprise Miner (From SAS) 6.0 Topics (outline): 6.1. Complex data structures and the emergence of Data Mining and Machine Learning 6.2. Measures of location and measures of variability 6.3. Distance measures, Similarity Measures and Dependency measures 6.4. Multiple linear regression and its extensions to Radial Basis Function regression 6.5. Difference of focus between model identification and predictive optimality 6.6. Principles and applications of dimensionality reduction techniques 6.7. Principal component Analysis and Singular Value Decomposition 6.8. Cluster analysis.via Hierarchical and Hierarchical Methods 6.9. Factor Analysis and Mixtures of Factor Analyzers 6.10. Multidimensional scaling and its relationship to other techniques 6.11. Model Based Clustering via Mixtures of Gaussians 6.12. Logistic regression for Pattern Recognition 6.13. Linear and Quadratic Discriminant analysis. 6.14. Classification and Regression Trees 6.15. Neural networks: Multilayer Perceptron and Kohonen networks. 6.16. Support Vector Machines for classification and regression 6.17. Nearest-neighbor models: kmeans and K Nearest Neighbors 6.18. Variance Reduction Techniques: Bagging Predictors 6.19. Non-parametric modeling and Bayesian Modeling 6.20. Generalized linear models and Log-linear models 6.21. Graphical models and their applications 6.22. Model Evaluation and model selection techniques 6.23. Ensemble Methods for Predictive Optimality: Boosting 3

7.0 Intended course learning outcomes and associated assessment methods of those outcomes (please include as many Course Learning Outcomes as appropriate, one outcome and assessment method per row). Course Objectives Level 2: Comprehension: 2.1.Understands the central role of model uncertainty in data mining, and maintains a keen awareness of the difference between accurate model identification and optimal prediction 2.2.Appreciates and takes into account the everpresent bias/variance dilemma in model selection and model building, and strives to find solutions that achieve bias/variance trade-off 2.3.Knows when and how to combine unsupervised learning techniques (e.g.: PCA for feature extraction) with supervised learning techniques (e.g. Neural Networks) to achieve optimality 2.4.Recognizes when and how to use Ensemble methods rather than select a single model, and also knows when to use variance reduction techniques like Bagging! 2.5.Understands the profound meaning of the No Free Lunch theorem, and refrains from relying solely on one single method of data mining, and indeed always comparing various methods before making recommendations Level 3: Application: 3.1.Identifies an interesting real world engineering problem during the course of study and formulates its statistically 3.2.Recognizes for each real world case study which classes of data mining methods are more appropriate 3.3.Uses statistical software like SAS Enterprise Miner to perform a thorough data mining analysis of real world problems Level 4: Analysis: 4.1.Determines/decides which statistical model(s) appear to be most appropriate for the task at hand in light of the graphs and descriptive statistics obtained for exploratory data analysis Assessment Method Homework Exams Projects 4

4.2.Fits the chosen plausible model(s) using a statistical software package like SAS Enterprise Miner, then extracts and interprets the estimates of the parameters 4.3.Performs additional statistical hypothesis tests wherever needed 4.4.Checks all the assumptions underlying each method/technique used 4.5.Interprets the statistical estimation and prediction results produced by the software package Level 5: Synthesis: 5.1.Selects the best model according to some of the usual model selection criteria 5.2.Provides any needed/required formal prediction or estimation. 5.3.Uses an ensemble (aggregation) of methods wherever the need arises 5.4.Draws conclusions and interpretations about the original engineering task based on sound formal analysis like confidence intervals and results of hypothesis testing. Level 6: Evaluation: 6.1.Evaluates several potential statistical models and decides on the most appropriate one for a given purpose. 6.2.Provides any needed/required formal prediction or estimation 6.3.Makes recommendations in clear and non technical language based a thorough assessment of the statistical findings 5

8.0 Program outcomes and/or goals supported by this course Relationship to Program Outcomes (1 = slightly, 2=moderately, 3=significantly) Program Outcomes and/or Goals for CQAS 8.1 Advanced Certificate in Lean Six Sigma 8.1.1 Demonstrates an solid understanding of statistical thinking and Lean Six Sigma methodology in solving real-world problems. 8.1.2 Leads Lean Six Sigma improvement projects. Level of Support 1 2 3 8.2 Advanced Certificate and Masters of Science in Applied Statistics 8.2.1 Demonstrates solid understanding of statistical thinking and applied statistics methodology in solving real-world problems. 8.2.2 Designs studies that are efficient and valid. 8.2.3 Analyzes data using appropriate statistical methods. 8.2.4 Communicates the results of statistical analysis with effective reports and presentations. Note: Students obtaining the Advanced Certificate in Applied Statistics will not be expected to perform at the same level as students obtaining a Master of Science degree. 9.0 - Not Applicable General Education Learning Outcome Supported by the Course, if appropriate Communication Express themselves effectively in common college-level written forms using standard American English Revise and improve written and visual content Express themselves effectively in presentations, either in spoken standard American English or sign language (American Sign Language or English-based Signing) Comprehend information accessed through reading and discussion Intellectual Inquiry Review, assess, and draw conclusions about hypotheses and theories Analyze arguments, in relation to their premises, assumptions, contexts, and conclusions Construct logical and reasonable arguments that include anticipation of counterarguments Use relevant evidence gathered through accepted scholarly methods and properly acknowledge sources of information Assessment Method 6

Ethical, Social and Global Awareness Analyze similarities and differences in human experiences and consequent perspectives Examine connections among the world s populations Identify contemporary ethical questions and relevant stakeholder positions Scientific, Mathematical and Technological Literacy Explain basic principles and concepts of one of the natural sciences Apply methods of scientific inquiry and problem solving to contemporary issues Comprehend and evaluate mathematical and statistical information Perform college-level mathematical operations on quantitative data Describe the potential and the limitations of technology Use appropriate technology to achieve desired outcomes Creativity, Innovation and Artistic Literacy Demonstrate creative/innovative approaches to course-based assignments or projects Interpret and evaluate artistic expression considering the cultural context in which it was created 10.0 Other relevant information (such as special classroom, studio, or lab needs, special scheduling, media requirements, etc.) None *Optional course designation; approval request date: This is the date that the college curriculum committee forwards this course to the appropriate optional course designation curriculum committee for review. The chair of the college curriculum committee is responsible to fill in this date. **Optional course designation; approval granted date: This is the date the optional course designation curriculum committee approves a course for the requested optional course designation. The chair of the appropriate optional course designation curriculum committee is responsible to fill in this date. ***Course Conversion Designations Please use the following definitions to complete table 2.a on page one. Semester Equivalent (SE) Closely corresponds to an existing quarter course (e.g., a 4 quarter credit hour (qch) course which becomes a 3 semester credit hour (sch) course.) The semester course may develop material in greater depth or length. Semester Replacement (SR) A semester course (or courses) taking the place of a previous quarter course(s) by rearranging or combining material from a previous quarter course(s) (e.g. a two semester sequence that replaces a three quarter sequence). New (N) - No corresponding quarter course(s). 7