Data Mining ( Z4)

Similar documents
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

10.2. Behavior models

INTRODUCTION TO PSYCHOLOGY

Office Hours: Mon & Fri 10:00-12:00. Course Description

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Course Syllabus p. 1. Introduction to Web Design AVT 217 Spring 2017 TTh 10:30-1:10, 1:30-4:10 Instructor: Shanshan Cui

Psychology 101(3cr): Introduction to Psychology (Summer 2016) Monday - Thursday 4:00-5:50pm - Gruening 413

Instructor Dr. Kimberly D. Schurmeier

Data Structures and Algorithms

Course Content Concepts

BIOH : Principles of Medical Physiology

CS 100: Principles of Computing

THE UNIVERSITY OF WESTERN ONTARIO. Department of Psychology

English Policy Statement and Syllabus Fall 2017 MW 10:00 12:00 TT 12:15 1:00 F 9:00 11:00

Class Numbers: & Personal Financial Management. Sections: RVCC & RVDC. Summer 2008 FIN Fully Online

Power of Ten Leadership Academy Class Curriculum

Monday/Wednesday, 9:00 AM 10:30 AM

PEPPERDINE UNIVERSITY THE GEORGE L. GRAZIADIO SCHOOL OF BUSINESS AND MANAGEMENT. ZHIKE LEI, Ph.D. BSCI 651- FEMBA BEHAVIOR IN ORGANIZATIONS

Office: Colson 228 Office Hours: By appointment

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

MGMT 479 (Hybrid) Strategic Management

CS Course Missive

Syllabus for PRP 428 Public Relations Case Studies 3 Credit Hours Fall 2012

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Introduction to Psychology

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

ACCT 100 Introduction to Accounting Course Syllabus Course # on T Th 12:30 1:45 Spring, 2016: Debra L. Schmidt-Johnson, CPA

Firms and Markets Saturdays Summer I 2014

MATH 1A: Calculus I Sec 01 Winter 2017 Room E31 MTWThF 8:30-9:20AM

TU-E2090 Research Assignment in Operations Management and Services

Journalism 336/Media Law Texas A&M University-Commerce Spring, 2015/9:30-10:45 a.m., TR Journalism Building, Room 104

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Northeastern University Online Course Syllabus

MGT/MGP/MGB 261: Investment Analysis

ECON492 Senior Capstone Seminar: Cost-Benefit and Local Economic Policy Analysis Fall 2017 Instructor: Dr. Anita Alves Pena

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS APPLIED STATICS MET 1040

Modeling user preferences and norms in context-aware systems

Guide to Teaching Computer Science

GEB 6930 Doing Business in Asia Hough Graduate School Warrington College of Business Administration University of Florida

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

Economics 201 Principles of Microeconomics Fall 2010 MWF 10:00 10:50am 160 Bryan Building

ASTR 102: Introduction to Astronomy: Stars, Galaxies, and Cosmology

Earl of March SS Physical and Health Education Grade 11 Summative Project (15%)

STA2023 Introduction to Statistics (Hybrid) Spring 2013

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

PreAP Geometry. Ms. Patricia Winkler

Strategic Management (MBA 800-AE) Fall 2010

COURSE DESCRIPTION PREREQUISITE COURSE PURPOSE

GEOG 473/573: Intermediate Geographic Information Systems Department of Geography Minnesota State University, Mankato

MANA 7A97 - STRESS AND WORK. Fall 2016: 6:00-9:00pm Th. 113 Melcher Hall

Department of Anthropology ANTH 1027A/001: Introduction to Linguistics Dr. Olga Kharytonava Course Outline Fall 2017

Western University , Ext DANCE IMPROVISATION Dance 2270A

MAR Environmental Problems & Solutions. Stony Brook University School of Marine & Atmospheric Sciences (SoMAS)

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

ACC : Accounting Transaction Processing Systems COURSE SYLLABUS Spring 2011, MW 3:30-4:45 p.m. Bryan 202

EDIT 576 DL1 (2 credits) Mobile Learning and Applications Fall Semester 2014 August 25 October 12, 2014 Fully Online Course

Name: Giovanni Liberatore NYUHome Address: Office Hours: by appointment Villa Ulivi Office Extension: 312

San José State University

HCI 440: Introduction to User-Centered Design Winter Instructor Ugochi Acholonu, Ph.D. College of Computing & Digital Media, DePaul University

Mining Association Rules in Student s Assessment Data

SYLLABUS- ACCOUNTING 5250: Advanced Auditing (SPRING 2017)

Academic Integrity RN to BSN Option Student Tutorial

Syllabus for GBIB 634 Wisdom Literature 3 Credit hours Spring 2014

BUSI 2504 Business Finance I Spring 2014, Section A

ACADEMIC POLICIES AND PROCEDURES

JOURNALISM 250 Visual Communication Spring 2014

UNA PROFESSIONAL ACCOUNTING PREP PROGRAM

MATH Study Skills Workshop

ASTRONOMY 2801A: Stars, Galaxies & Cosmology : Fall term

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Introduction. Chem 110: Chemical Principles 1 Sections 40-52

International Business BADM 455, Section 2 Spring 2008

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

I. PREREQUISITE For information regarding prerequisites for this course, please refer to the Academic Course Catalog.

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

Personal Tutoring at Staffordshire University

Lecture 1: Basic Concepts of Machine Learning

Show and Tell Persuasion

EDIT 576 (2 credits) Mobile Learning and Applications Fall Semester 2015 August 31 October 18, 2015 Fully Online Course

COURSE WEBSITE:

Language Arts Methods

Georgetown University School of Continuing Studies Master of Professional Studies in Human Resources Management Course Syllabus Summer 2014

CHEMISTRY 104 FALL Lecture 1: TR 9:30-10:45 a.m. in Chem 1351 Lecture 2: TR 1:00-2:15 p.m. in Chem 1361

BSW Student Performance Review Process

PSCH 312: Social Psychology

Houghton Mifflin Online Assessment System Walkthrough Guide

Course Syllabus It is the responsibility of each student to carefully review the course syllabus. The content is subject to revision with notice.

Research Brief. Literacy across the High School Curriculum

Greek Life Code of Conduct For NPHC Organizations (This document is an addendum to the Student Code of Conduct)

Introduction to Sociology SOCI 1101 (CRN 30025) Spring 2015

Friday, October 3, 2014 by 10: a.m. EST

Phys4051: Methods of Experimental Physics I

Aerospace Engineering

INTRODUCTION TO SOCIOLOGY SOCY 1001, Spring Semester 2013

MGMT 4750: Strategic Management

Transcription:

Data Mining (95-791 Z4) Syllabus Mini 4, Spring 2018 This syllabus is adapted from Dr. Dubrawski's 95-791 Data Mining Syllabus Lecture Instructor: Dr. Artur Dubrawski awd@cs.cmu.edu Distance Learning Facilitator: Karen (Lujie) Chen karenchen@cmu.edu Teaching Assistant: TBD Prerequisites 95-796 Statistics for IT Managers or instructor s permission based on the student s knowledge of fundamentals of probability and statistics. Previous experience with data analysis will be considered a plus, although it is not absolutely necessary. Course Motivation Data mining intelligent analysis of information stored in data sets has gained a substantial interest among practitioners in a variety of fields and industries. Nowadays, almost every organization collects data, which can be analyzed in order to support making better decisions, improving policies, discovering computer network intrusion patterns, designing new drugs, detecting credit fraud, making accurate medical diagnoses, predicting imminent occurrences of important events, monitoring and evaluation of reliability to preempt failures of complex systems, etc. About the Instructor Artur Dubrawski is a scientist and a practitioner. He has been researching machine intelligence and its applications for twenty five years. In the past, he has been affiliated with an advanced data mining firm, Schenley Park Research, and served as Chief Technology Officer at Aethon, a local high-tech company making autonomous delivery robots. Currently Dr. Dubrawski is a faculty at the CMU Robotics Institute, where he directs the Auton Lab : a data mining and machine learning research group. Auton Lab s work has yielded multiple deployments of analytic solutions and software in various government and industrial applications.

About the course facilitator Karen Chen is a PhD student in the information system program of Heinz College, she is also associated with Auton Lab under supervision of Dr. Dubrawski. Her research interest is in big data analytics, machine learning and data mining application, in particular, the modeling of temporal dynamics of real time sensor data with application to health care and education. Some of her work involved analyzing physiological signals from continuously monitored patients as well as psychological signals of emotion states from facial expression analysis. Before her PhD career, she worked as a research staff with the Auton Lab for about 10 years, working on a variety of data mining and analytics projects in areas of public health, food safety, health insurance and fuel efficiency. She holds MISM degree and M.S. in statistics, both from Carnegie Mellon University and B. Eng degree in business and computer science from Shanghai Jiaotong University in China. Course Objectives This course will provide participants with an understanding of fundamental data mining methodologies and with the ability to formulate and solve problems with them. Particular attention will be paid to practical, efficient and statistically sound techniques, capable of providing not only the requested discoveries, but also estimates of their utility. The lectures will be complemented with hands-on experience with data mining software, primarily R, to allow development of basic execution skills. The scope of the course will cover the following groups of topics. Foundations. How to make data mining practical? (approximately 40% of class time) Learning from data: why, what and how? Fundamental tasks, issues and paradigms of learning models from data. Real world data is noisy and uncertain. How much can we trust the results of our analyses? Model selection Reduction of dimensionality and data engineering Measures of association between data attributes: information theoretic, correlational Pragmatic methodologies for mining data (approximately 60% of class time)

Predictive analytics: classification and regression Cost-sensitive model selection using ROC approach Compression of data and models for improved reliability, understandability, and tractability of large sets of highly dimensional data Association rule learning and decision list learning, decision trees Introduction to density estimation, anomaly detection, and clustering Overview of mining complex types of data Illustrative examples of real-world applications Reading Material Unfortunately, the ideal textbook for this course does not exist. Instead, we will use a selection of readings excerpted from a variety of sources. These readings are intended to complement the material presented in class. Selected issues covered by the required readings will become topics of graded assignments and final examination. All required material will be distributed electronically through course site, or pointers to the resources available on the internet for free download. Note that many of the readings are protected under copyright law. In order to use them in this course it was necessary to purchase official permissions from the copyright holders. Each enrolled student could have their HUB account charged with an equal share of the copyright fees. Although the exact amount of the individual share is not known at the moment of writing this document, it is estimated to not exceed $30.00. Please note that it is illegal to distribute copies of the copyrighted materials without obtaining permissions from their legal owners. Interested students are welcome to go beyond the scope of the required readings. In particular, the following books are recommended - but not required - listed in no particular order: 1. Hand, Mannila and Smyth: Principles of Data Mining, MIT Press, 2001. 2. Witten and Frank: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 2000 (with newer editions avaiable). 3. Hastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer 2001 4. Mitchell: Machine Learning, McGraw-Hill, 1997. Software and Hands-on Exercises We will primarily rely on R free software to demonstrate and operationalize concepts presented during lectures. Students are expected to download and install

the software, as well as learn basic usage skills on their own using tutorials available online. Appropriate resources will be recommended during the first lecture and/or recitations session. Recitations will review concepts taught in lectures and connect them to homework problems through examples. Recitation sessions when software tools are introduced will provide hands-on-experience opportunity: the students will be asked to follow the presenter using their laptops and they will work on assigned exercises while in session. Assignments and Deadlines All assignments will be distributed electronicallythrough the piazza. All reports (including homework) must be submitted electronically through email (TBD). There are two kinds of deadlines for each homework:the soft deadline and hard deadline, each with one week apart. You are encouraged to submit homework by soft deadline, in which case you will have 5% bonus point of your actual homework marks (for example, if you submit homework x before the soft deadline, and your homework mark is 90 out of 100, then your final mark for this homework is 90*1.05).You may choose to submit according to the hard deadline schedule without bonus nor penalty. Late homework will be accepted until 24 hours past the hard deadline, but it will be subject to an automatic 50% grade reduction. Grading Grades will be based upon the results of four homework assignments, one analytical project. The analytical projects will be conducted in small groups of students. Each team will analyze specific real-world data. The project will be graded based on bi-weekly progress report (TBD), a report (TBD) and a recording of an oral presentation of the results.(tbd) The final grade for this course will is composed of following: 1. Homework (4 times 15%) 60% 2. Analytical project (in teams) 40% Academic Integrity Students are expected to strictly follow Carnegie Mellon University rules of academic integrity in this course. This meanshomework are to be the work of the individual student using only permitted material and without any cooperation of other students or third parties. It also means that usage of work by others is only permitted in the form of quotations and any such quotation must be distinctively marked to enable identification of the student s own work and own ideas. All external sources used must be properly cited, including author name(s), publication title, year of publication, and a complete reference needed for retrieval. Regarding the group projects, the work should be the work of only the group members. In all

their work students should not in any way rely on solutions to problems distributed in prior years or on the work of prior students or other current students. Violations will be penalized to the full extent mandated by the CMU policies. There will be no exceptions. Health and Wellness Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress. All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than 1 later is often helpful. If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.