DS 502/MA 543 STATISTICAL METHODS FOR DATA SCIENCE

Similar documents
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

(Sub)Gradient Descent

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

Python Machine Learning

MTH 215: Introduction to Linear Algebra

PHY2048 Syllabus - Physics with Calculus 1 Fall 2014

CS 3516: Computer Networks

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Instructor. Darlene Diaz. Office SCC-SC-124. Phone (714) Course Information

STA 225: Introductory Statistics (CT)

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

Syllabus ENGR 190 Introductory Calculus (QR)

Foothill College Summer 2016

Course Syllabus for Math

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Instructor Dr. Kimberly D. Schurmeier

MATH 1A: Calculus I Sec 01 Winter 2017 Room E31 MTWThF 8:30-9:20AM

Course Syllabus p. 1. Introduction to Web Design AVT 217 Spring 2017 TTh 10:30-1:10, 1:30-4:10 Instructor: Shanshan Cui

BADM 641 (sec. 7D1) (on-line) Decision Analysis August 16 October 6, 2017 CRN: 83777

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

MAT 122 Intermediate Algebra Syllabus Summer 2016

STA2023 Introduction to Statistics (Hybrid) Spring 2013

COURSE SYLLABUS SPM 3004, CRN PRINCIPLES OF SPORT MANAGEMENT


Spring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209. SYLLABUS and COURSE INFORMATION.

INTERMEDIATE ALGEBRA Course Syllabus

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Math 181, Calculus I

Office Hours: Mon & Fri 10:00-12:00. Course Description

CS/SE 3341 Spring 2012

Introduction to Forensic Drug Chemistry

Answers To Hawkes Learning Systems Intermediate Algebra

Physics XL 6B Reg# # Units: 5. Office Hour: Tuesday 5 pm to 7:30 pm; Wednesday 5 pm to 6:15 pm

Syllabus Foundations of Finance Summer 2014 FINC-UB

UEP 251: Economics for Planning and Policy Analysis Spring 2015

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Mastering Biology Test Answers

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Foothill College Fall 2014 Math My Way Math 230/235 MTWThF 10:00-11:50 (click on Math My Way tab) Math My Way Instructors:

ASTRONOMY 2801A: Stars, Galaxies & Cosmology : Fall term

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

ECO 2013: PRINCIPLES OF MACROECONOMICS Spring 2017

Learning From the Past with Experiment Databases

Phys4051: Methods of Experimental Physics I

Physics 270: Experimental Physics

Probability and Statistics Curriculum Pacing Guide

IPHY 3410 Section 1 - Introduction to Human Anatomy Lecture Syllabus (Spring, 2017)

MAR Environmental Problems & Solutions. Stony Brook University School of Marine & Atmospheric Sciences (SoMAS)

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

San José State University

POLSC& 203 International Relations Spring 2012

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Astronomy/Physics 1404 Introductory Astronomy II Course Syllabus

Bittinger, M. L., Ellenbogen, D. J., & Johnson, B. L. (2012). Prealgebra (6th ed.). Boston, MA: Addison-Wesley.

Assignment 1: Predicting Amazon Review Ratings

COMS 622 Course Syllabus. Note:

Lecture 1: Basic Concepts of Machine Learning

THE GEORGE WASHINGTON UNIVERSITY Department of Economics. ECON 1012: PRINCIPLES OF MACROECONOMICS Prof. Irene R. Foster

Course Syllabus. Alternatively, a student can schedule an appointment by .

The Policymaking Process Course Syllabus

Lecture 1: Machine Learning Basics

CS 100: Principles of Computing

Food Products Marketing

Ryerson University Sociology SOC 483: Advanced Research and Statistics

This course has been proposed to fulfill the Individuals, Institutions, and Cultures Level 1 pillar.

MGT/MGP/MGB 261: Investment Analysis

ALEKS. ALEKS Pie Report (Class Level)

CSL465/603 - Machine Learning

Math 96: Intermediate Algebra in Context

Probability and Game Theory Course Syllabus

General Physics I Class Syllabus

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Daily Language Review Grade 5 Answers

POFI 1301 IN, Computer Applications I (Introductory Office 2010) STUDENT INFORMANTION PLAN Spring 2013

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

COURSE WEBSITE:

MKT ADVERTISING. Fall 2016

San José State University Department of Marketing and Decision Sciences BUS 90-06/ Business Statistics Spring 2017 January 26 to May 16, 2017

ECO 2013-Principles of Macroeconomics

Hierarchical Linear Models I: Introduction ICPSR 2015

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Firms and Markets Saturdays Summer I 2014

BA 130 Introduction to International Business

Introduction to Personality Daily 11:00 11:50am

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

AU MATH Calculus I 2017 Spring SYLLABUS

ECON492 Senior Capstone Seminar: Cost-Benefit and Local Economic Policy Analysis Fall 2017 Instructor: Dr. Anita Alves Pena

LIN 6520 Syntax 2 T 5-6, Th 6 CBD 234

I. PREREQUISITE For information regarding prerequisites for this course, please refer to the Academic Course Catalog.

Class Tuesdays & Thursdays 12:30-1:45 pm Friday 107. Office Tuesdays 9:30 am - 10:30 am, Friday 352-B (3 rd floor) or by appointment

Answer Key Applied Calculus 4

Probabilistic Latent Semantic Analysis

SOUTHERN MAINE COMMUNITY COLLEGE South Portland, Maine 04106

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Texas A&M University - Central Texas PSYK PRINCIPLES OF RESEARCH FOR THE BEHAVIORAL SCIENCES. Professor: Elizabeth K.

Transcription:

DS 502/MA 543 STATISTICAL METHODS FOR DATA SCIENCE This course surveys the statistical methods most useful in data science applications. Topics covered include predictive modeling methods, including multiple linear regression, and time series; data dimension reduction; discrimination and classification methods, clustering methods; and committee methods. Students will implement these methods using statistical software. Prerequisites: Statistics at the level of MA 2611 and MA2612 and linear algebra at the level of MA 2071. Where and When Tuesdays and Thursdays from 4:00pm-5:15pm - SL105 Instructor information Prof. Randy Paffenroth Office location: AK124 Office hours: 5:30pm-6:30pm on Tuesdays and Thursdays (right after class). Other times are available by appointment, and walk-ins are always welcome if I am around and not otherwise indisposed. Best ways to contact me: WPI email: rcpaffenroth@wpi.edu Office phone: (508) 831-6562 I should be able to turn around email questions relatively quickly 9am-5pm, Monday-Friday. My availability at night and on weekends is more limited and I certainly check my email far more infrequently, but you may feel free to try and contact me. Teaching Assistant/Grader TBD

High level course goals and learning objectives By the end of the class you should be able to: Use tools such as Linear Regression, Logistic Regression, Trees, etc. for making predictions from data. Explain the pros and cons of various approaches. Avoid common pitfalls such as overfitting and data snooping. Given a prediction generated from such a method, be able to assess the validity of the prediction. Diagnose what can go wrong with a prediction. Recommended background for course The recommended background for the course are statistics at the level of MA 2611 and MA2612 and linear algebra at the level of MA 2071. In particular, you will need to know some linear algebra: Vectors (that they can represent points in space, column vs. row, etc.) Matrices (transposes, that they don t commute, etc.) Inner products Least squares How to solve linear systems etc. You will also need to know some probability and statistics Random variables (what they represent, etc.) Descriptive statistics (mean, variance, etc.) Hypothesis testing Estimation and prediction etc. You will need to be able get your hands dirty playing with, processing, and plotting data using the R computer language! The textbook uses R, the homework uses R, and that will be the officially supported language for the course and all lecture examples will be in R. Now, with that being said, this is not intended to be a programming course (i.e., your code will not be graded), but actually working with data will be extremely important (i.e., the results of the code will be graded)! Textbook An Introduction to Statistical Learning Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani If you have access to the WPI library then a PDF of the book can be downloaded for free from Springer. Just search for the title at the WPI library web page and then click on the ebook version.

Recommended texts Other texts that would be useful for the course are: Linear Algebra and Its Applications, by David Lay. This has been used as the textbook for MA2071 (one of the requirements for the course). Applied Statistics for Engineers and Scientists, by Joseph Petruccelli, Balgobin Nandram, and Minghui Chen. This has been the textbook for MA2611 and MA2612 (the other requirement for the course). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. This is the big brother of our textbook, and a great resource that covers a lot of interesting material. Learning From Data, by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin. This book is used in the Caltech Learning from Data course and does a great job covering things like cross validation and VC dimension. Learning R: A Step-by-Step Function Guide to Data Analysis By Richard Cotton O'Reilly Media, September 2013 Evaluation/Grades Final grades will be determined based upon the following breakdown: Homeworks (5 assignments, 2 person teams) 20% Midterm exam 20% Final project (3-5 person teams) 30% Final exam 30% The midterm exam and final exam will be in class, cumulative, and open note, but no collaboration will be allowed and the exams be graded based upon demonstrated understanding of key concepts. For each exam, you are allowed to bring in up to four (4) 8 ½ by 11 sheets of paper (either printed or handwritten) with whatever notes you want for the exam. The homework problems will be performed in groups of at most two and will be graded for demonstrated understanding of key concepts and quality of presentation. You can choose your own teammate, but team changes will need to be approved by Prof. Paffenroth. The final project will be performed in groups of 3-5 and will be graded based upon the quality and completeness of a final presentation and final report. I reserve the right to curve the final grades (either up or down) based upon the aggregate performance of the class. Make-up Exam Policy Make-up exams will only be allowed in the event of a documented emergency or religious observance. The exam dates are listed on the syllabus and you are responsible for avoiding conflicts with the exams.

Late Assignment Policy In general, late assignments will either not be accepted or, at best, be heavily penalized (50% of possible points). If an emergency arises or you know in advance about a conflict please let Prof. Paffenroth know as soon as possible. Collaboration and Academic Honesty Policy Collaboration is prohibited on the exams. Collaboration is encouraged on homeworks and the final project. Homeworks will be conducted in teams of one or two. You will also be allowed to select your own teams of 3-5 for the final project. On homeworks you may discuss problems across teams, but each homework team is responsible for generating solutions and writing up results on their own from scratch. On the final project, each of the teams will be using their own data sets, but the same collaboration policy applies. All violations of the collaboration policy will be handled in accordance with the WPI Academic Honesty Policy. As examples, each of the following would be a violation of the collaboration policy (this list is not exhaustive): Two different homework teams share a solution to any assigned problem. One homework or project team allows another homework or project team to copy any part of a solution to an assigned problem. Any code or plots are shared between homework or project teams. As examples, each of the following would not be a violation of the collaboration policy: Students within a team sharing solutions and code for a problem. Students from different teams discussing an assignment at the level of goals, where ideas for solutions can be found in the book or notes, what parts are more challenging, or how one might approach the problem. Of course, you can ask Prof. Paffenroth any questions you like, show him code, etc. If there is any doubt as to what is allowed and what is not allowed, please just ask!

Schedule On this schedule the homework, exam, and final project dates are fixed. On the other hand, I reserve the right to change the order and content of lectures to improve the learning experience for the course. I will ensure that the homeworks and exams match the material actually covered. Tuesday Class 1&2 January 17 & 19 Course introduction Section 2.1 Section 2.2 Class 3&4 January 24 & 26 Linear Regression 1 Section 3.1 Section 3.2 HW 1 assigned Class 4&5 January 31 & February 2 Linear Regression 2 Section 3.3 Section 3.4 Section 3.5 Time series methods Class 6&7 February 7 & 9 HW 1 due Classification Section 4.1 Section 4.2 Section 4.4 Section 4.5 HW 2 assigned Class 7&8 February 14 & 16 Resampling Section 5.1 Section 5.2 Class 9&10 February 21 & 23 HW 2 due Model Selection and Regularization Section 6.1 Section 6.2 HW 3 assigned Project definition assigned Class 11&12 February 28 & March 2 Review for the midterm Midterm exam March 7 & 9 Term break

Class 13&14 March 14 & 16 HW 3 due Dimension Reduction Section 6.3 Section 6.4 Johnson- Lindenstrauss/concentration of measure HW 4 assigned Class 15&16 March 21 & 23 Project proposals due Nonlinear methods Section 7.1 Section 7.4 Section 7.5 Section 7.7 Class 17&18 March 28 & 30 HW 4 due Tree methods Section 8.1 Section 8.2 HW 5 assigned Class 19&20 April 4 & 6 SVM Section 9.1 Section 9.2 Section 9.3 Class 21&22 April 11 & 13 HW 5 due Unsupervised Learning Section 10.2 Section 10.3 Non-linear dimension reduction Class 23&24 April 18, 20, & 25 Special topics Project presentations/posters Project report due Class 14 April 7 & May 2 Review for the final Final exam

Accommodation for Special Needs or Disabilities If you need course adaptations or accommodations because of a disability, or if you have medical information to share with me, please make an appointment with me as soon as possible. If you have not already done so, students with disabilities who believe that they may need accommodations in this class are encouraged to contact the Office of Disability Services as soon as possible to ensure that such accommodations are implemented in a timely fashion. This office is located in the West St. House (157 West St), (508) 831-4908. Accommodation for Religious Observance Students requiring accommodation for religious observance must make alternate arrangements with Prof. Paffenroth at least one week before the date in question. Personal Emergencies In the event of a medical or family emergency, please contact Prof. Paffenroth to work out appropriate accommodations.