Carnegie Mellon University Machine Learning for Problem Solving Spring 2019

Similar documents
CS 100: Principles of Computing

(Sub)Gradient Descent

Python Machine Learning

Course Content Concepts

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

CSL465/603 - Machine Learning

Foothill College Summer 2016

Office Hours: Mon & Fri 10:00-12:00. Course Description

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

Class Mondays & Wednesdays 11:00 am - 12:15 pm Rowe 161. Office Mondays 9:30 am - 10:30 am, Friday 352-B (3 rd floor) or by appointment

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

INTRODUCTION TO SOCIOLOGY SOCY 1001, Spring Semester 2013

95723 Managing Disruptive Technologies

Instructor Dr. Kimberly D. Schurmeier

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

GEOG 473/573: Intermediate Geographic Information Systems Department of Geography Minnesota State University, Mankato

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Accounting 380K.6 Accounting and Control in Nonprofit Organizations (#02705) Spring 2013 Professors Michael H. Granof and Gretchen Charrier

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Data Structures and Algorithms

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

INTERMEDIATE ALGEBRA Course Syllabus

BA 130 Introduction to International Business

Social Media Marketing BUS COURSE OUTLINE

MTH 141 Calculus 1 Syllabus Spring 2017

Bittinger, M. L., Ellenbogen, D. J., & Johnson, B. L. (2012). Prealgebra (6th ed.). Boston, MA: Addison-Wesley.

Spring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209. SYLLABUS and COURSE INFORMATION.

Learning From the Past with Experiment Databases

COURSE WEBSITE:

Class Tuesdays & Thursdays 12:30-1:45 pm Friday 107. Office Tuesdays 9:30 am - 10:30 am, Friday 352-B (3 rd floor) or by appointment

Instructor Experience and Qualifications Professor of Business at NDNU; Over twenty-five years of experience in teaching undergraduate students.

ASTRONOMY 2801A: Stars, Galaxies & Cosmology : Fall term

CHEM 6487: Problem Seminar in Inorganic Chemistry Spring 2010

Lecture 1: Machine Learning Basics

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

ECON492 Senior Capstone Seminar: Cost-Benefit and Local Economic Policy Analysis Fall 2017 Instructor: Dr. Anita Alves Pena

CS Course Missive

THE GEORGE WASHINGTON UNIVERSITY Department of Economics. ECON 1012: PRINCIPLES OF MACROECONOMICS Prof. Irene R. Foster

Theory of Probability

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

ACC 362 Course Syllabus

Math 181, Calculus I

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Generative models and adversarial training

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

HCI 440: Introduction to User-Centered Design Winter Instructor Ugochi Acholonu, Ph.D. College of Computing & Digital Media, DePaul University

Department of Statistics. STAT399 Statistical Consulting. Semester 2, Unit Outline. Unit Convener: Dr Ayse Bilgin

*In Ancient Greek: *In English: micro = small macro = large economia = management of the household or family

Office Hours: Day Time Location TR 12:00pm - 2:00pm Main Campus Carl DeSantis Building 5136

School: Business Course Number: ACCT603 General Accounting and Business Concepts Credit Hours: 3 hours Length of Course: 8 weeks Prerequisite: None

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

Navigating the PhD Options in CMS

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

BIOH : Principles of Medical Physiology

COMM370, Social Media Advertising Fall 2017

Astronomy/Physics 1404 Introductory Astronomy II Course Syllabus

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

CS 101 Computer Science I Fall Instructor Muller. Syllabus

SYLLABUS: RURAL SOCIOLOGY 1500 INTRODUCTION TO RURAL SOCIOLOGY SPRING 2017

IST 440, Section 004: Technology Integration and Problem-Solving Spring 2017 Mon, Wed, & Fri 12:20-1:10pm Room IST 202

Medical Terminology - Mdca 1313 Course Syllabus: Summer 2017

Course Syllabus It is the responsibility of each student to carefully review the course syllabus. The content is subject to revision with notice.

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Math 96: Intermediate Algebra in Context

Introduction to Forensic Drug Chemistry

Course Policies and Syllabus BUL3130 The Legal, Ethical, and Social Aspects of Business Syllabus Spring A 2017 ONLINE

Course Syllabus p. 1. Introduction to Web Design AVT 217 Spring 2017 TTh 10:30-1:10, 1:30-4:10 Instructor: Shanshan Cui

MAT 122 Intermediate Algebra Syllabus Summer 2016

Page 1 of 8 REQUIRED MATERIALS:

PHY2048 Syllabus - Physics with Calculus 1 Fall 2014

CRITICAL THINKING AND WRITING: ENG 200H-D01 - Spring 2017 TR 10:45-12:15 p.m., HH 205

Assignment 1: Predicting Amazon Review Ratings

MGT 136 Advanced Accounting

CS/SE 3341 Spring 2012

MATH 205: Mathematics for K 8 Teachers: Number and Operations Western Kentucky University Spring 2017

KOMAR UNIVERSITY OF SCIENCE AND TECHNOLOGY (KUST)

Foothill College Fall 2014 Math My Way Math 230/235 MTWThF 10:00-11:50 (click on Math My Way tab) Math My Way Instructors:

Financial Accounting Concepts and Research

General Microbiology (BIOL ) Course Syllabus

BUS Computer Concepts and Applications for Business Fall 2012

Course Syllabus for Math

COMMUNICATION AND JOURNALISM Introduction to Communication Spring 2010

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

arxiv: v1 [cs.lg] 15 Jun 2015

CS177 Python Programming

ENVR 205 Engineering Tools for Environmental Problem Solving Spring 2017

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

SYLLABUS- ACCOUNTING 5250: Advanced Auditing (SPRING 2017)

ACC : Accounting Transaction Processing Systems COURSE SYLLABUS Spring 2011, MW 3:30-4:45 p.m. Bryan 202

PSCH 312: Social Psychology

International Environmental Policy Spring :374:315:01 Tuesdays, 10:55 am to 1:55 pm, Blake 131

Name: Giovanni Liberatore NYUHome Address: Office Hours: by appointment Villa Ulivi Office Extension: 312

STA2023 Introduction to Statistics (Hybrid) Spring 2013

Transcription:

95-828 MLPS http://www.andrew.cmu.edu/user/lakoglu/courses/95828/index.htm 1 of 2 1/11/2019, 10:59 AM Carnegie Mellon University 95-828 Machine Learning for Problem Solving Spring 2019 CLASS MEETS: There are two sections of the course offered in Spring 2019. Time: Section A: Tue & Thu 9:00AM - 10:20AM Section B: Tue & Thu 10:30AM - 11:50AM Place: Both sections in HBH A301 WEEKLY RECITATION: Time: Fri 10:30AM - 11:50AM Place: HBH A301 PEOPLE: Instructor: Leman Akoglu Office hours: THU 12-1PM; also, by appointment Office: HBH 2118C, office ph. 412-268-30 four three Email: invert (cs.cmu.edu @ lakoglu) Teaching Assistants: (Note: all office hours in HBH TA room 3034) Shubhranshu Shekhar Office hours: TBD Email: invert (andrew.cmu.edu @ shubhras) Hung Nguyen Office hours: TBD Email: invert (andrew.cmu.edu @ hungnguy) Graders: Asparsh Chandera Email: invert (andrew.cmu.edu @ asparshc) Office hours: by appointment Jie Lou Email: invert (andrew.cmu.edu @ jlou1) Office hours: by appointment Kaichen Chen Email: invert (andrew.cmu.edu @ kaichenc) Office hours: by appointment Deepak Raj Subramanian Email: invert (andrew.cmu.edu @ dsubrama) Office hours: by appointment COURSE DESCRIPTION: Machine Learning (ML) is centered around automated methods that improve their own performance through learning patterns in data, and then using the uncovered patterns to predict the future and make decisions. ML is heavily used in a wide variety of domains such as business, finance, healthcare, security, etc. for problems including display advertising, fraud detection, disease diagnosis and treatment, face/speech recognition, automated navigation, to name a few. "If I had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions." -- Albert Einstein "A problem well put is half solved." -- John Dewey This course is designed to give a graduate-level student a thorough grounding in the methodologies, technologies, and best practices used in machine learning. The main premise of the course is to equip students with the intuitive understanding of machine learning concepts grounded in real-world applications. The course is to help students gain the practical knowledge and experience necessary for recognizing and formulating machine learning problems in the wild, as well as of applying machine learning techniques effectively in practice. The emphasis will be on learning and practicing the machine learning process, more than learning theory. "All models are wrong, but some models are useful." -- George Box As there exists no universally best model, we will cover a wide range of different models and learning algorithms, which have varying speed-accuracy-interpretability tradeoffs. In particular, the topics include supervised learning: linear models, decision trees, ensemble methods, kernel methods, nonparametric learning, and unsupervised learning: density estimation, clustering, and dimensionality reduction. The class will include biweekly homework each containing a mini-project (i.e., a problem solving assignment that involves programming) in addition to other conceptual and technical questions, a midterm, a final exam, and a case study at the end of the course. The case study will give students a chance to dig into a substantial problem using a large dataset and apply machine

95-828 MLPS http://www.andrew.cmu.edu/user/lakoglu/courses/95828/index.htm 2 of 2 1/11/2019, 10:59 AM learning tools they have learned throughout the course. Prerequisites This course does not assume any prior exposure to machine learning theory or practice. Students are expected to have the following background: Basic knowledge of probability Basic knowledge of linear algebra Basic programming skills Familiarity with Python programming and basic use of NumPy, pandas and matplotlib. Learning Objectives By the end of this class, students will learn the main concepts, methodologies, and tools for machine learning be able to recognize machine learning tasks in real-world problems develop the critical thinking for comparing and contrasting models for a given task learn the best practices for reliably performing model selection and evaluation gain experience with implementing ML solutions in Python and applying them to various real world datasets BULLETIN BOARD and other info For course materials, assignments, announcements, and grades please see the Canvas. For questions and discussions please use Piazza. Here is the link to signup. Carnegie Mellon 2018-2019 Official academic calendar TEXTBOOK: There is no required textbook for the course. I will post course notes and slides for each lecture as well as some code examples (Jupyter notebooks) on Canvas. See Resources for a list of recommended books that could help supplement your understanding of the course material. MISC - FUN: Fake (ML) protest Last updated by Leman Akoglu, 2019

95-828 MLPS http://www.andrew.cmu.edu/user/lakoglu/courses/95828/syllabus.htm 1 of 2 1/11/2019, 11:00 AM Carnegie Mellon University 95-828 Machine Learning for Problem Solving Spring 2019 Course Syllabus (download as pdf) LECTURES: I will provide course notes as well as slides for each lecture. Those will be uploaded to Canvas before the lecture. Feel free to print them and bring them to class with you for annotating. You may also benefit from the recommended books (listed under Resources, see left tab) to further your understanding. To stay on track, make sure to read the course notes in a timely fashion, and follow up with questions in lectures, office hours, recitations, and/or Piazza. RECITATIONS: There will be a recitation session held by one of the TAs on Fridays 5:30-7pm. The recitation will review the week's material and answer any questions you might have about the course material, including homework. Week Lectures Notes Week 1 INTRO TO MACHINE LEARNING [+] HW 0 out Python and Jupyter setup DATA PREPARATION [+] Recitation 1 Python setup Data prep demos Linear Algebra review Week 2 PART I: SUPERVISED LEARNING LINEAR REGRESSION (LR) [+] Recitation 2 Linear Regression review and demos Convex optimization basics Week 3 MODEL SELECTION [+] HW 1 out EDA LR Model selection LogR LOGISTIC REGRESSION (LogR) [+] Recitation 3 Cross-validation LogR Gradient descent Week 4 Week 5 Week 6 Week 7 NON-PARAMETRIC LEARNING [+] MODEL EVALUATION [+] DECISION TREES (DT) [+] ENSEMBLE METHODS [+] Recitation 4 Non-parametric learning knn Local regression HW 2 out Non-parametric learning Model evaluation DT Recitation 5 Model evaluation Recitation 6 DT Random Forest NAIVE BAYES (NB) [+] Recitation 7 Boosting NB Week 8 Midterm Review Midterm Exam Exam will be during class on Thur. Duration: 80 minutes. Open class notes, slides, and homework solutions. No electronics. Week 9 NO CLASS: Spring Break Week 10 SUPPORT VECTOR MACHINES (SVM) [+] HW 3 out Ensembles NB SVM

95-828 MLPS http://www.andrew.cmu.edu/user/lakoglu/courses/95828/syllabus.htm 2 of 2 1/11/2019, 11:00 AM Recitation 8 SVM and Kernels Week 11 NEURAL NETWORKS (NN) [+] Recitation 9 NNs Backpropagation Week 12 Week 13 DEEP LEARNING [+] PART II: UNSUPERVISED LEARNING DENSITY ESTIMATION [+] HW 4 out Kernels Neural Nets Density estimation Recitation 10 Deep learning Density estimation Thur NO CLASS: Spring carnival Friday NO RECITATION Week 14 Week 15 CLUSTERING [+] DIMENSIONALITY REDUCTION [+] HW 5 out Clustering EM Dimensionality reduction Recitation 11 Clustering k-means EM Recitation 12 Dim. reduction Week 16 RECOMMENDER SYSTEMS [+] Case Study & Final Review Case Study out Dataset provided, Tasks recommended Recitation 13 Recommender systems Case Study review Final Q&A Last modified by Leman Akoglu, 2019

95-828 MLPS http://www.andrew.cmu.edu/user/lakoglu/courses/95828/assignments.htm 1 of 2 1/11/2019, 11:00 AM Carnegie Mellon University 95-828 Machine Learning for Problem Solving Spring 2019 Coursework Coursework consist of (grading in parentheses): 5 Homework (9% each) 1 Midterm exam (15%) 1 Final exam (25%) 1 Case Study (15%) HOMEWORK: Homework will be posted on Canvas. Each homework will consist of two parts: (1) a set of conceptual questions, and (2) programming. For the programming part, we will provide a code template (and sometimes partial code as well) in a Jupyter notebook. You will have two weeks to complete each homework assignment. Getting help: You can visit the instructor and the TAs during office hours as well as post questions on Piazza to get help on the assignments. Regarding help from fellow students, see the note on collaboration below. Collaboration: All assignments are to be written individually, except the Case Study (which can be done in pairs). Collaboration and study groups are allowed and encouraged. However, each student should submit their own write-up. Please see the collaboration policy for details. Submitting: We ask that you submit two files per homework: (1) a pdf file with your answers to the conceptual questions, and (2) the Jupyter notebook we provide as a template with all your code that you filled in. Both files (.pdf and.ipynb) are to be uploaded electronically only on Canvas (no hard copy print outs). Homework assignments are due at the beginning of the class on the day it is due. You can upload your files multiple times, but note that we will use the latest upload date as the submission date, which may factor into your slip days accordingly. Please see the late submission policy for details. IMPORTANT DATES: Assignment Note Out Due Weight Homework 0 Setting up Python and Jupyter Jan 15 n/a 0% Homework 1 EDA, LR, Model selection, LogR Jan 29 Feb 12 9% Homework 2 Model evaluation, Non-parametric learning, DT Feb 14 Feb 28 9% Midterm Exam (in class) Mar 7 -- 15% Homework 3 Ensemble models, NB, SVM Mar 19 Apr 2 9% Homework 4 Kernels, Neural nets, Density estimation Apr 2 Apr 16 9% Homework 5 Clustering, EM, Dimensionality reduction Apr 16 Apr 30 9% Case Study (take home) Apr 30 May 7 15% EXAMS: Final Exam TBD -- 25% There will be a midterm exam (in class) and a final exam (to be scheduled by the University). Note: Both the midterm and the final will be open book, notes, homework solutions, etc., but you are not allowed to use a computer or any other electronics. The tentative dates are posted above, the finalized dates will be announced during the semester. CASE STUDY: Starting the second half of the course (after Spring break), we will provide you with information describing a large dataset and a list of potential problems to solve using this dataset. We will also release the dataset after Spring break. You will be given the second half of the semester to complete your analysis and modeling on the data. Particularly, you will be expected to carefully choose to apply the techniques and tools you have learned throughout the course to address the problems of your interest using machine learning. Evaluation: We will assess your case study outcomes in terms of your analytical approach to the problems, and not only based on the quality of your results. That is, the emphasis will be on evaluating how methodical you were in your analysis in terms of the tools you chose to apply, in the way you draw conclusions from your own results, and the sequence of steps you took based on your analyses and intermediate results. We will also

95-828 MLPS http://www.andrew.cmu.edu/user/lakoglu/courses/95828/assignments.htm 2 of 2 1/11/2019, 11:00 AM assess if you used the best practices in building your solutions, including proper model selection, model comparisons to appropriate baselines, choice of evaluation metrics, and so on. Submitting: The Case Study can be done in groups of up to two students. You are asked to submit a single Jupyter notebook on Canvas, composed of all your code and results. Last modified by Leman Akoglu, 2019

95-828 MLPS http://www.andrew.cmu.edu/user/lakoglu/courses/95828/policy.htm 1 of 2 1/11/2019, 11:00 AM Carnegie Mellon University 95-828 Machine Learning for Problem Solving Spring 2019 Course Policies LECTURES All devices such as laptops, cell phones, noisy PDAs, etc. should be turned off for the duration of the lectures and the recitations, because they may distract other fellow students. Students who would like to use their laptops during the course are strongly encouraged to sit at the backmost row of the classroom. Please come to all lectures on time and leave on time, again so that there are no distractions to the classmates. PRE-REQUISITES This course does not assume any prior exposure to machine learning theory or practice. Students are expected to have the following background: Basic knowledge of probability Basic knowledge of linear algebra Working knowledge of basic computing principles Familiarity with Python programming and basic use of NumPy, pandas and matplotlib ASSIGNMENTS Assignments are due at the * beginning of lecture * on the due date. The due date of assignments are posted at the assignments page. Assignments will be posted on Canvas. Students should submit their homework solutions (a pdf file with answers to conceptual questions and a Jupyter notebook with answers to programming questions) only electronically via Canvas (no print outs). Important Note: As we reuse problem set questions, covered by papers and webpages, we expect the students not to copy, refer to, or look at the solutions in preparing their answers. Since this is a graduate-level class, we expect students to want to learn and not google for answers. The purpose of problem sets in this class is to help you think about the material, not just give us the right answers. Therefore, please restrict attention to the class notes, slides, and the supplementary books mentioned on the resources page when solving problems on the problem set. If you do happen to use other material, it must be acknowledged clearly with a citation on the submitted solution. Questions and Re-grade requests You should use Piazza for all your questions about the assignments and the course material. Instructor and TA(s) will do their best to answer your questions timely. Regrade requests should be done in writing/email, within 2 days after graded assignments are distributed to the grader students specified on the front page (see Graders under People), and specifying the question under dispute (e.g., 'HW1-Q.2.b') the extra points requested (e.g., '2 points out of 5') and the justification (e.g., 'I forgot to divide by variance, but the rest of my answer was correct') In the remote case there is no satisfactory resolution, please contact the instructor. Homework Grading and Solutions All homework will be graded online through Canvas. Graders will provide comments and feedback on the deductions they have made accordingly. We will post solutions to the assignments on Canvas, 4 days after the due date (to account for students using slip days, see below). Late submission policy No delay penalties, for medical/family/etc. emergencies (bring written documentation, like doctor's note). Each student is granted '4 slip days' total for the whole course duration, to accommodate for coinciding deadlines/interviews/etc. That is, no questions asked, if the total delay is 4 days or less. You can use the extension on any assignment during the course (unless otherwise stated). For instance, you can hand in one assignment 4 days late, or 4 different assignments 1 day late each. Late days are rounded up to the nearest integer. For example, a submission that is 4 hours late will count as 1 day late. After you have used up your slip days, any assignment handed in late will be marked off 25% per day of delay. To use slip days:

95-828 MLPS http://www.andrew.cmu.edu/user/lakoglu/courses/95828/policy.htm 2 of 2 1/11/2019, 11:00 AM Collaboration policy upload your homework solutions on Canvas to mark the time of submission You can upload your modified files multiple times at different points in time. However, please note that we will use your latest upload date as the date of submission, even if you have modified only a small part of your files. You are encouraged to discuss homework problems with your fellow students. However, the work you submit must be your own. You must acknowledge in your submission any help received on your assignments. That is, you must include a comment in your homework submission that clearly states the name of the student, book, or online reference from which you received assistance. Submissions that fail to properly acknowledge any help from other students or non-class sources will receive NO credit. Copied work will receive NO credit. Any and all violations will be reported to the Heinz College administration and may appear in the student's transcript. Academic integrity All students are expected to comply with CMU's policy on academic integrity. Please read the policy and make sure you have a complete understanding of it. EMAIL Piazza should be used for general course and assignment related questions. For other types of questions (e.g., to report illness, request various permissions) please contact the instructor directly via email. Please make sure to include '95828' in the subject line of your email. AUDITING Auditing is not allowed. Only those students who are officially enrolled to take the course for credit are allowed to sit in class. Last modified by Leman Akoglu, 2019