CSC 411/2515 MACHINE LEARNING and DATA MINING

Similar documents
Python Machine Learning

CSL465/603 - Machine Learning

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

(Sub)Gradient Descent

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Syllabus Foundations of Finance Summer 2014 FINC-UB

Learning From the Past with Experiment Databases

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

CS 100: Principles of Computing

Reducing Features to Improve Bug Prediction

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

Lecture 1: Basic Concepts of Machine Learning

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Generative models and adversarial training

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

Firms and Markets Saturdays Summer I 2014

ME 4495 Computational Heat Transfer and Fluid Flow M,W 4:00 5:15 (Eng 177)

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS Machine Learning

Course Syllabus for Calculus I (Summer 2017)

CALCULUS III MATH

Universidade do Minho Escola de Engenharia

Human Emotion Recognition From Speech

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics


Speech Emotion Recognition Using Support Vector Machine

Introduction to Personality Daily 11:00 11:50am

Time series prediction

CS/SE 3341 Spring 2012

IST 440, Section 004: Technology Integration and Problem-Solving Spring 2017 Mon, Wed, & Fri 12:20-1:10pm Room IST 202

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Syllabus ENGR 190 Introductory Calculus (QR)

*In Ancient Greek: *In English: micro = small macro = large economia = management of the household or family

Laboratorio di Intelligenza Artificiale e Robotica

Lecture 10: Reinforcement Learning

Welcome to. ECML/PKDD 2004 Community meeting

Switchboard Language Model Improvement with Conversational Data from Gigaword

Self Study Report Computer Science

CS177 Python Programming

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Indian Institute of Technology, Kanpur

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

Australian Journal of Basic and Applied Sciences

General Microbiology (BIOL ) Course Syllabus

arxiv: v1 [cs.lg] 15 Jun 2015

Assignment 1: Predicting Amazon Review Ratings

Ryerson University Sociology SOC 483: Advanced Research and Statistics

MGT/MGP/MGB 261: Investment Analysis

EEAS 101 BASIC WIRING AND CIRCUIT DESIGN. Electrical Principles and Practices Text 3 nd Edition, Glen Mazur & Peter Zurlis

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

SPM 5309: SPORT MARKETING Fall 2017 (SEC. 8695; 3 credits)

BUSI 2504 Business Finance I Spring 2014, Section A

A Case Study: News Classification Based on Term Frequency

Control Tutorials for MATLAB and Simulink

Foothill College Summer 2016

Office Hours: Mon & Fri 10:00-12:00. Course Description

STA 225: Introductory Statistics (CT)

BUAD 425 Data Analysis for Decision Making Syllabus Fall 2015

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

Rule Learning With Negation: Issues Regarding Effectiveness

MAR Environmental Problems & Solutions. Stony Brook University School of Marine & Atmospheric Sciences (SoMAS)

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Softprop: Softmax Neural Network Backpropagation Learning

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

Data Structures and Algorithms

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

Axiom 2013 Team Description Paper

MTH 215: Introduction to Linear Algebra

Semi-Supervised Face Detection

International Business Principles (MKT 3400)

CS 446: Machine Learning

EGRHS Course Fair. Science & Math AP & IB Courses

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Probabilistic Latent Semantic Analysis

B. How to write a research paper

arxiv: v2 [cs.cv] 30 Mar 2017

Course Specifications

MGMT 5303 Corporate and Business Strategy Spring 2016

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Computer Science 1015F ~ 2016 ~ Notes to Students

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

ACCT 100 Introduction to Accounting Course Syllabus Course # on T Th 12:30 1:45 Spring, 2016: Debra L. Schmidt-Johnson, CPA

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Please read this entire syllabus, keep it as reference and is subject to change by the instructor.

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Mathematics Scoring Guide for Sample Test 2005

COMPUTER SCIENCE GRADUATE STUDIES Course Descriptions by Research Area

Applications of data mining algorithms to analysis of medical data

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Grading Policy/Evaluation: The grades will be counted in the following way: Quizzes 30% Tests 40% Final Exam: 30%

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Transcription:

CSC 411/2515 MACHINE LEARNING and DATA MINING Lectures: Mon 11-1pm (S1), Wed 11-1pm (S2), Thu 4-6pm (S3), Fri 11-1pm (S4) Lecture Room: AH 400 (S1), MS 2170 (S2), KP 108 (S3), MS 2172 (S4) Instructor: Raquel Urtasun (S1,2), Richard Zemel (S3,4) <csc411prof@cs.toronto.edu> Office hours: Raquel Urtasun: TBA Pratt 290 E. Richard Zemel: TBA Pratt 290D. TA email: <csc411ta@cs.toronto.edu> Tutorials: Mon 3-4pm (S1), Wed 3-4pm (S2), Thu 6-7pm (S3), Fri 3-4pm (S4) Tutorial Room: AH 400 (S1), BA 1170 (S2), KP 108 (S3), MS 2172 (S4) Class URL: www.cs.toronto.edu/ urtasun/courses/csc411/csc411 Fall16.html Overview Machine learning research aims to build computer systems that learn from experience. Learning systems are not directly programmed by a person to solve a problem, but instead they develop their own program based on examples of how they should behave, or from trial-and-error experience trying to solve the problem. These systems require learning algorithms that specify how the system should change its behavior as a result of experience. Researchers in machine learning develop new algorithms, and try to understand which algorithms should be applied in which circumstances. Machine learning is an exciting interdisciplinary field, with historical roots in computer science, statistics, pattern recognition, and even neuroscience and physics. In the past 10 years, many of these approaches have converged and led to rapid theoretical advances and real-world applications. This course will focus on the machine learning methods that have proven valuable and successful in practical applications. This course will contrast the various methods, with the aim of explaining the circumstances under which each is most appropriate. We will also discuss basic issues that confront any machine learning method. Pre-requisites You should understand basic probability and statistics, (STA 107, 250), and college-level algebra and calculus. For example it is expected that you know about standard probability distributions (Gaussians, Poisson), and also how to calculate derivatives. Knowledge of linear algebra is also expected, and knowledge of mathematics underlying probability models (STA 255, 261) will be useful. For the programming assignments, you should 1

have some background in programming (CSC 270), and it would be helpful if you know Python. Readings There is no required textbook for this course. There are several recommended books. We will recommend specific sections from: Pattern Recognition and Machine Learning by Chris Bishop. We will also recommend other readings. Course requirements and grading The format of the class will be lecture, with some discussion. I strongly encourage interaction and questions. There are assigned readings for each lecture that are intended to prepare you to participate in the class discussion for that day. The grading in the class will be divided up as follows: Assignments 55% Mid-Term Exam 20% Final Exam 25% There will be three assignments; the first two are worth 15% each, and the last one 25% of your grade. Homework assignments The best way to learn about a machine learning method is to program it yourself and experiment with it. So the assignments will generally involve implementing machine learning algorithms, and experimentation to test your algorithms on some data. You will be asked to summarize your work, and analyze the results, in brief (3-4 page) write ups. The implementations may be done in any language, but Python is recommended. Collaboration on the assignments is not allowed. Each student is responsible for his or her own work. Discussion of assignments and programs should be limited to clarification of the handout itself, and should not involve any sharing of pseudocode or code or simulation results. Violation of this policy is grounds for a semester grade of F, in accordance with university regulations. The schedule of assignments is included in the syllabus. Assignments are due at the beginning of class/tutorial on the due date. Because they may be discussed in class that day, it is important that you have completed them by that day. Assignments handed 2

in late but before 5 pm of that day will be penalized by 5% (i.e., total points multiplied by 0.95); a late penalty of 10% per day will be assessed thereafter. Extensions will be granted only in special situations, and you will need a Student Medical Certificate or a written request approved by the instructor at least one week before the due date. For the final assignment, we will have a bake-off: a competition between machine learning algorithms. We will give everyone some data for training a machine learning system, and you will try to develop the best method. We will then determine which system performs best on some unseen test data. Exams There will be a mid-term in class in the sixth week of class, which will be a closed book exam on all material covered up to that point in the lectures, tutorials, required readings, and assignments. The final will focus primarily on the material from the second half of the course. The exams will cover material presented in lectures, tutorials, and assignments. You will not be responsible for topics in the reading not covered in any of these. Attendance We expect students to attend all classes, and all tutorials. This is especially important because we will cover material in class that is not included in the textbook. Also, the tutorials will not only be for review and answering questions, but new material will also be covered. Electronic Communication If you have questions about the assignments, you should send email to the TA account, and cc me on it. You should include your full name in the email, and it will also be useful to include your CDF account name and/or student number. Feel free to email me with questions or comments about the material covered in the course, or other related questions. For questions about marks on the assignments, please first contact the TA. Questions about the exams should be addressed to me. 3

CLASS SCHEDULE, Part 1 Shown below are the topics for lectures and tutorials (in italics), as are the dates that each assignment will be handed out and is due. The assignments are due at 10AM on the dates listed below. The notes from each lecture and tutorial will be available on the class web-site the day of the class meeting. The assigned readings are specific sections from the book. All of these are subject to change. Number Topic Assignments L1 L2 T1 L3 L4 T2 Introduction Linear Regression Probability for ML & Linear regression Linear Classification Logistic Regression Optimization for ML L5 Nonparametric Methods Asst 1 Out (9/30) L6 T3 L7 L8 T4 Decision Trees knn & Decision Trees Multi-class Classification Probabilistic Classifiers Naive Bayes and Gaussian Bayes Classifier L9 Probabilistic Classifiers II Asst 1 In (10/14) L10 T5 L11 L12 T6 Neural Networks I Mid-term review MIDTERM Neural Networks II Neural Networks Tutorial 4

CLASS SCHEDULE, Part 2 Number Topic Assignments L13 Clustering Asst 2 Out (10/26) L14 T7 L15 T8 Mixture of Gaussians & EM Clustering PCA & Autoencoders PCA Tutorial L17 Kernels and Margins Asst 2 In (11/9) L18 Support Vector Machines T9 SVM Tutorial Asst 3 Out (11/11) L19 L20 T10 L21 L22 T11 Ensemble Methods I Ensemble Methods II Bagging & Boosting Reinforcement Learning I Reinforcement Learning II Reinforcement Learning L23 Time-Series Models Asst 3 In (11/30) L24 Final & Wrap-up 5