CSC 411: Introduction to Machine Learning

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Generative models and adversarial training

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CSL465/603 - Machine Learning

(Sub)Gradient Descent

Lecture 1: Basic Concepts of Machine Learning

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

CS Machine Learning

Office Hours: Mon & Fri 10:00-12:00. Course Description

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

CIS Introduction to Digital Forensics 12:30pm--1:50pm, Tuesday/Thursday, SERC 206, Fall 2015

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

CS 100: Principles of Computing

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Speech Emotion Recognition Using Support Vector Machine

CALCULUS I Math mclauh/classes/calculusi/ SYLLABUS Fall, 2003

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Department of Anthropology ANTH 1027A/001: Introduction to Linguistics Dr. Olga Kharytonava Course Outline Fall 2017

Assignment 1: Predicting Amazon Review Ratings

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

Financial Accounting Concepts and Research

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

THE UNIVERSITY OF WESTERN ONTARIO. Department of Psychology

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

ASTRONOMY 2801A: Stars, Galaxies & Cosmology : Fall term

Lecture 10: Reinforcement Learning

Laboratorio di Intelligenza Artificiale e Robotica

Australian Journal of Basic and Applied Sciences

A Case Study: News Classification Based on Term Frequency

Exploration. CS : Deep Reinforcement Learning Sergey Levine


ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

MKT ADVERTISING. Fall 2016

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

SPM 5309: SPORT MARKETING Fall 2017 (SEC. 8695; 3 credits)

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

MGT/MGP/MGB 261: Investment Analysis

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Learning From the Past with Experiment Databases

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Please read this entire syllabus, keep it as reference and is subject to change by the instructor.

Math 181, Calculus I

Probabilistic Latent Semantic Analysis

MTH 215: Introduction to Linear Algebra

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Welcome to. ECML/PKDD 2004 Community meeting

CS 446: Machine Learning

Coding II: Server side web development, databases and analytics ACAD 276 (4 Units)

Computer Science 1015F ~ 2016 ~ Notes to Students

JN2000: Introduction to Journalism Syllabus Fall 2016 Tuesdays and Thursdays 12:30 1:45 p.m., Arrupe Hall 222

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

Probability and Statistics Curriculum Pacing Guide

Firms and Markets Saturdays Summer I 2014

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Laboratorio di Intelligenza Artificiale e Robotica

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Syllabus ENGR 190 Introductory Calculus (QR)

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

A survey of multi-view machine learning

Intensive English Program Southwest College

COMP 3601 Social Networking Fall 2016

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Reducing Features to Improve Bug Prediction

Course Description. Student Learning Outcomes

Design and Creation of Games GAME

STA 225: Introductory Statistics (CT)

Rule Learning With Negation: Issues Regarding Effectiveness

Time series prediction

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

Introduction to Sociology SOCI 1101 (CRN 30025) Spring 2015

Artificial Neural Networks written examination

Foothill College Summer 2016

*In Ancient Greek: *In English: micro = small macro = large economia = management of the household or family

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

arxiv: v2 [cs.cv] 30 Mar 2017

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Semi-Supervised Face Detection

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

BUS Computer Concepts and Applications for Business Fall 2012

Universidade do Minho Escola de Engenharia

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

AGN 331 Soil Science. Lecture & Laboratory. Face to Face Version, Spring, Syllabus

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

CS 3516: Computer Networks

Math 96: Intermediate Algebra in Context

Teaching a Discussion Section

Transcription:

CSC 411: duction to Machine Learning Lecture 1 - duction Ethan Fetaya, James Lucas and Emad Andrews University of Toronto

Today Administration details Why is machine learning so cool?

The Team I Instructors: Ethan Fetaya S ection 1 (AH 400) Mon. 11am-1pm (tutorials Mon. 3-4pm) S ection 2 (OI 2212) Wed. 11am-1pm (tutorials Wed. 3-4pm) Office Hours : Prat 290A (for now). 11-12pm Tuesday, 8-10am Wednesday. Emad Andrews S ection 3 (MP 103) Thursday. 4-6pm (tutorials Thu. 6-7pm) Office Hours : BA3219 6-8pm Thursday James Lucas S ection 4 (RW 117) Friday. 11-1pm (tutorials Fri. 3-4pm) Office Hours : TBD email : csc411-20179-instrs@cs.toronto.edu Please send emails for administrative purposes only (e.g. medical documentations). For material-related questions, use Piazza or ask your instructor/ta in person during class or office hours. You must use an academic email account when sending us emails. Otherwise, they might be filtered as spam and deleted automatically.

The Team II TA s Eleni Triantafillou Aryan Arbabi Ladislav Rampasek Jixuan Wang Yingzhou Wu Shengyang Sun Tian Qi Chen Chris Cremer Yulia Rubanova Bowen Xu Seyed Kamyar Seyed Ghasemipour Tingwu Wang Harris Chan Bettencourt Jesse

Admin Details Liberal wrt waiving pre-requisites But it is up to you to determine if you have the appropriate background Do I have the appropriate background? Linear algebra: vector/matrix manipulations, properties. Calculus: partial derivatives/gradient. Probability: common distributions; Bayes Rule. Statistics: expectation, variance, covariance, median; maximum likelihood.

Course Information Course Website: http://www.cs.toronto.edu/~jlucas/teaching/csc411 you are expected to check course website regularly. All announcements posted are considered to have been announced to the class and not having read or seen an announcement is not an accepted reason for not following guidelines or missing deadlines The class will use Piazza for announcements and discussions: https://piazza.com/class/fall2017/csc411 First time, sign up here: https://piazza.com/utoronto.ca/csc411 Your grade does not depend on your participation on Piazza. Its just a good way for asking questions, discussing with your instructor, TAs and your peers

More on Course Information While cell phones and other electronics are not prohibited in lecture, talking, recording or taking pictures in class is strictly prohibited without the consent of your instructor. Please ask before doing! http://www.illnessverification.utoronto.ca is the only acceptable form of direct medical documentation. For accessibility services: If you require additional academic accommodations, please contact Accessibility Services as soon as possible, studentlife.utoronto.ca/as.

Course Information Textbooks: Christopher Bishop: Pattern Recognition and Machine Learning, 2006 (main textbook). Kevin Murphy: Machine Learning: a Probabilistic Perspective, 2012. David Mackay: Information Theory, Inference, and Learning Algorithms, 2003. Shai Shalev-Shwartz & Shai Ben-David: Understanding Machine Learning: From Theory to Algorithms, 2014.

Requirements (Undergrads) Do the readings! Read 5 classic papers. 5 points. Honor system. Assignments. Three assignments, worth 15% each, for a total of 45%. Programming: take Python code and extend it. Derivations: pen(cil)-and-paper Mid-term: One hour exam on week of Oct. 12 - Oct. 18 Worth 20% of course mark. Final: Focused on second half of course. Worth 30% of course mark.

Requirements (Grads) Do the readings! Read 5 classic papers. 5 points. Honor system. Assignments. Three assignments, worth 15% each, for a total of 45%. Programming: take Python code and extend it. Derivations: pen(cil)-and-paper Mid-term: One hour exam on week of Oct. 12 - Oct. 18 Worth 20% of course mark. Project: Worth 30% of course mark.

More on Assignments Collaboration on the assignments is not allowed. Each student is responsible for his/her own work. Discussion of assignments should be limited to clarification of the handout itself, and should not involve any sharing of pseudocode or code or simulation results. Violation of this policy is grounds for a semester grade of F, in accordance with university regulations. The schedule of assignments is included in the syllabus. Assignments should be handed in by 10 pm; a late penalty of 10% per day will be assessed thereafter (up to 3 days, then submission is blocked). Extensions will be granted only in special situations, and you will need a Student Medical Certificate or a written request approved by the course coordinator at least one week before the due date.

Provisional Calendar Sept. 7-13 : duction; Linear regression Sept. 14-20 : Linear classification & Logistic regression Sept. 21-27: Nearest neighbor & Decision trees, Assignment 1 release on Sept. 21 Sept. 28-Oct. 4 : Multi-class classification & Probabilistic Classifiers I, Reading assignment 1 release Oct. 5-11 : Probabilistic Classifiers II & Neural Networks I, Assignment 1 due on Oct. 5 & Reading assignment 2 release

Provisional Calendar II Oct. 12-18 : Neural Networks II & PCA, Midterm Oct. 19-25: t-sne & Clustering, Assignment 2 release on Oct. 19 Oct. 26-Nov. 1: Mixture of Gaussian & EM, Reading assignment 3 release Nov. 2-Nov8 : Nov 6-10 Reading week Assignment 2 due on Nov. 2

Provisional Calendar III Nov. 9-15 : SVM & Kernels Assignment 3 release on Nov.13 & Reading assignment 4 release Nov. 16-22: Ensembles Learning Nov. 23-29: Reinforcement learning Assignment 3 due on Nov. 27 Nov. 30-Dec. 7: Learning theory; Reading assignment 5 release Dec. 9-20:Final Exam Period

What is learning? The activity or process of gaining knowledge or skill by studying, practicing, being taught, or experiencing something. ML AI. Merriam Webster dictionary

What is machine learning? How can we solve a specific problem? As computer scientists we write a program that encodes a set of rules that are useful to solve the problem However, In many cases is very difficult to specify those rules Some tasks (vision, speech, NLP) are too complicated to code. Some systems need to adapt. Handle noise. Etc. Instead of explicitly writing a program to solve a specific problem, we use examples (training data) to train the computer to perform this task (to generalize).

What is machine learning? Learning systems are not directly programmed to solve a problem, instead develop own program based on: Examples of how they should behave From trial-and-error experience trying to solve the problem Different than standard CS: Want to implement unknown function, only have access e.g., to sample input-output pairs (training examples) Learning simply means incorporating information from the training examples into the system

Administration Examples Computer vision: Object detection, semantic segmentation, pose estimation, and almost every other task is done with ML. Instance segmentation - Link

Examples Speech: Speech to text, personal assistance, speaker identification...

Examples NLP: Machine translation, sentiment analysis, topic modeling, spam filtering.

Examples Playing Games DOTA2 - Link

Examples E-commerce & Recommender Systems : Amazon, netflix,...

Formulation ML broad categories: Supervised learning (correct outputs known). Given (x, y) pairs learn a mapping from x to y. Example: Sentiment analysis. Classification: categorical output (object recognition, medical diagnosis) Regression: real-valued output (predicting market prices, customer rating) Unsupervised learning. Given data points find some structure in the data. Example: Dimensionality reduction. Online learning. Supervised learning when the data is given sequentially, by an adversary, No separate train/test phases. Example: Spam filtering. Reinforcement learning. Learn actions to maximize future rewards. Delayed playoffs, agent controls what he sees. Example: Flying drones. Various smaller categories, e.g. active learning, semi-supervised learning.

Formulation Supervised learning mathematical set-up: An input space X. Examples: R n, images, texts, sound recordings, etc. An output space Y. Examples: {±1}, {1,..., k}, R. An unknown distribution D on X Y. A loss function l : Y Y R. Examples: 0 1 loss, square loss. A set of m i.i.d samples (x 1, y 1 ),..., (x m, y m ) sampled from the distribution D. The goal: return a function (hypothesis) h : X Y that minimizes the expect loss (risk) with respect to D i.e. find h that minimizes L D (h) = E (x,y) D [l(h(x), y)]

Formulation We want to minimize L D (h) = E (x,y) D [l(h(x), y)], but we don t know L D. We can approximate it by the empirical loss L S (h) = 1 n m i=1 l(h(x i), y i ) For a specific function h, L S (h) L D (h), but if we try to fit a very complex model we might find a solution that works on our training examples and doesn t generalize to other examples. That means we overfit. The main challenge: Find a model that is rich enough to find the patterns in your data, but does not fit random noise in our data.

Formulation If you torture the data long enough, it will confess. -Ronald Coase Images taken from spurious correlations

Formulation ML viewpoints: Agnostic approach. Trying to minimize loss on unseen data. Discriminative approach. Fit P (y x; θ) by some parametric model. Generative approach. Fit P (x, y; θ) by some parametric model, and use it to determine P (y x; θ). Bayesian approach. Instead of a single model θ we have a distribution over θ, p(θ) so p(y x) = p(y x, θ)p(θ)

Formulation Machine Learning vs Data Mining Data-mining: Typically using very simple machine learning techniques on very large databases because computers are too slow to do anything more interesting with ten billion examples Previously used in a negative sense misguided statistical procedure of looking for all kinds of relationships in the data until finally find one Now lines are blurred: many ML problems involve tons of data But problems with AI flavor (e.g., recognition, robot navigation) still domain of ML

Formulation Machine Learning vs Statistics ML uses statistical theory to build models A lot of ML is rediscovery of things statisticians already knew; often disguised by differences in terminology But the emphasis is very different: Good piece of statistics: Clever proof that relatively simple estimation procedure is asymptotically unbiased. Good piece of ML: Demo that a complicated algorithm produces impressive results on a specific task. Can view ML as applying computational techniques to statistical problems. But go beyond typical statistics problems, with different aims (speed vs. accuracy).

Formulation ML workflow sketch: 1 Should I use ML on this problem? Is there a pattern to detect? Can I solve it analytically? Do I have data? 2 Gather and organize data. 3 Preprocessing, cleaning, visualizing. 4 Establishing a baseline. 5 Choosing a model, loss, regularization,... 6 Optimization (could be simple, could be a Phd...). 7 Hyperparameter search. 8 Analyze performance and mistakes, and iterate back to step 5 (or 3).

Formulation Questions??