Machine Learning , Spring 2018

Similar documents
Lecture 1: Basic Concepts of Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Python Machine Learning

CSL465/603 - Machine Learning

CS 446: Machine Learning

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

INTERMEDIATE ALGEBRA Course Syllabus

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

CS Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Office Hours: Mon & Fri 10:00-12:00. Course Description

Foothill College Fall 2014 Math My Way Math 230/235 MTWThF 10:00-11:50 (click on Math My Way tab) Math My Way Instructors:

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Model Ensemble for Click Prediction in Bing Search Ads

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Rule Learning with Negation: Issues Regarding Effectiveness

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Mathematics. Mathematics

Course Content Concepts

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Firms and Markets Saturdays Summer I 2014

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Please read this entire syllabus, keep it as reference and is subject to change by the instructor.

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Axiom 2013 Team Description Paper

Math 22. Fall 2016 TROUT

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

MTH 215: Introduction to Linear Algebra

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Math 96: Intermediate Algebra in Context

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

CS/SE 3341 Spring 2012

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Itely,Newzeland,singapor etc. A quality investigation known as QualityLogic history homework help online that 35 of used printers cartridges break

GACE Computer Science Assessment Test at a Glance

Learning Methods in Multilingual Speech Recognition

Foothill College Summer 2016

Syllabus Foundations of Finance Summer 2014 FINC-UB

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Proof Theory for Syntacticians

Nutrition 10 Contemporary Nutrition WINTER 2016

Self Study Report Computer Science

Pre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value

CALCULUS III MATH

Assignment 1: Predicting Amazon Review Ratings

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

Creating Your Term Schedule

Section 7, Unit 4: Sample Student Book Activities for Teaching Listening

CSC200: Lecture 4. Allan Borodin

Data Structures and Algorithms

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Chapter 2 Rule Learning in a Nutshell

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

A Version Space Approach to Learning Context-free Grammars

Applications of data mining algorithms to analysis of medical data

Math 098 Intermediate Algebra Spring 2018

ME 443/643 Design Techniques in Mechanical Engineering. Lecture 1: Introduction

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Instructor. Darlene Diaz. Office SCC-SC-124. Phone (714) Course Information

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Case study Norway case 1

12- A whirlwind tour of statistics

Syllabus: CS 377 Communication and Ethical Issues in Computing 3 Credit Hours Prerequisite: CS 251, Data Structures Fall 2015

CS177 Python Programming

Course Syllabus for Math

Visual CP Representation of Knowledge

AP Statistics Summer Assignment 17-18

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Physics 270: Experimental Physics

Ab Calculus Clue Problem Set Answers

Second Exam: Natural Language Parsing with Neural Networks

The One Minute Preceptor: 5 Microskills for One-On-One Teaching

Distributed Weather Net: Wireless Sensor Network Supported Inquiry-Based Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Lecture Videos to Supplement Electromagnetic Classes at Cal Poly San Luis Obispo

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Syllabus Fall 2014 Earth Science 130: Introduction to Oceanography

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Transcription:

Machine Learning 10-401, Spring 2018 Introduction, Admin, Course Overview Lecture 1, 01/17/ 2018 Maria-Florina (Nina) Balcan

Image Classification Document Categorization Machine Learning Speech Recognition Protein Classification Spam Detection Branch Prediction Fraud Detection Natural Language Processing Playing Games Computational Advertising 2

Machine Learning is Changing the World Machine learning is the hot new thing (John Hennessy, President, Stanford) A breakthrough in machine learning would be worth ten Microsofts (Bill Gates, Microsoft) Web rankings today are mostly a matter of machine learning (Prabhakar Raghavan, VP Engineering at Google)

The COOLEST TOPIC IN SCIENCE A breakthrough in machine learning would be worth ten Microsofts (Bill Gates, Chairman, Microsoft) Machine learning is the next Internet (Tony Tether, Director, DARPA) Machine learning is the hot new thing (John Hennessy, President, Stanford) Web rankings today are mostly a matter of machine learning (Prabhakar Raghavan, Dir. Research, Yahoo) Machine learning is going to result in a real revolution (Greg Papadopoulos, CTO, Sun) Machine learning is today s discontinuity (Jerry Yang, CEO, Yahoo)

This course: introduction to machine learning. Cover (some of) the most commonly used machine learning paradigms and algorithms. Sufficient amount of details on their mechanisms: explain why they work, not only how to use them. Applications.

What is Machine Learning? Examples of important machine learning paradigms.

Supervised Classification from data to discrete classes

Supervised Classification. Example: Spam Detection Decide which emails are spam and which are important. Not spam Supervised classification spam Goal: use emails seen so far to produce good prediction rule for future data.

Supervised Classification. Example: Spam Detection Represent each message by features. (e.g., keywords, spelling, etc.) example label Reasonable RULES: Predict SPAM if unknown AND (money OR pills) Predict SPAM if 2money + 3pills 5 known > 0 + - + - - -- - Linearly separable

Supervised Classification. Example: Image classification Handwritten digit recognition (convert hand-written digits to characters 0..9) Face Detection and Recognition

Supervised Classification. Many other examples Weather prediction Medicine: diagnose a disease input: from symptoms, lab measurements, test results, DNA tests, output: one of set of possible diseases, or none of the above examples: audiology, thyroid cancer, diabetes, or: response to chemo drug X or: will patient be re-admitted soon? Computational Economics: predict if a stock will rise or fall predict if a user will click on an ad or not in order to decide which ad to show

Regression. Predicting a numeric value Stock market Weather prediction Temperature 72 F Predict the temperature at any given location

Other Machine Learning Paradigm Clustering: discovering structure in data (only unlabeled data) E.g, cluster users of social networks by interest (community detection). Facebook network Twitter Network Semi-Supervised Learning: learning with labeled & unlabeled data Active Learning: learns pick informative examples to be labeled Reinforcement Learning (acommodates indirect or delayed feedback) Dimensionality Reduction Collaborative Filtering (Matrix Completion),

Many communities relate to ML

Admin, Logistics, Grading

Instructors: Brief Overview Meeting Time: Mon, Wed, NSH 3002, 10:30 11:50 Course Staff TAs: Maria Florina (Nina) Balcan (ninamf@cs.cmu.edu) Kenneth Marino (kdmarino@cs.cmu.edu) Colin White (crwhite@cs.cmu.edu) Nupur Chatterji (nchatter@andrew.cmu.edu)

Course Website Brief Overview http://www.cs.cmu.edu/~ninamf/courses/401sp18 See website for: Syllabus details All the lecture slides and homeworks Additional useful resources. Office hours Recitation sessions Grading policy Honesty policy Late homework policy Piazza pointers Will use Piazza for discussions.

Prerequisites. What do you need to know now? You should know how to do math and how to program: Calculus (multivariate) Probability/statistics Algorithms. Big O notation. Linear algebra (matrices and vectors) Programming: You will implement some of the algorithms and apply them to datasets Assignments will be in Octave (play with that now if you want; also recitation tomorrow) Octave is open-source software clone of Matlab. We may review these things but we will not teach them

Source Materials No textbook required. Will point to slides and freely available online material. Useful textbooks: Machine Learning, Tom Mitchell, McGraw Hill, 1997. Machine Learning: a Probabilistic Perspective, K. Murphy, MIT Press, 2012 Pattern Recognition and Machine Learning Christopher Bishop, Springer-Verlag 2006

Homeworks 1 to 4 Theory/math handouts Grading 40% for homeworks. There are 5 and you can drop 1. 20% for midterm [March 7] 20% for final [May 2nd] 15% for project 5% for class participation. Piazza polls in class: bring a laptop or a phone Homework 0: background hwk, out today [get full credit if you turn it in] Programming exercises; applying/evaluating existing learners Late assignments: Up to 50% credit if it s less than 48 hrs late You can drop your lowest assignment grade Projects: conduct a small experiment or read a couple of papers and present the main ideas or work on a small theoretical question. Project presentations: April 23 and April 25

Collaboration policy (see syllabus) Discussion of anything is ok but the goal should be to understand better, not save work. So: no notes of the discussion are allowed the only thing you can take away is whatever s in your brain. you should acknowledge who you got help from/did help in your homework

Instructors: Brief Overview Meeting Time: Mon, Wed, NSH 3002, 10:30 11:50 Course Staff TAs: Maria Florina (Nina) Balcan (ninamf@cs.cmu.edu) Kenneth Marino (kdmarino@cs.cmu.edu) Colin White (crwhite@cs.cmu.edu) Nupur Chatterji (nchatter@andrew.cmu.edu)

Maria-Florina Balcan: Nina Foundations for Modern Machine Learning E.g., interactive, semi-supervised, distributed, life-long learning Connections between learning & other fields (algorithms, algorithmic game theory) Approx. Algorithms Control Theory Game Theory Machine Learning Theory Mechanism Design Discrete Optimization Matroid Theory Program Committee Chair for ICML 2016, COLT 2014

Kenneth Marino: Kenny Incorporating knowledge into Computer Vision Incorporating knowledge graphs Learning from Wikipedia articles Deep Learning non-traditional training and architectures Graph Networks Generative Models (VAEs and GANs)

Colin White 4 th year PhD student advised by Nina Balcan Design and analysis of algorithms Theoretical foundations of machine learning Beyond worst-case analysis Worst-case Average case Real-world, application-specific

Nupur Chatterji Senior is SCS (Undergrad) Minor in Machine Learning (and Economics) Intend to pursue ML in grad school Interested in the intersection between technology and healthcare

Learning Decision Trees. Supervised Classification. Useful Readings: Mitchell, Chapter 3 Bishop, Chapter 14.4 DT learning: Method for learning discrete-valued target functions in which the function to be learned is represented by a decision tree.

Supervised Classification: Decision Tree Learning Example: learn concept PlayTennis (i.e., decide whether our friend will play tennis or not in a given day) Simple Training Data Set Day Outlook Temperature Humidity Wind Play Tennis example label

Supervised Classification: Decision Tree Learning Each internal node: test one (discrete-valued) attribute X i Each branch from a node: corresponds to one possible values for X i Each leaf node: predict Y (or P(Y=1 x leaf)) Example: A Decision tree for f: <Outlook, Temperature, Humidity, Wind> PlayTennis? Day Outlook Temperature Humidity Wind Play Tennis E.g., x=(outlook=sunny, Temperature-Hot, Humidity=Normal,Wind=High), f(x)=yes.

Supervised Classification: Problem Setting Input: Training labeled examples {(x (i),y (i) )} of unknown target function f Examples described by their values on some set of features or attributes Day Outlook Temperature Humidity Wind Play Tennis Output: E.g. 4 attributes: Humidity, Wind, Outlook, Temp e.g., <Humidity=High, Wind=weak, Outlook=rain, Temp=Mild> Set of possible instances X (a.k.a instance space) Unknown target function f : X Y e.g., Y={0,1} label space e.g., 1 if we play tennis on this day, else 0 Hypothesis h H that (best) approximates target function f Set of function hypotheses H={ h h : X Y } each hypothesis h is a decision tree

Supervised Classification: Decision Trees Suppose X = <x 1, x n > where x i are boolean-valued variables How would you represent the following as DTs? f(x) = x 2 AND x 5? f(x) = x 2 OR x 5 x 2 1 0 x 2 1 0 x 5 f = No f = Yes x 5 1 0 1 0 f = Yes f = No f = Yes f = No Hwk: How would you represent X 2 X 5 X 3 X 4 ( X 1 )?

Supervised Classification: Problem Setting Input: Training labeled examples {(x (i),y (i) )} of unknown target function f Examples described by their values on some set of features or attributes Day Outlook Temperature Humidity Wind Play Tennis Output: E.g. 4 attributes: Humidity, Wind, Outlook, Temp e.g., <Humidity=High, Wind=weak, Outlook=rain, Temp=Mild> Set of possible instances X (a.k.a instance space) Unknown target function f : X Y e.g., Y={0,1} label space e.g., 1 if we play tennis on this day, else 0 Hypothesis h H that (best) approximates target function f Set of function hypotheses H={ h h : X Y } each hypothesis h is a decision tree

Core Aspects in Decision Tree & Supervised Learning How to automatically find a good hypothesis for training data? This is an algorithmic question, the main topic of computer science When do we generalize and do well on unseen data? Learning theory quantifies ability to generalize as a function of the amount of training data and the hypothesis space Occam s razor: use the simplest hypothesis consistent with data! Fewer short hypotheses than long ones a short hypothesis that fits the data is less likely to be a statistical coincidence highly probable that a sufficiently complex hypothesis will fit the data

Core Aspects in Decision Tree & Supervised Learning How to automatically find a good hypothesis for training data? This is an algorithmic question, the main topic of computer science When do we generalize and do well on unseen data? Occam s razor: use the simplest hypothesis consistent with data! Decision trees: if we were able to find a small decision tree that explains data well, then good generalization guarantees. NP-hard [Hyafil-Rivest 76]: unlikely to have a poly time algorithm Very nice practical heuristics; top down algorithms, e.g, ID3

Top-Down Induction of Decision Trees [ID3, C4.5, Quinlan] ID3: Natural greedy approach to growing a decision tree top-down (from the root to the leaves by repeatedly replacing an existing leaf with an internal node.). Algorithm: Pick best attribute to split at the root based on training data. Recurse on children that are impure (e.g, have both Yes and No). Humidity Outlook Temp Wind Day Outlook Temperature Humidity Wind Play Tennis Day Outlook Temperature Humidity Wind Play Tennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D11 Sunny Mild Normal Strong Yes Weak High Sunny Cool Overcast Mild Normal Strong Rain Hot Day Outlook Temperature Humidity Wind Play Tennis D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D10 Rain Mild Normal Weak Yes D14 Rain Mild High Strong No Humidity Yes Wind High Normal Strong Weak No Yes No Yes