Introduction to Machine Learning (CSCI-UA )

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

CSL465/603 - Machine Learning

Probabilistic Latent Semantic Analysis

Lecture 1: Basic Concepts of Machine Learning

CS Machine Learning

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Mathematics. Mathematics

Learning From the Past with Experiment Databases

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

CS177 Python Programming

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Generative models and adversarial training

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

B.S/M.A in Mathematics

Indian Institute of Technology, Kanpur

MTH 215: Introduction to Linear Algebra

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A Comparison of Two Text Representations for Sentiment Analysis

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

CS 100: Principles of Computing

Reducing Features to Improve Bug Prediction

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

EGRHS Course Fair. Science & Math AP & IB Courses

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Axiom 2013 Team Description Paper

Data Structures and Algorithms

Human Emotion Recognition From Speech

Using dialogue context to improve parsing performance in dialogue systems

UNIT ONE Tools of Algebra

Time series prediction

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

Knowledge Transfer in Deep Convolutional Neural Nets

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

Assignment 1: Predicting Amazon Review Ratings

ECO 3101: Intermediate Microeconomics

Mathematics subject curriculum

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Statistics and Data Analytics Minor

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Office Hours: Mon & Fri 10:00-12:00. Course Description

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Cal s Dinner Card Deals

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

arxiv: v2 [cs.cv] 30 Mar 2017

Grade 6: Correlated to AGS Basic Math Skills

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Foothill College Summer 2016

EDINA SENIOR HIGH SCHOOL Registration Class of 2020

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Pre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Syllabus Foundations of Finance Summer 2014 FINC-UB

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

AC : DESIGNING AN UNDERGRADUATE ROBOTICS ENGINEERING CURRICULUM: UNIFIED ROBOTICS I AND II

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

ADVANCED PLACEMENT STUDENTS IN COLLEGE: AN INVESTIGATION OF COURSE GRADES AT 21 COLLEGES. Rick Morgan Len Ramist

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

First Grade Curriculum Highlights: In alignment with the Common Core Standards

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

SOUTHWEST COLLEGE Department of Mathematics

Comment-based Multi-View Clustering of Web 2.0 Items

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

CS/SE 3341 Spring 2012

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A deep architecture for non-projective dependency parsing

ME 4495 Computational Heat Transfer and Fluid Flow M,W 4:00 5:15 (Eng 177)

Speech Emotion Recognition Using Support Vector Machine

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Deep Neural Network Language Models

Rule Learning With Negation: Issues Regarding Effectiveness

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Transcription:

Introduction to Machine Learning (CSCI-UA.0480-007) David Sontag New York University Slides adapted from Luke Zettlemoyer, Pedro Domingos, and Carlos Guestrin

Logistics Class webpage: http://cs.nyu.edu/~dsontag/courses/ml16/ Sign up for Piazza! Office hours: TBD Teaching assistant: Kevin Jiao <jjiao@stern.nyu.edu> Graders: Yijun Xiao <ryjxiao@nyu.edu> Alexandre Sablayrolles <alexandre.sablayrolles@gmail.com>

Evaluation 6-7 homeworks (50%) Both theory and programming Collaboration policy: First try to solve the problems on your own Then, can discuss with other classmates Write-up solutions on your own List names of anyone you talked to Midterm exam (25%) Project (20%) Course participation (5%)

Projects Be creative think of new problems that you can tackle using machine learning Scope: ~40 hours/person Logistics: 2-3 students per group Begins mid-march. Project proposal due week after midterm exam Will still be problem sets during this period!

Prerequisites REQUIRED: Basic algorithms (CS 310) Dynamic programming, algorithmic analysis Can be taken concurrently STRONGLY RECOMMENDED: Linear algebra (Math 140) Matrices, vectors, systems of linear equations Eigenvectors, matrix rank Singular value decomposition Multivariable calculus (Math 123) Derivatives, integration, tangent planes Optimization, Lagrange multipliers Good programming skills: Python highly recommended

Source Materials No textbook required. Readings will come from freely available online material. If you really want a book for an additional reference, these are OK options: C. Bishop, Pattern Recognition and Machine Learning, Springer, 2007 K. Murphy, Machine Learning: a Probabilistic Perspective, MIT Press, 2012 may update this list throughout semester. I wouldn t buy anything yet.

What is Machine Learning? (by examples)

Classification from data to discrete classes

data Spam filtering prediction Spam vs. Not Spam

Face recognition Example training images for each orientation 10 2009 Carlos Guestrin

Weather prediction

Regression predicting a numeric value

Stock market

Weather prediction revisited Temperature 72 F

Ranking comparing items

Web search

Given image, find similar images http://www.tiltomo.com/

Collaborative Filtering

Recommendation systems

Recommendation systems Machine learning competition with a $1 million prize

Clustering discovering structure in data

Clustering Data: Group similar things

Clustering images Set of Images [Goldberger et al.]

Clustering web search results

Embedding visualizing data

Embedding images Images have thousands or millions of pixels. Can we give each image a coordinate, such that similar images are near each other? 26 2009 Carlos Guestrin [Saul & Roweis 03]

Embedding words [Joseph Turian]

Embedding words (zoom in) [Joseph Turian]

Structured prediction from data to discrete classes

Speech recognition

Natural language processing I need to hide a body noun, verb, preposition,

Growth of Machine Learning Machine learning is preferred approach to Speech recognition, Natural language processing Computer vision Medical outcomes analysis Robot control Computational biology Sensor networks This trend is accelerating Big data Improved machine learning algorithms Faster computers Good open-source software

Course roadmap First half of course: supervised learning SVMs, kernel methods Learning theory Decision trees, boosting, deep learning Second half of course: data science Unsupervised learning, EM algorithm Dimensionality reduction Topic models

Supervised Learning: find f Given: Training set {(x i, y i ) i = 1 N} Find: A good approximation to f : X! Y Examples: what are X and Y? Spam Detection Map email to {Spam, Not Spam} Digit recognition Map pixels to {0,1,2,3,4,5,6,7,8,9} Stock Prediction Map new, historic prices, etc. to (the real numbers) R

A Supervised Learning Problem Dataset: Our goal is to find a function f : X! Y X = {0,1} 4 Y = {0,1} Question 1: How should we pick the hypothesis space, the set of possible functions f? Question 2: How do we find the best f in the hypothesis space?

Most General Hypothesis Space Consider all possible boolean functions over four input features! Dataset: 2 16 possible hypotheses 2 9 are consistent with our dataset How do we choose the best one?

A Restricted Hypothesis Space Consider all conjunctive boolean functions. 16 possible hypotheses Dataset: None are consistent with our dataset How do we choose the best one?

Occam s Razor Principle William of Occam: Monk living in the 14 th century Principle of parsimony: One should not increase, beyond what is necessary, the number of entities required to explain anything When many solutions are available for a given problem, we should select the simplest one But what do we mean by simple? We will use prior knowledge of the problem to solve to define what is a simple solution Example of a prior: smoothness [Samy Bengio]

Key Issues in Machine Learning How do we choose a hypothesis space? Often we use prior knowledge to guide this choice How can we gauge the accuracy of a hypothesis on unseen data? Occam s razor: use the simplest hypothesis consistent with data! This will help us avoid overfitting. Learning theory will help us quantify our ability to generalize as a function of the amount of training data and the hypothesis space How do we find the best hypothesis? This is an algorithmic question, the main topic of computer science How to model applications as machine learning problems? (engineering challenge)