Time and LocaBon. CS 6140: Machine Learning Spring Prerequisites. Course Webpage. Prerequisites. Textbook and References 1/13/17

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

Generative models and adversarial training

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

CSL465/603 - Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Speech Emotion Recognition Using Support Vector Machine

Assignment 1: Predicting Amazon Review Ratings

A study of speaker adaptation for DNN-based speech synthesis

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Probabilistic Latent Semantic Analysis

Calibration of Confidence Measures in Speech Recognition

CS Machine Learning

Multivariate k-nearest Neighbor Regression for Time Series data -

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Time series prediction

Lecture 1: Basic Concepts of Machine Learning

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Reducing Features to Improve Bug Prediction

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Australian Journal of Basic and Applied Sciences

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

arxiv: v2 [cs.cv] 30 Mar 2017

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Artificial Neural Networks written examination

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

MTH 215: Introduction to Linear Algebra

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Syllabus ENGR 190 Introductory Calculus (QR)

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

CS 100: Principles of Computing

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Learning Methods for Fuzzy Systems

Semi-Supervised Face Detection

Data Fusion Through Statistical Matching

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

WHEN THERE IS A mismatch between the acoustic

Learning Methods in Multilingual Speech Recognition

Learning From the Past with Experiment Databases

Indian Institute of Technology, Kanpur

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Human Emotion Recognition From Speech

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Comparison of Two Text Representations for Sentiment Analysis

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

*In Ancient Greek: *In English: micro = small macro = large economia = management of the household or family

Mathematics Success Level E

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

CS 446: Machine Learning

Axiom 2013 Team Description Paper

Probability and Game Theory Course Syllabus

Physics 270: Experimental Physics

Issues in the Mining of Heart Failure Datasets

Understanding ABA in the Public School Setting. Anissa Moore, Anissa Moore Educational/Behavioral Consulting

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Rule Learning With Negation: Issues Regarding Effectiveness

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Measurement. When Smaller Is Better. Activity:

Modeling function word errors in DNN-HMM based LVCSR systems

arxiv: v1 [cs.lg] 15 Jun 2015

Speech Recognition at ICSI: Broadcast News and beyond

Data Structures and Algorithms

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CS177 Python Programming

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

CS 101 Computer Science I Fall Instructor Muller. Syllabus

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Modeling function word errors in DNN-HMM based LVCSR systems

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

arxiv: v1 [math.at] 10 Jan 2016

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

A Vector Space Approach for Aspect-Based Sentiment Analysis

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

An OO Framework for building Intelligence and Learning properties in Software Agents

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Syllabus Foundations of Finance Summer 2014 FINC-UB

Welcome to. ECML/PKDD 2004 Community meeting

Support Vector Machines for Speaker and Language Recognition

Math 96: Intermediate Algebra in Context

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

Instructor: Matthew Wickes Kilgore Office: ES 310

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Transcription:

Time and LocaBon CS 6140: Machine Learning Spring 2017 Time: Thursdays from 6:00 pm 9:00 pm Loca)on: Forsyth Building 129 Instructor: Lu Wang College of Computer and InformaBon Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Course Webpage hqp://www.ccs.neu.edu/home/luwang/ courses/cs6140_sp2017.html Prerequisites Programming Being able to write code in some programming languages (e.g. Python, Java, C/C++, Matlab) proficiently Courses Algorithms Probability and stabsbcs Linear algebra Courses Algorithms Probability and stabsbcs Linear algebra Prerequisites A quiz: 22 simple quesbons, 20 of them as True or False quesbons (relevant to probability, stabsbcs, and linear algebra) The purpose of this quiz is to indicate the expected background of students. 80% of the quesbons should be easy to answer. Not counted in your final score! Textbook and References Main Textbook Kevin Murphy, "Machine Learning - a ProbabilisBc PerspecBve", MIT Press, 2012. Christopher M. Bishop, "PaQern RecogniBon and Machine Learning", Springer, 2006. Other textbooks Tom Mitchell, "Machine Learning", McGraw Hill, 1997. Machine learning lectures 1

Content of the Course Regression: linear regression, logisbc regression Dimensionality Reduc)on: Principal Component Analysis (PCA), Independent Component Analysis (ICA), Linear Discriminant Analysis Probabilis)c Models: Naive Bayes, maximum likelihood esbmabon Sta)s)cal Learning Theory: VC dimension Kernels: Support Vector Machines (SVMs), kernel tricks, duality Sequen)al Models and Structural Models: Hidden Markov Model (HMM), CondiBonal Random Fields (CRFs) Clustering: spectral clustering, hierarchical clustering Latent Variable Models: K-means, mixture models, expectabon-maximizabon (EM) algorithms, Latent Dirichlet AllocaBon (LDA), representabon learning Deep Learning: feedforward neural network, restricted Boltzmann machine, autoencoders, recurrent neural network, convolubonal neural network Reinforcement Learning: Markov decision processes, Q-learning and others, including advanced topics for machine learning in natural language processing and text analysis The Goal ScienBfic understanding of machine learning models How to apply and design learning methods for novel problems The Goal Not only what, but also why! Assignment 3 assignments, 10% for each Grading Quiz 10 in-class tests, 1% for each Exam 1 exam, 30% Project 1 project, 27% ParBcipaBon 3% Classes Piazza Exam Course Project Open book April 20, 2017 A machine learning relevant research project 2-3 students as a team 2

Topics Machine learning relevant Natural language processing Computer vision RoboBcs BioinformaBcs Health informabcs Course Project Grading We want to see novel and interesbng projects! The problem needs to be well-defined, novel, useful, and pracbcal machine learning techniques Reasonable results and observabons Project from Last Year Project from Last Year PredicBng Follow-back Behavior in Instagram Users Project from Last Year PredicBng Grasp Points Using ConvoluBonal Neural Networks Project from Last Year ArBficial Neural Networks for Drug Response PredicBon in Tailored Therapy 3

Project from Last Year Threat DetecBon from TwiQer Project from Last Year Player Ranking in Popular Games Course Project Grading Three reports Proposal (2%) Progress, with code (10%) Final, with code (10%) One presentabon In class (5%) Submission and Late Policy Each assignment or report, both electronic copy and hard copy, is due at the beginning of class on the corresponding due date. Programming language Python, Java, C/C++, Matlab Electronic version On blackboard Hard copy In class Submission and Late Policy Assignment or report turned in late will be charged 10 points (out of 100 points) off for each late day (i.e. 24 hours). Each student has a budget of 5 days throughout the semester before a late penalty is applied. How to find us? Course webpage: hqp://www.ccs.neu.edu/home/luwang/courses/ cs6140_sp2017.html Office hours Lu Wang: Thursdays from 4:30pm to 5:30pm, or by appointment, 448 WVH Rui Dong (TA), Tuesdays from 4:00pm to 5:00pm, or by appointment, 466B WVH Piazza hqp://piazza.com/northeastern/spring2017/cs614002 All course relevant quesbons go here 4

What is Machine Learning? A set of methods that can automabcally detect paqerns in data, and then use the uncovered paqerns to predict future data, or to perform other kinds of decisions making under certainty. 5

1/13/17 RelaBons with Other Areas Natural Language Processing Computer Vision RoboBcs A lot of other areas 6

Today s Outline Supervised vs. Unsupervised Learning Basic concepts in machine learning K-nearest neighbors Linear regression Ridge regression Supervised Learning Supervised vs. Unsupervised Learning Supervised learning Training set Training sample Gold-standard label - Classifica)on, if categorical - Regression, if numerical Supervised Learning Supervised Learning 7

1/13/17 Supervised Learning Supervised Learning Goal: Generalizable to new input samples Overfivng vs. underfivng One solubon: we use probabilisbc models Typical setup: Step 1: Features Step 2: Training set, test set, development set Step 3: EvaluaBon Supervised Learning Supervised Learning Supervised Learning Supervised vs. Unsupervised Learning Regression PredicBng stock price PredicBng temperature PredicBng revenue Unsupervised Learning More about knowledge discovery 8

Unsupervised Learning Dimension reducbon Principal component analysis Unsupervised Learning Clustering (e.g. graph mining) RolX: Role Extrac.on and Mining in Large Networks, by Henderson et al, 2011 Unsupervised Learning Topic modeling Parametric vs. Non-parametric model Fixed number of parameters? If yes, parametric model Number of parameters grow with the amount of training data? If yes, non-parametric model ComputaBonal tractability Today s Outline Basic concepts in machine learning A non-parametric classifier: K-nearest neighbors (KNN) K-nearest neighbors Supervised learning A non-parametric classifier Linear regression Ridge regression 9

A non-parametric classifier: K-nearest neighbors (KNN) Basic idea: memorize all the training samples The more you have in training data, the more the model has to remember A non-parametric classifier: K-nearest neighbors (KNN) Basic idea: memorize all the training samples The more you have in training data, the more the model has to remember Nearest neighbor (or 1-nearest neighbor): TesBng phase: find closet sample, and return corresponding label A non-parametric classifier: K-nearest neighbors (KNN) Basic idea: memorize all the training samples The more you have in training data, the more the model has to remember K-Nearest neighbor: TesBng phase: find the K nearest neighbors, and return the majority vote of their labels 10

About K K=1: just piecewise constant labeling K=N: global majority vote (class) Problems of knn Can be slow when training data is big Searching for the neighbors takes Bme Needs lots of memory to store training data Needs to tune k and distance funcbon Not a probability distribubon Distance funcbon Euclidean distance Problems of knn Problems of knn Distance funcbon Mahalanobis distance: weights on components ProbabilisBc knn ProbabilisBc knn We prefer a probabilisbc output because somebmes we may get an uncertain result 1 samples as yes, 199 samples as no à? 99 samples as yes, 101 samples as no à? ProbabilisBc knn: 3-class synthebc training data 11

Smoothing Class 1: 3, class 2: 0, class 3: 1 Original probability: P(y=1)=3/4, p(y=2)=0/4, p(y=3)=1/4 Smoothing Class 1: 3, class 2: 0, class 3: 1 Original probability: P(y=1)=3/4, p(y=2)=0/4, p(y=3)=1/4 Add-1 smoothing: Class 1: 3+1, class 2: 0+1, class 3: 1+1 P(y=1)=4/7, p(y=2)=1/7, p(y=3)=2/7 Soxmax Class 1: 3, class 2: 0, class 3: 1 Original probability: P(y=1)=3/4, p(y=2)=0/4, p(y=3)=1/4 Redistribute probability mass into different classes Define a soxmax as Today s Outline Basic concepts in machine learning K-nearest neighbors Linear regression Supervised learning A parametric classifier Ridge regression A parametric classifier: linear regression AssumpBon: the response is a linear funcbon of the inputs A parametric classifier: linear regression Inner product between input sample X and weight vector W Residual error: difference between predicbon and true label Inner product between input sample X and weight vector W Residual error: difference between predicbon and true label Assume residual error has a normal distribubon 12

A parametric classifier: linear regression A parametric classifier: linear regression We can further assume Basic funcbon expansion VerBcal: temperature Horizontal: locabon within a room A parametric classifier: linear regression Learning with Maximum Likelihood EsBmaBon (MLE) Maximum Likelihood EsBmaBon (MLE) Learning with Maximum Likelihood EsBmaBon (MLE) Log-likelihood Learning with Maximum Likelihood EsBmaBon (MLE) With our normal distribubon assumpbon Maximize log-likelihood is equivalent to minimize negabve log-likelihood (NLL) Residual sum of squares (RSS) à We want to minimize it! 13

DerivaBon of MLE for Linear Regression Rewrite our objecbve funcbon as DerivaBon of MLE for Linear Regression Rewrite our objecbve funcbon as Get the derivabve (or gradient) DerivaBon of MLE for Linear Regression Rewrite our objecbve funcbon as Get the derivabve (or gradient) Set our derivabve to 0 Ordinary least squares solu)on Overfivng A Prior on the Weight Zero-mean Gaussian prior Feature weights w: 14

A Prior on the Weight Zero-mean Gaussian prior A Prior on the Weight Zero-mean Gaussian prior New objecbve funcbon New objecbve funcbon Today s Outline Basic concepts in machine learning We want to minimize Ridge Regression K-nearest neighbors Linear regression Ridge regression Ridge Regression Ridge Regression We want to minimize We want to minimize L2 regulariza)on New esbmabon for the weight New esbmabon for the weight 15

Ridge Regression What we learned We want to minimize L2 regulariza)on Basic concepts in machine learning K-nearest neighbors New esbmabon for the weight Linear regression Leave the proof in Assignment 1! Ridge regression Homework Homework Reading Murphy ch1, ch2, and ch7 (only the secbons covered in the lecture) Sign up at Piazza hqp://piazza.com/northeastern/spring2017/cs614002 Start thinking about course project and find a team! Project proposal due Jan 26 Reading Murphy ch1, ch2, and ch7 Sign up at Piazza hqp://piazza.com/northeastern/spring2017/cs614002 Start thinking about course project and find a team! Project proposal due Jan 26 Next Time: LogisBc Regression, Decision Tree, GeneraBve Models (Naive Bayes) Reading: Murphy Ch 3, 8.1-8.3, 8.6, 16.2 16