Time and CS 6140: Machine Learning Spring Prerequisites. Course Webpage. Textbook and References. Content of the Course 2/26/16

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Generative models and adversarial training

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CSL465/603 - Machine Learning

(Sub)Gradient Descent

Assignment 1: Predicting Amazon Review Ratings

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Speech Emotion Recognition Using Support Vector Machine

A study of speaker adaptation for DNN-based speech synthesis

Probabilistic Latent Semantic Analysis

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Lecture 1: Basic Concepts of Machine Learning

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Time series prediction

CS Machine Learning

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Australian Journal of Basic and Applied Sciences

Reducing Features to Improve Bug Prediction

Calibration of Confidence Measures in Speech Recognition

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Multivariate k-nearest Neighbor Regression for Time Series data -

arxiv: v2 [cs.cv] 30 Mar 2017

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Syllabus ENGR 190 Introductory Calculus (QR)

WHEN THERE IS A mismatch between the acoustic

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Semi-Supervised Face Detection

Human Emotion Recognition From Speech

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Data Fusion Through Statistical Matching

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning Methods for Fuzzy Systems

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

Physics 270: Experimental Physics

MTH 215: Introduction to Linear Algebra

Artificial Neural Networks written examination

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

*In Ancient Greek: *In English: micro = small macro = large economia = management of the household or family

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning From the Past with Experiment Databases

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CS 100: Principles of Computing

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Root Cause Analysis. Lean Construction Institute Provider Number H561. Root Cause Analysis RCA

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Switchboard Language Model Improvement with Conversational Data from Gigaword

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

A Comparison of Two Text Representations for Sentiment Analysis

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Comment-based Multi-View Clustering of Web 2.0 Items

Modeling function word errors in DNN-HMM based LVCSR systems

Theory of Probability

Indian Institute of Technology, Kanpur

Measurement. When Smaller Is Better. Activity:

A Vector Space Approach for Aspect-Based Sentiment Analysis

Learning Methods in Multilingual Speech Recognition

Bittinger, M. L., Ellenbogen, D. J., & Johnson, B. L. (2012). Prealgebra (6th ed.). Boston, MA: Addison-Wesley.

arxiv: v1 [cs.lg] 15 Jun 2015

CS177 Python Programming

Probability and Statistics Curriculum Pacing Guide

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

CS 101 Computer Science I Fall Instructor Muller. Syllabus

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Welcome to. ECML/PKDD 2004 Community meeting

Modeling function word errors in DNN-HMM based LVCSR systems

CS 446: Machine Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Rule Learning With Negation: Issues Regarding Effectiveness

Speech Recognition at ICSI: Broadcast News and beyond

Issues in the Mining of Heart Failure Datasets

Phys4051: Methods of Experimental Physics I

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

A14 Tier II Readiness, Data-Decision, and Practices

Model Ensemble for Click Prediction in Bing Search Ads

Statewide Framework Document for:

Course Objec4ves. Pimp Your Presenta4on. Title. Key components. Abstracts 9/18/15

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Transcription:

Time and Loca@on CS 6140: Machine Learning Spring 2016 Time: Thursdays from 6:00 pm 9:00 pm Loca)on: Behrakis Health Sciences Cntr 325 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Course Webpage hrp://www.ccs.neu.edu/home/luwang/ courses/cs6140_sp2016.html Prerequisites Programming Being able to write code in some programming languages (e.g. Python, Java, C/C++) proficiently Courses Algorithms Probability and sta@s@cs Linear algebra Some knowledge in machine learning Background Check Textbook and References Main Textbook Kevin Murphy, "Machine Learning - a Probabilis@c Perspec@ve", MIT Press, 2012. Other textbooks Christopher M. Bishop, "PaRern Recogni@on and Machine Learning", Springer, 2006. Tom Mitchell, "Machine Learning", McGraw Hill, 1997. Machine learning lectures Content of the Course Regression: linear regression, logis@c regression Dimensionality Reduc)on: Principal Component Analysis (PCA), Independent Component Analysis (ICA), Linear Discriminant Analysis Probabilis)c Models: Naive Bayes, maximum likelihood es@ma@on, bayesian inference Sta)s)cal Learning Theory: VC dimension Kernels: Support Vector Machines (SVMs), kernel tricks, duality Sequen)al Models and Structural Models: Hidden Markov Model (HMM), Condi@onal Random Fields (CRFs) Clustering: spectral clustering, hierarchical clustering Latent Variable Models: K-means, mixture models, expecta@on-maximiza@on (EM) algorithms, Latent Dirichlet Alloca@on (LDA), representa@on learning Deep Learning: feedforward neural network, restricted Boltzmann machine, autoencoders, recurrent neural network, convolu@onal neural network and others, including advanced topics for machine learning in natural language processing and text analysis 1

The Goal Not only what, but also why! Assignments 3 assignments, 10% for each Grading Exam 1 exam, 30% March, 31, 2016 Project 1 project, 35% Par@cipa@on 5% Classes Piazza Exam Course Project Open book March 31, 2016 Op@on 1: A machine learning relevant research project Op@on 2: Yelp Challenge 2-3 students as a team Research Project Yelp Challenge Machine learning relevant Natural language processing Computer vision Robo@cs Bioinforma@cs Health informa@cs Novelty 2

Yelp Challenge Yelp Challenge Course Project Grading We want to see novel and interes@ng projects! The problem needs to be well-defined, novel, useful, and prac@cal machine learning techniques Reasonable results and observa@ons Course Project Grading Three reports Proposal (5%) Progress, with code (10%) Final, with code (10%) One presenta@on In class (10%) Submission and Late Policy Each assignment or report, both electronic copy and hard copy, is due at the beginning of class on the corresponding due date. Electronic version On blackboard Hard copy In class Submission and Late Policy Assignment or report turned in late will be charged 10 points (out of 100 points) off for each late day (i.e. 24 hours). Each student has a budget of 5 days throughout the semester before a late penalty is applied. 3

How to find us? Course webpage: hrp://www.ccs.neu.edu/home/luwang/courses/ cs6140_sp2016.html Office hours Lu Wang: Thursdays from 4:30pm to 5:30pm, or by appointment, 448 WVH Gabriel Bakiewicz (TA), Mondays and Tuesdays from 4:00pm to 5:00pm, 362 WVH Piazza hrp://piazza.com/northeastern/spring2016/cs6140 All course relevant ques@ons go here What is Machine Learning? A set of methods that can automa@cally detect parerns in data, and then use the uncovered parerns to predict future data, or to perform other kinds of decisions making under certainty. 4

Rela@ons with Other Areas Natural Language Processing Computer Vision Robo@cs A lot of other areas 5

Today s Outline Basic concepts in machine learning Supervised vs. Unsupervised Learning Supervised learning K-nearest neighbors Linear regression Training set Training sample Gold-standard label - Classifica)on, if categorical - Regression, if numerical Ridge regression Goal: Generalizable to new input samples Overfiung vs. underfiung we use probabilis@c models Typical setup: Training set, test set, development set Features Evalua@on 6

Regression Predic@ng stock price Predic@ng temperature Predic@ng revenue Supervised vs. Unsupervised Learning Unsupervised Learning Unsupervised Learning Dimension reduc@on Principal component analysis More about knowledge discovery 7

Unsupervised Learning Clustering (e.g. graph mining) Unsupervised Learning Topic modeling RolX: Role Extrac.on and Mining in Large Networks, by Henderson et al, 2011 Parametric vs. Non-parametric model Fixed number of parameters? If yes, parametric model Number of parameters grow with the amount of training data? If yes, non-parametric model A non-parametric classifier: K-nearest neighbors (KNN) Basic idea: memorize all the training samples The more you have in training data, the more the model has to remember Nearest neighbor: Tes@ng phase: find closet sample, and return corresponding label Computa@onal tractability A non-parametric classifier: K-nearest neighbors (KNN) Basic idea: memorize all the training samples The more you have in training data, the more the model has to remember K-Nearest neighbor: Tes@ng phase: find the K nearest neighbors, and return the majority vote of their labels 8

About K About K K=1: just piecewise constant labeling K=N: global majority vote (class) Problems of knn Can be slow when training data is big Searching for the neighbors takes @me Distance func@on Euclidean distance Problems of knn Needs lots of memory to store training data Needs to tune k and distance func@on Not a probability distribu@on Problems of knn Distance func@on Mahalanobis distance: weights on components Probabilis@c knn We prefer a probabilis@c output because some@mes we may get an uncertain result 99 samples as yes, 101 samples as no à? Probabilis@c knn: 9

Probabilis@c knn Smoothing Class 1: 3, class 2: 0, class 3: 1 Original probability: P(y=1)=3/4, p(y=2)=0/4, p(y=3)=1/4 Add-1 smoothing: Class 1: 3+1, class 2: 0+1, class 3: 1+1 P(y=1)=4/7, p(y=2)=1/7, p(y=3)=2/7 3-class synthe@c training data Sowmax Class 1: 3, class 2: 0, class 3: 1 Original probability: P(y=1)=3/4, p(y=2)=0/4, p(y=3)=1/4 A parametric classifier: linear regression Assump@on: the response is a linear func@on of the inputs Redistribute probability mass into different classes Define a sowmax as Inner product between input sample X and weight vector W Residual error: difference between predic@on and true label A parametric classifier: linear regression Inner product between input sample X and weight vector W Residual error: difference between predic@on and true label Assume residual error has a normal distribu@on 10

A parametric classifier: linear regression A parametric classifier: linear regression We can further assume Basic func@on expansion Learning with Maximum Likelihood Es@ma@on (MLE) Maximum Likelihood Es@ma@on (MLE) Learning with Maximum Likelihood Es@ma@on (MLE) Log-likelihood Maximize log-likelihood is equivalent to minimize nega@ve log-likelihood (NLL) Learning with Maximum Likelihood Es@ma@on (MLE) With our normal distribu@on assump@on Deriva@on of MLE for Linear Regression Rewrite our objec@ve func@on as Residual sum of squares (RSS) à We want to minimize it! 11

Deriva@on of MLE for Linear Regression Rewrite our objec@ve func@on as Deriva@on of MLE for Linear Regression Rewrite our objec@ve func@on as Get the deriva@ve (or gradient) Get the deriva@ve (or gradient) Set our deriva@ve to 0 Ordinary least squares solu)on Geometric Interpreta@on Geometric Interpreta@on Remember we have Therefore the projected value of y is à This corresponds to an orthogonal project of y onto the column space of X Overfiung Overfiung 12

A Prior on the Weight Zero-mean Gaussian prior A Prior on the Weight Zero-mean Gaussian prior New objec@ve func@on A Prior on the Weight Zero-mean Gaussian prior We want to minimize Ridge Regression New objec@ve func@on Ridge Regression Ridge Regression We want to minimize We want to minimize L2 regulariza)on New es@ma@on for the weight New es@ma@on for the weight 13

What we learned Basic concepts in machine learning K-nearest neighbors non-parametric Linear regression parametric Homework Reading Murphy ch1, ch2, and ch7 Sign up at Piazza hrp://piazza.com/northeastern/spring2016/cs6140 Start thinking about course project and find a team! Project proposal due Jan 28th Ridge regression parametric 14