DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CSL465/603 - Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Assignment 1: Predicting Amazon Review Ratings

CS Machine Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning From the Past with Experiment Databases

Generative models and adversarial training

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Lecture 1: Basic Concepts of Machine Learning

Probability and Statistics Curriculum Pacing Guide

CS 446: Machine Learning

The Evolution of Random Phenomena

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

arxiv: v1 [cs.lg] 15 Jun 2015

Rule Learning With Negation: Issues Regarding Effectiveness

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Australian Journal of Basic and Applied Sciences

Rule Learning with Negation: Issues Regarding Effectiveness

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A survey of multi-view machine learning

Knowledge Transfer in Deep Convolutional Neural Nets

Speech Emotion Recognition Using Support Vector Machine

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Model Ensemble for Click Prediction in Bing Search Ads

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Axiom 2013 Team Description Paper

Reducing Features to Improve Bug Prediction

Probabilistic Latent Semantic Analysis

STA 225: Introductory Statistics (CT)

Softprop: Softmax Neural Network Backpropagation Learning

Semi-Supervised Face Detection

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Indian Institute of Technology, Kanpur

Artificial Neural Networks written examination

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

arxiv: v2 [cs.cv] 30 Mar 2017

Probability and Game Theory Course Syllabus

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Time series prediction

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

WHEN THERE IS A mismatch between the acoustic

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Human Emotion Recognition From Speech

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Statewide Framework Document for:

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Switchboard Language Model Improvement with Conversational Data from Gigaword

Universidade do Minho Escola de Engenharia

arxiv: v1 [cs.lg] 3 May 2013

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Word Segmentation of Off-line Handwritten Documents

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

A Case Study: News Classification Based on Term Frequency

Word learning as Bayesian inference

Issues in the Mining of Heart Failure Datasets

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Speech Recognition at ICSI: Broadcast News and beyond

Mathematics subject curriculum

Applications of data mining algorithms to analysis of medical data

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

INPE São José dos Campos

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

A Comparison of Two Text Representations for Sentiment Analysis

A Review: Speech Recognition with Deep Learning Methods

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

CS/SE 3341 Spring 2012

A study of speaker adaptation for DNN-based speech synthesis

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Calibration of Confidence Measures in Speech Recognition

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Transcription:

DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 10 2019

Class Outline Introduction 1 week Probability and linear algebra review Supervised learning - 7 weeks Linear regression Classification (logistic regression, LDA, knn, decision trees, random forest, SVM, Naïve Bayes) Model selection, regularization, cross validation Neural networks and deep learning 2 weeks Back-propagation, gradient descent NN architectures (feed-forward, convolutional, recurrent) Unsupervised learning 1-2 weeks Dimensionality reduction (PCA) Clustering (k-means, hierarchical) Adversarial ML 1 lecture Security of ML at testing and training time 2

Schedule and Resources Instructors Alina Oprea TA: Ewen Wang Schedule Tue 11:45am 1:25pm, Thu 2:50-4:30pm Shillman Hall 210 Office hours: Alina: Thu 4:30 6:00 pm (ISEC 625) Ewen: Monday 5:30-6:30pm (ISEC 605) Online resources Slides will be posted after each lecture Use Piazza for questions, Gradescope for homework and project submission 3

Grading Assignments 25% 4-5 assignments and programming exercises based on studied material in class Final project 35% Select your own project based on public dataset Submit short project proposal and milestone Presentation at end of class (10 min) and report Exam 35% One exam about 3/4 in the class Tentative end of March Class participation 5% Participate in class discussion and on Piazza 4

Supervised learning Classification Regression Outline Unsupervised learning Clustering Bias-Variance Tradeoff Occam s Razor Probability review 5

Example 1 Handwritten digit recognition MNIST dataset: Predict the digit Multi-class classifier 6

Supervised Learning: Classification Training Data Preprocessing Feature extraction Learning model Labeled x (i), y (i) {0,1} Normalization Feature Selection Classification f(x) Testing New data Unlabeled x Learning model f(x) Predictions Positive Negative Classification y = f x {0,1} 7

Training data Classification x (i) (i) (i) = [x 1, xd ]: vector of image pixels Error Size d = 28x28 = 784 y (i) : image label (in {0,1}) Models (hypothesis) Example: Linear model f x = wx + b Classify 1 if f x > T ; 0 otherwise Classification algorithm Training: Learn model parameters w, b to minimize error (number of training examples for which model gives wrong label) Output: optimal model Testing Apply learned model to new data and generate prediction 8

Example Classifiers Linear classifiers: logistic regression, SVM, LDA Decision trees SVM polynomial kernel 9

Real-world example: Spam email SPAM email Unsolicited Advertisement Sent to a large number of people 10

Classifying spam email Content-related features Use of certain words Word frequencies Language Sentence Structural features Sender IP address IP blacklist DNS information Email server URL links (non-matching) 11

Classifying spam email SPAM REGULAR New email Numerical Feature extraction Content Structural Classifier Logistic regression Decision tree SVM Model Labeled data SPAM REGULAR SPAM Filter Training Testing REGULAR Allow 12

Example 2 Stock market prediction 13

Linear regression 1 dimension Volume x (1),, x (N) y (1),, y N R Numerical x (i) (i) (i) = (xx 1 i =, (x i1,, x xd id ) - d predictors (features) y i - response variable y (i) 14

Income Prediction Linear Regression Non-Linear Regression Polynomial/Spline Regression 15

Supervised Learning: Regression Training Data Preprocessing Feature extraction Learning model Labeled x (i), y (i) R Normalization Feature Selection Regression f(x) Testing New data Unlabeled x Learning model f(x) Predictions Response variable Regression y = f x R 16

Example 3: image search Find similar images to a target one 17

K-means Clustering K=3 18

K-means Clustering K=6 19

Hierarchical Clustering 2020

Unsupervised Learning Clustering Group similar data points into clusters Example: k-means, hierarchical clustering Dimensionality reduction Project the data to lower dimensional space Example: PCA (Principal Component Analysis) Feature learning Find feature representations Example: Autoencoders 21

Supervised Learning Tasks Classification Learn to predict class (discrete) Minimize classification error 1/N σ N i=1 [y i f(x (i) )] Regression Learn to predict response variable (numerical) Minimize MSE (Mean Square Error) 1/N σ N i=1 y i f x i 2 Both classification and regression Training and testing phase Optimal model is learned in training and applied in testing 22

Learning Challenges Goal Classify well new testing data Model generalizes well to new testing data Variance Amount by which model would change if we estimated it using a different training data set More complex models result in higher variance Bias Error introduced by approximating a real-life problem by a much simpler model E.g., assume linear model (linear regression), then error is high More complex models result in lower bias Bias-Variance tradeoff 23

Example: Regression 24

Bias-Variance Tradeoff Generalizes well on new data Model underfits the data Model overfits the data 25

Occam s Razor Select the simplest machine learning model that gets reasonable accuracy for the task at hand 26

Recap ML is a subset of AI designing learning algorithms Learning tasks are supervised (e.g., classification and regression) or unsupervised (e.g., clustering) Supervised learning uses labeled training data Learning the best model is challenging Design algorithm to minimize the error Bias-Variance tradeoff Need to generalize on new, unseen test data Occam s razor (prefer simplest model with good performance) 27

Probability review 2 8

Discrete Random Variables 29

Visualizing A 30

Axioms of Probability 31

Interpreting the Axioms 32

Interpreting the Axioms 33

Interpreting the Axioms 34

The union bound For events A and B P[ A B ] P[A] + P[B] U A B Axiom: P[ A B ] = P[A] + P[B] P[A B] If A B = Φ, then P[ A B ] = P[A] + P[B] Example: A 1 = { all x in {0,1} n s.t lsb 2 (x)=11 } ; A 2 = { all x in {0,1} n s.t. msb 2 (x)=11 } P[ lsb 2 (x)=11 or msb 2 (x)=11 ] = P[A 1 A 2 ] ¼+¼ = ½ 35

Negation Theorem 36

Random Variables (Discrete) Def: a random variable X is a function X:U V Def: A discrete random variable takes a finite number of values: V is finite Example: X is modeling a coin toss with output 1 (heads) or 0 (tail) Pr[X=1] = p, Pr[X=0] = 1-p Bernoulli Random Variable We write X U to denote a uniform random variable (discrete) over U for all u U: Pr[ X = u ] = 1/ U Example: If p=1/2; then X is a uniform coin toss Probability Mass Function (PMF): p(u) = Pr[X = u] 37

Example 1. X is the number of heads in a sequence of n coin tosses What is the probability P[X = k]? P X = k = ( n k ) pk 1 p n k Binomial Random Variable 2. X is the sum of two fair dice What is the probability P[X = k] for k {2,, 12}? P[X=2]=1/36; P[X=3]=2/36; P[X=4]= 3/36 For what k is P[X = k] highest? 38

Expectation and variance Expectation for discrete random variable X Properties E ax = a E X Linearity: E X + Y = E X + E Y Variance E X = vpr[x = v] v 39

Conditional Probability Def: Events A and B are independent if and only if Pr[ A B ] = Pr[A] Pr[B] If A and B are independent Pr[A B] = Pr A B Pr[B] = Pr A]Pr[B Pr[B] = Pr[A] 40

Acknowledgements Slides made using resources from: Andrew Ng Eric Eaton David Sontag Thanks! 41