Lecture 1: Introduction to Machine Learning

Similar documents
Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

(Sub)Gradient Descent

CSL465/603 - Machine Learning

CS Machine Learning

Probabilistic Latent Semantic Analysis

Laboratorio di Intelligenza Artificiale e Robotica

Lecture 1: Basic Concepts of Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Australian Journal of Basic and Applied Sciences

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Laboratorio di Intelligenza Artificiale e Robotica

Assignment 1: Predicting Amazon Review Ratings

Rule Learning with Negation: Issues Regarding Effectiveness

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Artificial Neural Networks written examination

Generative models and adversarial training

Learning From the Past with Experiment Databases

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Axiom 2013 Team Description Paper

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Reducing Features to Improve Bug Prediction

Using dialogue context to improve parsing performance in dialogue systems

Word Segmentation of Off-line Handwritten Documents

A Case Study: News Classification Based on Term Frequency

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Speech Recognition at ICSI: Broadcast News and beyond

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

12- A whirlwind tour of statistics

Calibration of Confidence Measures in Speech Recognition

An investigation of imitation learning algorithms for structured prediction

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Semi-Supervised Face Detection

Learning Methods for Fuzzy Systems

Applications of data mining algorithms to analysis of medical data

Truth Inference in Crowdsourcing: Is the Problem Solved?

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Issues in the Mining of Heart Failure Datasets

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Reinforcement Learning by Comparing Immediate Reward

Introduction to Causal Inference. Problem Set 1. Required Problems

Linking Task: Identifying authors and book titles in verbose queries

WHEN THERE IS A mismatch between the acoustic

Disambiguation of Thai Personal Name from Online News Articles

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

arxiv: v2 [cs.cv] 30 Mar 2017

A survey of multi-view machine learning

On the Combined Behavior of Autonomous Resource Management Agents

Intelligent Agents. Chapter 2. Chapter 2 1

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Cross-lingual Short-Text Document Classification for Facebook Comments

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

arxiv: v1 [cs.lg] 15 Jun 2015

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Speeding Up Reinforcement Learning with Behavior Transfer

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Lecture 10: Reinforcement Learning

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Chapter 2 Rule Learning in a Nutshell

A study of speaker adaptation for DNN-based speech synthesis

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

TD(λ) and Q-Learning Based Ludo Players

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Multivariate k-nearest Neighbor Regression for Time Series data -

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Top US Tech Talent for the Top China Tech Company

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Human Emotion Recognition From Speech

Detecting English-French Cognates Using Orthographic Edit Distance

The taming of the data:

Learning Methods in Multilingual Speech Recognition

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Indian Institute of Technology, Kanpur

Natural Language Processing. George Konidaris

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Using focal point learning to improve human machine tacit coordination

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

For Jury Evaluation. The Road to Enlightenment: Generating Insight and Predicting Consumer Actions in Digital Markets

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Transcription:

Statistical Methods for Intelligent Information Processing (SMIIP) Lecture 1: Introduction to Machine Learning Shuigeng Zhou School of Computer Science September 13, 2017

What is machine learning? Machine learning is an application of artificial intelligence that automates analytical model building by using algorithms that iteratively learn from data without being explicitly programmed where to look 2017/9/25 SMIIP 2

What is machine learning? A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E Tom M. Mitchell 2017/9/25 SMIIP 3

What is machine learning? Machine learning is about predicting the future based on the past Hal Daume III (University of Maryland) 2017/9/25 SMIIP 4

What is machine learning? Machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest Kevin P. Murphy 2017/9/25 SMIIP 5

The key point is More data / experience / knowledge can make the methods / algorithms / programs more effective or smarter 2017/9/25 SMIIP 6

Machine learning: types & applications Different types of machine learning Supervised learning Unsupervised learning Semi-supervised learning Reinforcement learning Transfer learning Deep learning etc. Applications Pattern recognition Data mining Natural language processing (NLP) Computer vision Bioinformatics etc. 2017/9/25 SMIIP 7

Machine learning: from different views What data is available Supervised, unsupervised, reinforcement learning semi-supervised, active learning, How are we getting the data online vs. offline learning Types of model generative vs. discriminative parametric vs. non-parametric 2017/9/25 SMIIP 8

Popular machine learning tasks Supervised learning Unsupervised learning discrete label Classification k-nearest Neighbors Naive Bayes Support vector machines Decision trees Clustering k-means DBSCAN continuous label Regression Linear Locally weighted linear Ridge Lasso Density estimation Expectation maximization Parzen window 2017/9/25 SMIIP 9

Supervised learning examples label label 1 label 3 labeled examples label 4 label 5 Supervised learning: given labeled examples 2017/9/25 SMIIP 10

Supervised learning label label 1 label 3 model/ predictor label 4 label 5 Supervised learning: given labeled examples 2017/9/25 SMIIP 11

Supervised learning model/ predictor predicted label Supervised learning: learn to predict new example 2017/9/25 SMIIP 12

Supervised learning label apple apple Classification: a finite set of labels banana banana Supervised learning: given labeled examples 2017/9/25 SMIIP 13

Classifying flowers 2017/9/25 SMIIP 14

Classifying flowers 2017/9/25 SMIIP 15

Handwriting recognition 2017/9/25 SMIIP 16

Unsupervised learning (clustering) Learning what normally happens Clustering: Grouping similar instances Splitting a dataset to different groups to maximize intra-group similarity and inter-group difference Example applications Customer segmentation in CRM Image compression: Color quantization Bioinformatics: Learning motifs 2017/9/25 SMIIP 17

Unsupervised learning (clustering) Unsupervised learning: given data, i.e. examples, but no labels 2017/9/25 SMIIP 18

Reinforcement learning left, right, straight, left, left, left, straight left, straight, straight, left, right, straight, straight GOOD BAD left, right, straight, left, left, left, straight left, straight, straight, left, right, straight, straight 18.5-3 Given a sequence of examples/states and a reward after completing that sequence, learn to predict the action to take in for an individual example/state Applications: Game playing, Robot in a maze, Multiple agents, partial observability,... 2017/9/25 SMIIP 19

Reinforcement learning Backgammon WIN! LOSE! Given sequences of moves and whether or not the player won at the end, learn to make good moves 2017/9/25 SMIIP 20

Reinforcement learning 2017/9/25 SMIIP 21

Machine learning methods (Supervised learning) Certain training data (limited and well established) Determine hypothesis space (contains all possible models) Apply evaluation criterion in a certain strategy Implement solving/learning algorithms Training to select the optimal model Use the model to predict or analysis ML methods = model + strategy + algorithm 2017/9/25 SMIIP 22

Model 2017/9/25 SMIIP 23

Strategy 2017/9/25 SMIIP 24

Strategy 2017/9/25 SMIIP 25

Strategy 2017/9/25 SMIIP 26

Algorithm Solve it as an optimization problem Many kinds of algorithms used for convex optimization or non-convex optimization Stochastic Gradient Descent (SGD) many tricks and modifications Adelta/Adagrad/Adam/fancy SGD 2017/9/25 SMIIP 27

Model evaluation and selection 2017/9/25 SMIIP 28

Model evaluation and selection Overfitting a model describes random error or noise instead of the underlying relationship Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations 2017/9/25 SMIIP 29

Model evaluation and selection Underfitting A machine learning algorithm cannot capture the underlying trend of the data Underfitting would occur, for example, when fitting a linear model to non-linear data. Such a model would have poor predictive performance. 2017/9/25 SMIIP 30

Model evaluation and selection Avoid overfitting regularization: L1-norm, L2-norm Cross validation: leave-one out cross validation (LOOCV), k-fold 2017/9/25 SMIIP 31

Performance evaluation: classification Confusion matrix ( 混淆矩阵 ) / error matrix Matching matrix for unsupervised learning A kind of contingency table ( 列联表 ) with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mis-labelling one as another) Actual class Cat Dog Rabbit Predicted class Cat 5 2 0 Dog 3 3 2 Rabbit 0 1 11 2017/9/25 SMIIP 32

Performance evaluation: classification Table of confusion (sometimes also called a confusion matrix) A table with two rows and two columns that reports the number of false positives, false negatives, true positives, and true negatives. Actual class Predicted class Cat Non-cat Cat 5 True Positives 2 False Positives Non-cat 3 False Negatives 17 True Negatives 2017/9/25 SMIIP 33

Performance evaluation: classification Some terms of confusion matrix Conditional positive (P) The number of real positive examples in the data Conditional negative (N) The number of real negative examples in the data True positive (TP), or hit Correctly predicted positive True negative (TN): or correct rejection Correctly predicted negative False positive (FP) : or false alarm, Type 1 error Incorrectly predicted positive False negative (FN): or miss, Type 2 error Incorrectly predicted negative 2017/9/25 SMIIP 34

Performance evaluation: classification Recall, or sensitivity, hit, true positive rate (TPR) Recall=TP/P=TP/(TP+FN) Specificity, or true negative rate (TNR) Specificity=TNR=TN/N=TN/(TN+FN) Precision, or true positive predictive value (PPV) Precision=TP/(TP+FP) False discovery rate (FDR) FDR=FP/(TP+FP)=1-PPV Accuracy (Acc) Acc=(TP+TN)/(P+N)=(TP+TN)/(TP+FP+TN+FN) 2017/9/25 SMIIP 35

Performance evaluation: classification 2017/9/25 SMIIP 36

Performance evaluation: multiple- class classification Micro-averaging Summing up the individual true positives, false positives, and false negatives for different classes and the apply them to get the statistics Macro-averaging Just taking the average of the precision and recall of different classes Macro-average weights equally all the classes, while micro-average weights equally all the documents 2017/9/25 SMIIP 37

Assignment Reading Chapter 1 Introduction of Murphy s book. 2017/9/25 SMIIP 38

Thanks! Questions?