n Learning is useful as a system construction method n Examples of systems that employ ML? q Supervised learning: correct answers for each example

Similar documents
Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

CS Machine Learning

CSL465/603 - Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Probabilistic Latent Semantic Analysis

A Case Study: News Classification Based on Term Frequency

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Rule Learning with Negation: Issues Regarding Effectiveness

Laboratorio di Intelligenza Artificiale e Robotica

Learning From the Past with Experiment Databases

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Rule Learning With Negation: Issues Regarding Effectiveness

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Lecture 1: Basic Concepts of Machine Learning

Universidade do Minho Escola de Engenharia

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Artificial Neural Networks written examination

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Lecture 10: Reinforcement Learning

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Welcome to. ECML/PKDD 2004 Community meeting

Switchboard Language Model Improvement with Conversational Data from Gigaword

Assignment 1: Predicting Amazon Review Ratings

Laboratorio di Intelligenza Artificiale e Robotica

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Human Emotion Recognition From Speech

Issues in the Mining of Heart Failure Datasets

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

CS 446: Machine Learning

UMass at TDT Similarity functions 1. BASIC SYSTEM Detection algorithms. set globally and apply to all clusters.

Semi-Supervised Face Detection

Axiom 2013 Team Description Paper

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Speech Emotion Recognition Using Support Vector Machine

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Linking Task: Identifying authors and book titles in verbose queries

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Introduction to Simulation

The stages of event extraction

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Introduction to Causal Inference. Problem Set 1. Required Problems

Reducing Features to Improve Bug Prediction

Applications of memory-based natural language processing

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Model Ensemble for Click Prediction in Bing Search Ads

Multivariate k-nearest Neighbor Regression for Time Series data -

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Online Updating of Word Representations for Part-of-Speech Tagging

Content-free collaborative learning modeling using data mining

Mining Student Evolution Using Associative Classification and Clustering

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Applications of data mining algorithms to analysis of medical data

SARDNET: A Self-Organizing Feature Map for Sequences

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Georgetown University at TREC 2017 Dynamic Domain Track

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Preference Learning in Recommender Systems

Going to School: Measuring Schooling Behaviors in GloFish

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

16.1 Lesson: Putting it into practice - isikhnas

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Generative models and adversarial training

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

A survey of multi-view machine learning

The Enterprise Knowledge Portal: The Concept

Exposé for a Master s Thesis

A Vector Space Approach for Aspect-Based Sentiment Analysis

Multi-label Classification via Multi-target Regression on Data Streams

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Speech Recognition by Indexing and Sequencing

Toward Probabilistic Natural Logic for Syllogistic Reasoning

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Why Did My Detector Do That?!

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

INPE São José dos Campos

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

Automatic document classification of biological literature

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Time series prediction

ON BEHAVIORAL PROCESS MODEL SIMILARITY MATCHING A CENTROID-BASED APPROACH

Transcription:

Learning Learning from Data Russell and Norvig Chapter 18 Essential for agents working in unknown environments Learning is useful as a system construction method q Expose the agent to reality rather than trying to write it down Learning modifies the agent's decision mechanisms to improve performance Learning from examples Machine learning is ubiquitous Examples of systems that employ ML? Supervised learning: Given labeled examples of each digit, learn a classification rule Examples of learning tasks Learning OCR (Optical Character Recognition) Loan risk diagnosis Medical diagnosis Credit card fraud detection Speech recognition (e.g., in automatic call handling systems) Spam filtering Collaborative filtering (recommender systems) Biometric identification (fingerprints, iris scan, face) Information retrieval (incl. web searching) Data mining, e.g. customer purchase behavior Customer retention Bioinformatics: prediction of properties of genes and proteins. The agent tries to learn from the data (examples) provided to it. The agent receives feedback that tells it how well it is doing. There are several learning scenarios according to the type of feedback: q Supervised learning: correct answers for each example q Unsupervised learning: correct answers not given q Reinforcement learning: occasional rewards (e.g. learning to play a game). Each scenario has appropriate learning algorithms 1

16 14 12 10 8 6 4 11/14/14 ML tasks Classification: discrete/categorical labels Regression: continuous labels Clustering: no labels 2 7 0 2 4 6 8 10 12 14 Occam s Razor Ockham s razor: prefer the simplest hypothesis consistent with data http://old.aitopics.org/aitoons Learning is concerned with accurate prediction of future data, not accurate prediction of training data. 2

Overfitting in classification Supervised Learning Example: want to classify versus Data: Labeled images D = {(x i,y i )} n i=1 x i is a vector that represents the the image Task: Here is a new image: What species is it? The Nearest Neighbor Method (your first classification algorithm!) NN(image): Distance measures How to measure closeness? 1. Find the image in the training data which is closest to the query image. 2. Return its label. query closest image Distance measures k-nn How to measure closeness? Discrete data: Hamming distance Continuous data: Euclidean distance Sequence data: edit distance Use the closest k neighbors to make a decision instead of a single nearest neighbor Why do you expect this to work better? Alternative: use a similarity measure (or dot product) rather than a distance 3

Remarks on NN methods Very easy to implement No training required. All the computation performed in classifying an example (complexity: O(n) ) Need to store the whole training set (memory inefficient). Flexible, no prior assumptions (a type of non parametric classifier: does not assume anything about the data). Curse of dimensionality: if data has many features that are irrelevant/noisy distances are always large. Take home question How would you convert the k-nearest-neighbor classification method to a regression method? Or how accurate is my classifier. The error rate on a set of examples D = {(x i,y i )} n i=1 : I is the indicator function that returns 1 if its argument is True and zero otherwise What is the error rate of a nearest neighbor classifier applied to its training set? The error rate on a set of examples D = {(x i,y i )} n i=1 : The error rate on a set of examples D = {(x i,y i )} n i=1 : I is the indicator function that returns 1 if its argument is True and zero otherwise Report error rates computed on an independent test set (classifier was trained using training set): classifier performance on the training set is not indicative of performance on unseen data. I is the indicator function that returns 1 if its argument is True and zero otherwise Issue when classes are imbalanced. There are other measures of performance that address this. 4

Split data into training set and test set (say 70%, 30%). Compare several classifiers trained on this split. Train final best classifier on the full dataset. A better method: cross-validation Cross-validation Split data into k parts (E 1,,E k ) for i = 1,,k : training set = D\E i test set = E i classifier.train(training set) accumulate results of classifier.test(test set) This is called k-fold cross-validation Extreme version: Leave-One-Out Assumptions? Uses of CV CV-based model selection Cross Validation is used to choose: Classifier parameters q k for k-nn Normalization method Which classifier Feature selection (which features provide best performance). This is called model selection We re trying to determine which classifier to use Classifier Training error f 1 f 2 f 3 f 4 f 5 f 6 CV-error choice CV-based model selection Example: choosing k for the k-nn algorithm: Classifier Training error K = 1 K = 2 K = 3 K = 4 K = 5 K = 6 CV-error choice Show demo 5

The general workflow Formulate problem Get data Decide on a representation (what features to use) Choose a classifier Assess the performance of the classifier Depending on the results: modify the representation, classifier, or look for more data Next More classifiers: Decision trees How to use a probabilistic model such as a Bayesian network as a classifier 6