Introduction to Machine Learning Reykjavík University Spring Instructor: Dan Lizotte

Similar documents
Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

Lecture 1: Basic Concepts of Machine Learning

(Sub)Gradient Descent

Laboratorio di Intelligenza Artificiale e Robotica

Knowledge Transfer in Deep Convolutional Neural Nets

Lecture 1: Machine Learning Basics

Speech Recognition at ICSI: Broadcast News and beyond

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Word Segmentation of Off-line Handwritten Documents

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Rule Learning With Negation: Issues Regarding Effectiveness

A Case Study: News Classification Based on Term Frequency

Millersville University Degree Works Training User Guide

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Assignment 1: Predicting Amazon Review Ratings

Python Machine Learning

Mining Association Rules in Student s Assessment Data

MYCIN. The MYCIN Task

CSL465/603 - Machine Learning

Firms and Markets Saturdays Summer I 2014

Rule Learning with Negation: Issues Regarding Effectiveness

Lecture 10: Reinforcement Learning

A Comparison of Standard and Interval Association Rules

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

INPE São José dos Campos

Probabilistic Latent Semantic Analysis

Learning Methods for Fuzzy Systems

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Mining Student Evolution Using Associative Classification and Clustering

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

PREP S SPEAKER LISTENER TECHNIQUE COACHING MANUAL

Cal s Dinner Card Deals

Why Pay Attention to Race?

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

AQUA: An Ontology-Driven Question Answering System

The Good Judgment Project: A large scale test of different methods of combining expert predictions

ECE-492 SENIOR ADVANCED DESIGN PROJECT

Active Learning. Yingyu Liang Computer Sciences 760 Fall

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

Corpus Linguistics (L615)

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

SOFTWARE EVALUATION TOOL

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

16.1 Lesson: Putting it into practice - isikhnas

Axiom 2013 Team Description Paper

Using Web Searches on Important Words to Create Background Sets for LSI Classification

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Speeding Up Reinforcement Learning with Behavior Transfer

Generative models and adversarial training

While you are waiting... socrative.com, room number SIMLANG2016

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

An Online Handwriting Recognition System For Turkish

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Software Maintenance

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

An OO Framework for building Intelligence and Learning properties in Software Agents

Linking Task: Identifying authors and book titles in verbose queries

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Extending Place Value with Whole Numbers to 1,000,000

Australian Journal of Basic and Applied Sciences

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Individual Differences & Item Effects: How to test them, & how to test them well

Genevieve L. Hartman, Ph.D.

Artificial Neural Networks written examination

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Speech Emotion Recognition Using Support Vector Machine

Notetaking Directions

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

WHEN THERE IS A mismatch between the acoustic

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Physics 270: Experimental Physics

arxiv: v2 [cs.cv] 30 Mar 2017

Functional Maths Skills Check E3/L x

Cooperative evolutive concept learning: an empirical study

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Learning From the Past with Experiment Databases

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

A Note on Structuring Employability Skills for Accounting Students

Evidence for Reliability, Validity and Learning Effectiveness

Computerized Adaptive Psychological Testing A Personalisation Perspective

On-Line Data Analytics

Transcription:

Introduction to Machine Learning Reykjavík University Spring 2007 Instructor: Dan Lizotte Logistics To contact Dan: dlizotte@cs.ualberta.ca http://www.cs.ualberta.ca/~dlizotte/teaching/ Books: Introduction to Machine Learning, Alpaydin We ll use mostly this one Reinforcement Learning: An Introduction We ll use this somewhat at the end - it s online 1

Logistics Time MTWRF, 8:15am - 9:00am, 9:15am - 10:00am Lectures K21 (Kringlan 1) Labs Room 432 (Ofanleiti 2) What is Machine Learning Machine learning is programming computers to optimize a performance criterion using example data or past experience. Alpaydin The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. Mitchell the subfield of AI concerned with programs that learn from experience. Russell & Norvig 2

What else is Machine Learning? Data Mining The nontrivial extraction of implicit, previously unknown, and potentially useful information from data. W. Frawley and G. Piatetsky-Shapiro and C. Matheus..the science of extracting useful information from large data sets or databases. D. Hand, H. Mannila, P. Smyth Data-driven discovery of models and patterns from massive observational data sets. Padhraic Smyth This is all pretty vague You may find that in this course, we cover a bunch of loosely related topics. You re right. That s kind of what ML is. Hopefully, you will learn a little bit about a lot of things Some theory Some practice To get the most out of this course, ASK ME QUESTIONS 3

Any questions before we start? Anybody? Anybody? Really people -- now is the time but you can (and should) always ask later. Let s look at a few examples. Alpaydin, Ch 1.2 Learning Associations What things go together? Chips and beer, maybe? Suppose we want P(chips beer). The probability a particular customer will buy chips, given that he or she has bought beer. We will estimate this probability from data. P(chips beer) #(chips & beer) / #beer Just count the people who bought beer and chips, and divide by the number of people who bought beer While not glamorous, counting is learning. 4

Classification Input: features Output: label Features can be symbols, real numbers, etc [ age, height, weight, gender, hair_colour, ] Labels come from a (small) discrete set L = {Icelander, Canadian} We need a discriminant function that maps feature vectors to labels. We can learn this from data, in many ways. ( [ 27, 172, 68, M, brown, ], Canadian ) ( [ 29, 160, 54, F, brown, ], Icelander ) We can use it to predict the label of a new instance. How good are our predictions? Regression Input: features Output: response Features can be symbols, real numbers, etc [ age, height, weight, gender, hair_colour, ] Response is real-valued. - < life_span < We need a regression function that maps feature vectors to responses. We can learn this from data, in many ways. ( [ 27, 172, 68, M, brown, ], 86 ) ( [ 29, 160, 54, F, brown, ], 99 ) We can use it to predict the response of a new instance. How good are our predictions? 5

Pause: Classification vs. Regression Both are Learn a function from labeled examples. The only difference is the label s domain. Why make the distinction? Historically, they ve been studied separately The label domain can significantly impact what algorithms will work or not work Classification Separate the data. Regression Fit the data. Unsupervised Learning Take clustering for example. Input: features Output: label Features can be symbols, real numbers, etc [ age, height, weight, gender, hair_colour, ] Labels are not given a priori. (Frequently L is given.) Each label describes a subset of the data In clustering, examples that are close together are grouped So we need to define close Labels are represented by cluster centres In this case, frequently the groups really are the end result. They are subjective: Evaluation is difficult. 6

Reinforcement Learning Input: observations, rewards Output: actions Observations may be real or discrete Reward is a real number Actions may be real or discrete The situation here is one of an agent (think robot ) interacting with its environment The interaction is continuing -- actions are chosen and performance is measured. Performance can be improved (i.e. reward increased.) over time by analyzing past experience. Okay: Let s tie these together Associations, Classification, Regression, Clustering, Reinforcement Learning We re going to take features, and predict something: label, response, good action We re going to learn this predictor from previous data 7

A Closer Look at Classification We will now look at an example classification problem. Slides courtesy of Russ Greiner, and Duda, Hart, and Stork. Intro to Machine Learning (aka Pattern Recognition) Chapter 1.1 1.6, Duda, Hart, Stork Machine Perception An Example Pattern Recognition Systems The Design Cycle Learning and Adaptation Conclusion 8

Machine Perception Build a machine that can recognize patterns: Speech recognition Fingerprint identification OCR (Optical Character Recognition) DNA sequence identification Example Sort Fish into Species using optical sensing Sea bass Salmon 9

Problem Analysis Extract features from sample images: Length Width Average pixel brightness Number and shape of fins Position of mouth Classifier makes decision for FishX, based on values of these features! Preprocessing Use segmentation to isolate fish from background fish from one another Send info about each single fish to feature extractor, compresses quantity of data, into small set of features Classifier sees these features 10

Use Length? Problematic many incorrect classifications 11

Use Lightness? Better fewer incorrect classifications Still not perfect Where to place boundary? Salmon Region intersects SeaBass Region So no boundary is perfect Smaller boundary fewer SeaBass classified as Salmon Larger boundary fewer Salmon classified as SeaBass Which is best depends on misclassification costs Task of decision theory 12

Why not 2 features? Use lightness and width of fish Fish x T = [x 1, x 2 ] Lightness Width Results Much better very few incorrect classifications! 13

How to produce Better Classifier? Perhaps add other features? ideally, not correlated with current features Warning: noisy features will reduce performance Best decision boundary one that provides optimal performance Not necessarily LINE Eg Optimal Performance?? 14

Objective: Handle Novel Data Goal: Optimal performance on NOVEL data Performance on TRAINING DATA!= Performance on NOVEL data Issue of generalization! Simple (non-line) Boundary 15

Pattern Recognition Systems Sensing Using transducer (camera, microphone, ) PR system depends of the bandwidth, the resolution sensitivity distortion of the transducer Segmentation and grouping Patterns should be well separated (should not overlap) 16

Machine Learning Steps Feature extraction Discriminative features Want features INVARIANT wrt translation, rotation, scale. Classification Using feature vector (provided by feature extractor) to assign given object to a category Post Processing Exploit context (information not in the target pattern itself) to improve performance The Design Cycle Data collection Feature Choice Model Choice Training Evaluation Computational Complexity 17

Data Collection How do we know when we have collected an adequately large and representative set of examples for training and testing the system? 18

Which Features? Depends on characteristics of problem domain Ideally Simple to extract Invariant to irrelevant transformation Insensitive to noise Which Model? Try simple one If not satisfied with performance consider another class of model 19

Training Use data to obtain good classifier identify best model determine appropriate parameters Many procedures for training classifiers and choosing models Evaluation Measure error rate performance May suggest switching from one set of features to another one from one model to another 20

Computational Complexity Trade-off between computational ease and performance? How algorithm scales as function of number of features, patterns or categories? Learning and Adaptation Supervised learning A teacher provides a category label or cost for each pattern in the training set Unsupervised learning System forms clusters or natural groupings of input patterns 21

Conclusion Machine Learning has many challenging subproblems Many of these sub-problems can be solved! Many fascinating unsolved problems still remain Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher 22