Overview of Introduction

Similar documents
Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

Python Machine Learning

Chapter 2 Rule Learning in a Nutshell

A Version Space Approach to Learning Context-free Grammars

Artificial Neural Networks written examination

(Sub)Gradient Descent

A Case Study: News Classification Based on Term Frequency

Rule Learning With Negation: Issues Regarding Effectiveness

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Reducing Features to Improve Bug Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Learning From the Past with Experiment Databases

Mining Association Rules in Student s Assessment Data

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Mining Student Evolution Using Associative Classification and Clustering

Assignment 1: Predicting Amazon Review Ratings

Rule Learning with Negation: Issues Regarding Effectiveness

CSL465/603 - Machine Learning

A Comparison of Standard and Interval Association Rules

TD(λ) and Q-Learning Based Ludo Players

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Universidade do Minho Escola de Engenharia

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Applications of data mining algorithms to analysis of medical data

Multivariate k-nearest Neighbor Regression for Time Series data -

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Data Stream Processing and Analytics

Knowledge-Based - Systems

MYCIN. The MYCIN Task

Generative models and adversarial training

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Proof Theory for Syntacticians

Australian Journal of Basic and Applied Sciences

Cooperative evolutive concept learning: an empirical study

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Softprop: Softmax Neural Network Backpropagation Learning

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Issues in the Mining of Heart Failure Datasets

Online Updating of Word Representations for Part-of-Speech Tagging

Linking Task: Identifying authors and book titles in verbose queries

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Using dialogue context to improve parsing performance in dialogue systems

Word learning as Bayesian inference

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

INPE São José dos Campos

STAT 220 Midterm Exam, Friday, Feb. 24

Reinforcement Learning by Comparing Immediate Reward

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Speech Emotion Recognition Using Support Vector Machine

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

EVOLVING POLICIES TO SOLVE THE RUBIK S CUBE: EXPERIMENTS WITH IDEAL AND APPROXIMATE PERFORMANCE FUNCTIONS

Word Segmentation of Off-line Handwritten Documents

Multi-label Classification via Multi-target Regression on Data Streams

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

An Empirical and Computational Test of Linguistic Relativity

Introduction to Questionnaire Design

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Welcome to. ECML/PKDD 2004 Community meeting

Radius STEM Readiness TM

Axiom 2013 Team Description Paper

Learning Methods for Fuzzy Systems

Introduction to Causal Inference. Problem Set 1. Required Problems

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Learning to Schedule Straight-Line Code

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Laboratorio di Intelligenza Artificiale e Robotica

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Abstractions and the Brain

Multi-label classification via multi-target regression on data streams

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

An Introduction to Simio for Beginners

How do adults reason about their opponent? Typologies of players in a turn-taking game

Evolutive Neural Net Fuzzy Filtering: Basic Description

Probability and Statistics Curriculum Pacing Guide

Semi-Supervised Face Detection

On-Line Data Analytics

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

Calibration of Confidence Measures in Speech Recognition

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

CSC200: Lecture 4. Allan Borodin

CS 446: Machine Learning

Transcription:

Overview of Introduction Machine Learning Problem definition Example Tasks Dimensions of Machine Learning Problems Example Representation Concept Representation Learning Tasks Evaluation Scenarios Induction of Classifiers Characteristics of this framework A small example Learning and Search Generalization and Bias Data Mining Motivation Relation to Machine Learning 1 J. Fürnkranz

Was ist Lernen? Lernen ist der Sammelname für Vorgänge, Prozesse oder nicht unmittelbar beobachtbare Veränderungen im Organismus, die durch Erfahrungen entstehen und zu Veränderungen des Verhaltens führen.'' [Bergius,1971] Lernen bedeutet Veränderungen in der Wahrscheinlichkeit, mit der Verhaltensweisen in bestimmten Situationen auftreten. [Hilgard,1973] Lernen ist eine Verhaltensänderung, die nicht durch Reifungsvorgänge, Verletzungen oder Erkrankungen, Ermüdungsprozesse oder durch Anlagen erklärt werden kann.'' [Joerger,1976] 2 J. Fürnkranz

Machine Learning Learning denotes changes in the system that... enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time.'' [Simon,1983] Learning is making useful changes in our minds.'' [Minsky,1985] Learning is constructing or modifying representations of what is being experienced.'' [Michalski,1986] 3 J. Fürnkranz

Machine Learning Problem Definition Definition (Mitchell 1997) A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Given: a task T a performance measure P some experience E with the task Goal: generalize the experience in a way that allows to improve your performance on the task 4 J. Fürnkranz

Learning to Play Backgammon Task: play backgammon Performance Measure: percentage of games won Experience: previous games played TD-Gammon: learned a neural network for evaluating backgammon boards from playing millions of games against itself successively improved to world-champion strength http://www.research.ibm.com/massive/tdl.html GNU Backgammon: http://www.gnu.org/software/gnubg/ 5 J. Fürnkranz

Recognizing Spam-Mail Task: sort E-mails into categories (e.g., Regular / Spam) Performance Measure: Weighted Sum of Mistakes (letting spam through is not so bad as misclassifying regular E-mail as spam) Experience: Handsorted E-mail messages in your folder In Practice: Many Spam-Filters (e.g., Mozilla) use Bayesian Learning for recognizing spam mails 6 J. Fürnkranz

Market Basket Analysis Task: discover items that are frequently bought together Performance Measure:? (revenue by making use of the discovered patterns) Experience: Supermarket check-out data Myth: The most frequently cited result is: diapers beer 7 J. Fürnkranz

Learning to Classify Stars Task: Classification of celestial bodies Data: 3000 images (23,040 x 23,040 pixels) of the Palomar Sky Observatory, 3 Terabytes of data, classified into 10 7 galaxies, 10 8 stars, 10 5 quasars representation with 40 attributes (image-processing) Method: learning of multiple decision trees combining the best rules of each tree SKICAT Performance: 94% accuracy, better than astronomers discovery of 16 new quasars (12/95) 8 J. Fürnkranz

Dimensions of Learning Problems Example Representation attribute-value data vs. first-order logic Type of training information supervised vs. unsupervised learning Availability of training examples batch learning vs. on-line learning (incremental learning) Concept representation IF-THEN rules, decision trees, neural networks... Learning algorithm divide-and-conquer, back-propagation,... Evaluation scenario estimating predictive performance, cost-sensitive-learning, 10 J. Fürnkranz

Example Representation Attribute-Value data: Each example is described with values for a fixed number of attributes Nominal Attributes: store an unordered list of symbols (e.g., color) Numeric Attributes: store a number (e.g., income) Other Types: ordered values hierarchical attributes set-valued attributes the data corresponds to a single relation (spreadsheed) Multi-Relational data: The relevant information is distributed over multiple relations Inductive Logic Programming 11 J. Fürnkranz

Type of Training Information Supervised Learning: A teacher provides the value for the target function for all training examples (labeled examples) concept learning, classification, regression Semi-supervised Learning: Only a subset of the training examples are labeled (labeling examples is expensive!) Reinforcement Learning: A teacher provides feedback about the values of the target function chosen by the learner Unsupervised Learning: There is no information except the training examples clustering, subgroup discovery, association rule discovery 12 J. Fürnkranz

Example Availability Batch Learning The learner is provided with a set of training examples Incremental Learning / On-line Learning There is constant stream of training examples Active Learning The learner may choose an example and ask the teacher for the relevant training information 13 J. Fürnkranz

Concept Representation Most Learners generalize the training examples into an explicit representation (called a model, function, hypothesis, concept...) mathematical functions (e.g., polynomial of 3 rd degree) logical formulas (e.g., propositional IF-THEN rules) decision trees neural networks... Lazy Learning do not compute an explicit model generalize on demand for a given training example example: nearest neighbor classification 14 J. Fürnkranz

A Selection of Learning Techniques Decision and Regression Trees Classification Rules Association Rules Inductive Logic Programming Neural Networks Support Vector Machines Statistical Modeling Clustering Techniques Case-Based Reasoning Genetic Algorithms 15 J. Fürnkranz

Evaluation of Learned Models Validation through experts a domain experts evaluates the plausibility of a learned model + subjective, time-intensive, costly but often the only option (e.g., clustering) Validation on data evaluate the accuracy of the model on a separate dataset drawn from the same distribution as the training data labeled data are scarce, could be better used for training + fast and simple, off-line, no domain knowledge needed, methods for re-using training data exist (e.g., cross-validation) On-line Validation test the learned model in a fielded application + gives the best estimate for the overall utility bad models may be costly 16 J. Fürnkranz

Induction of Classifiers The most popular learning problem: Task: learn a model that predicts the outcome of a dependent variable for a given instance Experience: experience is given in the form of a data base of examples an example describes a single previous observation instance: a set of measurements that characterize a situation label: the outcome that was observed in this siutation Performance Measure: compare the predicted outcome to the observed outcome estimate the probability of predicting the right outcome in a new situation 17 J. Fürnkranz

Induction of Classifiers Characteristics attribute-value representation (single relation) batch learning from off-line data (data are available from external sources) supervised learning (examples are pre-classified) numerous learning algorithms for practically all concept representations (decision trees, rules, neural networks, SVMs, statistical models,...) often greedy algorithms (may not find optimal solution, but fast processing of large datasets) evaluation by estimating predictive accuracy (on a portion of the available data) 18 J. Fürnkranz

Induction of Classifiers Inductive Machine Learning algorithms induce a classifier from labeled training examples. The classifier generalizes the training examples, i.e. it is able to assign labels to new cases. Training An inductive learning algorithm searches in a given family of hypotheses (e.g., decision trees, neural networks) for a member that optimizes given quality criteria (e.g., estimated predictive accuracy or misclassification costs). Example Classifier Classification 19 J. Fürnkranz

A Sample Task Day Temperature Outlook Humidity Windy Play Golf? 07-05 hot sunny high false no 07-06 hot sunny high true no 07-07 hot overcast high false yes 07-09 cool rain normal false yes 07-10 cool overcast normal true yes 07-12 mild sunny high false no 07-14 cool sunny normal false yes 07-15 mild rain normal false yes 07-20 mild sunny normal true yes 07-21 mild overcast high true yes 07-22 hot overcast normal false yes 07-23 mild rain high true no 07-26 cool rain normal true no 12-30 mild rain high false yes today cool sunny normal false? tomorrow mild sunny normal false? 20 J. Fürnkranz

Rote Learning Day Temperature Outlook Humidity Windy Play Golf? 07-05 hot sunny high false no 07-06 hot sunny high true no 07-07 hot overcast high false yes 07-09 cool rain normal false yes 07-10 cool overcast normal true yes 07-12 mild sunny high false no 07-14 cool sunny normal false yes 07-15 mild rain normal false yes 07-20 mild sunny normal true yes 07-21 mild overcast high true yes 07-22 hot overcast normal false yes 07-23 mild rain high true no 07-26 cool rain normal true no 07-30 mild rain high false yes today cool sunny normal false yes 21 J. Fürnkranz

Nearest Neighbor Classifier K-Nearest Neighbor algorithms classify a new example by comparing it to all previously seen examples. The classifications of the k most similar previous cases are used for predicting the classification of the current example. Training The training examples are used for providing a library of sample cases re-scaling the similarity function to maximize performance New Example? Classification 22 J. Fürnkranz

Nearest Neighbor Day Temperature Outlook Humidity Windy Play Golf? 07-05 hot sunny high false no 07-06 hot sunny high true no 07-07 hot overcast high false yes 07-09 cool rain normal false yes 07-10 cool overcast normal true yes 07-12 mild sunny high false no 07-14 cool sunny normal false yes 07-15 mild rain normal false yes 07-20 mild sunny normal true yes 07-21 mild overcast high true yes 07-22 hot overcast normal false yes 07-23 mild rain high true no 07-26 cool rain normal true no 12-30 mild rain high false yes tomorrow mild sunny normal false yes 23 J. Fürnkranz

Decision Trees a decision tree consists of Nodes: test for the value of a certain attribute Edges: correspond to the outcome of a test connect to the next node or leaf Leaves: terminal nodes that predict the outcome an example is classified 1. start at the root 2. perform the test 3. follow the corresponding edge 4. goto 2. unless leaf 5. predict that outcome associated with the leaf 24 J. Fürnkranz

Decision Tree Learning In Decision Tree Learning, a new example is classified by submitting it to a series of tests that determine the class label of the example.these tests are organized in a hierarchical structure called a decision tree. Training? The training examples are used for choosing appropriate tests in the decision tree. Typically, a tree is built from top to bottom, where tests that maximize the information gain about the classification are selected first. New Example Classification 25 J. Fürnkranz

Decision Tree Learning tomorrow mild sunny normal false? 26 J. Fürnkranz

A Different Decision Tree also explains all of the training data will it generalize well to new data? 27 J. Fürnkranz

Learning and Search Learning may be viewed as a search problem Search space: the space of all possible hypotheses of the chosen hypothesis class (e.g., all decision trees) Find: find a hypothesis that is likely to underly the data Different search techniques: exhaustive search enumerate all hypotheses typically infeasible (some hypotheses spaces are even infinite!) greedy search incrementally build up a solution use heuristics to choose the next solution step randomized search e.g., evolutionary algorithms 28 J. Fürnkranz

Bias and Generalization Bias: (Machine Learning Definition) Any criterion that prefers one concept over another except for completeness/consistency on the training data. Language Bias: Choose a hypothesis representation language Selection Bias: Which hypotheses will be preferred during the search? Overfitting Avoidance Bias: Avoid too close approximations to training data Bias is necessary for generalization without bias all complete and consistent hypotheses are equally likely for any example, half of them will predict one class, the other half the opposite class (no free lunch theorems) 29 J. Fürnkranz

Data Mining - Motivation "Computers have promised us a fountain of wisdom but delivered a flood of data." "It has been estimated that the amount of information in the world doubles every 20 months." [Frawley, Piatetsky-Shapiro, Matheus, 1992] 30 J. Fürnkranz

World-Wide Data Growth Science satellite monitoring human genome Business OLTP (on-line transaction processing) data warehouses e-commerce Industry process data World-Wide Web 31 J. Fürnkranz

Data Mining Mining for nuggets of knowledge in mountains of Data. 32 J. Fürnkranz

Definition Data Mining is a non-trivial process of identifying valid novel potentially useful ultimately understandable patterns in data. (Fayyad et al. 1996) It employs techniques from machine learning statistics databases Or maybe: Data Mining is torturing your database until it confesses. (Mannila (?)) 33 J. Fürnkranz

The Data Mining Process Fayyad, Piatetsky-Shapiro, Smyth, 1996 34 J. Fürnkranz