Learning from Examples

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Artificial Neural Networks written examination

Lecture 1: Basic Concepts of Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A Neural Network GUI Tested on Text-To-Phoneme Mapping

(Sub)Gradient Descent

Python Machine Learning

Lecture 1: Machine Learning Basics

CS Machine Learning

MYCIN. The MYCIN Task

Evolutive Neural Net Fuzzy Filtering: Basic Description

Learning Methods for Fuzzy Systems

Chapter 2 Rule Learning in a Nutshell

Axiom 2013 Team Description Paper

INPE São José dos Campos

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Lecture 10: Reinforcement Learning

Knowledge Transfer in Deep Convolutional Neural Nets

Softprop: Softmax Neural Network Backpropagation Learning

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

CSL465/603 - Machine Learning

Knowledge-Based - Systems

Artificial Neural Networks

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

LEGO MINDSTORMS Education EV3 Coding Activities

Laboratorio di Intelligenza Artificiale e Robotica

Learning to Schedule Straight-Line Code

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Issues in the Mining of Heart Failure Datasets

Rule Learning With Negation: Issues Regarding Effectiveness

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

IMGD Technical Game Development I: Iterative Development Techniques. by Robert W. Lindeman

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

On-Line Data Analytics

Evolution of Symbolisation in Chimpanzees and Neural Nets

A Reinforcement Learning Variant for Control Scheduling

Rule Learning with Negation: Issues Regarding Effectiveness

Discriminative Learning of Beam-Search Heuristics for Planning

Seminar - Organic Computing

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

CS 446: Machine Learning

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Using focal point learning to improve human machine tacit coordination

Software Maintenance

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Test Effort Estimation Using Neural Network

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

An Empirical and Computational Test of Linguistic Relativity

Learning Methods in Multilingual Speech Recognition

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

A Version Space Approach to Learning Context-free Grammars

Calibration of Confidence Measures in Speech Recognition

Human Emotion Recognition From Speech

A Genetic Irrational Belief System

SARDNET: A Self-Organizing Feature Map for Sequences

An OO Framework for building Intelligence and Learning properties in Software Agents

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Model Ensemble for Click Prediction in Bing Search Ads

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

An empirical study of learning speed in backpropagation

On the Combined Behavior of Autonomous Resource Management Agents

University of Groningen. Systemen, planning, netwerken Bosman, Aart

arxiv: v1 [cs.cv] 10 May 2017

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Probability and Statistics Curriculum Pacing Guide

AQUA: An Ontology-Driven Question Answering System

Proof Theory for Syntacticians

Speeding Up Reinforcement Learning with Behavior Transfer

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Soft Computing based Learning for Cognitive Radio

On the Formation of Phoneme Categories in DNN Acoustic Models

Introduction to Simulation

Transfer Learning Action Models by Measuring the Similarity of Different Domains

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Introduction to Questionnaire Design

Word Segmentation of Off-line Handwritten Documents

TD(λ) and Q-Learning Based Ludo Players

The Strong Minimalist Thesis and Bounded Optimality

Speaker Identification by Comparison of Smart Methods. Abstract

Time series prediction

Intelligent Agents. Chapter 2. Chapter 2 1

Laboratorio di Intelligenza Artificiale e Robotica

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Forget catastrophic forgetting: AI that learns after deployment

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Grade 6: Correlated to AGS Basic Math Skills

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Transcription:

INF5390 Kunstig intelligens Learning from Examples Roar Fjellheim INF5390-12 Learning from Examples 1

Outline General model Types of learning Learning decision trees Neural networks Perceptrons Summary AIMA Chapter 18: Learning From Examples INF5390-12 Learning from Examples 2

Why should agents learn? Agents in previous lectures have assumed builtin knowledge, provided by designers In order to handle incomplete knowledge and changing knowledge requirements, agents must learn Learning is a way of achieving agent autonomy and the ability to improve performance over time The field in AI that deals with learning is called machine learning, and is very active INF5390-12 Learning from Examples 3

General model of learning agents Performance standard Critic Sensors Feedback Learning goals Learning element Changes Knowledge Performance element Environment Agent Problem generator Actuators INF5390-12 Learning from Examples 4

Elements of the general model Performance element Carries out the task of the agent, i.e. processes percepts and decides on actions Learning element Critic Proposes improvements of the performance element, based on previous knowledge and feedback Evaluates performance element by comparing results of its actions with imposed performance standards Problem generator Proposes exploratory actions to increase knowledge INF5390-12 Learning from Examples 5

Aspects of the learning element Which components of the performance element are to be improved Which parts of the agent s knowledge base is targeted What feedback is available Supervised, unsupervised or reinforcement learning differ in type of feedback agent receives What representation is used for the components E.g. logic sentences, belief networks, utility functions, etc. What prior information (knowledge) is available INF5390-12 Learning from Examples 6

Performance element components Possible components that can be improved Direct mapping from states to actions Means to infer world properties from percept sequences Information about how the world evolves Information about the results of possible actions Utility information about the desirability of world states Desirability of specific actions in specific states Goals describing states that maximize utility In each case, learning can be sees as learning an unknown function y = f(x) INF5390-12 Learning from Examples 7

Hypothesis space H H: the set of hypothesis functions h to be considered in searching for f(x) Consistent hypothesis: Fits with all data If several consistent hypotheses choose simplest one! (Occam s razor) Realizability of learning problem: Realizable if H contains the true function Unrealizable if not We do normally know what the true function is Why not choose H as large as possible? May be very inefficient in learning and in applying INF5390-12 Learning from Examples 8

Types of learning - Knowledge Inductive learning Given a collection of examples (x, f(x)) Return a function h that approximates f Does not rely on prior knowledge ( just data ) Deductive (or analytical) learning Going from known general f to a new f that is logically entailed Based on prior knowledge ( data+knowledge ) Resemble more human learning INF5390-12 Learning from Examples 9

Types of learning - Feedback Unsupervised learning Agent learns patterns in data even though no feedback is given, e.g. via clustering Reinforcement learning Agent gets reward or punishment at the end, but is not told which particular action led to the result Supervised learning Agent receives learning examples and is explicitly told what the correct answer is for each case Mixed modes, e.g. semi-supervised learning Correct answers for some but not all examples INF5390-12 Learning from Examples 10

Learning decision trees A decision situation can be described by A number of attributes, each with a set of possible values A decision which may be Boolean (yes/no) or multivalued A decision tree is a tree structure where Each internal node represents a test of the value of an attribute, with one branch for each possible attribute value Each leaf node represents the value of the decision if that node is reached Decision tree learning is one of simplest and most successful forms of machine learning An example of inductive and supervised learning INF5390-12 Learning from Examples 11

Example: Wait for restaurant table Goal predicate: WillWait (for restaurant table) Domain attributes Alternate (other restaurants nearby) Bar (to wait in) Fri/Sat (day of week) Hungry (yes/no) Patrons (none, some, full) Price (range) Raining (outside) Reservation (made before) Type (French, Italian,..) WaitEstimate (minutes) INF5390-12 Learning from Examples 12

One decision tree for the example Patrons? None Some Full No Yes WaitEstimate? >60 30-60 10-30 0-10 No Alternate? Hungry? Yes No Reservation? Yes Fri/Sat? No Yes Yes Alternate? No Bar? Yes Yes No No Yes Yes No Yes Yes Raining? No No Yes Yes No No Yes Yes INF5390-12 Learning from Examples 13

Expressiveness of decision trees The tree is equivalent to a conjunction of implications rpatrons( r, Full) WaitEstimate( r,10 30) Hungry( r, No) WillWait( r) Cannot represent tests on two or more objects, restricted to testing attributes of one object Fully expressive as propositional language, e.g. any Boolean function can be written as a decision tree For some functions, exponentially large decision trees are required E.g. decision trees are good for some functions and bad for others INF5390-12 Learning from Examples 14

Inducing decision trees from examples Terminology Example - Specific values for all attributes, plus goal predicate Classification - Value of goal predicate of the example Positive/negative example - Goal predicate is true/false Training set - Complete set of examples The task of inducing a decision tree from a training set is to find the simplest tree that agrees with the examples The resulting tree should be more compact and general than the training set itself INF5390-12 Learning from Examples 15

A training set for the restaurant example Example Attributes Will Alt Bar Fri Hun Pat Price Rain Res Type Est wait X1 Yes No No Yes Some $$$ No Yes French 0-10 Yes X2 Yes No No Yes Full $ No No Thai 30-60 No X3 No Yes No No Some $ No No Burger 0-10 Yes X4 Yes No Yes Yes Full $ No No Thai 10-30 Yes X5 X6 X7 X8 ETC. X9 X10 X11 X12 INF5390-12 Learning from Examples 16

General idea of induction algorithm Test the most important attribute first, i.e. the one that makes the most difference to the classification Patrons? is a good choice for the first attribute, because it allows early decisions Apply same principle recursively Patrons? None Some Full +: - : X7,X11 No +: X1,X3,X4,X6,X8,X12 - : X2,X5,X7,X9,X10,X11 +: X1,X3,X6,X8 - : Yes +: X4,X12 - : X2,X5,X9,X10 INF5390-12 Learning from Examples 17

Recursive step of induction algorithm The attribute test splits the tree into smaller decision trees, with fewer examples and one attribute less Four cases to consider for the smaller trees If some positive and some negative examples, choose best attribute to split them If examples are all positive (negative), answer Yes (No) If no examples left, return a default value (no example observed for this case) If no attributes left, but both positive and negative examples: Problem! (same description, different classifications - noise) INF5390-12 Learning from Examples 18

Induced tree for the example set The induced tree is simpler than the original manual tree It captures some regularities that the original creator was unaware of Patrons? None Some Full No Yes Hungry? No Type? Yes No French Italian Thai Burger Yes No Fri/Sat? Yes No No Yes Yes INF5390-12 Learning from Examples 19

Broaden applicability of decision trees Missing data How to handle training samples with partially missing attribute values Multi/many-valued attributes How to treat attributes with many possible values Continuous or integer-valued input attributes How to branch the decision tree when attribute has a continuous value range Continuous-valued output attributes Requires regression tree rather than a decision tree, i.e. output value is a linear function of input variables rather than a point value INF5390-12 Learning from Examples 20

% correct on test set Assessing learning performance Collect large set of examples Divide into two disjoint sets, training set and test set Use learning algorithm on training set to generate hypothesis h Measure percentage of examples in test set that are correctly classified by h Repeat steps above for differently sized training sets Training set size INF5390-12 Learning from Examples 21

Neural networks in AI The human brain is a huge network of neurons A neuron is a basic processing unit that collects, processes and disseminates electrical signals Early AI tried to imitate the brain by building artificial neural networks (ANN) Met with theoretical limits and disappeared In the 1980-90 es, interest in ANNs resurfaced New theoretical development Massive industrial interest&applications INF5390-12 Learning from Examples 22

The basic unit of neural networks The network consists of units (nodes, neurons ) connected by links Carries an activation a i from unit i to unit j The link from unit i to unit j has a weight W i,j Bias weight W 0,j to fixed input a 0 = 1 Activation of a unit j Calculate input in j = W i,j a i (i=0..n) Derive output a j = g(in j ) where g is the activation function INF5390-12 Learning from Examples 23

Activation functions Activation function should separate well Active (near 1) for desired input Inactive (near 0) otherwise It should be non-linear Most used functions Threshold function Sigmoid function INF5390-12 Learning from Examples 24

Neural networks as logical gates With proper use of bias weight W 0 to set thresholds, neural networks can compute standard logical gate functions INF5390-12 Learning from Examples 25

Neural network structures Two main structures Feed-forward (acyclic) networks Represents a function of its inputs No internal state Recurrent network Feeds outputs back to inputs May be stable, oscillate or become chaotic Output depends on initial state Recurrent networks are the most interesting and brain-like, but also most difficult to understand INF5390-12 Learning from Examples 26

Feed-forward networks as functions A FF network calculates a function of its inputs The network may contain hidden units/layers By changing #layers/units and their weights, different functions can be realized FF networks are often used for classification INF5390-12 Learning from Examples 27

Perceptrons Single-layer feed-forward neural networks are called perceptrons, and were the earliest networks to be studied Perceptrons can only act as linear separators, a small subset of all interesting functions This partly explains why neural network research was discontinued for a long time INF5390-12 Learning from Examples 28

Perceptron learning algorithm How to train the network to do a certain function (e.g. classification) based on a training set of input/output pairs? x 1 W j x 2 y x 3 Basic idea x 4 Adjust network link weights to minimize some measure of the error on the training set Adjust weights in direction that minimizes error INF5390-12 Learning from Examples 29

Perceptron learning algorithm (cont.) function PERCEPTRON-LEARNING(examples, network) returns a perceptron hypothesis inputs: examples, a set of examples, each with inputs x 1, x 2.. and output y repeat network, a perceptron with weights W j and act. function g for each e in examples do in = W j x j [e] Err = y[e] g(in) W j = W j + Err x j [e] until some stopping criterion is satisfied j=0.. n return NEURAL-NETWORK-HYPOTHESIS(network) - the learning rate INF5390-12 Learning from Examples 30

Performance of perceptrons vs. decision trees Perceptrons better at learning separable problem Decision trees better at restaurant problem INF5390-12 Learning from Examples 31

Multi-layer feed-forward networks Adds hidden layers The most common is one extra layer The advantage is that more function can be realized, in effect by combining several perceptron functions It can be shown that A feed-forward network with a single sufficiently large hidden layer can represent any continuous function With two layers, even discontinuous functions can be represented However Cannot easily tell which functions a particular network is able to represent Not well understood how to choose structure/number of layers for a particular problem INF5390-12 Learning from Examples 32

Example network structure Feed-forward network with 10 inputs, one output and one hidden layer suitable for restaurant problem INF5390-12 Learning from Examples 33

More complex activation functions Multi-layer networks can combine simple (linear separation) perceptron activation functions into more complex functions (combine 2) (combine 2) INF5390-12 Learning from Examples 34

Learning in multi-layer networks In principle as for perceptrons adjusting weights to minimize error The main difference is what error at internal nodes mean nothing to compare to Solution: Propagate error at output nodes back to hidden layers Successively propagate backwards if the network has several hidden layers The resulting Back-propagation algorithm is the standard learning method for neural networks INF5390-12 Learning from Examples 35

Learning neural network structure Need to learn network structure Learning algorithms have assumed fixed network structure However, we do not know in advance what structure will be necessary and sufficient Solution approach Try different configurations, keep the best Search space is very large (# layers and # nodes) Optimal brain damage : Start with full network, remove nodes selectively (optimally) Tiling : Start with minimal network that covers subset of training set, expand incrementally INF5390-12 Learning from Examples 36

Summary Learning agents have a performance element and a learning element The learning element tries to improve various parts of the performance element, generally seen as functions y = f(x) Learning can be inductive (from examples) or deductive (based on knowledge) Differ in types of feedback to the agent: unsupervised, reinforcement or supervised learning Learning a function from examples of inputs and outputs is inductive/supervised learning Learning decision trees is an important variant INF5390-12 Learning from Examples 37

Summary (cont.) Neural networks (NN) are inspired by human brains, and are complex nonlinear functions with many parameters learned from noisy data A perceptron is a feed-forward network with no hidden layers and can only represent linearly separable functions Multi-layer feed-forward NN can represent arbitrary functions, and be trained efficiently using the back-propagation algorithm INF5390-12 Learning from Examples 38