Question of the Day Machine Learning 2D1431 What is the next symbol in this series? Lecture 1: Introduction to Machine Learning Machine Learning lecturer: Frank Hoffmann hoffmann@nada.kth.se lab assistants: Mikael Huss hussm@nada.kth.se Martin Rehn rehn@nada.kth.se Course Requirements four mandatory labs Location: Spelhallen, Sporthalle Dates: Lab 1: Thursday 14/11/02 13-17 Lab 2: Thursday 21/11/02 13-17 Lab 3: Thursday 28/11/02 13-17 Lab 4: Thursday 5/12/02 13-17 written exam Location: L21-22 Date: 14/12/02 8-13 1
Grading Exam grade: U : 0-22p 3 : 23-28p 4 : 29-34p 5 : 35-40p Final grade: To pass the course you need at least a 3 in the exam For each lab presented in time you get 1.5 bonus points Example: exam 25 points 3 labs in time 4.5 bonus point total: 29.5 points, final grade 4 Labs Preparation Learn or refresh your knowledge on Matlab Start at least 2 weeks before the lab Read the lab instructions Read the reference material Complete the assignments, write the Matlab code, answer the questions Presentation No more than two students per group Both students need to understand the entire assignment and code Book a time for presentation Present results and code to the teaching assistant Exam Exam theoretical questions small practical exercises Scope It is not sufficient to just study the course book!!! attend lectures (lecture slides available) study course book and read additional literature participate in the labs and complete the assignments Course Information course webpage http://www.nada.kth.se/kurser/kth/2d1431/02/index.html course newsgroup news:nada.kurser.mi course directory /info/mi02 course module course join mi02 course registration in RES res checkin mi02 NADA UNIX account http://www.sgr.nada.kth.se/ 2
Course Literature Textbook (required): Machine Learning Tom M. Mitchell, McGraw Hill,1997 ISBN: 0-07-115467-1 (paperback) Additional literature: Reinforcement Learning An Introduction Richard S. Sutton, Andrew G. Barto, MIT Press, 1998 http://www-anw.cs.umass.edu/~rich/book/the-book.html Pattern Classification 2nd edition Richard O. Duda, Peter E. Hart, David G. Stork Neural Networks A Comprehensive Foundation 2nd edition Simon Haykin, Prentice-Hall, 1999 Matlab Labs in the course are based on Matlab learn or refresh your knowledge on Matlab Matlab Primer, Kermit Sigmon A Practical Introduction to Matlab, Mark S. Gockenback Matlab at Google http://directory.google.com/top/science/math/software/matlab Course Overview introduction to machine learning concept learning decision trees artificial neural networks evolutionary algorithms instance based learning reinforcement learning Bayesian learning computational learning theory fuzzy logic machine learning in robotics Software Packages & Datasets Machine Learning at Google http://directory.google.com/top/computers/artificial_ Intelligence/Machine_Learning Matlab Toolbox for Pattern Recognition http://www.ph.tn.tudelft.nl/~bob/prtools.html MIT GALIB in C++ http://lancet.mit.edu/ga Machine Learning Data Repository UC Irvine http://www.ics.uci.edu/~mlearn/ml/repository.html 3
Learning & Adaptation Learning Modification of a behavioral tendency by expertise. (Webster 1984) A learning machine, broadly defined is any device whose actions are influenced by past experiences. (Nilsson 1965) Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population. (Simon 1983) An improvement in information processing ability that results from information processing activity. (Tanimoto 1990) Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience. Disciplines relevant to ML Artificial intelligence Bayesian methods Control theory Information theory Computational complexity theory Philosophy Psychology and neurobiology Statistics Applications of ML Learning to recognize spoken words SPHINX (Lee 1989) Learning to drive an autonomous vehicle ALVINN (Pomerleau 1989) Learning to classify celestial objects (Fayyad et al 1995) Learning to play world-class backgammon TD-GAMMON (Tesauro 1992) Designing the morphology and control structure of electro-mechanical artefacts GOLEM (Lipton, Pollock 2000) 4
ALVINN Automated driving at 70 mph on a public highway Camera image 30 outputs for steering 4 hidden units 30x32 pixels as inputs 30x32 weights into one out of four hidden unit Artificial Life GOLEM Project (Nature: Lipson, Pollack 2000) http://demo.cs.brandeis.edu/golem Evolve simple electromechanical locomotion machines from basic building blocks (bars, acuators, artificial neurons) in a simulation of the physical world (gravity, friction). The individuals that demonstrate the best locomotion ability are fabricated through rapid prototyping technology. Evolvable Robot Golem Movie 5
Evolved Creatures Evolved creatures: Sims (1994) http://genarts.com/karl/evolved-virtual-creatures.html Darwinian evolution of virtual block creatures for swimming, jumping, following, competing for a block Learning Learning problems Learning with a teacher Learning with a critic Unsupervised learning Learning tasks Pattern association Pattern recognition (classification) Function approximation Control Filtering Credit Assignment Problem The problem of assigning credit or blame for the overall outcomes to each of the internal decisions made by the learning machine which contributed to these outcomes Temporal credit assignment problem Involves the instants of time when the actions that deserve credit were taken Structural credit assignment problem Involves assigning credit to the internal structures of of actions generated by the system Learning with a Teacher supervised learning knowledge represented by a set of input-output examples (x i,y i ) minimize the error between the actual response of the learner and the desired response Environment state x Teacher Learning system actual response error signal desired response - Σ + 6
Learning with a Critic learning through interaction with the environment exploration of states and actions feed-back through delayed primary reinforcement signal (temporal credit assignment problem) goal: maximize accumulated future reinforcements Environment state action Critic heuristic reinforcement signal Learning system primary reinforcement signal Unsupervised Learning self-organized learning no teacher or critic task independent quality measure identify regularities in the data and discover classes automatically competitive learning Environment state Learning system Pattern Recognition A pattern/signal is assigned to one of a prescribed number of classes/categories rice raisins soup sugar fanta teabox Object Recognition goal: recognize objects in the image input: cropped raw RGB image decision: contains object - yes/no training examples: images of the object in different poses and different backgrounds possible features: raw image data color histograms spatial filters edge, corner detection 7
Function Approximation The goal is to approximate an unknown function d = f(x) such that the mapping F(x) realized by the learning system is close enough to f(x). F(x)-f(x) <ε for all x System identification and modeling: Describe the input-output relationship of an unknown time-invariant multiple input multiple output system Pose Estimation from Images goal: estimate the pose (orientation, position) of an object from its appearance input: image data output: 3-D pose (x, y, z, θ, ϕ, ψ) training examples: pairs of images with known object pose Control Learning Adjust the parameters of a controller such that the closed loop control system demonstrates a desired behaviour. reference signal + Σ - error signal Controller plant input unity feedback Plant plant output Control Learning Learning to choose actions Robot learns navigation and obstacle avoidance Learning to choose actions to optimize a factory output Learning to play Backgammon Problem characteristics: Delayed reward instead of immediate reward for good or bad actions, temporal credit assignment problem No supervised learning (no training examples in form of correct state, action pairs) Learning with a critic Need for active exploration 8
Learning to play Backgammon state : board state actions : possible moves reward function +100 win -100 loose 0 for all other actions/states trained by playing 1.5 million games against itself now approximately equal to the best human player link: http://www.research.ibm.com/massive/tdl.html reading assignment Tesauro [1995]) state s t Reinforcement Learning reward r t r t+1 s t+1 Agent Environment s 0 a 0 r 1 s 1 a 1 r 2 s 2 a 2 r 3 Ziel: Learn a policy a=π(s) which maximizes future accumulated rewards R = r t +γ r t+1 + γ 2 r t+2 + + = Σ i=0 r t+i γ i s 3 action a t Upswing of an Inverse Pendulum reward r: +1000 penalty r: -1000 Upswing of an Inverse Pendulum state s: angle ϕ angular velocity ω control action a: left right brake upswing_1.mov 9
Learning Problem Learning to play checkers Learning: improving with experience at some task Improve over task T With respect to performance measure P Based on experience E Example: Learn to play checkers: T: play checkers P: percentage of games won in a tournament E: opportunity to play against itself T: play checkers P: percentage of games won What experience? What exactly should be learned? How shall it be represented? What specific algorithm to learn it? Type of Training Experience Direct or indirect? Direct: board state -> correct move Indirect: outcome of a complete game Credit assignment problem Teacher or not? Teacher selects board states Learner can select board states Is training experience representative of performance goal? Training playing against itself Performance evaluated playing against world champion Choose Target Function ChooseMove : B M : board state move Maps a legal board state to a legal move Evaluate : B V : board state board value Assigns a numerical score to any given board state, such that better board states obtain a higher score Select the best move by evaluating all successor states of legal moves and pick the one with the maximal score 10
Definition of Target Function If b is a final board state that is won then V(b) = 100 If b is a final board state that is lost then V(b) = - 100 If b is a final board state that is drawn then V(b)=0 If b is not a final board state, then V(b)=V(b ), where b is the best final board state that can be achieved starting from b and playing optimally until the end of the game. Gives correct values but is not operational State Space Search V(b)=? V(b)= max i V(b i ) m 1 : b b 1 m 2 : b b 2 m 3 : b b 3 State Space Search V(b 1 )=? V(b 1 )= min i V(b i ) Final Board States Black wins: V(b)=-100 m 4 : b b 4 m 5 : b b 5 m 6 : b b 6 Red wins: V(b)=100 draw: V(b)=0 11
Number of Board States Tic-Tac-Toe: #board states < 9!/(5! 4!) + 9!/(1! 4! 4!) + + 9!/(2! 4! 3!) + 9 = 6045 4 x 4 checkers: (no queens) #board states =? #board states < 8x7x6x5*2 2 /(2!*2!) = 1680 Regular checkers (8x8 board, 8 pieces each) #board states < 32!*2 16 /(8! * 8! * 16!) = 5.07*10 17 Representation of Target Function table look-up collection of rules neural networks polynomial function of board features trade-off in choosing an expressive representation: approximation accuracy number of training examples required to learn the target function Representation of Target Function V(b)=ω 0 + ω 1 bp(b) + ω 2 rp(b) + ω 3 bk(b) + ω 4 rk(b) + ω 5 bt(b) + ω 6 rt(b) bp(b): #black pieces rb(b): #red pieces bk(b): #black kings rk(b): #red kings bt(b): #red pieces threatened by black rt(b): #black pieces threatened by red Obtaining Training Examples V(b) : true target function V (b) : learned target function V train (b) : training value Rule for estimating training values: V train (b) V (Successor(b)) 12
Choose Weight Training Rule LMS weight update rule: Select a training example b at random 1. Compute error(b) error(b) = V train (b) V (b) 2. For each board feature f i, update weight ω i ω i + η f i error(b) η : learning rate approx. 0.1 Example: 4x4 checkers V(b)=ω 0 + ω 1 rp(b) + ω 2 bp(b) Initial weights: ω 0 =-10, ω 1 =75, ω 2 =-60 m 1 : b b 1 V(b 1 )=20 V(b 0 )=ω 0 + ω 1 *2 + ω 2 *2 = 20 m 2 : b b 2 V(b 2 )=20 m 3 : b b 3 V(b 3 )=20 Example 4x4 checkers Example: 4x4 checkers V(b 0 )=20 V(b 1 )=20 V(b 0 )=20 V(b 1 )=20 1. Compute error(b 0 ) = V train (b) V(b 0 ) = V(b 1 ) V(b 0 ) = 0 2. For each board feature f i, update weight ω i ω i + η f i error(b) ω 0 ω 0 + 0.1 * 1 * 0 ω 1 ω 1 + 0.1 * 2 * 0 ω 2 ω 2 + 0.1 * 2 * 0 V(b 3 )=20 V(b 2 )=20 13
Example: 4x4 checkers Example 4x4 checkers V(b 3 )=20 V(b 4a )=20 V(b 4b )=-55 V(b V(b 4 )=-55 3 )=20 1. Compute error(b 3 ) = V train (b) V(b 3 ) = V(b 4 ) V(b 3 ) = -75 2. For each board feature f i, update weight ω i ω i + η f i error(b) : ω 0 =-10, ω 1 =75, ω 2 =-60 ω 0 ω 0-0.1 * 1 * 75, ω 0 = -17.5 ω 1 ω 1-0.1 * 2 * 75, ω 1 = 60 ω 2 ω 2-0.1 * 2 * 75, ω 2 = -75 Example: 4x4 checkers Example 4x4 checkers ω 0 = -17.5, ω 1 = 60, ω 2 = -75 V(b 5 )=-107.5 V(b 6 )=-167.5 V(b 4 )=-107.5 V(b 5 )=-107.5 error(b 5 ) = V train (b) V(b 5 ) = V(b 6 ) V(b 5 ) = -60 ω 0 =-17.5, ω 1 =60, ω 2 =-75 ω i ω i + η f i error(b) ω 0 ω 0-0.1 * 1 * 60, ω 0 = -23.5 ω 1 ω 1-0.1 * 1 * 60, ω 1 = 54 ω 2 ω 2-0.1 * 2 * 60, ω 2 = -87 14
Example 4x4 checkers Final board state: black won V f (b)=-100 V(b 6 )=-197.5 error(b 6 ) = V train (b) V(b 6 ) = V f (b 6 ) V(b 6 ) = 97.5 ω 0 =-23.5, ω 1 =54, ω 2 =-87 ω i ω i + η f i error(b) ω 0 ω 0 + 0.1 * 1 * 97.5, ω 0 = 13.75 ω 1 ω 1 + 0.1 * 0 * 97.5, ω 1 = 54 ω 2 ω 2 + 0.1 * 2 * 97.5, ω 2 = -67.5 Evolution of Value Function Training data: before after Design Choices Games against experts Board Move Determine Type of Training Experience Games against self Determine Target Function Board Value Table of correct moves Determine Representation of Learned Function polynomial Linear function of Artificial neural Determine Learning Algorithm six features network Gradient descent Linear programming Learning Problem Examples Credit card applications Task T: Distinguish good applicants from risky applicants. Performance measure P :? Experience E :? (direct/indirect) Target function :? 15
Performance Measure P: Error based: minimize percentage of incorrectly classified customers : P = N fp + N fn / N N fp : # false positives (rejected good customers) N fn : # false negatives (accepted bad customers) Utility based: maximize expected profit of credit card business: P = N cp *U cp + N fn *U fn U cp : expected utility of an accepted good customer U fn : expected utility/loss of an accepted bad customer Experience E: Direct: Decisions on credit card applications made by a human financial expert Training data: <customer inf., reject/accept> Direct: Actual customer behavior based on previously accepted customers Training data: <customer inf., good/bad> Problem: Distribution of applicants P applicant is not identical with training data P train Indirect: Evaluate a decision policy based on the profit you made over the past N years. Distribution of Applicants Distribution of Accepted Customers Good customers Cw=38 Good customers Cw=43 Bad customers Bad customers Assume we want to minimize classification error: What is the optimal decision boundary? What is the optimal decision boundary? 16
Target Function Customer record: income, owns house, credit history, age, employed, accept $40000, yes, good, 38, full-time, yes $25000, no, excellent, 25, part-time, no $50000, no, poor, 55, unemployed, no T: Customer data accept/reject T: Customer data probability good customer T: Customer data expected utility/profit Learning methods Decision rules: If income < $30.000 then reject Bayesian network: P(good income, credit history,.) Neural Network: Nearest Neighbor: Take the same decision as for the customer in the data base that is most similar to the applicant Learning Problem Examples Obstacle Avoidance Behavior of a Mobile Robot Task T: Navigate robot safely through an environment. Performance measure P :? Experience E :? Target function :? Performance Measure P: P: Maximize time until collision with obstacle P: Maximize distance travelled until collision with obstacle P: Minimize rotational velocity, maximize translational velocity P: Minimize error between control action of a human operator and robot controller in the same situation 17
Training Experience E: Direct: Monitor human operator and use her control actions as training data: E = { <perception i, action i >} Indirect: Operate robot in the real world or in a simulation. Reward desirable states, penalize undesirable states V(b) = +1 if v > 0.5 m/s V(b) = +2 if ω < 10 deg/s V(b) = -100 if bumper state = 1 Question: Internal or external reward? Target Function Choose action: A: perception action Sonar readings: s1(t) sn(t) <v,ω> Evaluate perception/state: V: s1(t) sn(t) V(s1(t) sn(t)) Problem: states are only partially observable therefore world seems non-deterministic Markov Decision Process : successor state s(t+1) is a probabilistic function of current state s(t) and action a(t) Evaluate state/action pairs: V: s1(t) sn(t), a(t) V(s1(t) sn(t),a(t)) Learning Methods Neural Networks Require direct training experience Reinforcement Learning Indirect training experience Evolutionary Algorithms Indirect training experience Issues in Machine Learning What algorithms can approximate functions well and when? How does the number of training examples influence accuracy? How does the complexity of hypothesis representation impact it? How does noisy data influence accuracy? What are the theoretical limits of learnability? 18