Lecture notes Page 1 Artificial Intelligence COMP-424 Lecture notes by Alexandre Tomberg Prof. Joelle Pineau McGill University Winter 2009
Lecture notes Page 2 Table of Contents December-03-08 12:16 PM I. II. III. IV. V. VI. History of AI Search 1. Uninformed Search Methods 2. Informed Search 3. Search for Optimization Problems 4. Game Playing 5. Constraint Satisfaction Logic 1. Knowledge Representation: Logic 2. First Order Logic 3. Planning 4. Spatial Planning Probability 1. Reasoning under Uncertainty 2. Bayesian Networks Machine Learning 1. Machine Learning: Parameter Estimation 2. Learning with Missing Values 3. Supervised Learning 4. Neural Nets 5. Decision Trees Decision Theory 1. Utility Theory 2. Markov Decision Processes (MDPs) 3. Reinforcement Learning
Lecture notes Page 3 History of AI January-06-09 10:03 AM
Lecture notes Page 4 Uninformed Search Methods January-08-09 10:06 AM
Lecture notes Page 5
Lecture notes Page 6 Generic Search Algorithm: Algorithm 1: BFS
Lecture notes Page 7 Algorithm 2: DFS Algorithm 3: Depth limited search Algorithm 4: Iterative Deepening
Lecture notes Page 8 Informed Search January-13-09 10:02 AM
Lecture notes Page 9 Algorithms January-13-09 10:34 AM Algorithm #1: Best-First Search Algorithm #2: Heuristic Search
Algorithm # 3: A* search Lecture notes Page 10
Lecture notes Page 11
Lecture notes Page 12 Search for Optimization Problems January-15-09 10:05 AM
Lecture notes Page 13 Iterative Improvement Algorithms January-15-09 10:05 AM Algorithm #1: Hill Climbing Algorithm #2: Simulated Annealing
Lecture notes Page 14
Lecture notes Page 15 Genetic Algorithms January-15-09 11:06 AM
Lecture notes Page 16 Game Playing January-20-09 10:03 AM
Lecture notes Page 17 Minimax Search January-20-09 10:07 AM
Lecture notes Page 18 α-β Pruning January-20-09 10:44 AM
Lecture notes Page 19 Constraint Satisfaction January-22-09 10:10 AM
Lecture notes Page 20
Lecture notes Page 21
Lecture notes Page 22 Knowledge Representation: Logic January-27-09 10:10 AM
Lecture notes Page 23
Lecture notes Page 24
Lecture notes Page 25
Lecture notes Page 26
Lecture notes Page 27
Lecture notes Page 28 First Order Logic February-18-09 7:50 PM
Lecture notes Page 29
Lecture notes Page 30
Lecture notes Page 31
Lecture notes Page 32
Lecture notes Page 33
Lecture notes Page 34
Lecture notes Page 35 Planning February-03-09 10:11 AM
Lecture notes Page 36
Lecture notes Page 37
Lecture notes Page 38
Lecture notes Page 39 Partial Order Planning Algorithm February-18-09 8:55 PM
Lecture notes Page 40 Least Commitment Analysis
Lecture notes Page 41 Spatial Planning February-03-09 10:32 AM
Lecture notes Page 42
Lecture notes Page 43
Lecture notes Page 44
Lecture notes Page 45 Reasoning under Uncertainty February-18-09 9:13 PM If we know probabilities, what actions should we choose?
Lecture notes Page 46
Lecture notes Page 47
Lecture notes Page 48
Lecture notes Page 49
Lecture notes Page 50
Lecture notes Page 51 Bayesian Networks March-19-09 3:26 PM
Lecture notes Page 52
Lecture notes Page 53 Machine Learning: Parameter Estimation March-03-09 10:09 AM
Lecture notes Page 54 Statistical Parameter Fitting March-03-09 10:34 AM
Lecture notes Page 55 Maximum Likelihood Estimate (MLE) March-03-09 10:53 AM
Lecture notes Page 56
Lecture notes Page 57 Learning with Missing Values March-10-09 10:14 AM
Lecture notes Page 58 Basic EM algorithm: Start with an initial parameter setting Repeat: Expectation Step: Complete the data by assigning values to missing items. Maximization Step: Compute the maximum log-likelihood and new parameters on the complete data.
Lecture notes Page 59
Soft EM for a general Bayes net: Lecture notes Page 60
Lecture notes Page 61 Machine Learning: Clustering March-19-09 4:21 PM
Lecture notes Page 62
Lecture notes Page 63 Supervised Learning March-10-09 10:55 AM
Lecture notes Page 64
Lecture notes Page 65
Lecture notes Page 66 Overfitting April-14-09 8:35 PM
Lecture notes Page 67
Lecture notes Page 68 Finding Parameters in General April-14-09 9:05 PM Gradient Descent: Given w 0, for i = 0, 1, 2,... do: Repeat until necessary.
Lecture notes Page 69 Batch vs. Online Optimization April-14-09 9:38 PM
What we should know: Lecture notes Page 70
Lecture notes Page 71 Neural Nets March-19-09 4:48 PM
Lecture notes Page 72
Lecture notes Page 73
Lecture notes Page 74
Lecture notes Page 75 Feed Forward Neural Networks April-15-09 10:48 AM Forward pass: for layer k = 1... K do: Compute the output of all units in layer k Copy this output as the input to the next layer
Lecture notes Page 76
Lecture notes Page 77 1. 2. 3. Backpropagation algorithm: Forward pass: compute the output of the network going from input layer to output layer. Backward pass: compute the gradient of the error for every weight inside the network going from output layer towards the input layer. Update: update the weights using the standard rule:
Lecture notes Page 78
Lecture notes Page 79 Overfitting in Neural Net April-15-09 12:56 PM
Lecture notes Page 80 Decision Trees April-15-09 1:04 PM
Lecture notes Page 81
Lecture notes Page 82
Lecture notes Page 83
Lecture notes Page 84
Lecture notes Page 85 Utility Theory April-15-09 1:54 PM
Utility Models: Lecture notes Page 86
Lecture notes Page 87 Maximizing Expected Utility (MEU) Principle April-15-09 2:21 PM
Lecture notes Page 88
What we should know: Lecture notes Page 89
Lecture notes Page 90 Markov Decision Processes (MDPs) April-15-09 2:50 PM
Lecture notes Page 91
Lecture notes Page 92 Policies April-15-09 2:50 PM
Lecture notes Page 93
Lecture notes Page 94 1. 2. Iterative Policy Evaluation Algorithm: Start with some initial guess During iteration k update the function for all states as follows:
Lecture notes Page 95 Searching for a Good Policy April-15-09 4:47 PM
Lecture notes Page 96 Policy Iteration Algorithm: Start with an initial policy Repeat until Compute using policy evaluation algorithm Compute using greedy policy update rule on
Lecture notes Page 97 Value Iteration Algorithm: Start with an initial value Repeat until Update the value function estimate using:
Lecture notes Page 98
Lecture notes Page 99
Lecture notes Page 100 Reinforcement Learning April-15-09 5:38 PM
Lecture notes Page 101
Lecture notes Page 102 1. 2. TD (order 0) Learning Algorithm: Initialize the value function: Repeat until feeling sick of it: a. Pick a start state b. Repeat for every time step t i. Choose an action a based on current policy π and current state s ii. Take action a, observe reward r and new state s' iii. Compute TD error: δ = r + γ V(s') - V(s) iv. Update the value function: V(s) = V(s) + α s δ v. Update current state: s = s' vi. If s' is a terminal state, GoTo 2.
Lecture notes Page 103 Reinforcement Learning for Control April-15-09 6:35 PM
Lecture notes Page 104