Artificial Intelligence COMP-424

Similar documents
Lecture 10: Reinforcement Learning

(Sub)Gradient Descent

Axiom 2013 Team Description Paper

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Artificial Neural Networks written examination

Probabilistic Latent Semantic Analysis

Lecture 1: Machine Learning Basics

Laboratorio di Intelligenza Artificiale e Robotica

Learning Methods for Fuzzy Systems

Python Machine Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Evolutive Neural Net Fuzzy Filtering: Basic Description

Georgetown University at TREC 2017 Dynamic Domain Track

Reinforcement Learning by Comparing Immediate Reward

Laboratorio di Intelligenza Artificiale e Robotica

FF+FPG: Guiding a Policy-Gradient Planner

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Softprop: Softmax Neural Network Backpropagation Learning

CSL465/603 - Machine Learning

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

TD(λ) and Q-Learning Based Ludo Players

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Knowledge-Based - Systems

Learning to Schedule Straight-Line Code

Task Completion Transfer Learning for Reward Inference

Regret-based Reward Elicitation for Markov Decision Processes

High-level Reinforcement Learning in Strategy Games

Learning Prospective Robot Behavior

Discriminative Learning of Beam-Search Heuristics for Planning

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Lecture 1: Basic Concepts of Machine Learning

Evolution of Symbolisation in Chimpanzees and Neural Nets

Task Completion Transfer Learning for Reward Inference

A Comparison of Annealing Techniques for Academic Course Scheduling

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Seminar - Organic Computing

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Improving Action Selection in MDP s via Knowledge Transfer

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Learning and Transferring Relational Instance-Based Policies

Truth Inference in Crowdsourcing: Is the Problem Solved?

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Rule Learning With Negation: Issues Regarding Effectiveness

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

An OO Framework for building Intelligence and Learning properties in Software Agents

Soft Computing based Learning for Cognitive Radio

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

A Reinforcement Learning Variant for Control Scheduling

A Neural Network GUI Tested on Text-To-Phoneme Mapping

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The Evolution of Random Phenomena

Calibration of Confidence Measures in Speech Recognition

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

INPE São José dos Campos

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Test Effort Estimation Using Neural Network

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Parsing of part-of-speech tagged Assamese Texts

Improving Fairness in Memory Scheduling

Toward Probabilistic Natural Logic for Syllogistic Reasoning

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Speeding Up Reinforcement Learning with Behavior Transfer

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

AI Agent for Ice Hockey Atari 2600

Planning with External Events

CS Machine Learning

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

***** Article in press in Neural Networks ***** BOTTOM-UP LEARNING OF EXPLICIT KNOWLEDGE USING A BAYESIAN ALGORITHM AND A NEW HEBBIAN LEARNING RULE

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

AMULTIAGENT system [1] can be defined as a group of

Deep Facial Action Unit Recognition from Partially Labeled Data

On the Combined Behavior of Autonomous Resource Management Agents

Model Ensemble for Click Prediction in Bing Search Ads

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Artificial Neural Networks

WHEN THERE IS A mismatch between the acoustic

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Classification Using ANN: A Review

Learning From the Past with Experiment Databases

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Predicting Future User Actions by Observing Unmodified Applications

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Acquiring Competence from Performance Data

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

MYCIN. The embodiment of all the clichés of what expert systems are. (Newell)

Kamaldeep Kaur University School of Information Technology GGS Indraprastha University Delhi

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Learning Methods in Multilingual Speech Recognition

A student diagnosing and evaluation system for laboratory-based academic exercises

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Transcription:

Lecture notes Page 1 Artificial Intelligence COMP-424 Lecture notes by Alexandre Tomberg Prof. Joelle Pineau McGill University Winter 2009

Lecture notes Page 2 Table of Contents December-03-08 12:16 PM I. II. III. IV. V. VI. History of AI Search 1. Uninformed Search Methods 2. Informed Search 3. Search for Optimization Problems 4. Game Playing 5. Constraint Satisfaction Logic 1. Knowledge Representation: Logic 2. First Order Logic 3. Planning 4. Spatial Planning Probability 1. Reasoning under Uncertainty 2. Bayesian Networks Machine Learning 1. Machine Learning: Parameter Estimation 2. Learning with Missing Values 3. Supervised Learning 4. Neural Nets 5. Decision Trees Decision Theory 1. Utility Theory 2. Markov Decision Processes (MDPs) 3. Reinforcement Learning

Lecture notes Page 3 History of AI January-06-09 10:03 AM

Lecture notes Page 4 Uninformed Search Methods January-08-09 10:06 AM

Lecture notes Page 5

Lecture notes Page 6 Generic Search Algorithm: Algorithm 1: BFS

Lecture notes Page 7 Algorithm 2: DFS Algorithm 3: Depth limited search Algorithm 4: Iterative Deepening

Lecture notes Page 8 Informed Search January-13-09 10:02 AM

Lecture notes Page 9 Algorithms January-13-09 10:34 AM Algorithm #1: Best-First Search Algorithm #2: Heuristic Search

Algorithm # 3: A* search Lecture notes Page 10

Lecture notes Page 11

Lecture notes Page 12 Search for Optimization Problems January-15-09 10:05 AM

Lecture notes Page 13 Iterative Improvement Algorithms January-15-09 10:05 AM Algorithm #1: Hill Climbing Algorithm #2: Simulated Annealing

Lecture notes Page 14

Lecture notes Page 15 Genetic Algorithms January-15-09 11:06 AM

Lecture notes Page 16 Game Playing January-20-09 10:03 AM

Lecture notes Page 17 Minimax Search January-20-09 10:07 AM

Lecture notes Page 18 α-β Pruning January-20-09 10:44 AM

Lecture notes Page 19 Constraint Satisfaction January-22-09 10:10 AM

Lecture notes Page 20

Lecture notes Page 21

Lecture notes Page 22 Knowledge Representation: Logic January-27-09 10:10 AM

Lecture notes Page 23

Lecture notes Page 24

Lecture notes Page 25

Lecture notes Page 26

Lecture notes Page 27

Lecture notes Page 28 First Order Logic February-18-09 7:50 PM

Lecture notes Page 29

Lecture notes Page 30

Lecture notes Page 31

Lecture notes Page 32

Lecture notes Page 33

Lecture notes Page 34

Lecture notes Page 35 Planning February-03-09 10:11 AM

Lecture notes Page 36

Lecture notes Page 37

Lecture notes Page 38

Lecture notes Page 39 Partial Order Planning Algorithm February-18-09 8:55 PM

Lecture notes Page 40 Least Commitment Analysis

Lecture notes Page 41 Spatial Planning February-03-09 10:32 AM

Lecture notes Page 42

Lecture notes Page 43

Lecture notes Page 44

Lecture notes Page 45 Reasoning under Uncertainty February-18-09 9:13 PM If we know probabilities, what actions should we choose?

Lecture notes Page 46

Lecture notes Page 47

Lecture notes Page 48

Lecture notes Page 49

Lecture notes Page 50

Lecture notes Page 51 Bayesian Networks March-19-09 3:26 PM

Lecture notes Page 52

Lecture notes Page 53 Machine Learning: Parameter Estimation March-03-09 10:09 AM

Lecture notes Page 54 Statistical Parameter Fitting March-03-09 10:34 AM

Lecture notes Page 55 Maximum Likelihood Estimate (MLE) March-03-09 10:53 AM

Lecture notes Page 56

Lecture notes Page 57 Learning with Missing Values March-10-09 10:14 AM

Lecture notes Page 58 Basic EM algorithm: Start with an initial parameter setting Repeat: Expectation Step: Complete the data by assigning values to missing items. Maximization Step: Compute the maximum log-likelihood and new parameters on the complete data.

Lecture notes Page 59

Soft EM for a general Bayes net: Lecture notes Page 60

Lecture notes Page 61 Machine Learning: Clustering March-19-09 4:21 PM

Lecture notes Page 62

Lecture notes Page 63 Supervised Learning March-10-09 10:55 AM

Lecture notes Page 64

Lecture notes Page 65

Lecture notes Page 66 Overfitting April-14-09 8:35 PM

Lecture notes Page 67

Lecture notes Page 68 Finding Parameters in General April-14-09 9:05 PM Gradient Descent: Given w 0, for i = 0, 1, 2,... do: Repeat until necessary.

Lecture notes Page 69 Batch vs. Online Optimization April-14-09 9:38 PM

What we should know: Lecture notes Page 70

Lecture notes Page 71 Neural Nets March-19-09 4:48 PM

Lecture notes Page 72

Lecture notes Page 73

Lecture notes Page 74

Lecture notes Page 75 Feed Forward Neural Networks April-15-09 10:48 AM Forward pass: for layer k = 1... K do: Compute the output of all units in layer k Copy this output as the input to the next layer

Lecture notes Page 76

Lecture notes Page 77 1. 2. 3. Backpropagation algorithm: Forward pass: compute the output of the network going from input layer to output layer. Backward pass: compute the gradient of the error for every weight inside the network going from output layer towards the input layer. Update: update the weights using the standard rule:

Lecture notes Page 78

Lecture notes Page 79 Overfitting in Neural Net April-15-09 12:56 PM

Lecture notes Page 80 Decision Trees April-15-09 1:04 PM

Lecture notes Page 81

Lecture notes Page 82

Lecture notes Page 83

Lecture notes Page 84

Lecture notes Page 85 Utility Theory April-15-09 1:54 PM

Utility Models: Lecture notes Page 86

Lecture notes Page 87 Maximizing Expected Utility (MEU) Principle April-15-09 2:21 PM

Lecture notes Page 88

What we should know: Lecture notes Page 89

Lecture notes Page 90 Markov Decision Processes (MDPs) April-15-09 2:50 PM

Lecture notes Page 91

Lecture notes Page 92 Policies April-15-09 2:50 PM

Lecture notes Page 93

Lecture notes Page 94 1. 2. Iterative Policy Evaluation Algorithm: Start with some initial guess During iteration k update the function for all states as follows:

Lecture notes Page 95 Searching for a Good Policy April-15-09 4:47 PM

Lecture notes Page 96 Policy Iteration Algorithm: Start with an initial policy Repeat until Compute using policy evaluation algorithm Compute using greedy policy update rule on

Lecture notes Page 97 Value Iteration Algorithm: Start with an initial value Repeat until Update the value function estimate using:

Lecture notes Page 98

Lecture notes Page 99

Lecture notes Page 100 Reinforcement Learning April-15-09 5:38 PM

Lecture notes Page 101

Lecture notes Page 102 1. 2. TD (order 0) Learning Algorithm: Initialize the value function: Repeat until feeling sick of it: a. Pick a start state b. Repeat for every time step t i. Choose an action a based on current policy π and current state s ii. Take action a, observe reward r and new state s' iii. Compute TD error: δ = r + γ V(s') - V(s) iv. Update the value function: V(s) = V(s) + α s δ v. Update current state: s = s' vi. If s' is a terminal state, GoTo 2.

Lecture notes Page 103 Reinforcement Learning for Control April-15-09 6:35 PM

Lecture notes Page 104