Learning and adaptive behavior in autonomous robots and Multi-robot applications

Similar documents
Lecture 10: Reinforcement Learning

Reinforcement Learning by Comparing Immediate Reward

Laboratorio di Intelligenza Artificiale e Robotica

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Laboratorio di Intelligenza Artificiale e Robotica

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Seminar - Organic Computing

Axiom 2013 Team Description Paper

On the Combined Behavior of Autonomous Resource Management Agents

Python Machine Learning

Speeding Up Reinforcement Learning with Behavior Transfer

TD(λ) and Q-Learning Based Ludo Players

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Artificial Neural Networks written examination

Lecture 1: Machine Learning Basics

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

The SWARM-BOTS Project

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Using focal point learning to improve human machine tacit coordination

A Reinforcement Learning Variant for Control Scheduling

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Improving Action Selection in MDP s via Knowledge Transfer

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

First Grade Standards

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

AMULTIAGENT system [1] can be defined as a group of

Introduction to Simulation

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Learning Cases to Resolve Conflicts and Improve Group Behavior

Improving Conceptual Understanding of Physics with Technology

While you are waiting... socrative.com, room number SIMLANG2016

Grade 6: Correlated to AGS Basic Math Skills

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

Introduction to Causal Inference. Problem Set 1. Required Problems

Learning Prospective Robot Behavior

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

XXII BrainStorming Day

Surprise-Based Learning for Autonomous Systems

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Intelligent Agents. Chapter 2. Chapter 2 1

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

Arizona s College and Career Ready Standards Mathematics

Generative models and adversarial training

High-level Reinforcement Learning in Strategy Games

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Computers Change the World

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

FF+FPG: Guiding a Policy-Gradient Planner

Cal s Dinner Card Deals

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Probabilistic Latent Semantic Analysis

Robot Shaping: Developing Autonomous Agents through Learning*

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Science Fair Project Handbook

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Function Tables With The Magic Function Machine

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

Evolution of Symbolisation in Chimpanzees and Neural Nets

Shockwheat. Statistics 1, Activity 1

PRODUCT COMPLEXITY: A NEW MODELLING COURSE IN THE INDUSTRIAL DESIGN PROGRAM AT THE UNIVERSITY OF TWENTE

An OO Framework for building Intelligence and Learning properties in Software Agents

EGRHS Course Fair. Science & Math AP & IB Courses

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

LEGO MINDSTORMS Education EV3 Coding Activities

Probability estimates in a scenario tree

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Grade 5 + DIGITAL. EL Strategies. DOK 1-4 RTI Tiers 1-3. Flexible Supplemental K-8 ELA & Math Online & Print

EUROPEAN UNIVERSITIES LOOKING FORWARD WITH CONFIDENCE PRAGUE DECLARATION 2009

Major Milestones, Team Activities, and Individual Deliverables

A Case-Based Approach To Imitation Learning in Robotic Agents

A Comparison of Annealing Techniques for Academic Course Scheduling

An Introduction to Simio for Beginners

Math 96: Intermediate Algebra in Context

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Executive Guide to Simulation for Health

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Georgetown University at TREC 2017 Dynamic Domain Track

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Houghton Mifflin Harcourt Trophies Grade 5

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Cued Recall From Image and Sentence Memory: A Shift From Episodic to Identical Elements Representation

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

GACE Computer Science Assessment Test at a Glance

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Sample Problems for MATH 5001, University of Georgia

Modeling user preferences and norms in context-aware systems

4.0 CAPACITY AND UTILIZATION

DIDACTIC MODEL BRIDGING A CONCEPT WITH PHENOMENA

MADERA SCIENCE FAIR 2013 Grades 4 th 6 th Project due date: Tuesday, April 9, 8:15 am Parent Night: Tuesday, April 16, 6:00 8:00 pm

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

Self Study Report Computer Science

UDL AND LANGUAGE ARTS LESSON OVERVIEW

Running head: DELAY AND PROSPECTIVE MEMORY 1

On-the-Fly Customization of Automated Essay Scoring

Power Systems Engineering

Transcription:

Learning and adaptive behavior in autonomous robots and Multi-robot applications 2008-03-07 Lecture 14

Literature for this lecture: Wahde, M. An introduction to adaptive algorithms and intelligent machines, p. 89-94 (distributed in the lecture) Additional reading: Scherffig, L. (2002): Reinforcement learning in motor control. http://www-lehre.inf.uos.de/~lscherff/bachelor/rlimc.pdf Labella T.H., Dorigo M., Deneubourg J.-L. (2006): Division of Labour in a Group of Robots Inspired by Ants' Foraging Behaviour. http://www.swarm-bots.org/index.php?main=2

Part I: Learning and adaptive behavior in autonomous robots Characteristic of autonomous robots: selfdevelopment and learning through interaction with its environment Algorithm(s) for a robot's ''mental development'': Reinforcement learning, Q-learning

Learning Supervised learning: Teaching through examples States of the environment: s Availible actions: a Set of training examples: {s, a}-pairs Unsupervised learning: Biological organisms learn by trial-and-error Unknown situation: try some action, and observe the resulting state of the environment

RL motivation Thorndike, 1911: Law of effect: Behaviors in animals which lead to reward are strengthened Behaviors that result in punishment or discomfort are weakened The amount of strengthening or weakening is proportional to the amount of reward or punishment

Reinforcement learning Reinforcement learning is an intermediate method, between unsupervised and supervised learning: The agents action a in a given state s gives rise to a reinforcement signal r Thus, during reinforcement learning the information given by the triplet {s, a, r} must be availible to the agent

Reinforcement learning The agent seeks to learn an association between situations (states) and actions to be taken given the environment in this situation: The agent's goal is to try to maximize the cummulative reward

Reinforcement learning Example: A rat moving around in a maze If it finds food, it receives a positive reinforcement If it takes a wrong turn, a punishment is received:

Q-learning Basic version of reinforcement learning: the set of states {s i } and the set of actions (for each state) {a i } are finite. Consider an agent (robot) which is embedded in an environment: the agent determine the current state by taking measurements of the environment by taking actions, it can modify the state States: Actions:

Q-learning The agent receives a reward r for each action taken Objective: to find a method (policy), P, that maximizes the total cumulative reward: Rewards obtained in the future is considered less important than immediate rewards: Thus, discount factor δ < 1 is introduced

Q-learning An optimal policy P opt (s): a policy which maximizes R P (s(t)) for all states s. A quality function Q(s,a) is introduced: Q(s,a): the sum of the immediate reward when performing action a(t) and the value R Popt obtained by acting according to the optimal policy thereafter:

Q-learning The task of maximizing the cumulative reward can now be reduced to the task of maximizing Q: However, only the immediate reward r(t) can be computed directly: Computation of the second term would require knowledge of the optimal policy...

Q-learning A recursive equation for Q can now be obtained: An iterative learning method for Q which uses ~ the present estimate Q of Q, is given by:

Obtaining Q: ~ 1. The elements of the matrix Q(s,a) are set to zero. 2. The state s(t) is sensed, and an action a(t) is taken: With probability p, the action that maximizes Q(s(t),a(t)) is taken (exploitation). With probability 1-p, a random action is taken (exploration). 3. When the new state has been reached, the estimate of Q is is updated according to:

Convergence It can be shown that the iteration defined by causes the estimate to converge to Q. When the learning process has been completed, Q(s,a) generates the optimal action a to be taken in any state s (namely the action associated with the highest Q-value).

Q-learning Learning is a trade-off between exploitation and exploration: If the action that is perceived as being optimal is always chosen (greedy policy) other actions cannot be discovered If an extreme exploration policy is used, not much reward will be obtained...

Modified Q-learning A modified version of the learning algorithm is given by where η (0< η <1) is a learning rate parameter: the smaller the value of η, the smaller the ~ incremental modification of Q.

Q-learning (example) Consider a robot moving on the discrete grid shown in the figure: Immediate rewards: +10 if the goal is reached, -10 if an attempt is made to enter the blocked square.

Q-learning (example) ~ Initially, all Q-values are zero The robot move at random until the target T is reached or the robot tries to enter the blocked square. The robot started at state s=3 and the training episode was completed when state s=13 was reached, by moving to the right from ~ state 12. The Q-value of the previous state will then be updated according to: ~ No other modifications of Q occur during this episode

Q-learning (example) Consider Q(1,up): Immediate reward is -10 Optimal path is then (in 5 steps): 1 -> 1 -> 2 -> 5 -> 9 -> 13 Therefore: Q(1,up)= -10 + 0.9 4 10 = -3.4390 (In the example, δ = 0.9 was used).

Q-learning (example) This simple kind of reinforcement learning can be generalized to more realistic (continuous) cases. In such cases, the states and actions cannot normally be enumerated. Thus, instead of a matrix, Q can then be estimated using e.g. a neural network. Examples of applications: system identification, mechanics (balancing an inverted pendulum), game playing (backgammon) etc.

Part II: Multi-robot applications Example: Division of Labour in a Group of Robots Inspired by Ants Foraging Behavior. Biologically inspired approach to robot control: Insects can co-operate efficiently: termites, bees, and ants. Model based on ants' foraging behavior.

Collective insect behavior Insects have limited knowledge: No direct communication Only locally available information No internal map of the environment No sense of any "global plan" Still, insect behavior is amazingly robust in their natural environment!

Collective insect behavior Result of collective insect behavior goes beyond that of individual insects. Key mechanism: Self organization! Why look at insects? Inspiration for robotics researchers. Multi robot systems experimental tool for biologists.

Collective robot behavior An object search and retrieval task control algorithm inspired by a model of ants' foraging behavior. Division of labour: robots co-operate in order to increase the efficiency of the group. Selection mechanism: robots more suited to a task are more likely to carry out the task, than less capable robots.

Test application Prey retrieval task: look for objects, prey, retrieve objects to the nest. Similar to behavior observed in real ants. Used as model for real-world applications: search and rescue missions demining collection of terrain samples

Performance Since the task can be accompliched by a single robot, is there an actual performance gain in using more than one robot? Are more robots more efficient, than a single one? Efficiency = performance of the group:

Efficiency Income: prey retrieved to the nest. Cost: interferences among robots dangers in the environment energy Income and cost depend on the number of robots in the environment. What is the optimal number of robots?

Ants' foraging behavior model Ants randomly explore the environment until one of them finds a prey: pull it to the nest; cut it; recruitment; The prey is pulled straight to the nest Ant returns directly to the prey location, after retrieval. Learning and adaptation migth play a key role: probability P 1 to leave the nest for new search changes with a constant Δ, according to previous successes or failures.

Methods Real robots validate a theoretical model Simulated robots more data can be produce in shorter time: speeds up the analysis. Leads to more general conlusions!

Robots MindS-bot s-bot

Control: finite state machine Cond. state transitions: When "label" is TRUE With prob. P 1 once every second (+ Δ)

Experimental set-up Prey appear randomly in the environment Single experimental parameter: adaptation

Efficiency index Costs cannot easily be quantified. performance = # retrieved prey duty time = time spent in "search" or "retrieve"

Experiments and results Efficiency (real and simulated robots): Increased significantly when using adaptation. No difference in performance obtained => improvement is due to decrease of group duty time.

Experiments and results Division of labour occured: Two peaks in P 1 indicate two distinct groups of robots: active foragers have high P 1, and others have low P 1 value.

Conclusion Individual adaptation, which uses only locally availible information, can improve the efficiency of a group of robots by means of division of labour.

About the exam Friday, 20080314, 08.30-12.30, V-building Allowed to bring a calculator, provided that it cannot store any text: Can be bought at Cremona (Chalmers bookstore). It is allowed to bring mathematical tables (such as e.g. Beta), as long as no text has been added. It is NOT allowed to bring any course material e.g. lecture notes, or to use other tools such as computers, cell phones etc. Make sure to bring a VALID ID!!

About the exam The maximum score on the exam will be 25 points. The exam will contain both mathematical problems and questions concerning the various topics covered in the lectures. You may be asked to derive (and use!) equations etc. No programming-related questions in the exam, i.e. you will not be asked to write program code. The problems can be based on all the material rated as important in the Reading guidance files.

Next quarter... The robot construction part starts (finally :-) ) on April 1 st in ET-lab (Fundamental physics building)