based on Q-Learning and Self-organizing Control

Similar documents
Reinforcement Learning by Comparing Immediate Reward

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 10: Reinforcement Learning

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Axiom 2013 Team Description Paper

Learning Methods for Fuzzy Systems

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

Georgetown University at TREC 2017 Dynamic Domain Track

Laboratorio di Intelligenza Artificiale e Robotica

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

A Case-Based Approach To Imitation Learning in Robotic Agents

Introduction to Simulation

On the Combined Behavior of Autonomous Resource Management Agents

Evolutive Neural Net Fuzzy Filtering: Basic Description

A Reinforcement Learning Variant for Control Scheduling

AMULTIAGENT system [1] can be defined as a group of

Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor

Laboratorio di Intelligenza Artificiale e Robotica

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

SARDNET: A Self-Organizing Feature Map for Sequences

Applying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education

An OO Framework for building Intelligence and Learning properties in Software Agents

Artificial Neural Networks written examination

XXII BrainStorming Day

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Lecture 1: Machine Learning Basics

Improving Action Selection in MDP s via Knowledge Transfer

High-level Reinforcement Learning in Strategy Games

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

Learning Prospective Robot Behavior

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Abstractions and the Brain

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

TD(λ) and Q-Learning Based Ludo Players

DOCTOR OF PHILOSOPHY HANDBOOK

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Seminar - Organic Computing

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Speeding Up Reinforcement Learning with Behavior Transfer

A student diagnosing and evaluation system for laboratory-based academic exercises

Robot manipulations and development of spatial imagery

arxiv: v2 [cs.ro] 3 Mar 2017

Learning to Schedule Straight-Line Code

A SURVEY OF FUZZY COGNITIVE MAP LEARNING METHODS

A Pipelined Approach for Iterative Software Process Model

Motivation to e-learn within organizational settings: What is it and how could it be measured?

Time series prediction

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

AC : TEACHING COLLEGE PHYSICS

LEGO MINDSTORMS Education EV3 Coding Activities

Modeling user preferences and norms in context-aware systems

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Agent-Based Software Engineering

From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

INPE São José dos Campos

Human Emotion Recognition From Speech

(Sub)Gradient Descent

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

Soft Computing based Learning for Cognitive Radio

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Major Milestones, Team Activities, and Individual Deliverables

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Interaction Design Considerations for an Aircraft Carrier Deck Agent-based Simulation

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Probabilistic Latent Semantic Analysis

Python Machine Learning

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Exploration. CS : Deep Reinforcement Learning Sergey Levine

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems

Firms and Markets Saturdays Summer I 2014

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

FF+FPG: Guiding a Policy-Gradient Planner

Lecture 1: Basic Concepts of Machine Learning

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

WHEN THERE IS A mismatch between the acoustic

Running Head: STUDENT CENTRIC INTEGRATED TECHNOLOGY

Lecture 6: Applications

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

Software Maintenance

Test Effort Estimation Using Neural Network

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

COSCA COUNSELLING SKILLS CERTIFICATE COURSE

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Automating the E-learning Personalization

SURVIVING ON MARS WITH GEOGEBRA

AI Agent for Ice Hockey Atari 2600

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

Reducing Features to Improve Bug Prediction

Understanding and Changing Habits

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Softprop: Softmax Neural Network Backpropagation Learning

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Speaker Identification by Comparison of Smart Methods. Abstract

Transcription:

ICROS-SICE International Joint Conference 2009 August 18-21, 2009, Fukuoka International Congress Center, Japan Intelligent Navigation and Control of an Autonomous Underwater Vehicle based on Q-Learning and Self-organizing Control Namhoon Kim 1, Gyeong-Hwan Yoon 2, and Doheon Lee 3 1 Robotics Program, KAIST (Tel: +82-42-350-4353; nhkim@biosoft.kaist.ac.kr) 2 Robotics Program, KAIST, DAEYANG Electric CO.,LTD (Tel: +82-42-350-4356; smartauv@gmail.com) 3 Robotics Program, KAIST, Dept. of Bio and Brain Engineering, KAIST (Tel: +82-42-350-4316; dhleebit@gmail.com) Abstract : An autonomous underwater vehicle(auv) is developed to explore and patrol in underwater environments. To accomplish these objectives, an autonomous navigation and control system is essential to an AUV. An intelligent navigation system produces safe paths from the start point to the target point by itself, and the control system makes the vehicle follow the planned path. In this paper, we propose an autonomous navigation and control system for AUVs based on reinforcement learning scheme. Keyword: reinforcement learning, autonomous underwater vehicle, intelligent system 1. Introduction An intelligent system is defined as a system that perceives its environment and take actions which maximize it chance of success.[1] For an autonomous system, an intelligence is an essential element. Perception stands for the capability of acquiring and using knowledge about the environment and itself. And an action should be taken without involving human beings. Intelligence means that the robot is able to make decisions with learning and inference capabilities. The objective of the intelligent system for an autonomous underwater vehicle(auv) is minimizing human interventions while executing its desired missions. Especially, in underwater system, it is required that self-organizing and self-learning scheme. Underwater environment is one of the most hostile environments in the earth. In controlling of AUV, there are many difficulties such as time-delay effects, external disturbances, drag forces and poorly modeled plant. And in reinforcement learning scheme, the perception can be accomplished by means of observing state and getting rewards. Also, the robot can take actions by its own policy which is generated from repetitive tasks. In section II, we will show the overview of the autonomous underwater vehicle system. In section III, we present the navigation and control system based on Q-Learning algorithm and self-organizing control scheme. In section IV, we simulate our proposed system, and in section V, we will conclude this paper. 2. The intelligent system overwiew In our design, AUV system consists of two systems: an autonomous navigation system and a vehicle control system. To simplify our problem, we define the objective of the system is travel from the start point to the target point without hitting any obstacles. The start point should be selected arbitrary and the target point is decided before beginning the mission. In autonomous navigation system, we used Q- learning scheme to generate a global path to the target point. Figure 1 shows the block diagram of our designed system. The system has two phases. In first phase, an autonomous navigation system starts learning the optimal policy to find the path to the target point. In this phase, the system uses Q- Learning algorithm to make the policy. After the first phase, the fuzzy controller starts control the vehicle. The vehicle gives its location to the autonomous navigation system, and then the navigation system provides the references to the fuzzy controller. This procedure is repeated until the vehicle arrives the target point. Figure 1. Block diagram 3. The navigation and control of the vehicle 3.1. Autonomous Navigation system - 630 - PR0002/09/0000-0630 400 2009 SICE

Reinforcement learning is learning what to do so as to maximize a numerical reward signal. The learning agent is not told which actions to take, but discover which actions yield the most reward by trying them. In the scheme, the agent interacts with environment. The agent observes its state from environment, and takes an action derived from its own policy. After that, the agent s state is changed and the agent gets the reward. The agent always takes actions to maximize its reward at the terminal state. Beyond the agent and the environment, there are four elements of a reinforcement learning system: a policy, a reward function, a value function, and, optionally, a model of the environment. A policy defines the learning agent s way of behavior. A reward function defines the goal in a reinforcement learning problem. The reward function tells what the good and bad events are for the agent. A value function specifies what is good in the long run. Whereas rewards are immediate, values indicate the long-term desirability. A model of the environment mimics the behavior of the environment. [2] First, we need to define the elements of the reinforcement learning system. Before we define the elements, we make an assumption that the environment is stationary. We definee the environment is three dimensional Cartesian coordinate system. The learning agent is our autonomous underwater vehicle(auv). The actions are defined as six motions in three dimensional spaces. Figure 2 shows the defined actions. States are the (x, y, z) locations in environment. Figure 2. Defined actions Our tasks are defined as episodic tasks, and the episode is traveling from the start point to the target point without hitting any obstacles. The reward function is following. 10 if the robot arrives the target point r = 3 if the robot hits obstacles 1 otherwise where r represents the reward in the system. For the elementary solution method, we use temporal difference learning. Because our objective is improving the value function, we do not need to define a model of the environment, which is called direct reinforcement learning. Q-Learning algorithm that we used to training our vehicle is following. 1. Initialize all Q(s,a) 2. Initialize s with arbitrary position 3. Choose a using policy* derived from Q 4. Take action a, observe r and s 5. Update Q(s,a) 6. s s 7. repeat 3-6 until s is terminal state Qs ( t, at) Qs ( t, at) + α rt+ 1 + γ max Qs ( t+ 1, at+ 1) Qs ( t, at) at+ 1 In the above algorithm, Q(s,a) represents the learned action-value function, s represents state observed from the environment, a represents an action which is defined in our design procedure. The policy derived from Q uses exploration and exploitation. The agent prefers actions that has tried in the past and found to be effective in producing reward. But to discover such actions the agent needs to try actions which has not selected before. This procedure is exploration to find better actions for the future. And the exploitation is taking actions which are already known to maximize the reward. 3.2. Self-organizing control In our system, the most important objective is minimizing human interventions. The conventional controller which is based on the mathematical model of the plant requires the dynamics of the model and the fine tunings of the control parameters. In the underwater environment, the vehicle is highly nonlinear system and the disturbances such as ocean currents and the convection exist. In addition, due to the absence of the tether cable the autonomous underwater vehicle(auv) should be adaptive for the environment during the mission. To satisfy these requirements, the ability of modifying controller according to changing circumstances. The self-organizing controller is a table-based controller which has the performance measure unit and the modifying unit. In our design, the controller has two loops: the inner loop and the outer loop. The inner loop is a fuzzy incremental controller and the outer loop is the modifying mechanism. Figure 3 shows the block diagram of the designed control system. The performance measure unit observes the current performance of the controller. If the performance is undesirable, the control table F should be modified. The performance is calculated from the error e and the change of the error ce. - 631 -

Figure 4. Path generated by the autonomous navigation system from (1, 1, 1) to (15, 15, 5) with 50 random obstacles. Figure 3. The self-organizingg controller A zero performance specification implies that the state is satisfactory and the non-zero performance indicates that the state is unsatisfactory. Because we used the fuzzy incremental controller, the table F contains the change of the control input cu. The error e and the change of the error ce is multiplied by the gain GE and GCE respectively before entering the rule base block F. From the table F the table lookup value cu is multiplied by the outpu gain GCU and integrated to become the control signal U. The outer loop monitors the states, e and ce, and it modifies table F through a modifier M. This procedure is repeated until the vehicle arrives the target point. 4. Simulations For the simulation, we assumed that the environment is 15x15x5 three dimensional Cartesian coordinate space. A cell dimension is 2m*2m*2m cubic cell. We initialized ten startt points arbitrary. And a million iterations for each start point. The target point which is (15, 15, 5) in the environment is remained same for all simulations. The path in Figure 4 shows our simulation result with start point (1, 1, 1) with 50 random obstacles. The vehicle used in the simulation is ODIN which is developed by University of Hawaii. It has six degrees of freedom and the weight is 125kg and the radius of the vehicle is 0.3m. We set current effect of the ocean as 0.3m/s along the y axis. After the learning phase, the vehicle generated its policy to find the safe path to the target point for the given map. The policy maps during the learning phase are shown Figure 5. (a) At iteration #500 (b) At iteration #200000 (c) At iteration # 500000 (d) At iteration # 1000000 Figure 5. The policy generated by the autonomous navigation system during a million iterations of learning. After the learning phase, the self-organizing controller starts to drive the vehicle to the target point. Figure 6 shows the actual trajectory and the - 632 -

desired trajectory of the vehicle. Figure 6. The desired trajectory and the actual trajectory. The vehicle position and attitudes are shown in Figure 7. (d) the attitude errors of the vehicle Figure 7. The positions and the attitudes of the vehicle. The self-organizing controller modifies its lookup table F to adapt for a given environment. We can see the convergence of the table F in Figure 8. (a) The positions of the vehicle (a) (b) The attitudes of the vehicle. (b) (c) The position errors of the vehicle - 633 -

Electrical Engineers, 2006 [7] Smart. W. D., Pack Kaelbling L., Effective Reinforcement Learning for Mobile Robots, Proceedings of the IEEE International Conference on Robotics & Automation, 2002 [8] Chun-Fei Hsu, Self-Organizing Adaptive Fuzzy Neural Control for a Class of Nonlinear Systems, IEEE Transactions on Neural Networks, July 2007 (c) Figure 8. The convergence of (a) x axis (b) y axis (c) z axis table F. 5. Conclusion The designed system for the AUV produces the safe path through the Q-learning scheme with iterations. After the learning phase the fuzzy controller sends current location to the navigation system, and then the navigation system gives reference signals to the controller. The planned path leads the vehicle to the target points without hitting obstacles. The selforganizing controller adjusts its own table without human intervention to adapt to a given environment. Finally we successfully designed two sub systems for intelligent autonomous underwater vehicle and verified its performance by simulations. In our result the vehicle could find the safe path and arrive the target point with its own intelligence. Future study can be applying this system to the dynamic environments. ACKNOWLEDGEMENT This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute for Information Technology Advancement) (IITA-2009- C1090-0902-0001) REFERENCES [1] Russell, Stuart J.; Norvig, Peter, Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, NJ: Prentice Hall, 2003 [2] Richard S. Sutton; Andrew G. Barto, Reinforcement Learning: An Introduction, Cambridge, MA: The MIT Press, 1998 [3] Ethem Alpaydin, Introduction to Machine Learning, Cambridge, MA: The MIT Press, 2004 [4] Jan Jantzen, Foundations of fuzzy control, Wiley, 2007 [5] Gianluca Antonelli, Underwater robots, Springer, 2003 [6] Geoff Roberts and Robert Sutton, Advances in Unmanned Marine Vehicles, The Institution of - 634 -