Brief Overview of Adaptive and Learning Control

Similar documents
Exploration. CS : Deep Reinforcement Learning Sergey Levine

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Lecture 10: Reinforcement Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Simulation

Axiom 2013 Team Description Paper

Lecture 1: Machine Learning Basics

Python Machine Learning

Evolutive Neural Net Fuzzy Filtering: Basic Description

(Sub)Gradient Descent

Artificial Neural Networks written examination

A Reinforcement Learning Variant for Control Scheduling

Reinforcement Learning by Comparing Immediate Reward

AMULTIAGENT system [1] can be defined as a group of

Learning Methods for Fuzzy Systems

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

The Strong Minimalist Thesis and Bounded Optimality

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Seminar - Organic Computing

Generative models and adversarial training

An empirical study of learning speed in backpropagation

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Every curriculum policy starts from this policy and expands the detail in relation to the specific requirements of each policy s field.

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

arxiv: v1 [cs.cv] 10 May 2017

Carter M. Mast. Participants: Peter Mackenzie-Helnwein, Pedro Arduino, and Greg Miller. 6 th MPM Workshop Albuquerque, New Mexico August 9-10, 2010

Detailed course syllabus

B.S/M.A in Mathematics

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Texas Wisconsin California Control Consortium Group Highlights

TD(λ) and Q-Learning Based Ludo Players

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Lecture 1: Basic Concepts of Machine Learning

On the Combined Behavior of Autonomous Resource Management Agents

Learning to Schedule Straight-Line Code

Software Maintenance

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Comparison of Annealing Techniques for Academic Course Scheduling

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Georgetown University at TREC 2017 Dynamic Domain Track

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Physics 270: Experimental Physics

Firms and Markets Saturdays Summer I 2014

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Proof Theory for Syntacticians

Major Milestones, Team Activities, and Individual Deliverables

AC : DESIGNING AN UNDERGRADUATE ROBOTICS ENGINEERING CURRICULUM: UNIFIED ROBOTICS I AND II

Texas Wisconsin California Control Consortium Group Highlights

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Getting Started with Deliberate Practice

First Grade Standards

FF+FPG: Guiding a Policy-Gradient Planner

Julia Smith. Effective Classroom Approaches to.

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Launching GO 4 Schools as a whole school approach

High-level Reinforcement Learning in Strategy Games

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Self Study Report Computer Science

Improving Conceptual Understanding of Physics with Technology

The Role of Architecture in a Scaled Agile Organization - A Case Study in the Insurance Industry

Learning Prospective Robot Behavior

Why Did My Detector Do That?!

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

CSL465/603 - Machine Learning

Using focal point learning to improve human machine tacit coordination

INTRODUCTION TO TEACHING GUIDE

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Improving Fairness in Memory Scheduling

Cal s Dinner Card Deals

A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems

Speech Recognition at ICSI: Broadcast News and beyond

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Benjamin Pohl, Yves Richard, Manon Kohler, Justin Emery, Thierry Castel, Benjamin De Lapparent, Denis Thévenin, Thomas Thévenin, Julien Pergaud

Toward Probabilistic Natural Logic for Syllogistic Reasoning

arxiv: v1 [cs.lg] 15 Jun 2015

1.11 I Know What Do You Know?

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Softprop: Softmax Neural Network Backpropagation Learning

PreReading. Lateral Leadership. provided by MDI Management Development International

PRODUCT COMPLEXITY: A NEW MODELLING COURSE IN THE INDUSTRIAL DESIGN PROGRAM AT THE UNIVERSITY OF TWENTE

An Introduction to Simulation Optimization

College Pricing and Income Inequality

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Evolution of Symbolisation in Chimpanzees and Neural Nets

Grade 6: Correlated to AGS Basic Math Skills

IMGD Technical Game Development I: Iterative Development Techniques. by Robert W. Lindeman

Transcription:

1.10.2007

Outline Introduction

Outline Introduction Introduction

Outline Introduction Introduction

Definition of Adaptive Control

Definition of Adaptive Control Zames (reported by Dumont&Huzmezan): A non-adaptive controller is based solely on a-priori information whereas an adaptive controller is based also on a posteriori information

But isn t feedback control a posteriori? Unified view: system state and parameters are all just state, anyway Stochastic control solves everything

But isn t feedback control a posteriori? Unified view: system state and parameters are all just state, anyway Stochastic control solves everything Not possible in practice need approximations and optimizations The terminology and conceptual organization of the field is based on a long history Analog components,... Analytic proofs,...

Approximations and optimizations For example, Fixed-structure controller with parameters, simple laws to alter those parameters The system is periodic, and works the same every time

Definition of Adaptive Control 2 Zames (reported by Dumont&Huzmezan): A non-adaptive controller is based solely on a-priori information whereas an adaptive controller is based also on a posteriori information Sastry&Bodson: direct aggregation of a (non-adaptive) control methodology with some form of recursive system identification

Definition of Learning Control On an abstract level, arbitrary: combination of timescales and conceptual difference

Definition of Learning Control On an abstract level, arbitrary: combination of timescales and conceptual difference Adaptive controller depends on very recent history No memory Reacts to current state only Learning controller depends on long-term history Memory Remembers previous states and appropriate responses

Definition of Learning Control On an abstract level, arbitrary: combination of timescales and conceptual difference Adaptive controller depends on very recent history No memory Reacts to current state only Learning controller depends on long-term history Memory Remembers previous states and appropriate responses Again, from the grand unified stochastic control perspective, these are the same

Direct Indirect Direct = adapt or identify controller parameters directly Indirect = adapt model of system, calculate controller parameters from model

Direct Indirect Direct = adapt or identify controller parameters directly Indirect = adapt model of system, calculate controller parameters from model Even here, the only difference is conceptual

Outline Introduction Introduction

Motivating example Two-armed bandit System has two actions Each action gives a reward from an unknown but constant distribution How to maximize the wins?

Motivating example Two-armed bandit System has two actions Each action gives a reward from an unknown but constant distribution How to maximize the wins? Must take into account information from the system when making the decision, but also the uncertainty of the information, and optimize both.

Mathematical Methods Introduction Laplace transform Used all through control theory Lyapunov functions Can show convergence of certain controllers, provided assumptions hold

History 50s: Initial algorithms 60s: Dynamic Programming and Dual Control intractable 70s and 80s: Convergence proofs 80s-90s: Reinforcement learning and neural methods

Outline Introduction

Algorithms Adaptive Gain Scheduling MRAC (Model Reference Adaptive Control) Self-tuning regulator SOAS (Self-Oscillating Adaptive Systems) Adaptive / Learning ILC (Iterative Learning Control) RC (Repetitive Control) Reinforcement Learning

Outline Introduction

Gain Scheduling Determine controller parameters directly from measurements unrelated to the process Example: use measured air pressure and velocity to determine feedback gain in an aeroplane pitch controller otherwise, trouble Usually linear interpolation between controllers designed for particular parameter values Works well in certain problems

Gain Scheduling Determine controller parameters directly from measurements unrelated to the process Example: use measured air pressure and velocity to determine feedback gain in an aeroplane pitch controller otherwise, trouble Usually linear interpolation between controllers designed for particular parameter values Works well in certain problems Difficulties: Need to find good variables to measure May need lots of controllers to interpolate between

MRAC (Model Reference Adaptive Control) Drive difference between plant and reference model to zero by adapting controller parameters directly Ad hoc rules: e.g., high-gain servo MIT rule: gradient descent + assume unknown parameters are the estimated values for calculating gradient Can be unstable Many variants

Self Tuning Regulators Use parameterized control design equations for a plant Identify parameters on-line Apply controller for those parameters Certainty Equivalence Harder to analyze Design equations usually nonlinear

SOAS - Self-Oscillating Adaptive Systems Use relay to discretize control signal Use dithered error to control relay Adapt gain of relay based on limit cycle amplitude Instance of MRAS, with constant excitation designed into the system

Outline Introduction

systems Premise: fixed operation cycle Robot arm doing repetitive operation Adjusting voltage to particle accelerator magnet Premise: Error has periodic part

ILC and RC ILC = Iterative Learning Control, RC = Repetitive Control Eliminate periodic error Record error as a function of time, use on the subsequent cycles to improve control Heuristically: At 2.54 seconds, the robot arm usually goes too far left, so use force to the right at that point Works well with non-linear and difficult-to-model systems Stability sometimes difficult to obtain, need various filters Difference between schemes: ILC assumes known initial state for each period RC lets end of previous period affect start of next (transients)

Outline Introduction

Reinforcement Learning System = states, actions Step = act, get reward Goal: maximize reward over time Decide what to do (policy) Update policy over time to reflect reward obtained (learning rule), directly or indirectly Main variability Policy (e.g., α-greedy) Learning method (e.g., Q-learning: Q : states actions R) Function approximation and generalization Convergence guaranteed only if when there is no extrapolation (details in books)

Outline Introduction

Control theory: roots in era of real, analog components On an abstract enough level, it is all just approximations to stochastic control Big changes from neural computation: nonlinear function approximation Methods are difficult to compare since the range of systems to be controlled is huge In the industry, the simplest system that works is good

Issues with Adaptive Control Unmodeled dynamics cause bad behaviour If a controller regulates well, knowledge of plant s behaviour decays Need constant or intermittent excitation to know system behaviour Stochastic Control actually does generate excitations

Methods not discussed here Neurofuzzy control Adapt rules-of-thumb with data Neural augmentation of classical control methods Adapting feedforward control Treating nonlinearities in some area of the control problem using backpropagation neural networks Countless others the literature is huge

Hidden Bonus: Model-Predictive Control (MPC) Popular practical control method Original motivation: constraints Uses explicit model Explicitly optimizes control several time steps forwards (sliding horizon) Computationally intensive Enabled by digital computers