Planning in POMDPs using MDP heuristics
|
|
- Mae Roberts
- 5 years ago
- Views:
Transcription
1 Planning in POMDPs using MDP heuristics Polymenakos Kyriakos Oxford University Supervised by Shimon Whiteson Abstract Partially observable Markov decision processes provide a powerful framework for tackling discrete, partial information, stochastic problems. Exact solution of large POMDP s is often computationally intractable; online algorithms have had significant success in these larger problems. In this work we survey possible extensions on one such algorithm (POMCP), by replacing or combining the rollouts the algorithm performs in order to evaluate positions with other heuristic methods, based on solving the underlying MDP. That way, POMDP solvers can benefit from the great advances in the MDP state-of-the-art. According to the experiments performed, the MDP heuristic has a positive effect on the algorithm s performance, ranging from alleviating the need for hand crafted rollout policies to significantly outperforming the original algorithm using less computational time, depending on the problem introduction POMDP solvers use both online and offline approaches. Depending on the demands of the application, especially on the available time for action selection, one of the two is deemed more suitable than the other. In general full-width offline planners. such as (1), (2), (3), exploit factored representations and/or smart heuristics in order to effectively explore a search space that is typically very large. Online planners benefit from the fact that they don t necessarily have to compute policies for the entirety of the search space, allowing them to focus on the generally much smaller subset of relevant states. We are expanding on one such algorithm, Partially Observable Monte Carlo Planner (POMCP) (4) by combining it with heuristics. There have been various heuristics proposed in the POMDP literature. Here we are using variations of the MDP-heuristic (5), where the main idea is to approximate the value (expected return) of a state in the POMDP problem by estimating the return of one or more states of the underlying MDP. This estimation on the original algorithm is performed by using rollouts (simulations of possible outcomes of the problem under a simple, sometimes even random, policy). Depending on the problem and other parameters, rollouts can be more or less computationally expensive than other methods of estimation, as well as more or less accurate. We are exploring the effect of replacing rollouts with the MDP heuristic, on three benchmark problems, as presented in (4). 2 Background 2.1 The POMDP framework Markov decision processes (MDPs) provide a powerful framework for tackling discrete, partial information, stochastic problems. For every state s S and every action a A there are transition probabilities determining the next state distribution s. There is a reward function determining the agent s reward for each transition from s to s by performing a. The accumulated rewards make the
2 agent s return over an episode, or a part of an episode. In partially observable domains, the agent doesn t have access to the state, but only to certain observations o O. The observations are received according to observation probabilities, determined by s and a. The set of past action-observation pairs forms the history. The agent s task is to maximise its expected return and to do so, it needs to follow a policy π(h, a) that maps histories to probability distributions over actions. An optimal policy is a policy that maximises the agent s expected return. The belief state is the probability distribution over states s, given the history h, and it represents the agent s belief for the state of the environment. 2.2 The POMCP algorithm A brief description of POMCP is presented in this section. More details can be found in (4). POMCP is combining an Upper Confidence Tree (UCT) search, that selects actions, and a particle filter that updates the belief state. The UCT tree is a tree of histories with each node representing a unique sequence of action-observation pairs. In the original algorithm, for each action to be selected, a set number of simulations is run. Each simulation has two stages, the first, where actions are selected to maximise an augmented value, that favors states with high expected return and uncertainty on the return, and a second stage, where the action selection follows a rollout policy. The algorithm switches from stage one to stage two when a new node is encountered, and uses the rollout to evaluate it. As a result, for every a simulation a single node is added to the search tree. After the action is selected, the agent performs it and receives an observation, all the irrelevant branches of the search tree are pruned. The particle filter approximates the belief state with a set of states. Every time the agent performs an action and receives an observation the belief state update is performed by sampling from this set of states, simulating the action performed by the agent and then comparing the observation received during simulation with the actual observation received by the environment; if they match the particle "passes" through the filter and becomes a member of the new set of states. This is repeated until the number of particles in the updated belief state hits a set target. It is worth looking closer to the part of the POMCP algorithm where we are intervening. When UCT creates a new node in the search tree and its value needs to be estimated, a rollout is performed: the evolution of the episode is simulated, with the actions selected by some simple (history based) policy until the episode is terminated or some limit in the number of time steps is reached. The policy used depends on the level of domain knowledge the algorithm has available. Without domain knowledge a random rollout is performed, where at each time step an action is selected at random, with uniform probability. With an intermediate level of domain knowledge, only legal actions are sampled and finally with preferred actions, a simple, but predetermined and hand crafted, domain specific policy is used in the action selection. It should be noted that each new node is estimated by a single rollout, which is associated with a single particle. The estimation obtained has a high variance, and the computational cost is at most equal to performing the maximum number of steps, which, in the default algorithm setting, is The MDP heuristic The MDP heuristic provides a relatively fast way of estimating the expected return of the POMDP, given the belief state and the expected returns of states in the underlying MDP (6). To calculate the expected return of the POMPD, the heuristic averages the return of the MDP states that compose the belief state. Given the value for any state s S of the MDP, we need to estimate V (b) of the POMDP, which is approximated by ˆV (b) given by: ˆV (b) = s S V MDP (s)b(s) Depending on domain parameters, namely the size of the state space, the expected returns of the MDP can be supplied to the POMDP solver by different means. In the simplest of cases, the MDP is solved exactly, an optimal policy is determined and for that policy, the expected return of each state is stored in a table. The POMDP solver, when an estimate for a state is needed, finds the appropriate value in the table, computed offline. When the number of states doesn t allow the exact solution of the MPD and/or storing the expected return of each state, other methods have to employed. For example, a function approximator, as a neural network, can be used to estimate the expected return, given the state. In other cases, if the MDP solution is easily derived from the state structure, then the 2
3 expected return can be explicitly computed. It should be noted that a trivial MDP, whose expected return under optimal play is easy to compute, does not necessarily give rise to a trivial POMDP (this is the case in battleship as we ll see shortly). 3 Problems We are addressing three known problems, following (4). They have very different characteristics which call for different approaches in applying the methods proposed. The problems are ordered by number of states, with rocksample having the least. 3.1 Rocksample Rocksample is well known problem inspired by planet exploration. The agent or robot moves in a gridwrold of size kxk. There also exist n rocks, on predetermined positions. The rocks can be bad or good, with good rocks being valuable enough to sample while the bad rocks are not. The robot s task is to navigate to the position of the good rocks and perform a sampling action. To differentiate between good and bad rocks from a distance, the agent has a noisy long distance sensor, which it can use for a check action to receive a (noisy) measurement of the quality of one of the rocks. The measurement s accuracy deteriorates exponentially with increased distance between robot and rock. The robot gets no reward for move or check actions, a positive reward of 10 for sampling a good rock, a negative reward of 10 for sampling a bad rock and a positive reward of 10 for exiting the grid to the right, which terminates the episode. In our case we are experimenting with rocksample[7,8], where grid size k=7 and number of rocks n=8. In this case the number of actions are 13, 4 directions of movement, sampling, and checking any of the 8 rocks. The total number of states is Battleship Battleship is a game based on a popular board game, where each player places secretly a number of ships on a grid, and then players take turns shooting on one or multiple positions on the opponent s grid, and get a hit or miss type feedback. The aim of the game is to find and sink the opponent s ships first. In our case their is one player. There is a negative reward of -1 for each time step, and positive reward equal to the total number of positions on the grid that is obtained when all the ships are sunk. This way, if the player has to fire on all positions of the grid to win the final return will be 0. This challenging POMDP has approximately states, and the number of possible actions is equal to the number of positions on the grid minus the ones shot already (it is not allowed to shoot twice on the same place, and it wouldn t make sense either). 3.3 Pocman Pocman is a partially observable version of the popular arcade game Pacman, introduced by (4). In the original game, the agent, pacman, moves in a gridworld maze collecting food while being chased by ghosts. The game terminates when pacman collects all the food or is caught by a ghost (multiple lives are available usually but this is out of our scope). In the partially observable version we are in a sense playing pacman from the point of view of pacman itself: we cannot observe the whole maze, along with ghost and food positions. Instead Pocman receives 10 observation bits corresponding to his senses: four observation bits for his sight, indicating whether he can see a ghost in each of the four cardinal directions; one observation bit indicating whether he can hear a ghost, which is possible when the Manhattan distance between Pocman and ghost is 2 or less; 4 observation bits indicating whether he can feel a wall in this direction; finally, one observation bit for smelling food, indicating the existence food in the adjacent (diagonally and cardinally) grid positions. The number of states is approximately , there are 4 actions and 1024 observations. 4 Methods The common core idea is to provide the POMCP algorithm with q-values obtained offline, by solving the underlying MDP of each POMDP. These q-values are used by the POMDP solver to estimate the 3
4 expected return of states. The estimation would be done in the original case by performing rollouts. The q-values of the MDP provide an alternative estimator, which differs in statistical properties and computational demands. Less computational time spent on estimating leaf states of the search tree means more time to expand the tree. To capture that we slightly modified the original algorithm, from performing a set number of simulations (and thus creating search trees with a set number of nodes) per turn, to expanding the search tree for certain amount of time. The differences in the three problems pause different constraints on the implementation of this idea, from the solution of the MDP to the way some processing of the q-values provided occurs. In this section we are going to delve into these differences and present and justify the design decisions made. 4.1 Rocksample Rocksample, with its states,allows for an exact solution of its MDP. Let us define the MDP first. Given perfect information, the agent knows beforehand not only where the rocks are, but also which rock is good and which isn t. The planning task thus is to get to the good rocks as fast as possible, take samples, end exit the grid on the east. The actions that check rocks in this domain are not useful, since their sole purpose is to acquire observations for the rock quality, which is already given. We solve this particularly simple MDP exactly using value iteration. The q-values obtained by the MDP though clearly tend to overestimate the return obtained by the POMDP. This holds for the MDP heuristic in general, but specific properties of the rocksample task that exacerbate this issue should be mentioned. Firstly, there are actions (namely the check action) that don t alter the environment, only reduce the uncertainty about the state. Reasonably, the check actions are not chosen by the MDP solver, but they are chosen by the POMDP solver. Even without explicit cost for taking the actions, they reduce the total return, by increasing the number of moves and as a result the discount s exponent. Furthermore, the MDP solver assumes devotion to one plan, spanning to the terminal state, that takes the robot to all good rocks. Possible change of plans along the way, add to the number of moves and result in lower return. Finally, in the POMDP setting, there is of course the possibility of sampling a bad rock, and getting a significant negative reward as a result, a possibility that doesn t concern an agent following an optimal policy in the MDP setting. This systematic overestimation of the q-values can be addressed in different ways and dealing with it effectively can result in performance improvement. To do so, several approaches were tried. We are presenting results obtained by the unprocessed MDP values, the MDP divided by an arbitrary factor of 5, to showcase this idea, of consistent overestimation, and finally, a heuristic where the MDP values are obtained by solving the MDP with double the discount (0.95) of the POMDP. This is equivalent to taking an action every two time steps, while the discount decreases the accumulated rewards. 4.2 Battleship Battleship has more states than rocksample, and we wouldn t be able to conventionally solve the MDP and store the q-values all in some data structure. Taking a closer look at the underlying MDP though makes it obvious that we do not need to store MDP values. If we assume perfect information for this task, we end up with a problem where we know where the opponent s ships are located, and we just have to shoot on every grid position they are stationed on except the ones we already have shot. The optimal return then is equal to the size of the grid minus the positions occupied by the ships not already shot on. These values are readily available to the algorithm and and they are used straightaway. 4.3 Pocman The MDP underlying the Pocman task is the known Pacman video game. Perfect information means we get to see the agent s position in the maze, the whole extent of the maze, the ghosts positions as well as the food pellets. Pocman has approximately states and solving the MDP exactly is infeasible. A reasonable approach is to solve the MDP approximately, using q-learning with a neural network and then at the time of execution (online) performing a forward pass of the state whose q-value we need estimated through the neural net. 4
5 Figure 1: Rocksample, average discounted return Deciding on a suitable neural network architecture that is powerful enough to provide good estimations, without demanding significant online computation is a challenge. The implementation challenge is also not trivial. As a first take on this problem, we concluded on a simple neural network, accepting as input a preprocessed version of the task state based on a set of features. After training the network, the outputs of the network for all possible (encoded) inputs are calculated and stored in a table, replicating the tabular approach used in rocksample, with the difference being that the values are approximations and not the expected returns under optimal play. This part of this work is still under progress, with different methods for solving approximately the MDP and integrating the result with the POMDP solver, still to be examined. 5 Experiments and Results 5.1 Experimental Setup The different methods are tested with a set time limit for computation per move. Because of that, and because of hardware differences, for the comparison with (4) we are repeating their experiments and report the values we obtained. We are running the POMCP algorithm without domain specific knowledge other than the set of legal moves, except when stated otherwise. The performance of each method is in most cases evaluated by the average discounted return, obtained over a number of episodes ( ), run on a variety of time limits. 5.2 Rocksample For rocksample we want to compare the use of rollouts, with the q-values of the MDP heuristic. Our comparison is based on the average discounted return. A total of 250 episodes is run to evaluate each method, with a time limit set to 5 values ranging from to seconds, doubling the allowed time for every step. The discounted return is averaged and presented in the Figure 1. In Figure 1 we can see that the MDP heuristic outperforms simple rollouts, and refining it further can increase performance even more. The fact that naively dividing the MDP values by 5 performs better than increasing the discount indicates that there might be even better ways of preprocessing the q-values, but optimising this choice is not the aim of this work. 5.3 Battleship For battleship, no variants of the MDP heuristic are used, since the unprocessed values outperformed POMCP with rollouts significantly. 500 episodes for every method are run in total, with 10 different time limits used to showcase the expected increase in performance and to allow comparison between the performance of different methods, with different time limits. Even allowing domain knowledge 5
6 Figure 2: Batlleship (10,5), average discounted return - search time allowed Figure 3: Batlleship (20,7), average discounted return - search time allowed to the POMCP solver (the preferred actions knowledge setting) the MDP q-values resulted in significantly higher returns. To stretch that result, we scaled the problem up, by increasing the size of the grid and the number of ships. The largest case tackles a problem with a 20x20 grid and 7 ships. To our knowledge the battleship problem at that scale hasn t been addressed before. Since battleship does not use discounting, the results represent return. The time limits, for computational time per move used vary from to seconds. The observed increase in performance obtained by using the MDP heuristic is more than remarkable, and probably has something to do with the problem itself. We attempt to give an explanation in the next section. Still, the beneficial effects of using the heuristic are undoubtable in this domain. 6
7 Figure 4: Pocman, average return Pocman For Pocman, the methods used, rollouts and MDP heuristic, are compared in terms of undiscounted and discounted return. 250 episodes are run to evaluate each method, on time limits varying as in rocksample, between to seconds. From Figure 4 we can conclude that the MDP offers an advantage over the default algorithm, despite the very simplistic approximation of the MDP solution. The small difference in discounted return can be explained by the relatively long episodes and the significant discounting factor (0.95). We assume that for similar reasons the authors in (4) report the undiscounted return in their comparisons Conclusions and future work Examining the results presented above we can draw some important conclusions. Firstly, the MDP heuristic can be used in conjunction with POMCP algorithm, replacing the need for other types of domain knowledge. Solving the underlying MDP is a well posed problem, with a plethora of tools available to address it in contrast with the domain knowledge used in the preferred actions setting in POMCP which cannot be formalised in a consistent way across domains. In battleship, the where POMCP is greatly outperformed, the MDP heuristic shows a huge margin of improvement on the original findings. In the larger domain, with domain knowledge and 512 more time to build the search tree, POMCP still accumulates less reward. It should be noted that in (? ), in battleship, the authors reported only a slight increase in performance over the baseline method they used. This is an indication that the problem poses some special challenge on the original POMCP algorithm, a challenge that is overcome by introducing the MDP heuristic. Our estimate is that this is produced by the structure of the reward function, that doesn t guide the UCT search effectively, as well as the high variance in the rollouts. In combination these result in the POMCP algorithm lacking a robust way of evaluating states completely. The q-values seem to cover this gap effectively even though they are grossly overestimating the expected return. More work should be done to back this up as conclusion. For Pocman, as we already stated, the values passed to the POMCP solver are rough approximations, and a more careful implementation might prove beneficial. Exploring this prospect, pushing the MDP heuristic to its limits, and showing the tradeoff between estimation precision and computational demands experimentally seems as a natural expansion of the work presented here. 7
8 References [1] M. T. J. Spaan and N. Vlassis, Perseus: Randomized point-based value iteration for POMDPs, Journal of Artificial Intelligence Research, vol. 24, pp , [2] H. Kurniawati, D. Hsu, and W. S. Lee, Sarsop: Efficient point-based pomdp planning by approximating optimally reachable belief spaces, [3] J. Pineau, G. Gordon, and S. Thrun, Point-based value iteration: An anytime algorithm for pomdps, in International Joint Conference on Artificial Intelligence (IJCAI), pp , August [4] D. SIlver and J. Veness, Monte-carlo planning in large pomdps. paper/4031-monte-carlo-planning-in-large-p, [5] S. Ross, J. Pineau, S. Paquet, and B. Chaib-draa, Online planning algorithms for pomdps. [6] M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, Learning policies for partially observable environments: Scaling up. Learning_policies_for_partially_observable_environments_Scaling_up,
Reinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationExploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationLecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationIntroduction to Simulation
Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /
More informationTD(λ) and Q-Learning Based Ludo Players
TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM
Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and
More informationAutomatic Discretization of Actions and States in Monte-Carlo Tree Search
Automatic Discretization of Actions and States in Monte-Carlo Tree Search Guy Van den Broeck 1 and Kurt Driessens 2 1 Katholieke Universiteit Leuven, Department of Computer Science, Leuven, Belgium guy.vandenbroeck@cs.kuleuven.be
More informationGetting Started with Deliberate Practice
Getting Started with Deliberate Practice Most of the implementation guides so far in Learning on Steroids have focused on conceptual skills. Things like being able to form mental images, remembering facts
More informationSeminar - Organic Computing
Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationAn Introduction to Simio for Beginners
An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality
More informationUsing focal point learning to improve human machine tacit coordination
DOI 10.1007/s10458-010-9126-5 Using focal point learning to improve human machine tacit coordination InonZuckerman SaritKraus Jeffrey S. Rosenschein The Author(s) 2010 Abstract We consider an automated
More informationReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology
ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology Tiancheng Zhao CMU-LTI-16-006 Language Technologies Institute School of Computer Science Carnegie Mellon
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationAn OO Framework for building Intelligence and Learning properties in Software Agents
An OO Framework for building Intelligence and Learning properties in Software Agents José A. R. P. Sardinha, Ruy L. Milidiú, Carlos J. P. Lucena, Patrick Paranhos Abstract Software agents are defined as
More informationA Reinforcement Learning Variant for Control Scheduling
A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement
More informationHigh-level Reinforcement Learning in Strategy Games
High-level Reinforcement Learning in Strategy Games Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Guy Shani Department of Computer
More informationKnowledge based expert systems D H A N A N J A Y K A L B A N D E
Knowledge based expert systems D H A N A N J A Y K A L B A N D E What is a knowledge based system? A Knowledge Based System or a KBS is a computer program that uses artificial intelligence to solve problems
More informationNUMBERS AND OPERATIONS
SAT TIER / MODULE I: M a t h e m a t i c s NUMBERS AND OPERATIONS MODULE ONE COUNTING AND PROBABILITY Before You Begin When preparing for the SAT at this level, it is important to be aware of the big picture
More informationEvidence for Reliability, Validity and Learning Effectiveness
PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF
Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download
More informationDiscriminative Learning of Beam-Search Heuristics for Planning
Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University
More informationACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014
UNSW Australia Business School School of Risk and Actuarial Studies ACTL5103 Stochastic Modelling For Actuaries Course Outline Semester 2, 2014 Part A: Course-Specific Information Please consult Part B
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationOn the Combined Behavior of Autonomous Resource Management Agents
On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science
More informationEntrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany
Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationLecture 2: Quantifiers and Approximation
Lecture 2: Quantifiers and Approximation Case study: Most vs More than half Jakub Szymanik Outline Number Sense Approximate Number Sense Approximating most Superlative Meaning of most What About Counting?
More informationSpeeding Up Reinforcement Learning with Behavior Transfer
Speeding Up Reinforcement Learning with Behavior Transfer Matthew E. Taylor and Peter Stone Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 {mtaylor, pstone}@cs.utexas.edu
More informationKnowledge Transfer in Deep Convolutional Neural Nets
Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationGiven a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations
4 Interior point algorithms for network ow problems Mauricio G.C. Resende AT&T Bell Laboratories, Murray Hill, NJ 07974-2070 USA Panos M. Pardalos The University of Florida, Gainesville, FL 32611-6595
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationPlanning with External Events
94 Planning with External Events Jim Blythe School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 blythe@cs.cmu.edu Abstract I describe a planning methodology for domains with uncertainty
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More informationCHAPTER 4: REIMBURSEMENT STRATEGIES 24
CHAPTER 4: REIMBURSEMENT STRATEGIES 24 INTRODUCTION Once state level policymakers have decided to implement and pay for CSR, one issue they face is simply how to calculate the reimbursements to districts
More informationTOKEN-BASED APPROACH FOR SCALABLE TEAM COORDINATION. by Yang Xu PhD of Information Sciences
TOKEN-BASED APPROACH FOR SCALABLE TEAM COORDINATION by Yang Xu PhD of Information Sciences Submitted to the Graduate Faculty of in partial fulfillment of the requirements for the degree of Doctor of Philosophy
More informationDesigning a Computer to Play Nim: A Mini-Capstone Project in Digital Design I
Session 1793 Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I John Greco, Ph.D. Department of Electrical and Computer Engineering Lafayette College Easton, PA 18042 Abstract
More informationMeasurement. When Smaller Is Better. Activity:
Measurement Activity: TEKS: When Smaller Is Better (6.8) Measurement. The student solves application problems involving estimation and measurement of length, area, time, temperature, volume, weight, and
More informationWhat is Thinking (Cognition)?
What is Thinking (Cognition)? Edward De Bono says that thinking is... the deliberate exploration of experience for a purpose. The action of thinking is an exploration, so when one thinks one investigates,
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationFF+FPG: Guiding a Policy-Gradient Planner
FF+FPG: Guiding a Policy-Gradient Planner Olivier Buffet LAAS-CNRS University of Toulouse Toulouse, France firstname.lastname@laas.fr Douglas Aberdeen National ICT australia & The Australian National University
More informationRicochet Robots - A Case Study for Human Complex Problem Solving
Ricochet Robots - A Case Study for Human Complex Problem Solving Nicolas Butko, Katharina A. Lehmann, Veronica Ramenzoni September 15, 005 1 Introduction At the beginning of the Cognitive Revolution, stimulated
More informationDesigning a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses
Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Thomas F.C. Woodhall Masters Candidate in Civil Engineering Queen s University at Kingston,
More informationChapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)
Intelligent Agents Chapter 2 1 Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Agent types 2 Agents and environments sensors environment percepts
More informationAMULTIAGENT system [1] can be defined as a group of
156 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008 A Comprehensive Survey of Multiagent Reinforcement Learning Lucian Buşoniu, Robert Babuška,
More informationWE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT
WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working
More informationLearning and Transferring Relational Instance-Based Policies
Learning and Transferring Relational Instance-Based Policies Rocío García-Durán, Fernando Fernández y Daniel Borrajo Universidad Carlos III de Madrid Avda de la Universidad 30, 28911-Leganés (Madrid),
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationTitle:A Flexible Simulation Platform to Quantify and Manage Emergency Department Crowding
Author's response to reviews Title:A Flexible Simulation Platform to Quantify and Manage Emergency Department Crowding Authors: Joshua E Hurwitz (jehurwitz@ufl.edu) Jo Ann Lee (joann5@ufl.edu) Kenneth
More informationSurprise-Based Learning for Autonomous Systems
Surprise-Based Learning for Autonomous Systems Nadeesha Ranasinghe and Wei-Min Shen ABSTRACT Dealing with unexpected situations is a key challenge faced by autonomous robots. This paper describes a promising
More informationRegret-based Reward Elicitation for Markov Decision Processes
444 REGAN & BOUTILIER UAI 2009 Regret-based Reward Elicitation for Markov Decision Processes Kevin Regan Department of Computer Science University of Toronto Toronto, ON, CANADA kmregan@cs.toronto.edu
More informationChallenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley
Challenges in Deep Reinforcement Learning Sergey Levine UC Berkeley Discuss some recent work in deep reinforcement learning Present a few major challenges Show some of our recent work toward tackling
More informationAUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS
AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS R.Barco 1, R.Guerrero 2, G.Hylander 2, L.Nielsen 3, M.Partanen 2, S.Patel 4 1 Dpt. Ingeniería de Comunicaciones. Universidad de Málaga.
More informationCognitive Modeling. Tower of Hanoi: Description. Tower of Hanoi: The Task. Lecture 5: Models of Problem Solving. Frank Keller.
Cognitive Modeling Lecture 5: Models of Problem Solving Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk January 22, 2008 1 2 3 4 Reading: Cooper (2002:Ch. 4). Frank Keller
More informationAction Models and their Induction
Action Models and their Induction Michal Čertický, Comenius University, Bratislava certicky@fmph.uniba.sk March 5, 2013 Abstract By action model, we understand any logic-based representation of effects
More informationExtending Place Value with Whole Numbers to 1,000,000
Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit
More informationMYCIN. The MYCIN Task
MYCIN Developed at Stanford University in 1972 Regarded as the first true expert system Assists physicians in the treatment of blood infections Many revisions and extensions over the years The MYCIN Task
More informationCSL465/603 - Machine Learning
CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am
More informationA Pipelined Approach for Iterative Software Process Model
A Pipelined Approach for Iterative Software Process Model Ms.Prasanthi E R, Ms.Aparna Rathi, Ms.Vardhani J P, Mr.Vivek Krishna Electronics and Radar Development Establishment C V Raman Nagar, Bangalore-560093,
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationTesting A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA
Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA Testing a Moving Target How Do We Test Machine Learning Systems? Peter Varhol, Technology
More informationThe Strong Minimalist Thesis and Bounded Optimality
The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this
More informationLearning Cases to Resolve Conflicts and Improve Group Behavior
From: AAAI Technical Report WS-96-02. Compilation copyright 1996, AAAI (www.aaai.org). All rights reserved. Learning Cases to Resolve Conflicts and Improve Group Behavior Thomas Haynes and Sandip Sen Department
More informationEvolution of Symbolisation in Chimpanzees and Neural Nets
Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication
More informationP. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas
Exploiting Distance Learning Methods and Multimediaenhanced instructional content to support IT Curricula in Greek Technological Educational Institutes P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou,
More informationImplementing a tool to Support KAOS-Beta Process Model Using EPF
Implementing a tool to Support KAOS-Beta Process Model Using EPF Malihe Tabatabaie Malihe.Tabatabaie@cs.york.ac.uk Department of Computer Science The University of York United Kingdom Eclipse Process Framework
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationStrategic Planning for Retaining Women in Undergraduate Computing
for Retaining Women Workbook An NCWIT Extension Services for Undergraduate Programs Resource Go to /work.extension.html or contact us at es@ncwit.org for more information. 303.735.6671 info@ncwit.org Strategic
More informationCooking Matters at the Store Evaluation: Executive Summary
Cooking Matters at the Store Evaluation: Executive Summary Introduction Share Our Strength is a national nonprofit with the goal of ending childhood hunger in America by connecting children with the nutritious
More informationA GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING
A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING Yong Sun, a * Colin Fidge b and Lin Ma a a CRC for Integrated Engineering Asset Management, School of Engineering Systems, Queensland
More informationA Metacognitive Approach to Support Heuristic Solution of Mathematical Problems
A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems John TIONG Yeun Siew Centre for Research in Pedagogy and Practice, National Institute of Education, Nanyang Technological
More informationTHE ROLE OF TOOL AND TEACHER MEDIATIONS IN THE CONSTRUCTION OF MEANINGS FOR REFLECTION
THE ROLE OF TOOL AND TEACHER MEDIATIONS IN THE CONSTRUCTION OF MEANINGS FOR REFLECTION Lulu Healy Programa de Estudos Pós-Graduados em Educação Matemática, PUC, São Paulo ABSTRACT This article reports
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More information1 3-5 = Subtraction - a binary operation
High School StuDEnts ConcEPtions of the Minus Sign Lisa L. Lamb, Jessica Pierson Bishop, and Randolph A. Philipp, Bonnie P Schappelle, Ian Whitacre, and Mindy Lewis - describe their research with students
More informationRover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes
Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes WHAT STUDENTS DO: Establishing Communication Procedures Following Curiosity on Mars often means roving to places with interesting
More informationProbability estimates in a scenario tree
101 Chapter 11 Probability estimates in a scenario tree An expert is a person who has made all the mistakes that can be made in a very narrow field. Niels Bohr (1885 1962) Scenario trees require many numbers.
More informationCWSEI Teaching Practices Inventory
CWSEI Teaching Practices Inventory To create the inventory we devised a list of the various types of teaching practices that are commonly mentioned in the literature. We recognize that these practices
More informationECE-492 SENIOR ADVANCED DESIGN PROJECT
ECE-492 SENIOR ADVANCED DESIGN PROJECT Meeting #3 1 ECE-492 Meeting#3 Q1: Who is not on a team? Q2: Which students/teams still did not select a topic? 2 ENGINEERING DESIGN You have studied a great deal
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationSoftprop: Softmax Neural Network Backpropagation Learning
Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science
More informationHow long did... Who did... Where was... When did... How did... Which did...
(Past Tense) Who did... Where was... How long did... When did... How did... 1 2 How were... What did... Which did... What time did... Where did... What were... Where were... Why did... Who was... How many
More informationPredicting Future User Actions by Observing Unmodified Applications
From: AAAI-00 Proceedings. Copyright 2000, AAAI (www.aaai.org). All rights reserved. Predicting Future User Actions by Observing Unmodified Applications Peter Gorniak and David Poole Department of Computer
More informationAlgebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview
Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationarxiv: v1 [cs.cv] 10 May 2017
Inferring and Executing Programs for Visual Reasoning Justin Johnson 1 Bharath Hariharan 2 Laurens van der Maaten 2 Judy Hoffman 1 Li Fei-Fei 1 C. Lawrence Zitnick 2 Ross Girshick 2 1 Stanford University
More informationKelli Allen. Vicki Nieter. Jeanna Scheve. Foreword by Gregory J. Kaiser
Kelli Allen Jeanna Scheve Vicki Nieter Foreword by Gregory J. Kaiser Table of Contents Foreword........................................... 7 Introduction........................................ 9 Learning
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationNavigating the PhD Options in CMS
Navigating the PhD Options in CMS This document gives an overview of the typical student path through the four Ph.D. programs in the CMS department ACM, CDS, CS, and CMS. Note that it is not a replacement
More informationTraining a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski
Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer
More informationImproving Action Selection in MDP s via Knowledge Transfer
In Proc. 20th National Conference on Artificial Intelligence (AAAI-05), July 9 13, 2005, Pittsburgh, USA. Improving Action Selection in MDP s via Knowledge Transfer Alexander A. Sherstov and Peter Stone
More information