Reinforcement Learning for Mobile Robots with Continuous States

Size: px
Start display at page:

Download "Reinforcement Learning for Mobile Robots with Continuous States"

Transcription

1 Reinforcement Learning for Mobile Robots with Continuous States Yizheng Cai Department of Computer Science University of British Columbia Vancouver, V6T 1Z4 Homepage: Abstract It is very tedious to program a mobile robot in the real world. Also, it is difficult for the programmed mobile robot to get adapted to a new environment. Reinforcement learning provides a proper mechanism for the mobile robot to learn how to accomplish a task in any given environment with very little work for programming. However, traditional reinforcement learning methods only assume discrete state and action space, which is not applicable for mobile robots in the real world. This project simulated a mobile robot with continuous states and discrete actions using a safe approximation of the value function to learn the optimal policy. Experiment result shows that learning can be very successful and efficient with bootstrapped information provided by human controls. 1 Introduction The control task for a mobile robot is very tedious and time consuming because it is very difficult to translate high-level knowledge of how to accomplish a task into low-level control for the mobile robot to understand. This control task becomes more difficult if the environment of the task is complex. Also, this approach is not adaptive to the changes of the environment. The best way to solve this problem is to find a proper mechanism for the robot to learn how to accomplish the task itself. Reinforcement learning is just a proper way for the robot to learn the environment and accomplish the task. However, the traditional reinforcement learning methods always assume discrete state and action space, which is not applicable for mobile robot with real value states. In 1994, Boyan and Moore [1] introduced a method to safely approximate the value function for reinforcement learning, which makes it possible to apply reinforcement learning to the continuous state space. Later, Smart and Kaelbling [2, 3] adopted the safe value function approximation method in their work and used Q-learning to solve the control problem of mobile robots in continuous space. Another important contribution of their work is that they introduced an efficient and practical learning-system construction methodology that can augment the reinforcement learning process by information that is easy for human expert to provide. The experiment results in their work demonstrate that, with the bootstrapped

2 information provided by human controls, the learning process can be very efficient and the learning results can be extremely good. The motivation of this project is to take a closer look at reinforcement learning applied in the mobile robot control task and do some experiments to verify the efficiency of reinforcement learning in such tasks. The main task of this project is to simulate a mobile robot with continuous states and discrete actions using the same approach as Smart and Kaelbling did in their work so that experiments can be done to verify those hypotheses in their work. One of the important hypotheses is the effectiveness of save value function approximation which is the basis for the application of reinforcement learning in continuous state space. Another important hypothesis is that learning with bootstrapped information provided by human experts can significantly shorten the learning time and produce very good learning results for the robot to accomplish the task. The first part of this report will describe the basic idea of reinforcement learning and Q-learning. The second part will describe how the Q-learning method is implemented for the control of mobile robot. In the third part, some experiment results will be presented with discussion. The last part will be the discussion and future work of this project. 2 Reinforcement Learning Reinforcement learning had attracted much attention in the community of machine learning and artificial intelligence in the past several decades. It provides a very good way to teach a robot with some rewards or punishments so that the robot can learn how to accomplish a task without hard coding the low-level control strategies. The traditional reinforcement learning assumes that the world can be represented by a set of discrete states, S, and the agent has a finite set of actions, A, to take. The interaction between the agent and the world is represented by the reward, R. Here, the time is also discretized into time steps. For any agent, the state at time t is represented as s t, and the action to take is a t. After taking the action, a t, the agent will go to a new state, represented by s t+1, and will also be given a reward r t+1, which is an immediate reward to evaluate how good the action taken is. Thus, the agent can have cumulative experience represented by a sequence of tuples (s t, a t, r t+1, s t+1 ). For each action, it can affect both the immediate reward and the next state the agent might be in, which might affect the delayed reward. So, the ultimate goal of reinforcement learning is to find an optimal policy of behaviors that perform best in the environment; or an optimal value function that best maps the states (or the state and action) to a measure of long-term value of being in a state. The method used in this project, which is called Q-learning, is to find an optimal function that maps the given state to the long-term value of being in that state. 2.1 Q-learning Q-learning, introduced by Watkins and Dayan [4], is a method that is typically used to solve the RL problems. One of the big advantages of Q-Learning is that it is a model-free algorithm which does not require any prior knowledge about the MDP (Markov Decision Process) model. The optimal value function for Q-learning is defined as follows: Q * ( s, a ) = E [ R ( s, a ) + γ max a ' Q * ( s ', a ' )]

3 Here, the optimal value function is the expected value, over the next state, of the reward for taking action a in state s, ending up with s and perform optimal action from then on. γ is known as the discount factor which is a measure of how much weight should be given to the future reward. With the definition of Q-function, it is easy to define the function for the optimal policy: * π ( s) = arg maxq( s, a) During the Q-learning process, the agent performs trial actions around the environment to get a sequence of experience tuples and store the mapping from state and action to value into a table. As the learning process goes on, the state-action pair will be visited multiple times and the corresponding value in the table will then be updated according to the following form: Q( st, at ) Q( st, at ) + α ( rt γ maxa' Q( st + 1, a') Q( st, at )) where α is known as the learning rate. It has been proved by Watkins and Dayan [4] that the Q-function mapping will finally converge after infinite visits of those state-action pairs. 2.2 Safe Value Function Approximation The table mapping approach described in the previous section was the traditional method of Q-learning. But, in the real world, most mobile robots will have real value states and actions which are not possible to be very properly discretized. An alternative approach is to use some function approximation techniques to approximate the Q-function. One of the major problems of this approach is the error of prediction. During the process of Q-learning, it is required to predict the Q-value given some state-action pair. According to the updating function in the previous section, the error in predicting both the current Q-value and the maximum Q-value of the next state will easily accumulate when the process goes on, and finally dominate the approximation Reducing Approximation Error In order to reduce the approximation error, Boyan and Moore [1] introduced a method to safely approximate the value function. They claimed that most approximation methods suffer from a problem called hidden extrapolation which means that the approximation extrapolate data that are not within the train data space. In order to safely approximate the Q-function, they suggested to only use function approximators to interpolate training data instead of extrapolate from it. So, they tried to create a convex hull around the training data and only allow for prediction queries within it. To compromise the computation complexity and the safety, the structure adopted is what called independent variable hull (IVH). For a training data matrix, X, where the rows of X correspond to the training data points, the hat matrix can be computed in the following way: V = X ( X ' X ) 1 X ' For any query vector, it lies within the hull only if it fulfills the following criteria: x'( X ' X ) 1 a x maxυ where υii are the diagonal elements of V. The next section will describe the detail of how to use this IVH for prediction. Another important technique to reduce the prediction error is to use the Locally Weighted Regression [5]. Instead of using all the training data to approximate the i ii

4 function, LWR only uses those nearby neighbors that are close to the query points. Each of these neighbor will be associated a certain weight according to their distance to the query point. Any kernel function can be used to compute the weight. The one used in this project is a Gaussian function: ω = e 2 ( x q) h where, q is the query point, x is neighbor of that query The HEDGER Algorithm HEDGER algorithm [2, 3] is a function approximation algorithm using linear regression. It is based on both safe value function approximation [1] and Locally Weighted Regression [5]. The following is the pseudocode of the HEDGER algorithm for predicting the Q-value used in the simulation of this project. It is based on the one given by [6] with some minor modification for implementation purposes. 2 Algorithm 1 HEDGER prediction Input: Set of training examples, S, with tuples of the form (s, a, q, r) Query state, s Query action, a LWR minimum set size, k LWR distance threshold, D LWR bandwidth, h Output: Predicted Q-value, q s,a x ( s, a) K training points with distance to x smaller than D if the total number of training points is less than k q s,a = q default else Construct an IVH, H, with training points in K if K K is singular q s,a = q default else if x is outside of H then q s,a = q default else q s,a = LWR prediction using x, K, and h if q s,a > q max or q s,a < q min then q s,a = q default Return q s,a The minimum size of LWR set is easy to decide. It is just the number of the parameters that are to be decided for linear regression. For the distance threshold for LWR, it is better that the threshold is large enough to include as many training points as possible. But, as the size of K grows, the computation would be prohibitively expensive. Also, points that are far from the query might have very small weight due to the bandwidth so that their contribution can safely be ignored and the computation spent on them at early stages is wasted. So, it is better to define a threshold, κ, as the minimum weight

5 for points within K so that they will not be ignored. Thus, the distance threshold can be computed in the following way, if the kernel function is Gaussian: D = 2 h logκ where, the value of the bandwidth, h, is empirically determined in this project because Smart claimed in his thesis [6] that the improvement on h does not bring significant benefit for the overall performance. In addition, the predicted Q-value should be within the boundary of the possible minimum and maximum value of Q-value here. Because the problem in this project is set to be infinite discounted Q-learning, the possible maximum and minimum number of the Q-value can be computed as follows: Q max = t = 0 γ r t max rmax = 1 γ The minimum value of Q-value can be computed the same way. The boundary for the reward can be easily updated by the new reward received in each stage so that the Q-value can always be safely within the actual boundary. In addition, the predefined default value for Q-value should also be defined within the boundary. The HEDGER algorithm for training is a modification to the traditional Q-learning. The pseudocode is presented in Algorithm 2 with some changes of the one in [6]. Algorithm 2 HEDGER training Input: Set of training examples, S, with tuples of the form (s, a, q, r) Initial state, s t Action, a t Next state, s t+1 Reward, r t+1 Learning rate, α Discount factor, γ LWR minimum set size, k LWR distance threshold, D Output: New set of training set, S Update q max and q min based on r t+1 and γ x s t, a ) qt +1 ( t maximum predicted Q-value at state st+1 based on S, k predicted Q-value for query of x, based on S, k K set of training points used for prediction of qt κ set of weights of corresponding training points q new α ( rt γqt+ 1 qt ) + qt S' SU ( x, qnew) for each point in K q i ακ ( qnew qi ) + qi Return S qt In this project, due to the limit of time, it assumes that the robot has discrete action so that the maximum predicted Q-value at state s t+1 can be obtained by compare the Q-values from all possible actions.

6 3 Reinforcement Learning in Mobile Robot Control As is seen from the previous sections, the Q-learning method requires the agent to perform randomly at the beginning of learning to have sufficient experience. For environment with very sparse rewards, the learning process might spend quite a lot of time to explore the world without any useful information. One of the possible and reasonable ways to solve the problem is to use supplied control for the robot and lead it to those more interesting points as soon as possible, so that these information can bootstrap the learning process. 3.1 Two Phase Learning Process Smart and Kaelbling introduced the two phase learning process [2, 3] which uses some predefined policy or the direct control of human experts in the first phase to collect sufficient experience. Meanwhile, the RL learning system just learns the experience passively so that the information can be used to for the value-function approximation. In the second phase, the reinforcement learning system takes the control of the robot and learns the experience while the supplied control no long have effect on the robot. Here, the RL system is not trying to learn the trajectory but only uses the experience for the value-function approximation. The two phase learning process can be demonstrated in the following figure [2]: Environment Environment R O Supplied Control A R O Supplied Control A Policy Policy Learning System Learning System (a) Phase 1 (b) Phase 2 Figure 1. The graph demonstration of the two learning phases 3.2 Corridor-Following Task The task simulated in this project is the Corridor-Following task which is similar to the experiment done with the real robot by Smart and Kaelbling [2, 3]. The task is depicted by figure 2. Position in corridor Reward Area Distance to the end Figure 2. The corridor following task

7 In the corridor-following task, the state space has three dimensions, the distance to the end of the corridor, the distance to the left wall of the corridor as a fraction of the total width of the corridor and the angle to the target, shown in figure 3. Left wall d Target φ ψ w θ Figure 3. Relation between three dimensions of the state space According to figure 3, the relation between the three dimensions can be represented by the following formula: θ = tan 1 d w π + ϕ 2 All the three dimensions of the state are of real values. In the simulation, the scenario is of sparse reward, which means that the robot will be given reward of 10 when it reaches the reward area, and 0 elsewhere. The reason to use sparse reward is that, intuitively, if the learning works for the sparse reward situation, it would probably work for dense reward situations because everything is the same, but the robot will more easily get to the interesting point and update the approximation quickly so that the learning process will be faster. It is also a reason why it is necessary to adopt the two phase learning procedure to speed up the learning process. Because of the sparse reward, it takes longer time for the robot to get to those spots with rewards by taking random actions. One of the main purposes of this project is just to see how the learning strategy works for the sparse reward scenario. 3.3 Implementation Issues For simulation purpose, the corridor is modeled by a 300 by 800 pixel region and the robot is denoted by a circle with a line from the center to indicate the direction the robot is facing. The action is rotation from the original direction of robot. The goal of the task is to reach the end of the corridor. Due to the limit of time for this project, the task is simplified to some extent. The translation speed of the robot, υ t, is constant everywhere in the corridor. The action of the robot is the counterclockwise angle from the direction of the robot at current state. The angle takes the value from 30 degree to 360 degree with interval of 30 degree. Here, using 360 degree instead of 0 is in order to avoid getting a singular matrix K in Algorithm 1. Also, for the same reason, the task uses the angle to the target as one of the dimension in the state space so that its value will rarely be zero. As is seen in the formula in section 3.2, even if the actions are discrete and speed is constant, the position and direction of the robot is definitely real value. In order to simulate the continuous states, all the elements of the state vector are stored as real values. Only when the robot is plotted on the screen, the position of the robot will take the closest integer of that real value position in the state vector. Thus, even if it is simulated on the computer with discrete pixels on display, the simulation still take all the state entries as real value so that the simulation is very close to the real situation. For the parameters in the simulation, the learning rate, α, is set to 0.2 and the discount factor, γ, is 0.99, which are all adopted directly from Smart s work. The LWR

8 bandwidth is empirically set to For the minimum size, k, of LWR set, it should be at least the same as the size of query vector (s, a), which has four entries. So, k, is set to 5 in the simulation. The supplied control in the first phase is hard coded to force the robot always moving forward, but the exact action is random. In addition, for a real robot, it is better not to hit any obstacle to avoid break the sensors or other components, the robot should have some mechanism to avoid hitting into the wall. So, in the simulation of this project, the robot will approximate whether it will bump into the wall according to the state information from the sensor and the current action generated by the action module. If the action leads to a bump, the action module will just generate a random action. In the simulation, at the second phase, the action module controlled by the RL learning system will not always follow the greedy policy which always takes the best action. For the exploration purpose, it will have some probability to perform random actions. Also, in the evaluation, the action module will also produce noisy action so that the simulation looks more real. 4 Results In order to evaluate the learning results, after every 5 training rounds, the training process will be evaluated. In all the training rounds, the robot starts at the same position but with different random directions. The following figure depicts the steps to goal of the robot during evaluation. Phase One Training Runs Phase Two Training Runs Figure 4. The steps to the goal in evaluation of the two phase training According to figure 4, with several training runs in the first phase, the number of steps to goal decreases rapidly. Although there are some peaks, the overall number of steps to goal is decreasing. After 30 rounds of first phase training, the system switches to the control of the RL system. The phase change leads to a peak in the plot because it needs to get adaptive to those new data brought by RL control and exploration. Finally, with sufficient exploration of the world, the number of steps to the goal converges and is close to the optimal value. Different from the result in Smart s work, there is a latency of the peak when the phase changes. There are several reasons that might cause this problem. At the early stage of

9 the second phase learning, there are not enough new data points, brought by the RL module, to affect the Q-value prediction. The prediction highly depends on the previous data points from the supplied control. Also, because of the exploration strategy, the action module controlled by RL system will take the random actions with some probability, which leads the robot to the area that it is not familiar. Thus, the robot have to perform randomly and update the Q-value in those regions until it arrive the regions that it is more familiar with. So, those newly created data, which are still not stable, might cause the robot to perform poorly. Those unexpected peaks both in the two phases of training are probably caused by the noise added to the action module at the evaluation stage. Because, before the dataset cover most regions in the corridor, the random action would more likely to lead the robot to a strange regions and the robot need to take many random actions to go back to the familiar region. Finally, when there are sufficient training data which explore most regions from the starting position to the goal, one random action will not make the robot jump out of the safe region. So, the total number of steps to the goals will finally converge. In this simulation, because the k-d tree is not implemented to speed up the algorithm by shortening the time to find the nearest neighbor in the experience, it takes more than one day to simulation the two phase training with more than 80 training runs. Figure 4 only reflects one whole simulation and no more simulation are done due to the limit of time. Figure 5 shows two images that plot the path of the robot to achieve the corridor-following task. The first image is plotted at the early stage of the first phase training. It well reflects the pattern of Q-value propagation. Because it is a sparse reward situation, the propagation of the Q-value start from the reward area and back propagate slowly. Those steps in the early stages are more like random action. But in the end, the robot moves quickly to the end of corridor. The second image is at the late stage of second phase training, it is seen that the Q-value is well distributed so that almost all the actions are forward to the target rather than perform randomly. Figure 5. The path of the robot in the early stages of the whole training procedure

10 In conclusion, from the simulation, the safe value approximation works well for RL in continuous state space. The robot does converge very close to an optimal policy after reasonable runs of training. Also, the two phase training procedure does make the number of steps quickly decrease to reasonable value and speed up the whole training procedure significantly, especially in environments with very sparse reward. This simulation really confirms the effectiveness of using the human aided control as a practical method for the control task of mobile robots. 5 Discussion and Future work The results from this simulation well support the hypotheses by Smart and Kaelbling, especially the idea of using two phase learning. It is really a promising approach that can use simple and high level method to handle those annoying problems of low level control and information mapping. Anyhow, there are still two major problems remain to be solved. Firstly, the task in this simulation is really simple. Simulations with more complicated tasks are needed to further prove the effectiveness of the approach. Secondly, the action space in this simulation is discrete, which significantly simplify the problem. But in real world, most actions have real values and are difficult to be discretized. So, some fancy methods are required to find the maximum of the Q-value of the next state, which is just to find the extremum of the approximated Q-function around a specified region. It will probably make the policy more difficult to converge to optimal. Brent s method is the one used by Smart and Kaelbling and proved to be effective. Some other Root-Finding methods should be tried as well. Acknowledgments Thanks to David Poole for useful suggestions and information for the literature. References [1] Boyan J.A. and Moore A.W.. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference (pp ). Cambridge, MA: MIT Press, 1995 [2] William D. Smart and Leslie Pack Kaelbling. Effective Reinforcement Learning for Mobile Robots. In International Conference on Robotics and Automation, 2002 [3] William D. Smart and Leslie Pack Kaelbling. Practical Reinforcement Learning in Continuous Spaces. In Proceedings of the Seventeenth International Conference on Machine Learning, [4] Christopher J. C. H. Watkins and Peter Dayan, Q-learning. In Machine Learning, vol. 8, pp , 1992 [5] Atkeson C.G., Moore A.W. and Schaal, S.. Locally weighted learning. In AI Review, vol. 11, pp [6] William D. Smart. Making Reinforcement Learning Work on Real Robots. PhD Thesis, Department of Computer Science, Brown University, 2002

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Georgetown University at TREC 2017 Dynamic Domain Track

Georgetown University at TREC 2017 Dynamic Domain Track Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain

More information

Speeding Up Reinforcement Learning with Behavior Transfer

Speeding Up Reinforcement Learning with Behavior Transfer Speeding Up Reinforcement Learning with Behavior Transfer Matthew E. Taylor and Peter Stone Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 {mtaylor, pstone}@cs.utexas.edu

More information

Improving Action Selection in MDP s via Knowledge Transfer

Improving Action Selection in MDP s via Knowledge Transfer In Proc. 20th National Conference on Artificial Intelligence (AAAI-05), July 9 13, 2005, Pittsburgh, USA. Improving Action Selection in MDP s via Knowledge Transfer Alexander A. Sherstov and Peter Stone

More information

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

High-level Reinforcement Learning in Strategy Games

High-level Reinforcement Learning in Strategy Games High-level Reinforcement Learning in Strategy Games Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Guy Shani Department of Computer

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

BMBF Project ROBUKOM: Robust Communication Networks

BMBF Project ROBUKOM: Robust Communication Networks BMBF Project ROBUKOM: Robust Communication Networks Arie M.C.A. Koster Christoph Helmberg Andreas Bley Martin Grötschel Thomas Bauschert supported by BMBF grant 03MS616A: ROBUKOM Robust Communication Networks,

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Visit us at:

Visit us at: White Paper Integrating Six Sigma and Software Testing Process for Removal of Wastage & Optimizing Resource Utilization 24 October 2013 With resources working for extended hours and in a pressurized environment,

More information

TD(λ) and Q-Learning Based Ludo Players

TD(λ) and Q-Learning Based Ludo Players TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners Andrea L. Thomaz and Cynthia Breazeal Abstract While Reinforcement Learning (RL) is not traditionally designed

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Varun Raj Kompella, Marijn Stollenga, Matthew Luciw, Juergen Schmidhuber The Swiss AI Lab IDSIA, USI

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Executive Guide to Simulation for Health

Executive Guide to Simulation for Health Executive Guide to Simulation for Health Simulation is used by Healthcare and Human Service organizations across the World to improve their systems of care and reduce costs. Simulation offers evidence

More information

A Case-Based Approach To Imitation Learning in Robotic Agents

A Case-Based Approach To Imitation Learning in Robotic Agents A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu

More information

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education GCSE Mathematics B (Linear) Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education Mark Scheme for November 2014 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge

More information

arxiv: v2 [cs.ro] 3 Mar 2017

arxiv: v2 [cs.ro] 3 Mar 2017 Learning Feedback Terms for Reactive Planning and Control Akshara Rai 2,3,, Giovanni Sutanto 1,2,, Stefan Schaal 1,2 and Franziska Meier 1,2 arxiv:1610.03557v2 [cs.ro] 3 Mar 2017 Abstract With the advancement

More information

An empirical study of learning speed in backpropagation

An empirical study of learning speed in backpropagation Carnegie Mellon University Research Showcase @ CMU Computer Science Department School of Computer Science 1988 An empirical study of learning speed in backpropagation networks Scott E. Fahlman Carnegie

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

Using focal point learning to improve human machine tacit coordination

Using focal point learning to improve human machine tacit coordination DOI 10.1007/s10458-010-9126-5 Using focal point learning to improve human machine tacit coordination InonZuckerman SaritKraus Jeffrey S. Rosenschein The Author(s) 2010 Abstract We consider an automated

More information

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology Tiancheng Zhao CMU-LTI-16-006 Language Technologies Institute School of Computer Science Carnegie Mellon

More information

Grade 6: Correlated to AGS Basic Math Skills

Grade 6: Correlated to AGS Basic Math Skills Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

An OO Framework for building Intelligence and Learning properties in Software Agents

An OO Framework for building Intelligence and Learning properties in Software Agents An OO Framework for building Intelligence and Learning properties in Software Agents José A. R. P. Sardinha, Ruy L. Milidiú, Carlos J. P. Lucena, Patrick Paranhos Abstract Software agents are defined as

More information

Predicting Future User Actions by Observing Unmodified Applications

Predicting Future User Actions by Observing Unmodified Applications From: AAAI-00 Proceedings. Copyright 2000, AAAI (www.aaai.org). All rights reserved. Predicting Future User Actions by Observing Unmodified Applications Peter Gorniak and David Poole Department of Computer

More information

College Pricing and Income Inequality

College Pricing and Income Inequality College Pricing and Income Inequality Zhifeng Cai U of Minnesota, Rutgers University, and FRB Minneapolis Jonathan Heathcote FRB Minneapolis NBER Income Distribution, July 20, 2017 The views expressed

More information

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

More information

While you are waiting... socrative.com, room number SIMLANG2016

While you are waiting... socrative.com, room number SIMLANG2016 While you are waiting... socrative.com, room number SIMLANG2016 Simulating Language Lecture 4: When will optimal signalling evolve? Simon Kirby simon@ling.ed.ac.uk T H E U N I V E R S I T Y O H F R G E

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Learning and Transferring Relational Instance-Based Policies

Learning and Transferring Relational Instance-Based Policies Learning and Transferring Relational Instance-Based Policies Rocío García-Durán, Fernando Fernández y Daniel Borrajo Universidad Carlos III de Madrid Avda de la Universidad 30, 28911-Leganés (Madrid),

More information

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Transfer Learning Action Models by Measuring the Similarity of Different Domains Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

AMULTIAGENT system [1] can be defined as a group of

AMULTIAGENT system [1] can be defined as a group of 156 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008 A Comprehensive Survey of Multiagent Reinforcement Learning Lucian Buşoniu, Robert Babuška,

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Regret-based Reward Elicitation for Markov Decision Processes

Regret-based Reward Elicitation for Markov Decision Processes 444 REGAN & BOUTILIER UAI 2009 Regret-based Reward Elicitation for Markov Decision Processes Kevin Regan Department of Computer Science University of Toronto Toronto, ON, CANADA kmregan@cs.toronto.edu

More information

Test Effort Estimation Using Neural Network

Test Effort Estimation Using Neural Network J. Software Engineering & Applications, 2010, 3: 331-340 doi:10.4236/jsea.2010.34038 Published Online April 2010 (http://www.scirp.org/journal/jsea) 331 Chintala Abhishek*, Veginati Pavan Kumar, Harish

More information

University of Groningen. Systemen, planning, netwerken Bosman, Aart

University of Groningen. Systemen, planning, netwerken Bosman, Aart University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Learning Prospective Robot Behavior

Learning Prospective Robot Behavior Learning Prospective Robot Behavior Shichao Ou and Rod Grupen Laboratory for Perceptual Robotics Computer Science Department University of Massachusetts Amherst {chao,grupen}@cs.umass.edu Abstract This

More information

Probability and Statistics Curriculum Pacing Guide

Probability and Statistics Curriculum Pacing Guide Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

More information

Speaker Identification by Comparison of Smart Methods. Abstract

Speaker Identification by Comparison of Smart Methods. Abstract Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

LEGO MINDSTORMS Education EV3 Coding Activities

LEGO MINDSTORMS Education EV3 Coding Activities LEGO MINDSTORMS Education EV3 Coding Activities s t e e h s k r o W t n e d Stu LEGOeducation.com/MINDSTORMS Contents ACTIVITY 1 Performing a Three Point Turn 3-6 ACTIVITY 2 Written Instructions for a

More information

On-the-Fly Customization of Automated Essay Scoring

On-the-Fly Customization of Automated Essay Scoring Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Major Milestones, Team Activities, and Individual Deliverables

Major Milestones, Team Activities, and Individual Deliverables Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics Nishant Shukla, Yunzhong He, Frank Chen, and Song-Chun Zhu Center for Vision, Cognition, Learning, and Autonomy University

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

An Online Handwriting Recognition System For Turkish

An Online Handwriting Recognition System For Turkish An Online Handwriting Recognition System For Turkish Esra Vural, Hakan Erdogan, Kemal Oflazer, Berrin Yanikoglu Sabanci University, Tuzla, Istanbul, Turkey 34956 ABSTRACT Despite recent developments in

More information

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method Farhadi F, Sorkhi M, Hashemi S et al. An effective framework for fast expert mining in collaboration networks: A grouporiented and cost-based method. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(3): 577

More information

Introduction to Causal Inference. Problem Set 1. Required Problems

Introduction to Causal Inference. Problem Set 1. Required Problems Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not

More information

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

FF+FPG: Guiding a Policy-Gradient Planner

FF+FPG: Guiding a Policy-Gradient Planner FF+FPG: Guiding a Policy-Gradient Planner Olivier Buffet LAAS-CNRS University of Toulouse Toulouse, France firstname.lastname@laas.fr Douglas Aberdeen National ICT australia & The Australian National University

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations 4 Interior point algorithms for network ow problems Mauricio G.C. Resende AT&T Bell Laboratories, Murray Hill, NJ 07974-2070 USA Panos M. Pardalos The University of Florida, Gainesville, FL 32611-6595

More information

Circuit Simulators: A Revolutionary E-Learning Platform

Circuit Simulators: A Revolutionary E-Learning Platform Circuit Simulators: A Revolutionary E-Learning Platform Mahi Itagi Padre Conceicao College of Engineering, Verna, Goa, India. itagimahi@gmail.com Akhil Deshpande Gogte Institute of Technology, Udyambag,

More information

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games David B. Christian, Mark O. Riedl and R. Michael Young Liquid Narrative Group Computer Science Department

More information

Improving Conceptual Understanding of Physics with Technology

Improving Conceptual Understanding of Physics with Technology INTRODUCTION Improving Conceptual Understanding of Physics with Technology Heidi Jackman Research Experience for Undergraduates, 1999 Michigan State University Advisors: Edwin Kashy and Michael Thoennessen

More information

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering Lecture Details Instructor Course Objectives Tuesday and Thursday, 4:00 pm to 5:15 pm Information Technology and Engineering

More information

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE Edexcel GCSE Statistics 1389 Paper 1H June 2007 Mark Scheme Edexcel GCSE Statistics 1389 NOTES ON MARKING PRINCIPLES 1 Types of mark M marks: method marks A marks: accuracy marks B marks: unconditional

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Robot manipulations and development of spatial imagery

Robot manipulations and development of spatial imagery Robot manipulations and development of spatial imagery Author: Igor M. Verner, Technion Israel Institute of Technology, Haifa, 32000, ISRAEL ttrigor@tx.technion.ac.il Abstract This paper considers spatial

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,

More information

Measures of the Location of the Data

Measures of the Location of the Data OpenStax-CNX module m46930 1 Measures of the Location of the Data OpenStax College This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 The common measures

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Functional Skills Mathematics Level 2 assessment

Functional Skills Mathematics Level 2 assessment Functional Skills Mathematics Level 2 assessment www.cityandguilds.com September 2015 Version 1.0 Marking scheme ONLINE V2 Level 2 Sample Paper 4 Mark Represent Analyse Interpret Open Fixed S1Q1 3 3 0

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

Analysis of Enzyme Kinetic Data

Analysis of Enzyme Kinetic Data Analysis of Enzyme Kinetic Data To Marilú Analysis of Enzyme Kinetic Data ATHEL CORNISH-BOWDEN Directeur de Recherche Émérite, Centre National de la Recherche Scientifique, Marseilles OXFORD UNIVERSITY

More information

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

More information

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers

More information

A Comparison of Annealing Techniques for Academic Course Scheduling

A Comparison of Annealing Techniques for Academic Course Scheduling A Comparison of Annealing Techniques for Academic Course Scheduling M. A. Saleh Elmohamed 1, Paul Coddington 2, and Geoffrey Fox 1 1 Northeast Parallel Architectures Center Syracuse University, Syracuse,

More information

What is beautiful is useful visual appeal and expected information quality

What is beautiful is useful visual appeal and expected information quality What is beautiful is useful visual appeal and expected information quality Thea van der Geest University of Twente T.m.vandergeest@utwente.nl Raymond van Dongelen Noordelijke Hogeschool Leeuwarden Dongelen@nhl.nl

More information