Reinforcement Learning of Artificial Intelligence B659. Class meets Tu & Thur 2:30pm - 3:45pm in BH 330
|
|
- Millicent Conley
- 5 years ago
- Views:
Transcription
1 Reinforcement Learning of Artificial Intelligence B659 Class meets Tu & Thur 2:30pm 3:45pm in BH 330 Course webpage on canvas: schedule, slides, assignment submission, info about projects (later) Instructor: Adam White, you can find me in Lindley 201I, website: adamwhite.ca, contact me via , not canvas Your AI s are Matt and Su: contact info on canvas
2 Text & other useful resources Text: Reinforcement Learning An Introduction (1998) The second edition on Sutton and Barto: Minor rearrangement of topics Changes of notation new topics free online: we will use this exclusively! Csaba Szepesvari s book: Algorithms for reinforcement learning (2010) More theory Few additional topics covered Free online:
3 Grading 50% from 5 assignments, mostly questions from Sutton & Barto, and one or more programming questions code framework will be provided 5% short midterm quiz. Gives you an idea of how well you are tracking the course contents See what type of questions I will use on the final 35% from final project report or final exam. PhD students and <MS students with permission> can do a project rest must do a final
4 Thought questions 10% of your mark. The idea is to show you have read and thought about the reading material Must ask a question! Must provide at least one possible answer! Answers cannot be found in textbook or lecture slides You are showing me that you have read the text!!
5 Thought questions/statements Good example: Setting parameters (e.g., learning rate) in ML often involves cross validation and a testing/training split, however, in RL data is produced interactively. This seems much more challenging in RL. What is a fair way compare learning algorithms in RL? <QUESTION> One idea is to discretize the parameter set, for each algorithm, and report the performance with the best parameter setting <ANSWER> Another idea would be to use some metalearning algorithm to automatically tune the parameters of each method. <ANSWER>
6 Thought questions/statements If you submit thought questions in the correct form, i.e., question with an answer you get half the marks To get more than half marks you have to ask a good question. Bad questions I don t understand Sarsa, can you explain it again? There is a typo on page 7 Chapter 3 unrealistically assume access to the model of the MDP. I think we should skip this chapter How does reinforcement learning avoid overfitting? <No answer>
7 Assignments Assignments are one of the best ways to learn about RL All assignments will be individual work. You can talk to your friends but only at the ideas level, no details, no writing on the white board etc You cannot share written answers or code You cannot submit code you found online and modified, you must write your own from scratch You cannot use ML packages like RLtoolkit, python ML packages, tensor flow, etc All programming will be done in C
8 Academic integrity If you are caught cheating, copying, working together, plagiarizing: you will be reported to the university you may get a zero on the assignment/exam/project you may get a fail in the course you may be expelled We have problems with this every single year. Last year people plagiarized and copied assignments. Don t let it happen to you!!
9 What is artificial intelligence? Get out some paper and write down a definition What is machine learning? How do they differ?
10 What is artificial intelligence? Intelligence is the most powerful phenomena in the universe Ray Kurzweil The phenomena is that there are systems in the universe that are well thought of as goal seeking systems A science of mind Sutton When people finally come to understand the principles of intelligence what it is and how it works well enough to design and create beings as intelligent as ourselves It is the science and engineering of making intelligent machines, especially intelligent computer programs. Intelligence is the computational part of the ability to achieve goals in the world. John McCarthy
11 What is machine learning? A branch of computational statistics, with a specific focus on efficiency and scalability me Machine learning is the subfield of computer science that "gives computers the ability to learn without being explicitly programmed" Arthur Samuel Often AI and ML are interchanged One way to keep it simple is AI defines a problem, a research goal and ML defines a set of tools and a computational perspective these tools can be applied to a variety of applications and can be used to help understand and replicate the principles of human intelligence
12 What is reinforcement learning?
13 Goals and ambitions for the Course Learn the methods and foundational ideas of RL Outcome: prepared to apply RL to a novel application Outcome: prepared to do research in RL Learn some new ways of thinking about AI research The agent perspective interaction between learning and decision making experience as an unending stream temporally correlated actions and agent lives a life
14 Things we will not get into Deep reinforcement learning Neural networks Nonlinear representations of state Large applications of RL When we are done, you will be able to learn these topics on your own This course will help demystify modern RL
15 What is Reinforcement Learning? An approach to Artificial Intelligence it is a problem specification, a class of methods, and a field of study Learning from interaction Goaloriented learning Learning about, from, and while interacting with an external & unknown environment Learning what to do how to map situations to actions so as to maximize a numerical reward signal
16 Key Features of RL Learner is not told which actions to take no teacher, no labels (not supervised learning) TrialandError search (learn by doing) Possibility of delayed reward Sacrifice shortterm gains for greater longterm gains e.g., games like backgammon, most interesting problems The need to tradeoff: explore and exploit Considers the whole problem of a goaldirected agent interacting with an uncertain environment
17 Supervised, Unsupervised, & RL In SL the system is told what the correct response should have been: given a set of labelled training examples (teach provides labels) objective is to do well on unlabelled examples (generalize well to new examples) Unsupervised learning is about learning/uncovering the structure of unlabelled examples: e.g., finding a lower dimensional representation of the data this certainly could be useful to an RL agent, but does not address the key problem of maximizing reward
18 RL is influenced by and is influencing Computer Science Engineering Mathematics Optimal Control Operations Research Machine Learning Reinforcement Learning Bounded Rationality Reward System Classical/Operant Conditioning Neuroscience Psychology Economics David Silver
19 Agentenvironment interaction stream Interaction produces a temporal stream of data Continual learning, acting, and planning Object is to affect the environment Environment is stochastic and uncertain Environment Agent State, Stimulus, Situation Reward, Gain, Payoff, Cost Environment (world) Action, Response, Control
20 Example: Hajime Kimura s RL Robots (slide from Sutton) Backward New Robot, Same algorithm Before After
21 RL + Deep Learing Performance on Atari Games Space Invaders Breakout Enduro
22 RL + Deep Learning, applied to Classic Atari Games Google Deepmind 2015, Bowling et al Learned to play 49 games for the Atari 2600 game console, without labels RESEARCH or LETTER human input, from selfplay and the score alone Convolution Convolution Fully connected Fully connected No input mapping raw screen pixels to predictions of final score for each of 18 joystick actions Figure 1 Schematic illustration of the convolutional neural network. The details of the architecture are explained in the Methods. The input to the neural network consists of an image produced by the preprocessing by a rectifier nonlinearity (that is, maxð0,xþ). Learned to play better than all previous algorithms map w, followed by three convolutional layers (note: snaking blue line and at human level for more than half the games symbolizes sliding of each filter across input image) and two fully connected layers with a single output for each valid action. Each hidden layer is followed Same learning algorithm applied to all 49 games! w/o human tuning
23 Classic Examples of Reinforcement Learning Elevator Control Crites & Barto (Probably) world's best downpeak elevator controller Helicopter control Ng & Abbel ( watch?v=vcdxqn0fcne#t=22) Can perform maneuvers that no human operator can; modelbased RL TDGammon and Jellyfish Tesauro, Dahl World's best backgammon player
24 More Examples of RL Robot learning to walk Schuitema ( v=sbf5efeiw) 00:45 Uses methods you will learn about in this class Octopus arm simulator Engel ( icml07_engel_demo/) 9:00, 10:55 Bayesian Temporal difference learning Keepaway Soccer Sutton & Stone Higher dimensional control using RL and tile coding Go, Hearts, other games
25 Elements of RL Policy: what to do, mapping from situations to actions Reward: what is good. Immediate Value: what is good because it predicts reward. Longer term Model: what follows what
26 Rewards Single scalar number Provided by you the designer. Not really that hard Agent seeks to maximize total future reward nextstep reward optimization is usually suboptimal The agent cannot change how rewards are generated, they are outside the agent s control In general rewards are stochastic functions of the state of the environment A good example of reward is physical pain and pleasure (positive and negative reward)
27 Value functions The value of a state specifies how good the state is in terms of future total reward The value function predicts longterm reward Reward tells us immediate goodness of moving from one state to another Value specify longterm desirability of states taking into account the states that usually follow We might incur short term negative reward to achieve higher value in the long term The environment produces rewards in response to the agent s actions The agent constructs an estimate of the value function to help select actions the yield high longterm reward
28 Policies A mapping from perceived states to actions Encoding how the agent behaves over time Policies may be stochastic They may be represented as a table, or involve some complex optimization
29 Models A model mimics the environment A model might predict the next state and reward, given the current state and action a prediction of what would happen in the environment if the agent took an action in some state. A simulation Models are often used for planning deciding a course of action by, for example, simulating different courses of action and choosing among them In this course we will consider an approach to planning that is stronglyconnected with value function learning: very different from the classic approaches considered in GOFAI
30 RLtoolkit demo
31 Limitations of course contents The methods we will study assume their exists some state, with certain properties almost all RL theory assumes this the algorithms work very well when we only have limited or incomplete access to state Most methods covered will estimate valuefunctions large chunk of modern RL other non valuefunction methods like evolution methods are possible but outside the scope we will cover policy gradient methods that learn a parameterized policy, and a value function
32 An Extended Example: TicTacToe Two player, turn based game Win if three in a row Assume draw and loss are equally bad We assume we do not know anything about the opponent s strategy, and he/she is not perfect Instead learn from experience generated from playing many games Can we build an agent to exploit our opponent? Maximize the chance of winning?
33 Possible solutions Specify or learn a model of our opponent is their strategy stationary? Evolutionary search search the space of policies maintain a population fitness measured via prob of winning Reinforcement learning we are player X
34 One way to approach this as an RL task Create a value function table of numbers for state of the game prob of winning from each state takes into account what policy does in the future all states with 3 X s in a row have value 1.0 all states with 3 O s in a row have value 0.0 all draw states have value 0.0 initially set rest to 0.5 The policy to select a move, examine next possible states from current one (look ahead) most of the time pick one with largest value greedy, or exploitive occasionally pick randomly: exploratory
35 x x x o o x x o o o o x o o x x x o o An RL Approach to TicTacToe 1. Make a table with one entry per state: State x V(s) estimated probability of winning.5?.5? 1 win loss 0 draw Just pick the next state with the highest estimated prob. of winning the largest V(s); a greedy move. 2. Now play lots of games. But 10% of the time pick a move at random; an exploratory move. To pick our moves, look ahead one step: current state * various possible next states
36 RL Learning Rule for TicTacToe Opponent's Move { Our Move { Opponent's Move { Our Move { Opponent's Move { Our Move { * e' Starting Position a c c * b d e f g g* Exploratory move s s ʹ the state before our greedy move the state after our greedy move We increment each V(s) toward V( s ʹ ) a backup : V(s) V (s) + α[ V( s ʹ ) V (s)] a small positive fraction, e.g., α =.1 the step size parameter
37 Learning rule If we reduce the stepsize over time, this approach converges, giving the optimal moves for a fixed opponent If we keep the stepsize small but constant this approach tracks and plays well against opponents that change their strategy over time Where is the reward?
38 Attributes of this simple task Learn while interacting learning affects how we play, which requires learning the values of the new policy, which changes how we play, which Clear goal Delayed consequences of action Sophisticated behavior without a model of opponent or search over action sequences Just a value function + onestep model RL methods can be applied when no model is available In the beginning the agent didn t know anything about it s action consequences how could we inject prior knowledge?
39 How can we improve this T.T.T. player? Do we need random moves? Why? Do we always need a full 10%? Can we learn from random moves? Can we learn offline? Pretraining from self play? Using learned models of opponent?...
40 What would happen if we learned from exploratory moves? Not doing so: we learn the probability of winning under optimal play prob of win from current state if we choose some action then then played optimally from then on Learning from exploratory moves: we learn the probability of winning under the policy that includes exploration the estimates take into account that we sometimes explore this will likely result in different moves (sometimes safer moves) If we continue to explore forever, this second approach may end up being better: winning more games
41 How is TicTacToe Too Easy? Finite, small number of states Backgammon for example has Go has more unique configurations than atoms in the universe! Onestep lookahead is always possible in TTT State completely observable...
42 The Course Part I: The Problem Introduction Evaluative Feedback The Reinforcement Learning Problem Part II: Elementary Solution Methods Dynamic Programming Monte Carlo Methods Temporal Difference Learning Part III: A Unified View Eligibility Traces Generalization and Function Approximation Planning and Learning Advanced topics (see canvas) RL in psychology and animal learning Case Studies
43 Next Class Thursday: Read Chapter 2 of Sutton & Barto (2016), you can skip 2.7 I find the history section at the end particularly interesting! 2 thought questions about chapter 1 & 2 are due Monday 16th (the day before that way we can discuss them in class Assignment #1 will be released today. We will talk about it next time
Exploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationTD(λ) and Q-Learning Based Ludo Players
TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationReinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationLecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM
Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationLecture 6: Applications
Lecture 6: Applications Michael L. Littman Rutgers University Department of Computer Science Rutgers Laboratory for Real-Life Reinforcement Learning What is RL? Branch of machine learning concerned with
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationHigh-level Reinforcement Learning in Strategy Games
High-level Reinforcement Learning in Strategy Games Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Guy Shani Department of Computer
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationCSL465/603 - Machine Learning
CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am
More informationSpeeding Up Reinforcement Learning with Behavior Transfer
Speeding Up Reinforcement Learning with Behavior Transfer Matthew E. Taylor and Peter Stone Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 {mtaylor, pstone}@cs.utexas.edu
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationAMULTIAGENT system [1] can be defined as a group of
156 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008 A Comprehensive Survey of Multiagent Reinforcement Learning Lucian Buşoniu, Robert Babuška,
More informationFoothill College Summer 2016
Foothill College Summer 2016 Intermediate Algebra Math 105.04W CRN# 10135 5.0 units Instructor: Yvette Butterworth Text: None; Beoga.net material used Hours: Online Except Final Thurs, 8/4 3:30pm Phone:
More informationContinual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots
Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Varun Raj Kompella, Marijn Stollenga, Matthew Luciw, Juergen Schmidhuber The Swiss AI Lab IDSIA, USI
More informationLecture 1: Basic Concepts of Machine Learning
Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010
More informationImproving Action Selection in MDP s via Knowledge Transfer
In Proc. 20th National Conference on Artificial Intelligence (AAAI-05), July 9 13, 2005, Pittsburgh, USA. Improving Action Selection in MDP s via Knowledge Transfer Alexander A. Sherstov and Peter Stone
More informationIntroduction to Simulation
Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /
More informationACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014
UNSW Australia Business School School of Risk and Actuarial Studies ACTL5103 Stochastic Modelling For Actuaries Course Outline Semester 2, 2014 Part A: Course-Specific Information Please consult Part B
More informationThe Evolution of Random Phenomena
The Evolution of Random Phenomena A Look at Markov Chains Glen Wang glenw@uchicago.edu Splash! Chicago: Winter Cascade 2012 Lecture 1: What is Randomness? What is randomness? Can you think of some examples
More informationAutomatic Discretization of Actions and States in Monte-Carlo Tree Search
Automatic Discretization of Actions and States in Monte-Carlo Tree Search Guy Van den Broeck 1 and Kurt Driessens 2 1 Katholieke Universiteit Leuven, Department of Computer Science, Leuven, Belgium guy.vandenbroeck@cs.kuleuven.be
More informationADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF
Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download
More informationLearning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for
Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com
More informationEvolutive Neural Net Fuzzy Filtering: Basic Description
Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:
More informationENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering
ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering Lecture Details Instructor Course Objectives Tuesday and Thursday, 4:00 pm to 5:15 pm Information Technology and Engineering
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationFF+FPG: Guiding a Policy-Gradient Planner
FF+FPG: Guiding a Policy-Gradient Planner Olivier Buffet LAAS-CNRS University of Toulouse Toulouse, France firstname.lastname@laas.fr Douglas Aberdeen National ICT australia & The Australian National University
More informationA Reinforcement Learning Variant for Control Scheduling
A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement
More informationIAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)
IAT 888: Metacreation Machines endowed with creative behavior Philippe Pasquier Office 565 (floor 14) pasquier@sfu.ca Outline of today's lecture A little bit about me A little bit about you What will that
More informationUniversity of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.
University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING Calendar Description Units: 1.5 Hours: 3-2 Neural and cognitive processes underlying human skilled
More informationReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology
ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology Tiancheng Zhao CMU-LTI-16-006 Language Technologies Institute School of Computer Science Carnegie Mellon
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationTeachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners
Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners Andrea L. Thomaz and Cynthia Breazeal Abstract While Reinforcement Learning (RL) is not traditionally designed
More informationGenerative models and adversarial training
Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationAn OO Framework for building Intelligence and Learning properties in Software Agents
An OO Framework for building Intelligence and Learning properties in Software Agents José A. R. P. Sardinha, Ruy L. Milidiú, Carlos J. P. Lucena, Patrick Paranhos Abstract Software agents are defined as
More informationAGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016
AGENDA Advanced Learning Theories Alejandra J. Magana, Ph.D. admagana@purdue.edu Introduction to Learning Theories Role of Learning Theories and Frameworks Learning Design Research Design Dual Coding Theory
More informationCS 100: Principles of Computing
CS 100: Principles of Computing Kevin Molloy August 29, 2017 1 Basic Course Information 1.1 Prerequisites: None 1.2 General Education Fulfills Mason Core requirement in Information Technology (ALL). 1.3
More informationSpring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes
Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes Instructor: Dr. Gregory L. Wiles Email Address: Use D2L e-mail, or secondly gwiles@spsu.edu Office: M
More informationDeep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach
#BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying
More informationarxiv: v1 [cs.lg] 15 Jun 2015
Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and
More informationActive Learning. Yingyu Liang Computer Sciences 760 Fall
Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,
More informationTHE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography
THE UNIVERSITY OF SYDNEY Semester 2, 2017 Information Sheet for MATH2068/2988 Number Theory and Cryptography Websites: It is important that you check the following webpages regularly. Intermediate Mathematics
More informationManagerial Decision Making
Course Business Managerial Decision Making Session 4 Conditional Probability & Bayesian Updating Surveys in the future... attempt to participate is the important thing Work-load goals Average 6-7 hours,
More informationEvolution of Symbolisation in Chimpanzees and Neural Nets
Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication
More informationKnowledge Transfer in Deep Convolutional Neural Nets
Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract
More informationLearning Prospective Robot Behavior
Learning Prospective Robot Behavior Shichao Ou and Rod Grupen Laboratory for Perceptual Robotics Computer Science Department University of Massachusetts Amherst {chao,grupen}@cs.umass.edu Abstract This
More informationSeminar - Organic Computing
Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts
More informationVisual CP Representation of Knowledge
Visual CP Representation of Knowledge Heather D. Pfeiffer and Roger T. Hartley Department of Computer Science New Mexico State University Las Cruces, NM 88003-8001, USA email: hdp@cs.nmsu.edu and rth@cs.nmsu.edu
More informationChallenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley
Challenges in Deep Reinforcement Learning Sergey Levine UC Berkeley Discuss some recent work in deep reinforcement learning Present a few major challenges Show some of our recent work toward tackling
More informationUsing focal point learning to improve human machine tacit coordination
DOI 10.1007/s10458-010-9126-5 Using focal point learning to improve human machine tacit coordination InonZuckerman SaritKraus Jeffrey S. Rosenschein The Author(s) 2010 Abstract We consider an automated
More informationAn Introduction to Simio for Beginners
An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality
More informationLanguage Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus
Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,
More informationPlanning with External Events
94 Planning with External Events Jim Blythe School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 blythe@cs.cmu.edu Abstract I describe a planning methodology for domains with uncertainty
More informationIntelligent Agents. Chapter 2. Chapter 2 1
Intelligent Agents Chapter 2 Chapter 2 1 Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types The structure of agents Chapter 2 2 Agents
More informationAI Agent for Ice Hockey Atari 2600
AI Agent for Ice Hockey Atari 2600 Emman Kabaghe (emmank@stanford.edu) Rajarshi Roy (rroy@stanford.edu) 1 Introduction In the reinforcement learning (RL) problem an agent autonomously learns a behavior
More informationSurprise-Based Learning for Autonomous Systems
Surprise-Based Learning for Autonomous Systems Nadeesha Ranasinghe and Wei-Min Shen ABSTRACT Dealing with unexpected situations is a key challenge faced by autonomous robots. This paper describes a promising
More informationDIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.
DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE Sample 2-Year Academic Plan DRAFT Junior Year Summer (Bridge Quarter) Fall Winter Spring MMDP/GAME 124 GAME 310 GAME 318 GAME 330 Introduction to Maya
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationDOCTOR OF PHILOSOPHY HANDBOOK
University of Virginia Department of Systems and Information Engineering DOCTOR OF PHILOSOPHY HANDBOOK 1. Program Description 2. Degree Requirements 3. Advisory Committee 4. Plan of Study 5. Comprehensive
More informationEECS 700: Computer Modeling, Simulation, and Visualization Fall 2014
EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014 Course Description The goals of this course are to: (1) formulate a mathematical model describing a physical phenomenon; (2) to discretize
More informationMajor Milestones, Team Activities, and Individual Deliverables
Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationOn the Combined Behavior of Autonomous Resource Management Agents
On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationLearning to Schedule Straight-Line Code
Learning to Schedule Straight-Line Code Eliot Moss, Paul Utgoff, John Cavazos Doina Precup, Darko Stefanović Dept. of Comp. Sci., Univ. of Mass. Amherst, MA 01003 Carla Brodley, David Scheeff Sch. of Elec.
More informationSpeaker Identification by Comparison of Smart Methods. Abstract
Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer
More informationProbability and Game Theory Course Syllabus
Probability and Game Theory Course Syllabus DATE ACTIVITY CONCEPT Sunday Learn names; introduction to course, introduce the Battle of the Bismarck Sea as a 2-person zero-sum game. Monday Day 1 Pre-test
More informationLearning and Transferring Relational Instance-Based Policies
Learning and Transferring Relational Instance-Based Policies Rocío García-Durán, Fernando Fernández y Daniel Borrajo Universidad Carlos III de Madrid Avda de la Universidad 30, 28911-Leganés (Madrid),
More informationLahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017
Instructor Syed Zahid Ali Room No. 247 Economics Wing First Floor Office Hours Email szahid@lums.edu.pk Telephone Ext. 8074 Secretary/TA TA Office Hours Course URL (if any) Suraj.lums.edu.pk FINN 321 Econometrics
More informationCal s Dinner Card Deals
Cal s Dinner Card Deals Overview: In this lesson students compare three linear functions in the context of Dinner Card Deals. Students are required to interpret a graph for each Dinner Card Deal to help
More informationUsing Deep Convolutional Neural Networks in Monte Carlo Tree Search
Using Deep Convolutional Neural Networks in Monte Carlo Tree Search Tobias Graf (B) and Marco Platzner University of Paderborn, Paderborn, Germany tobiasg@mail.upb.de, platzner@upb.de Abstract. Deep Convolutional
More informationAn investigation of imitation learning algorithms for structured prediction
JMLR: Workshop and Conference Proceedings 24:143 153, 2012 10th European Workshop on Reinforcement Learning An investigation of imitation learning algorithms for structured prediction Andreas Vlachos Computer
More informationGame-based formative assessment: Newton s Playground. Valerie Shute, Matthew Ventura, & Yoon Jeon Kim (Florida State University), NCME, April 30, 2013
Game-based formative assessment: Newton s Playground Valerie Shute, Matthew Ventura, & Yoon Jeon Kim (Florida State University), NCME, April 30, 2013 Fun & Games Assessment Needs Game-based stealth assessment
More informationCS177 Python Programming
CS177 Python Programming Recitation 1 Introduction Adapted from John Zelle s Book Slides 1 Course Instructors Dr. Elisha Sacks E-mail: eps@purdue.edu Ruby Tahboub (Course Coordinator) E-mail: rtahboub@purdue.edu
More informationMAT 122 Intermediate Algebra Syllabus Summer 2016
Instructor: Gary Adams Office: None (I am adjunct faculty) Phone: None Email: gary.adams@scottsdalecc.edu Office Hours: None CLASS TIME and LOCATION: Title Section Days Time Location Campus MAT122 12562
More informationA Review: Speech Recognition with Deep Learning Methods
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.1017
More informationA Case-Based Approach To Imitation Learning in Robotic Agents
A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu
More informationEVOLVING POLICIES TO SOLVE THE RUBIK S CUBE: EXPERIMENTS WITH IDEAL AND APPROXIMATE PERFORMANCE FUNCTIONS
EVOLVING POLICIES TO SOLVE THE RUBIK S CUBE: EXPERIMENTS WITH IDEAL AND APPROXIMATE PERFORMANCE FUNCTIONS by Robert Smith Submitted in partial fulfillment of the requirements for the degree of Master of
More informationCS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus
CS 1103 Computer Science I Honors Fall 2016 Instructor Muller Syllabus Welcome to CS1103. This course is an introduction to the art and science of computer programming and to some of the fundamental concepts
More informationLearning goal-oriented strategies in problem solving
Learning goal-oriented strategies in problem solving Martin Možina, Timotej Lazar, Ivan Bratko Faculty of Computer and Information Science University of Ljubljana, Ljubljana, Slovenia Abstract The need
More informationSoftprop: Softmax Neural Network Backpropagation Learning
Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science
More informationWelcome to. ECML/PKDD 2004 Community meeting
Welcome to ECML/PKDD 2004 Community meeting A brief report from the program chairs Jean-Francois Boulicaut, INSA-Lyon, France Floriana Esposito, University of Bari, Italy Fosca Giannotti, ISTI-CNR, Pisa,
More informationSpring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209. SYLLABUS and COURSE INFORMATION.
Spring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209 Professor Peter Nemethy SYLLABUS and COURSE INFORMATION. Office: 707 Meyer Telephone: 8-7747 ( external 212 998 7747 ) e-mail: peter.nemethy@nyu.edu
More informationPhotography: Photojournalism and Digital Media Jim Lang/B , extension 3069 Course Descriptions
Course Descriptions Photography: Photojournalism and Digital Media Jim Lang/B105-107 812-542-8504, extension 3069 jlang@nafcs.k12.in.us http://fcmediamatters.wordpress.com Journalism I: Journalism I is
More informationModel Ensemble for Click Prediction in Bing Search Ads
Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com
More informationFINANCE 3320 Financial Management Syllabus May-Term 2016 *
FINANCE 3320 Financial Management Syllabus May-Term 2016 * Instructor details: Professor Mukunthan Santhanakrishnan Office: Fincher 335 Office phone: 214-768-2260 Email: muku@smu.edu Class details: Days:
More informationOn the Formation of Phoneme Categories in DNN Acoustic Models
On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-
More informationMaster s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors
Master s Programme in Computer, Communication and Information Sciences, Study guide 2015-2016, ELEC Majors Sisällysluettelo PS=pääsivu, AS=alasivu PS: 1 Acoustics and Audio Technology... 4 Objectives...
More information