Reinforcement Learning. CS 188: Artificial Intelligence Reinforcement Learning. Reinforcement Learning. Example: Learning to Walk. The Crawler!
|
|
- Aubrie Morrison
- 6 years ago
- Views:
Transcription
1 CS 188: rtificial Intelligence Dan Klein, Pieter bbeel Univerity of California, Berkeley xample: Learning to Walk gent State: Reward: r ction: a nvironment Baic idea: Receive feedback in the form of reward gent utility i defined by the reward function Mut (learn to) act o a to maximize expected reward ll learning i baed on oberved ample of outcome! Before Learning Learning Trial fter Learning [1K Trial] [Kohl and Stone, ICR 24] The Crawler! Still aume a Markov deciion proce (MDP): et of tate S et of action (per tate) model T(,a, ) reward function R(,a, ) Still looking for a policy π() New twit: don t know T or R I.e. we don t know which tate are good or what the action do Mut actually try action and tate out to learn [You, in Project 3] 1
2 Offline (MDP) v. Online (RL) Paive Offline Solution Online Learning Paive Simplified tak: policy evaluation Input: a fixed policy π() You don t know the tranition T(,a, ) You don t know the reward R(,a, ) Goal: learn the tate value In thi cae: Learner i along for the ride No choice about what action to take Jut execute the policy and learn from experience Thi i NOT offline planning! You actually take action in the world. Direct valuation Goal: Compute value for each tate under π Idea: verage together oberved ample value ct according to π very time you viit a tate, write down what the um of dicounted reward turned out to be verage thoe ample Thi i called direct evaluation xample: Direct valuation Problem with Direct valuation Input Policy π ume: γ= 1 Oberved piode (Training) piode 1 piode 2 B, eat, C, -1 D, exit, x, +1 B, eat, C, -1 D, exit, x, +1 piode 3 piode 4, north, C, -1 D, exit, x, +1, north, C, -1 C, eat,, -1, exit, x, -1 Output Value What good about direct evaluation? It eay to undertand It doen t require any knowledge of T, R It eventually compute the correct average value, uing jut ample tranition What bad about it? It wate information about tate connection ach tate mut be learned eparately So, it take a long time to learn Output Value If B and both go to C under thi policy, how can their value be different? 2
3 Why Not Ue Policy valuation? Simplified Bellman update calculate V for a fixed policy: ach round, replace V with a one-tep-look-ahead layer over V π(), π() xample: xpected ge Goal: Compute expected age of c188 tudent Known P() Thi approach fully exploited the connection between the tate Unfortunately, we need T and R to do it!, π(), Key quetion: how can we do thi update to V without knowing T and R? In other word, how to we take a weighted average without knowing the weight? Why doe thi work? Becaue eventually you learn the right model. Without P(), intead collect ample [a 1, a 2, a N ] Unknown P(): Model Baed Unknown P(): Model Free Why doe thi work? Becaue ample appear with the right frequencie. Model-Baed Learning Model-Baed Learning Model-Baed Idea: Learn an approximate model baed on experience Solve for value a if the learned model were correct Step 1: Learn empirical MDP model Count outcome for each, a Normalize to give an etimate of Dicover each when we experience (, a, ) Step 2: Solve the learned MDP For example, ue policy evaluation xample: Model-Baed Learning Model-Free Learning Input Policy π ume: γ= 1 Oberved piode (Training) piode 1 piode 2 B, eat, C, -1 D, exit, x, +1 B, eat, C, -1 D, exit, x, +1 piode 3 piode 4, north, C, -1 D, exit, x, +1, north, C, -1 C, eat,, -1, exit, x, -1 Learned Model T(,a, ). T(B, eat, C) = 1. T(C, eat, D) =.75 T(C, eat, ) =.25 R(,a, ). R(B, eat, C) = -1 R(C, eat, D) = -1 R(D, exit, x) = +1 3
4 Sample-Baed Policy valuation? Temporal Difference Learning We want to improve our etimate of V by computing thee average: Idea: Take ample of outcome (by doing the action!) and average π() Big idea: learn from every experience! Update V() each time we experience a tranition (, a,, r) Likely outcome will contribute update more often Temporal difference learning of value Policy till fixed, till doing evaluation! Move value toward value of whatever ucceor occur: running average π(), π(), π(), π(), 2' ' 1' 3' Sample of V(): Update to V(): lmot! But we can t rewind time to get ample after ample from tate. Same update: xponential Moving verage xample: Temporal Difference Learning xponential moving average The running interpolation update: State Oberved Tranition B, eat, C, -2 C, eat, D, -2 Make recent ample more important: Forget about the pat (ditant pat value were wrong anyway) Decreaing learning rate (alpha) can give converging average ume: γ= 1, α = 1/2 Problem with TD Value Learning ctive TD value leaning i a model-free way to do policy evaluation, mimicking Bellman update with running ample average However, if we want to turn value into a (new) policy, we re unk: a, a Idea: learn Q-value, not value Make action election model-free too!,a, 4
5 ctive Full reinforcement learning: optimal policie (like value iteration) You don t know the tranition T(,a, ) You don t know the reward R(,a, ) You chooe the action now Goal: learn the optimal policy / value In thi cae: Learner make choice! Fundamental tradeoff: exploration v. exploitation Thi i NOT offline planning! You actually take action in the world and find out what happen Detour: Q-Value Iteration Value iteration: find ucceive (depth-limited) value Start with V () =, which we know i right Given V k, calculate the depth k+1 value for all tate: But Q-value are more ueful, o compute them intead Start with Q (,a) =, which we know i right Given Q k, calculate the depth k+1 q-value for all q-tate: Q-Learning Q-Learning: ample-baed Q-value iteration Learn Q(,a) value a you go Receive a ample (,a,,r) Conider your old etimate: Conider your new ample etimate: Incorporate the new etimate into a running average: Q-Learning Propertie mazing reult: Q-learning converge to optimal policy --even if you re acting uboptimally! Thi i called off-policy learning Caveat: You have to explore enough You have to eventually make the learning rate mall enough but not decreae it too quickly Baically, in the limit, it doen t matter how you elect action (!) [demo grid, crawler Q ] CS 188: rtificial Intelligence II We till aume an MDP: et of tate S et of action (per tate) model T(,a, ) reward function R(,a, ) Still looking for a policy π() Dan Klein, Pieter bbeel Univerity of California, Berkeley New twit: don t know T or R I.e. don t know which tate are good or what the action do Mut actually try action and tate out to learn 5
6 The Story So Far: MDP and RL Known MDP: Offline Solution Goal Technique Compute V*, Q*, π* Value / policy iteration valuate a fixed policy π Policy evaluation Unknown MDP: Model-Baed Unknown MDP: Model-Free Goal Technique Goal Technique Compute V*, Q*, π* VI/PI on approx. MDP Compute V*, Q*, π* Q-learning valuate a fixed policy π P on approx. MDP valuate a fixed policy π Value Learning Model-free (temporal difference) learning xperience world through epiode Update etimate each tranition Over time, update will mimic Bellman update Model-Free Learning Q-Value Iteration (model-baed, require known MDP) Q-Learning (model-free, require only experienced tranition) a, a r a, a Q-Learning We d like to do Q-value update to each Q-tate: But can t compute thi update without knowing T, R Intead, compute average a we go Receive a ample tranition (,a,r, ) Thi ample ugget But we want to average over reult from (,a) (Why?) So keep a running average Q-Learning Propertie mazing reult: Q-learning converge to optimal policy --even if you re acting uboptimally! Thi i called off-policy learning Caveat: You have to explore enough You have to eventually make the learning rate mall enough but not decreae it too quickly Baically, in the limit, it doen t matter how you elect action (!) [demo off policy] xploration v. xploitation How to xplore? Several cheme for forcing exploration Simplet: random action (ε-greedy) very time tep, flip a coin With (mall) probability ε, act randomly With (large) probability 1-ε, act on current policy Problem with random action? You do eventually explore the pace, but keep thrahing around once learning i done One olution: lower εover time nother olution: exploration function [demo crawler] 6
7 xploration Function Regret When to explore? Random action: explore a fixed amount Better idea: explore area whoe badne i not (yet) etablihed, eventually top exploring xploration function Take a value etimate u and a viit count n, and return an optimitic utility, e.g. Regular Q-Update: Modified Q-Update: Note: thi propagate the bonu back to tate that lead to unknown tate a well! [demo crawler] ven if you learn the optimal policy, you till make mitake along the way! Regret i a meaure of your total mitake cot: the difference between your (expected) reward, including youthful uboptimality, and optimal (expected) reward Minimizing regret goe beyond learning to be optimal it require optimally learning to be optimal xample: random exploration and exploration function both end up optimal, but random exploration ha higher regret pproximate Q-Learning Generalizing cro State Baic Q-Learning keep a table of all q-value In realitic ituation, we cannot poibly learn about every ingle tate! Too many tate to viit them all in training Too many tate to hold the q-table in memory Intead, we want to generalize: Learn about ome mall number of training tate from experience Generalize that experience to new, imilar ituation Thi i a fundamental idea in machine learning, and we ll ee it over and over again xample: Pacman Feature-Baed Repreentation Let ay we dicover through experience that thi tate i bad: In naïve q-learning, we know nothing about thi tate: Or even thi one! Solution: decribe a tate uing a vector of feature (propertie) Feature are function from tate to real number (often /1) that capture important propertie of the tate xample feature: Ditance to cloet ghot Ditance to cloet dot Number of ghot 1 / (dit to dot) 2 I Pacmanin a tunnel? (/1) etc. I it the exact tate on thi lide? Can alo decribe a q-tate (, a) with feature (e.g. action move cloer to food) [demo RL pacman] 7
8 Linear Value Function pproximate Q-Learning Uing a feature repreentation, we can write a q function (or value function) for any tate uing a few weight: Q-learning with linear Q-function: dvantage: our experience i ummed up in a few powerful number Diadvantage: tate may hare feature but actually be very different in value! Intuitive interpretation: djut weight of active feature.g., if omething unexpectedly bad happen, blame the feature that were on: dipreferall tate with that tate feature Formal jutification: online leat quare xact Q pproximate Q xample: Q-Pacman Q-Learning and Leat Square [demo RL pacman] Linear pproximation: Regreion* Optimization: Leat Square* Obervation rror or reidual Prediction Prediction: Prediction: 2 8
9 Minimizing rror* Imagine we had only one point x, with feature f(x), target value y, and weight w: Overfitting: Why Limiting Capacity Can Help* Degree 15 polynomial pproximate q update explained: -5 target prediction Policy Search Policy Search Problem: often the feature-baed policie that work well (win game, maximize utilitie) aren t the one that approximate V / Q bet.g. your value function from project 2 were probably horrible etimate of future reward, but they till produced good deciion Q-learning priority: get Q-value cloe (modeling) ction election priority: get ordering of Q-value right (prediction) We ll ee thi ditinction between modeling and prediction again later in the coure Solution: learn policie that maximize reward, not the value that predict them Policy earch: tart with an ok olution (e.g. Q-learning) then fine-tune by hill climbing on feature weight Policy Search Simplet policy earch: Start with an initial linear value function or Q-function Nudge each feature weight up and down and ee if your policy i better than before Problem: How do we tell the policy got better? Need to run many ample epiode! If there are a lot of feature, thi can be impractical Better method exploit lookaheadtructure, ample wiely, change multiple parameter Concluion We re done with Part I: Search and Planning! We ve een how I method can olve problem in: Search Contraint Satifaction Problem Game Markov Deciion Problem Next up: Part II: Uncertainty and Learning! 9
Lecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationExploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hoted at the Radboud Repoitory of the Radboud Univerity Nijmegen The folloing full text i a publiher' verion. For additional information about thi publication click thi link. http://hdl.handle.net/2066/43776
More informationReinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationA Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
A Joint Many-Tak Model: Growing a Neural Network for Multiple NLP Tak Kazuma Hahimoto, Caiming Xiong, Yohimaa Turuoka, and Richard Socher The Univerity of Tokyo {hay, turuoka}@logo.t.u-tokyo.ac.jp Saleforce
More informationINFORMATION SEEKING BEHAVIOR OF USERS OF ICT ORIENTED COLLEGES: A CASE STUDY
Review Of Reearch Impact Factor :.40(UIF) ISSN 49-894X Volume - 5 Iue - Oct - 05 INFORMATION SEEKING BEHAVIOR OF USERS OF ICT ORIENTED COLLEGES: A CASE STUDY Dr. Sachin D. Sakarkar Shri. R. R. Lahoti Science
More informationImproving Action Selection in MDP s via Knowledge Transfer
In Proc. 20th National Conference on Artificial Intelligence (AAAI-05), July 9 13, 2005, Pittsburgh, USA. Improving Action Selection in MDP s via Knowledge Transfer Alexander A. Sherstov and Peter Stone
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationAMULTIAGENT system [1] can be defined as a group of
156 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008 A Comprehensive Survey of Multiagent Reinforcement Learning Lucian Buşoniu, Robert Babuška,
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationSpeeding Up Reinforcement Learning with Behavior Transfer
Speeding Up Reinforcement Learning with Behavior Transfer Matthew E. Taylor and Peter Stone Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 {mtaylor, pstone}@cs.utexas.edu
More informationISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM
Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and
More informationRegret-based Reward Elicitation for Markov Decision Processes
444 REGAN & BOUTILIER UAI 2009 Regret-based Reward Elicitation for Markov Decision Processes Kevin Regan Department of Computer Science University of Toronto Toronto, ON, CANADA kmregan@cs.toronto.edu
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationThe Evolution of Random Phenomena
The Evolution of Random Phenomena A Look at Markov Chains Glen Wang glenw@uchicago.edu Splash! Chicago: Winter Cascade 2012 Lecture 1: What is Randomness? What is randomness? Can you think of some examples
More informationHigh-level Reinforcement Learning in Strategy Games
High-level Reinforcement Learning in Strategy Games Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Guy Shani Department of Computer
More information12- A whirlwind tour of statistics
CyLab HT 05-436 / 05-836 / 08-534 / 08-734 / 19-534 / 19-734 Usable Privacy and Security TP :// C DU February 22, 2016 y & Secu rivac rity P le ratory bo La Lujo Bauer, Nicolas Christin, and Abby Marsh
More informationMajor Milestones, Team Activities, and Individual Deliverables
Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationLearning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for
Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationALL-IN-ONE MEETING GUIDE THE ECONOMICS OF WELL-BEING
ALL-IN-ONE MEETING GUIDE THE ECONOMICS OF WELL-BEING LeanIn.0rg, 2016 1 Overview Do we limit our thinking and focus only on short-term goals when we make trade-offs between career and family? This final
More informationTD(λ) and Q-Learning Based Ludo Players
TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationFF+FPG: Guiding a Policy-Gradient Planner
FF+FPG: Guiding a Policy-Gradient Planner Olivier Buffet LAAS-CNRS University of Toulouse Toulouse, France firstname.lastname@laas.fr Douglas Aberdeen National ICT australia & The Australian National University
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationTesting A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA
Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA Testing a Moving Target How Do We Test Machine Learning Systems? Peter Varhol, Technology
More informationA Syntactic Description of German in a Formalism Designed for Machine Translation
A Syntactic Decription of German in a Formalim Deigned for Machine Tranlation Paul Schmldt A-Eurotra-D Martln-Luther-Str. 14 D-6600 Saarbrlickcn Wet-Germany Abtract: Thi paper preent a yntactic decription
More informationHentai High School A Game Guide
Hentai High School A Game Guide Hentai High School is a sex game where you are the Principal of a high school with the goal of turning the students into sex crazed people within 15 years. The game is difficult
More informationGo fishing! Responsibility judgments when cooperation breaks down
Go fishing! Responsibility judgments when cooperation breaks down Kelsey Allen (krallen@mit.edu), Julian Jara-Ettinger (jjara@mit.edu), Tobias Gerstenberg (tger@mit.edu), Max Kleiman-Weiner (maxkw@mit.edu)
More informationCharacteristics of Functions
Characteristics of Functions Unit: 01 Lesson: 01 Suggested Duration: 10 days Lesson Synopsis Students will collect and organize data using various representations. They will identify the characteristics
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationIN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.
6 1 IN THIS UNIT YOU LEARN HOW TO: ask and answer common questions about jobs talk about what you re doing at work at the moment talk about arrangements and appointments recognise and use collocations
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationAn OO Framework for building Intelligence and Learning properties in Software Agents
An OO Framework for building Intelligence and Learning properties in Software Agents José A. R. P. Sardinha, Ruy L. Milidiú, Carlos J. P. Lucena, Patrick Paranhos Abstract Software agents are defined as
More informationGenerative models and adversarial training
Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationTask Completion Transfer Learning for Reward Inference
Task Completion Transfer Learning for Reward Inference Layla El Asri 1,2, Romain Laroche 1, Olivier Pietquin 3 1 Orange Labs, Issy-les-Moulineaux, France 2 UMI 2958 (CNRS - GeorgiaTech), France 3 University
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationSoftprop: Softmax Neural Network Backpropagation Learning
Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science
More informationContinual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots
Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Varun Raj Kompella, Marijn Stollenga, Matthew Luciw, Juergen Schmidhuber The Swiss AI Lab IDSIA, USI
More informationChallenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley
Challenges in Deep Reinforcement Learning Sergey Levine UC Berkeley Discuss some recent work in deep reinforcement learning Present a few major challenges Show some of our recent work toward tackling
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationLearning Prospective Robot Behavior
Learning Prospective Robot Behavior Shichao Ou and Rod Grupen Laboratory for Perceptual Robotics Computer Science Department University of Massachusetts Amherst {chao,grupen}@cs.umass.edu Abstract This
More informationENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering
ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering Lecture Details Instructor Course Objectives Tuesday and Thursday, 4:00 pm to 5:15 pm Information Technology and Engineering
More informationManagerial Decision Making
Course Business Managerial Decision Making Session 4 Conditional Probability & Bayesian Updating Surveys in the future... attempt to participate is the important thing Work-load goals Average 6-7 hours,
More informationKnowledge Transfer in Deep Convolutional Neural Nets
Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract
More informationUDL AND LANGUAGE ARTS LESSON OVERVIEW
UDL AND LANGUAGE ARTS LESSON OVERVIEW Title: Reading Comprehension Author: Carol Sue Englert Subject: Language Arts Grade Level 3 rd grade Duration 60 minutes Unit Description Focusing on the students
More informationSimple Random Sample (SRS) & Voluntary Response Sample: Examples: A Voluntary Response Sample: Examples: Systematic Sample Best Used When
Simple Random Sample (SRS) & Voluntary Response Sample: In statistics, a simple random sample is a group of people who have been chosen at random from the general population. A simple random sample is
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationTask Completion Transfer Learning for Reward Inference
Machine Learning for Interactive Systems: Papers from the AAAI-14 Workshop Task Completion Transfer Learning for Reward Inference Layla El Asri 1,2, Romain Laroche 1, Olivier Pietquin 3 1 Orange Labs,
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationQuantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)
Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur) 1 Interviews, diary studies Start stats Thursday: Ethics/IRB Tuesday: More stats New homework is available
More informationSight Word Assessment
Make, Take & Teach Sight Word Assessment Assessment and Progress Monitoring for the Dolch 220 Sight Words What are sight words? Sight words are words that are used frequently in reading and writing. Because
More informationAn Introduction to Simio for Beginners
An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality
More informationK5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc.
K5 Math Practice Boost Confidence Increase Scores Get Ahead Free Pilot Proposal Jan -Jun 2017 Studypad, Inc. 100 W El Camino Real, Ste 72 Mountain View, CA 94040 Table of Contents I. Splash Math Pilot
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationDiscriminative Learning of Beam-Search Heuristics for Planning
Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University
More informationCognitive Thinking Style Sample Report
Cognitive Thinking Style Sample Report Goldisc Limited Authorised Agent for IML, PeopleKeys & StudentKeys DISC Profiles Online Reports Training Courses Consultations sales@goldisc.co.uk Telephone: +44
More informationCSL465/603 - Machine Learning
CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am
More informationFoothill College Summer 2016
Foothill College Summer 2016 Intermediate Algebra Math 105.04W CRN# 10135 5.0 units Instructor: Yvette Butterworth Text: None; Beoga.net material used Hours: Online Except Final Thurs, 8/4 3:30pm Phone:
More informationarxiv: v1 [cs.lg] 15 Jun 2015
Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and
More informationP-4: Differentiate your plans to fit your students
Putting It All Together: Middle School Examples 7 th Grade Math 7 th Grade Science SAM REHEARD, DC 99 7th Grade Math DIFFERENTATION AROUND THE WORLD My first teaching experience was actually not as a Teach
More informationMachine Learning and Development Policy
Machine Learning and Development Policy Sendhil Mullainathan (joint papers with Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, Ziad Obermeyer) Magic? Hard not to be wowed But what makes
More informationUsing focal point learning to improve human machine tacit coordination
DOI 10.1007/s10458-010-9126-5 Using focal point learning to improve human machine tacit coordination InonZuckerman SaritKraus Jeffrey S. Rosenschein The Author(s) 2010 Abstract We consider an automated
More informationHow long did... Who did... Where was... When did... How did... Which did...
(Past Tense) Who did... Where was... How long did... When did... How did... 1 2 How were... What did... Which did... What time did... Where did... What were... Where were... Why did... Who was... How many
More informationEvolutive Neural Net Fuzzy Filtering: Basic Description
Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:
More informationLearning to Schedule Straight-Line Code
Learning to Schedule Straight-Line Code Eliot Moss, Paul Utgoff, John Cavazos Doina Precup, Darko Stefanović Dept. of Comp. Sci., Univ. of Mass. Amherst, MA 01003 Carla Brodley, David Scheeff Sch. of Elec.
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationOn the Combined Behavior of Autonomous Resource Management Agents
On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science
More informationHow People Learn Physics
How People Learn Physics Edward F. (Joe) Redish Dept. Of Physics University Of Maryland AAPM, Houston TX, Work supported in part by NSF grants DUE #04-4-0113 and #05-2-4987 Teaching complex subjects 2
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationNavigating the PhD Options in CMS
Navigating the PhD Options in CMS This document gives an overview of the typical student path through the four Ph.D. programs in the CMS department ACM, CDS, CS, and CMS. Note that it is not a replacement
More informationRenaissance Learning 32 Harbour Exchange Square London, E14 9GE +44 (0)
Maths Pretest Instructions It is extremely important that you follow standard testing procedures when you administer the STAR Maths test to your students. Before you begin testing, please check the following:
More informationMulti-genre Writing Assignment
Multi-genre Writing Assignment for Peter and the Starcatchers Context: The following is an outline for the culminating project for the unit on Peter and the Starcatchers. This is a multi-genre project.
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationProbability and Statistics Curriculum Pacing Guide
Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods
More informationa) analyse sentences, so you know what s going on and how to use that information to help you find the answer.
Tip Sheet I m going to show you how to deal with ten of the most typical aspects of English grammar that are tested on the CAE Use of English paper, part 4. Of course, there are many other grammar points
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationTruth Inference in Crowdsourcing: Is the Problem Solved?
Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer
More informationPurdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study
Purdue Data Summit 2017 Communication of Big Data Analytics New SAT Predictive Validity Case Study Paul M. Johnson, Ed.D. Associate Vice President for Enrollment Management, Research & Enrollment Information
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationYou re not a princess... But you can still rule the world.
5801 Fegenbuh Lane Louiville, Kentucky 40228 Non-Profit Org. U.S. Potage PAID Lebanon Jct., Ky. Permit No. 738 Spring 2014 Return Service Requeted You re not a prince... But you can till rule the world.
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationCS 446: Machine Learning
CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt
More informationDIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.
DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE Sample 2-Year Academic Plan DRAFT Junior Year Summer (Bridge Quarter) Fall Winter Spring MMDP/GAME 124 GAME 310 GAME 318 GAME 330 Introduction to Maya
More informationPREPARATION GUIDE FOR LEVEL 1 REGULATORY EXAMINATIONS Applicants and/or Key Individuals in Category III (RE4)
PREPARATION GUIDE FOR LEVEL 1 REGULATORY EXAMINATION Applicants and/or ey Individuals in Category III (RE4) Table of Contents 1. DICLAIMER... 2 2. BACGROUND TO THE REGULATORY EXAM... 2 3. FORMAT OF THE
More informationarxiv: v2 [cs.ro] 3 Mar 2017
Learning Feedback Terms for Reactive Planning and Control Akshara Rai 2,3,, Giovanni Sutanto 1,2,, Stefan Schaal 1,2 and Franziska Meier 1,2 arxiv:1610.03557v2 [cs.ro] 3 Mar 2017 Abstract With the advancement
More informationCS 100: Principles of Computing
CS 100: Principles of Computing Kevin Molloy August 29, 2017 1 Basic Course Information 1.1 Prerequisites: None 1.2 General Education Fulfills Mason Core requirement in Information Technology (ALL). 1.3
More informationICTCM 28th International Conference on Technology in Collegiate Mathematics
DEVELOPING DIGITAL LITERACY IN THE CALCULUS SEQUENCE Dr. Jeremy Brazas Georgia State University Department of Mathematics and Statistics 30 Pryor Street Atlanta, GA 30303 jbrazas@gsu.edu Dr. Todd Abel
More informationOutline for Session III
Outline for Session III Before you begin be sure to have the following materials Extra JM cards Extra blank break-down sheets Extra proposal sheets Proposal reports Attendance record Be at the meeting
More informationRunning Head: STUDENT CENTRIC INTEGRATED TECHNOLOGY
SCIT Model 1 Running Head: STUDENT CENTRIC INTEGRATED TECHNOLOGY Instructional Design Based on Student Centric Integrated Technology Model Robert Newbury, MS December, 2008 SCIT Model 2 Abstract The ADDIE
More informationPlanning for Preassessment. Kathy Paul Johnston CSD Johnston, Iowa
Planning for Preassessment Kathy Paul Johnston CSD Johnston, Iowa Why Plan? Establishes the starting point for learning Students can t learn what they already know Match instructional strategies to individual
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationThis map-tastic middle-grade story from Andrew Clements gives the phrase uncharted territory a whole new meaning!
A Curriculum Guide to The Map Trap By Andrew Clements About the Book This map-tastic middle-grade story from Andrew Clements gives the phrase uncharted territory a whole new meaning! Alton Barnes loves
More information