Imitation Learning Using Graphical Models
|
|
- Lora Doyle
- 6 years ago
- Views:
Transcription
1 Imiaion Learning Using Graphical Models Deepak Verma and Rajesh P.N. Rao Dep. of Compuer Science & Engineering Universiy of Washingon Seale, WA, USA hp://neural.cs.washingon.edu/ Absrac. Imiaion-based learning is a general mechanism for rapid acquisiion of new behaviors in auonomous agens and robos. In his paper, we propose a new approach o learning by imiaion based on parameer learning in probabilisic graphical models. Graphical models are used no only o model an agen s own dynamics bu also he dynamics of an observed eacher. Parameer ying beween he agen-eacher models ensures consisency and faciliaes learning. Given only observaions of he eacher s saes, we use he expecaion-maximizaion (EM) algorihm o learn boh dynamics and policies wihin graphical models. We presen resuls demonsraing ha EM-based imiaion learning ouperforms pure exploraion-based learning on a benchmark problem (he FlagWorld domain). We addiionally show ha he graphical model represenaion can be leveraged o incorporae domain knowledge (e.g., sae space facoring) o achieve significan speed-up in learning. 1 Inroducion Learning by imiaion is a general mechanism for rapidly acquiring new skills or behaviors in humans and robos. Several approaches o imiaion have previously been proposed (e.g., [1,]). Many of hese rea he problem of imiaion as rajecory-following where he goal is o follow he eacher s rajecory as bes as possible. However, imiaion ofen involves he need o infer inenions and goals which inroduces considerable uncerainy ino he problem, besides he uncerainy already exising in he observaion process and in he environmen. Previous models of imiaion have ypically no been probabilisic and are herefore no geared owards handling uncerainy. There have been some recen effors in modeling goal-based imiaion [3] bu hese eiher assume ha he dynamics of environmen are given or need o learn he dynamics using a ime-consuming exploraion sage. A differen approach o imiaion is based on ideas from he field of Reinforcemen Learning (RL) []. In reinforcemen learning, he agen is assumed o receive rewards in cerain saes and he agen s goal is o learn a sae-oacion mapping ( policy ) ha maximizes he oal fuure expeced reward. The compuaional challenge of solving RL problem is hard for a variey of reasons: (1) he sae space is ofen exponenial in he number of aribues, and () for J.N. Kok e al. (Eds.): ECML 7, LNAI 71, pp , 7. c Springer-Verlag Berlin Heidelberg 7
2 75 D. Verma and R.P.N. Rao uncerain environmens wih large sae spaces, he agen needs o perform a large amoun of exploraion o learn a model of he environmen before learning a good policy. These problems can be amelioraed by using imiaion [5] ( or appreniceship []) where a eacher exhibis he opimal behavior ha is observed by he suden or he eacher guides he suden o he mos imporan saes for exploraion. Price and Bouilier formulae his in he RL framework as Implici Imiaion [7], in which he suden learns he dynamics of he environmen by passively observing he eacher wihou any explici communicaion regarding wha acions o ake. This speeds up he learning of policies. However, hese approaches rely on knowing or inferring an explici reward funcion in he environmen, which may no always be available or easy o infer. In his paper, we propose a new approach o imiaion ha is based on probabilisic Graphical Models (GMs). We pose he problem of imiaion learning as learning he parameers of he underlying GM for he menor s and observer s behavior (we use he erms menor/eacher (and observer/suden) inerchangeably in he paper). To faciliae he ransfer of knowledge from menor o observer we ie he parameers of dynamics for he menor wih ha of he observer, and updae he observer s policy using he learned menor policy. Parameers are learned using he expecaion-maximizaion (EM) algorihm for learning in GMs from parial daa. Our approach provides a principled approach o imiaion based compleely on an inernal GM represenaion, allowing us o leverage he growing number of efficien inference and learning echniques for GMs. Graphical Models for Imiaion Noaion: We use capial leers for variables and small case leers o denoe specific insances. We assume here are wo agens, he observer A o and he menor A m operaing in he environmen 1.LeΩ S be he se of saes in he environmen and Ω A he se of all possible acions available o he agen (boh finie). A ime, he agen is in sae S and execues acion A. The agen s sae changes in a sochasic manner given by he ransiion probabiliy P (S +1 S,A ), which is assumed o be independen of, i.e., P (S +1 = s S = s, A = a) =τ s sa. When obvious from conex, we use s for S = s and a for A = a, ec. For each sae s and acion a, here is a real valued reward R m (s, a) for he menor (R o (s, a) for he observer) associaed wih being in sae s and execuing he acion a (wih negaive values denoing undesirable saes or he cos of he acion). The parameers described above define a Markov Decision Process (MDP) [9]. Solving an MDP ypically involves compuing an opimal policy a = π(s) ha maximizes oal expeced fuure reward (eiher a finie 1 We use he superscrip o disinguish he wo agens and omi i for common variables (e.g., dynamics of he environmen). For simpliciy of exposiion, we assume ha agens operae (non-ineracively) in he same environmen. However, as discussed in [], his assumpion is no essenial and one can apply he echniques discussed here o he more general seing where observer and menor(s) have differen acion and sae spaces.
3 Imiaion Learning Using Graphical Models 759 horizon cumulaive reward or discouned infinie horizon cumulaive reward) when acion a is execued in sae s. In a ypical Reinforcemen Learning problem, he dynamics and he reward funcion are no known, and one canno herefore compue an opimal policy direcly. One can learn boh hese funcions by exploraion bu his requires he agen o execue a large number of exploraion seps before an opimal policy can be compued. Learning can be grealy sped up via implici imiaion [7] which involves an agen (he observer) observing anoher agen (menor) who has similar goals.. The main idea is o allow he agen o quickly learn he parameers in he relevan porion of he sae space, hereby cuing down on he exploraion required o compue a near-opimal policy. We assume ha he menor follows a saionary policy π m (s) which defines is behavior compleely. The observer is only able o observe he sequence of saes ha menor has been in (S m 1:) andno he acions: hisisimporanbecause some of he mos useful forms of imiaion learning are hose in which he eacher s acions are no available, e.g., when a robo mus learn by waching a human in such a scenario, he robo can observe body poses bu has no access o he human s acions (muscle or moor commands). The ask of he observer is hen o compue he bes esimae of he dynamics ˆτ and menor policy ˆπ m, givenisownhisorys o 1:,Ao 1: and he menor s sae hisory Sm 1:.Noehaπm can be compleely independen of he observer s reward funcion R o : in fac, he problem as formulaed above does no require he inroducion of a reward funcion a all. The goal is simply o imiae he menor by esimaing and execuing he menor s policy. In he special case where he menor is opimizing he same reward funcion as he observer, π m becomes he opimal MDP policy. Noe ha since he observer canno see acions ha he menor ook and he ransiion parameers are no given, he problem is differen from oher approaches which speed up RL via imiaion [,1]..1 Generaive Graphical Model Boh he menor and he observer are solving an MDP. One key observaion we make is ha given he menor policy he acion choice and dynamics can be modeled easily using a generaive model based on he well-known graphical model for MDP shown in Fig. 1(a). One does no need o know he menor s reward model as π m compleely explains he menor sae sequence observed. The figure shows he -slice represenaion of he Dynamic Bayesian Nework (DBN) used o model he imiaion problem. Since we are assuming ha he wo agens are operaing in he same environmen, hey have he same ransiion parameers (τ m =τ o =τ). Noe ha he wo graphical models (for he menor and observer respecively) are disconneced as he wo agens are non-ineracing. The menor s acions are guided by he opimal menor policy P (A m = a S m = s) =π m (a s) and he observer s acions by he policy P (A o = a S m = s) = π o (a s). Unlike he menor, he observer updaes is policy over ime (hence he subscrip on π o ). We require only he menor o have a saionary policy. The menor observaions s m 1:T are generaed by sampling he DBN. In our
4 7 D. Verma and R.P.N. Rao S m S m +1 τ sas π m Menor S F1 G A m Tied parameers A m +1 S o τ sas S o +1 F3 π o Observer F A o (a) A o +1 (b) Fig. 1. Model and Domain for Imiaion. (a) Graphical Model Represenaion for Imiaion. (b) FlagWorld Domain. experimens, when a goal sae is reached, we jump o he sar sae in he nex sep. T hus represens he oal number of seps aken by agen, which could span muliple episodes of reaching a goal sae. 3 Imiaion Via Parameer Learning Our approach o imiaion is based on esimaing he unknown parameers θ = (τ,π m ) of he graphical model in Fig. 1(a) given observed daa as evidence, i.e., ˆθ =(ˆτ,ˆπ m )= argmax P (θ s m θ 1:T,so 1:T,ao 1:T ). Noe ha he evidence does no include menor acions A m 1:T. This means ha he daa is incomplee as no all nodes of he graphical model are observed. A well-known approach o learning he parameers of a GM from incomplee daa [11] is o use he expecaionmaximizaion (EM) algorihm [1]. Alhough any parameer learning mehod could be used, we use EM in he presen sudy since i is a general-purpose, well-undersood algorihm widely used in machine learning. The EM algorihm involves saring wih an iniial esimae θ (chosen randomly or incorporaing any prior knowledge) which is hen ieraively improved by performing he following wo seps: Expecaion: The curren se of parameers θ i is used o compue a disribuion (expecaion) over he hidden nodes: h(a m 1:T )=P(Am 1:T θi,s m 1:T,so 1:T,ao 1:T ). This allows he expeced sufficien saisics o be compued for he complee daa se. Maximizaion: The disribuion h is hen used o compue he new parameers θ i+1 which maximize he (expeced) log-likelihood of evidence: θ i+1 = argmax h(a m θ 1:T )log(p (sm 1:T,am 1:T,so 1:T,ao 1:T θ)) a 1:T When saes and acions are discree, he new esimae can be compued by simply using he expeced couns. The wo seps above are performed alernaively
5 Imiaion Learning Using Graphical Models 71 unil convergence. The mehod is guaraneed o improve performance in each ieraion in ha he incomplee log likelihood of daa (log P (s m 1:T,so 1:T,ao 1:T θi )) is guaraneed o increase in every ieraion and converge o a local maximum [1]. We hen use he esimae for ˆθ o conrol he observer. In paricular, he observer combines he learned menor policy ˆπ m wih an exploraion sraegy o arrive a he policy π o. 3.1 Parameer Learning Resuls Domain: We esed our resuls on a benchmark problem known as he Flag- World domain [13] shown in Fig. 1(b). The agen s objecive is o reach he goal sae G saring from he sae S and pick up a subse of he hree flags locaed a saes F 1, F andf3. I receives a reward of 1 poin for each flag picked up bu rewards are discouned by a facor of γ =.99 a each ime sep unil he goal is reached; he laer consrain favors shores pahs o goal. The environmen is a sandard maze environmen used in RL [] in ha each acion (N,E,S,W) akes he agen o he inended sae wih a high probabiliy (.9) and o a sae perpendicular o he inended sae wih a small probabiliy (.1). The probabiliy mass going ino he wall or ouside he maze is assigned o he sae in which acion aken. This domain is ineresing in ha here are saes (33 locaions, augmened wih a boolean aribue for each flag picked), resuling in a large number of parameers ha needs o be learned ( sae acion pairs for which τ(s, a, :) and π m (a s) needs o be learned). However, he opimal policy pah is sparse and hence only a small subse of parameers needs o be learned o compue a near-opimal policy, hereby making i ideal for demonsraing he uiliy of imiaion as a medium o speed up RL. Exploraion versus Exploiaion: We used he ɛ greedy mehod o radeoff exploraion of he domain wih exploiaion of he curren learned policy: a random acion is chosen wih probabiliy ɛ, wihɛ gradually decreased over ime o favor exploraion iniially and exploiaion of he learned policy in laer ime seps. Resuls: The resuls of EM-based learning are shown in Fig (a) (averaged over 5 runs). The parameers were learned in a bach mode where T was increased from o 5 in seps of and reward in he las seps was repored. Average reward received is shown in op righ corner. Also shown are he Error in parameers (mean absolue difference w.r.. rue parameers 3 ), he log-likelihood of he learned parameers and value funcion of sar sae under he curren esimae for observer policy Vˆπ o(s) w.r. he rue ransiion parameers. The resuls show ha he observer is able o learn he menor policy o a high degree of accuracy, hough no perfecly. The uncerain dynamics of he environmen leads i o collec less rewards han he menor as he opimal policy is no learned everywhere. An imporan poin o noe is ha he error in 3 The error beween uniformly random parameers and rue parameers is 1.5 for π m and 1.75 for τ.
6 7 D. Verma and R.P.N. Rao Average Error (Mean Abs dis from rue) Average Log likelihood (per sep) Error in Learn Parameers ransiion policy Log likelihood of learn parameers Training Tes 5 (a) Reward obained by wo agens in las seps (5 Runs) 1 Reward 1 1 Menor (Oracle) Observer Value Funcion of Sar Sae of learn observer policy Value V(S) for Obs Opimal V(S) (b) Fig.. Imiaion Learning Resuls for FlagWorld Domain. (a) (Clockwise) Error in parameers (mean absolue difference w.r.. rue parameers), average reward received, he log-likelihood of he learned parameers, and value funcion of sar sae Vˆπ o(s) w.r. he rue ransiion parameers. (b) Comparison of learned policy (ParamImi) wih some popular exploraion echniques (measured in erms of average discouned reward obained per seps). ParamImi ouperforms all he pure exploraion-based mehods. parameers is sill quie high even when observer policy is quie good, hereby confirming he inuiion ha only a small (relevan) subse of parameers needs o be learned well before he agen can sar exploiing a learned policy. Figure (b) compares he relaive qualiy of he learned policy wih a number of pure exploraion-based echniques used in [13]. The bars represen he average discouned reward obained per seps in he nd sage, i.e., obained in nex, seps afer an iniial 1s sage of exploraion consising of, seps. For ParamImi (our algorihm) he average is aken afer only seps of exploraion. The righmos bar is he Menor value. As can be seen, ParamImi ouperforms all he exploraion sraegies wih far less experience. 3. Facored Graphical Model A major advanage of using a graphical models-based approach o imiaion is he abiliy o leverage domain knowledge o speed up learning. For example, he number of rue parameers in he FlagWorld is acually much less han he number ha was learned in he previous secion since here are only 33 locaions for which he ransiion parameers need o be learned: he dynamics are he same irrespecive of which flags have been picked up. To reflec his fac, we can facor he menor sae S m ino locaion L m and flag saus variable Picked Flag PF m as shown in Fig. 3(a) (and similarly for he observer). This reduces he number of ransiion parameers significanly (from τ sas o τ lal ).
7 Imiaion Learning Using Graphical Models 73 We can incorporae domain knowledge abou he flags by defining he CPT P (PF +1 L +1,PF )ashe, P (PF +1 L +1,PF )=δ(pf +1,pf(PF,i)) if L +1 = Fi = δ(pf +1,PF ) oherwise where pf(pf,i)ishedeerminsic funcion which maps he old value of PF o one in which he i h flag is picked up. L m PF m π m l,pf τ lal A m (a) L m +1 PF m +1 A m +1 Average Error (Mean Abs dis from rue) Average Log likelihood (per sep) Error in Learn Parameers ransiion policy Log likelihood of learn parameers Training Tes 5 Reward obained by wo agens in las seps (5 Runs) 1 Reward 1 1 Menor (Oracle) Observer Value Funcion of Sar Sae of learn observer policy Value (b) V(S) for Obs Opimal V(S) Fig. 3. Fas Learning using Facored Graphical Models. (a) Facored model for FlagWorld (only he menor model is shown). (b) Resuls using facored model. Noe he speed-up in learning w.r.. he unfacored case (Fig. (a)). The resuls of EM-based parameer learning for he facored graphical model are shown in Fig. 3(b). As expeced, he error in ransiion parameers goes down much more rapidly han in he unfacored case (compare wih Fig. (a)). Conclusion This paper inroduces a new framework for learning by imiaion based on modeling he imiaion process in erms of probabilisic graphical models. Imiaive policies are learned in a principled manner using he expecaion-maximizaion (EM) algorihm. The model achieves ransfer of knowledge by ying he parameers for he menor s dynamics wih hose of he observer. Our resuls 5 demonsrae ha he menor s policy can be esimaed direcly from observaions of This is a common rick used in GMs o encode deerminisic domain knowledge. 5 Addiional resuls are presened in he exended version of he paper available a hp://neural.cs.washingon.edu/. In paricular, we show how learning can be furher sped up by incorporaing reward informaion colleced on he way. Also, we demonsrae he generaliy of parameer learning by exending he graphical model o learn ask-oriened policies.
8 7 D. Verma and R.P.N. Rao he menor s sae sequences and ha significan speed-up in learning can be achieved by exploiing he graphical models framework o facor he sae space in accordance wih domain knowledge. Our curren work is focused on esing he approach more exhausively, especially in he conex of roboic imiaion. No only do Graphical Models provide a compuaionally efficien framework for general imiaion, hey are also being used for modeling behavior [1]. An exciing prospec of using graphical models for imiaion is he ease of exension o models wih more absracion, including parially observable, hierarchical, and relaional models. Acknowledgmens This maerial is based upon work suppored by ONR, he Packard Foundaion, and NSF Grans and 5. References 1. Schaal, S.: Is imiaion learning he roue o humanoid robos? Trends in Cogniive Sciences 3, 33 (1999). Dauenhahn, K., Nehaniv, C.: Imiaion in Animals and Arifacs. MIT Press, Cambridge, MA () 3. Verma, D., Rao, R.P.N.: Goal-based imiaion as probabilisic inference over graphical models. In: NIPS 1 (). Suon, R.S., Baro, A.: Reinforcemen Learning: An Inroducion. MIT Press, Cambridge, MA (199) 5. Akeson, C.G., Schaal, S.: Robo learning from demonsraion. In: Proc. 1h ICML, pp. 1 (1997). Abbeel, P., Ng, A.Y.: Appreniceship learning via inverse reinforcemen learning. In: ICML, pp. 1 () 7. Price, B., Bouilier, C.: Acceleraing reinforcemen learning hrough implici imiaion. JAIR 19, 59 9 (3). Price, B., Bouilier, C.: A bayesian approach o imiaion in reinforcemen learning. In: IJCAI, pp (3) 9. Bouilier, C., Dean, T., Hanks, S.: Decision-heoreic planning: Srucural assumpions and compuaional leverage. JAIR 11, 1 9 (1999) 1. Raliff, N.D., Bagnell, J.A., Zinkevich, M.A.: Maximum margin planning. In: ICML, pp () 11. Heckerman, D.: A uorial on learning wih bayesian neworks. Technical repor, Microsof Research, Redmond, Washingon (1995) 1. Dempser, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplee daa via he EM algorihm. Journal of he Royal Saisical Sociey, Series B 39, 1 3 (1977) 13. Dearden, R., Friedman, N., Andre, D.: Model-based Bayesian Exploraion. In: UAI- 99, San Francisco, CA, pp (1999) 1. Griffihs, T.L., Tenenbaum, J.B.: Srucure and srengh in causal inducion. Cogniive Psychology 51(), 33 3 (5)
Neural Network Model of the Backpropagation Algorithm
Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák
More informationAn Effiecient Approach for Resource Auto-Scaling in Cloud Environments
Inernaional Journal of Elecrical and Compuer Engineering (IJECE) Vol. 6, No. 5, Ocober 2016, pp. 2415~2424 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i5.10639 2415 An Effiecien Approach for Resource Auo-Scaling
More informationFast Multi-task Learning for Query Spelling Correction
Fas Muli-ask Learning for Query Spelling Correcion Xu Sun Dep. of Saisical Science Cornell Universiy Ihaca, NY 14853 xusun@cornell.edu Anshumali Shrivasava Dep. of Compuer Science Cornell Universiy Ihaca,
More informationMore Accurate Question Answering on Freebase
More Accurae Quesion Answering on Freebase Hannah Bas, Elmar Haussmann Deparmen of Compuer Science Universiy of Freiburg 79110 Freiburg, Germany {bas, haussmann}@informaik.uni-freiburg.de ABSTRACT Real-world
More informationMyLab & Mastering Business
MyLab & Masering Business Efficacy Repor 2013 MyLab & Masering: Business Efficacy Repor 2013 Edied by Michelle D. Speckler 2013 Pearson MyAccouningLab, MyEconLab, MyFinanceLab, MyMarkeingLab, and MyOMLab
More informationInformation Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports
Downloaded from ascelibrary.org by Basil Sephanis on 07/13/16. Copyrigh ASCE. For personal use only; all righs reserved. Informaion Propagaion for informing Special Populaion Subgroups abou New Ground
More information1 Language universals
AS LX 500 Topics: Language Uniersals Fall 2010, Sepember 21 4a. Anisymmery 1 Language uniersals Subjec-erb agreemen and order Bach (1971) discusses wh-quesions across SO and SO languages, hypohesizing:...
More informationChannel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices
Z. Zhang e al.: Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion in Hands-Free Voice Conrolled Devices 525 Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationLecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationReinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationTransfer Learning Action Models by Measuring the Similarity of Different Domains
Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn
More informationA Case-Based Approach To Imitation Learning in Robotic Agents
A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu
More informationRegret-based Reward Elicitation for Markov Decision Processes
444 REGAN & BOUTILIER UAI 2009 Regret-based Reward Elicitation for Markov Decision Processes Kevin Regan Department of Computer Science University of Toronto Toronto, ON, CANADA kmregan@cs.toronto.edu
More informationSpeeding Up Reinforcement Learning with Behavior Transfer
Speeding Up Reinforcement Learning with Behavior Transfer Matthew E. Taylor and Peter Stone Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 {mtaylor, pstone}@cs.utexas.edu
More informationHigh-level Reinforcement Learning in Strategy Games
High-level Reinforcement Learning in Strategy Games Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Guy Shani Department of Computer
More informationImproving Action Selection in MDP s via Knowledge Transfer
In Proc. 20th National Conference on Artificial Intelligence (AAAI-05), July 9 13, 2005, Pittsburgh, USA. Improving Action Selection in MDP s via Knowledge Transfer Alexander A. Sherstov and Peter Stone
More informationEfficient Use of Space Over Time Deployment of the MoreSpace Tool
Efficient Use of Space Over Time Deployment of the MoreSpace Tool Štefan Emrich Dietmar Wiegand Felix Breitenecker Marijana Srećković Alexandra Kovacs Shabnam Tauböck Martin Bruckner Benjamin Rozsenich
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationGo fishing! Responsibility judgments when cooperation breaks down
Go fishing! Responsibility judgments when cooperation breaks down Kelsey Allen (krallen@mit.edu), Julian Jara-Ettinger (jjara@mit.edu), Tobias Gerstenberg (tger@mit.edu), Max Kleiman-Weiner (maxkw@mit.edu)
More informationExploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationLearning Human Utility from Video Demonstrations for Deductive Planning in Robotics
Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics Nishant Shukla, Yunzhong He, Frank Chen, and Song-Chun Zhu Center for Vision, Cognition, Learning, and Autonomy University
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationVisual CP Representation of Knowledge
Visual CP Representation of Knowledge Heather D. Pfeiffer and Roger T. Hartley Department of Computer Science New Mexico State University Las Cruces, NM 88003-8001, USA email: hdp@cs.nmsu.edu and rth@cs.nmsu.edu
More informationFinding Your Friends and Following Them to Where You Are
Finding Your Friends and Following Them to Where You Are Adam Sadilek Dept. of Computer Science University of Rochester Rochester, NY, USA sadilek@cs.rochester.edu Henry Kautz Dept. of Computer Science
More informationISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM
Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationSemi-Supervised Face Detection
Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University
More informationRobot Learning Simultaneously a Task and How to Interpret Human Instructions
Robot Learning Simultaneously a Task and How to Interpret Human Instructions Jonathan Grizou, Manuel Lopes, Pierre-Yves Oudeyer To cite this version: Jonathan Grizou, Manuel Lopes, Pierre-Yves Oudeyer.
More informationRajesh P. N. Rao, Aaron P. Shon and Andrew N. Meltzoff
11 A Bayesian model of imitation in infants and robots Rajesh P. N. Rao, Aaron P. Shon and Andrew N. Meltzoff 11.1 Introduction Humans are often characterized as the most behaviourally flexible of all
More informationActive Learning. Yingyu Liang Computer Sciences 760 Fall
Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,
More informationFF+FPG: Guiding a Policy-Gradient Planner
FF+FPG: Guiding a Policy-Gradient Planner Olivier Buffet LAAS-CNRS University of Toulouse Toulouse, France firstname.lastname@laas.fr Douglas Aberdeen National ICT australia & The Australian National University
More informationComparison of network inference packages and methods for multiple networks inference
Comparison of network inference packages and methods for multiple networks inference Nathalie Villa-Vialaneix http://www.nathalievilla.org nathalie.villa@univ-paris1.fr 1ères Rencontres R - BoRdeaux, 3
More informationData Integration through Clustering and Finding Statistical Relations - Validation of Approach
Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego
More informationPredicting Students Performance with SimStudent: Learning Cognitive Skills from Observation
School of Computer Science Human-Computer Interaction Institute Carnegie Mellon University Year 2007 Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation Noboru Matsuda
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationA Pilot Study on Pearson s Interactive Science 2011 Program
Final Report A Pilot Study on Pearson s Interactive Science 2011 Program Prepared by: Danielle DuBose, Research Associate Miriam Resendez, Senior Researcher Dr. Mariam Azin, President Submitted on August
More informationMathematics subject curriculum
Mathematics subject curriculum Dette er ei omsetjing av den fastsette læreplanteksten. Læreplanen er fastsett på Nynorsk Established as a Regulation by the Ministry of Education and Research on 24 June
More informationTask Completion Transfer Learning for Reward Inference
Machine Learning for Interactive Systems: Papers from the AAAI-14 Workshop Task Completion Transfer Learning for Reward Inference Layla El Asri 1,2, Romain Laroche 1, Olivier Pietquin 3 1 Orange Labs,
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationLearning Prospective Robot Behavior
Learning Prospective Robot Behavior Shichao Ou and Rod Grupen Laboratory for Perceptual Robotics Computer Science Department University of Massachusetts Amherst {chao,grupen}@cs.umass.edu Abstract This
More informationDublin City Schools Mathematics Graded Course of Study GRADE 4
I. Content Standard: Number, Number Sense and Operations Standard Students demonstrate number sense, including an understanding of number systems and reasonable estimates using paper and pencil, technology-supported
More informationImproving Conceptual Understanding of Physics with Technology
INTRODUCTION Improving Conceptual Understanding of Physics with Technology Heidi Jackman Research Experience for Undergraduates, 1999 Michigan State University Advisors: Edwin Kashy and Michael Thoennessen
More informationContinual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots
Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Varun Raj Kompella, Marijn Stollenga, Matthew Luciw, Juergen Schmidhuber The Swiss AI Lab IDSIA, USI
More informationTruth Inference in Crowdsourcing: Is the Problem Solved?
Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationLearning Rules from Incomplete Examples via Implicit Mention Models
JMLR: Workshop and Conference Proceedings 20 (2011) 197 212 Asian Conference on Machine Learning Learning Rules from Incomplete Examples via Implicit Mention Models Janardhan Rao Doppa Mohammad Shahed
More informationWhat s in a Step? Toward General, Abstract Representations of Tutoring System Log Data
What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data Kurt VanLehn 1, Kenneth R. Koedinger 2, Alida Skogsholm 2, Adaeze Nwaigwe 2, Robert G.M. Hausmann 1, Anders Weinstein
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationCHANCERY SMS 5.0 STUDENT SCHEDULING
CHANCERY SMS 5.0 STUDENT SCHEDULING PARTICIPANT WORKBOOK VERSION: 06/04 CSL - 12148 Student Scheduling Chancery SMS 5.0 : Student Scheduling... 1 Course Objectives... 1 Course Agenda... 1 Topic 1: Overview
More informationA Game-based Assessment of Children s Choices to Seek Feedback and to Revise
A Game-based Assessment of Children s Choices to Seek Feedback and to Revise Maria Cutumisu, Kristen P. Blair, Daniel L. Schwartz, Doris B. Chin Stanford Graduate School of Education Please address all
More informationAn Estimating Method for IT Project Expected Duration Oriented to GERT
An Estimating Method for IT Project Expected Duration Oriented to GERT Li Yu and Meiyun Zuo School of Information, Renmin University of China, Beijing 100872, P.R. China buaayuli@mc.e(iuxn zuomeiyun@263.nct
More informationAn investigation of imitation learning algorithms for structured prediction
JMLR: Workshop and Conference Proceedings 24:143 153, 2012 10th European Workshop on Reinforcement Learning An investigation of imitation learning algorithms for structured prediction Andreas Vlachos Computer
More informationA Bayesian Model of Imitation in Infants and Robots
To appear in: Imitation and Social Learning in Robots, Humans, and Animals: Behavioural, Social and Communicative Dimensions, K. Dautenhahn and C. Nehaniv (eds.), Cambridge University Press, 2004. A Bayesian
More informationAUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationGraphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task
Graphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task Beate Grawemeyer and Richard Cox Representation & Cognition Group, Department of Informatics, University
More informationA Reinforcement Learning Variant for Control Scheduling
A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationAn Evaluation of E-Resources in Academic Libraries in Tamil Nadu
An Evaluation of E-Resources in Academic Libraries in Tamil Nadu 1 S. Dhanavandan, 2 M. Tamizhchelvan 1 Assistant Librarian, 2 Deputy Librarian Gandhigram Rural Institute - Deemed University, Gandhigram-624
More informationBAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass
BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,
More informationRule discovery in Web-based educational systems using Grammar-Based Genetic Programming
Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationTask Completion Transfer Learning for Reward Inference
Task Completion Transfer Learning for Reward Inference Layla El Asri 1,2, Romain Laroche 1, Olivier Pietquin 3 1 Orange Labs, Issy-les-Moulineaux, France 2 UMI 2958 (CNRS - GeorgiaTech), France 3 University
More informationSession Six: Software Evaluation Rubric Collaborators: Susan Ferdon and Steve Poast
EDTECH 554 (FA10) Susan Ferdon Session Six: Software Evaluation Rubric Collaborators: Susan Ferdon and Steve Poast Task The principal at your building is aware you are in Boise State's Ed Tech Master's
More informationSeminar - Organic Computing
Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts
More informationDiscriminative Learning of Beam-Search Heuristics for Planning
Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University
More informationSan Francisco County Weekly Wages
San Francisco County Weekly Wages Focus on Post-Recession Recovery Q 3 205 Update Produced by: Marin Economic Consulting March 6, 206 Jon Haveman, Principal 45-336-5705 or Jon@MarinEconomicConsulting.com
More informationRedirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design
Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design Burton Levine Karol Krotki NISS/WSS Workshop on Inference from Nonprobability Samples September 25, 2017 RTI
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationCollege Pricing and Income Inequality
College Pricing and Income Inequality Zhifeng Cai U of Minnesota, Rutgers University, and FRB Minneapolis Jonathan Heathcote FRB Minneapolis NBER Income Distribution, July 20, 2017 The views expressed
More informationApproaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque
Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically
More informationBootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition
Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition Tom Y. Ouyang * MIT CSAIL ouyang@csail.mit.edu Yang Li Google Research yangli@acm.org ABSTRACT Personal
More informationAMULTIAGENT system [1] can be defined as a group of
156 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008 A Comprehensive Survey of Multiagent Reinforcement Learning Lucian Buşoniu, Robert Babuška,
More informationLEt s GO! Workshop Creativity with Mockups of Locations
LEt s GO! Workshop Creativity with Mockups of Locations Tobias Buschmann Iversen 1,2, Andreas Dypvik Landmark 1,3 1 Norwegian University of Science and Technology, Department of Computer and Information
More informationCued Recall From Image and Sentence Memory: A Shift From Episodic to Identical Elements Representation
Journal of Experimental Psychology: Learning, Memory, and Cognition 2006, Vol. 32, No. 4, 734 748 Copyright 2006 by the American Psychological Association 0278-7393/06/$12.00 DOI: 10.1037/0278-7393.32.4.734
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationDifferent Requirements Gathering Techniques and Issues. Javaria Mushtaq
835 Different Requirements Gathering Techniques and Issues Javaria Mushtaq Abstract- Project management is now becoming a very important part of our software industries. To handle projects with success
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationOn the Combined Behavior of Autonomous Resource Management Agents
On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science
More informationCOMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR
COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The
More informationReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology
ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology Tiancheng Zhao CMU-LTI-16-006 Language Technologies Institute School of Computer Science Carnegie Mellon
More informationCSC200: Lecture 4. Allan Borodin
CSC200: Lecture 4 Allan Borodin 1 / 22 Announcements My apologies for the tutorial room mixup on Wednesday. The room SS 1088 is only reserved for Fridays and I forgot that. My office hours: Tuesdays 2-4
More informationA Model of Knower-Level Behavior in Number Concept Development
Cognitive Science 34 (2010) 51 67 Copyright Ó 2009 Cognitive Science Society, Inc. All rights reserved. ISSN: 0364-0213 print / 1551-6709 online DOI: 10.1111/j.1551-6709.2009.01063.x A Model of Knower-Level
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationDistant Supervised Relation Extraction with Wikipedia and Freebase
Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational
More informationACTIVITY: Comparing Combination Locks
5.4 Compound Events outcomes of one or more events? ow can you find the number of possible ACIVIY: Comparing Combination Locks Work with a partner. You are buying a combination lock. You have three choices.
More informationGuru: A Computer Tutor that Models Expert Human Tutors
Guru: A Computer Tutor that Models Expert Human Tutors Andrew Olney 1, Sidney D'Mello 2, Natalie Person 3, Whitney Cade 1, Patrick Hays 1, Claire Williams 1, Blair Lehman 1, and Art Graesser 1 1 University
More informationTaking Kids into Programming (Contests) with Scratch
Olympiads in Informatics, 2009, Vol. 3, 17 25 17 2009 Institute of Mathematics and Informatics, Vilnius Taking Kids into Programming (Contests) with Scratch Abdulrahman IDLBI Syrian Olympiad in Informatics,
More informationAn Introduction to Simio for Beginners
An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality
More informationAcquiring Competence from Performance Data
Acquiring Competence from Performance Data Online learnability of OT and HG with simulated annealing Tamás Biró ACLC, University of Amsterdam (UvA) Computational Linguistics in the Netherlands, February
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationCorrective Feedback and Persistent Learning for Information Extraction
Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,
More informationMath Hunt th November, Sodalitas de Mathematica St. Xavier s College, Maitighar Kathmandu, Nepal
Math Hunt-2017 11 th November, 2017 Sodalitas de Mathematica St. Xavier s College, Maitighar Kathmandu, Nepal SODALITAS DE MATHEMATICA To, Subject: Regarding Participation in Math Hunt-2017 Respected Sir/Madam,
More informationM55205-Mastering Microsoft Project 2016
M55205-Mastering Microsoft Project 2016 Course Number: M55205 Category: Desktop Applications Duration: 3 days Certification: Exam 70-343 Overview This three-day, instructor-led course is intended for individuals
More information