Imitation Learning Using Graphical Models

Size: px
Start display at page:

Download "Imitation Learning Using Graphical Models"

Transcription

1 Imiaion Learning Using Graphical Models Deepak Verma and Rajesh P.N. Rao Dep. of Compuer Science & Engineering Universiy of Washingon Seale, WA, USA hp://neural.cs.washingon.edu/ Absrac. Imiaion-based learning is a general mechanism for rapid acquisiion of new behaviors in auonomous agens and robos. In his paper, we propose a new approach o learning by imiaion based on parameer learning in probabilisic graphical models. Graphical models are used no only o model an agen s own dynamics bu also he dynamics of an observed eacher. Parameer ying beween he agen-eacher models ensures consisency and faciliaes learning. Given only observaions of he eacher s saes, we use he expecaion-maximizaion (EM) algorihm o learn boh dynamics and policies wihin graphical models. We presen resuls demonsraing ha EM-based imiaion learning ouperforms pure exploraion-based learning on a benchmark problem (he FlagWorld domain). We addiionally show ha he graphical model represenaion can be leveraged o incorporae domain knowledge (e.g., sae space facoring) o achieve significan speed-up in learning. 1 Inroducion Learning by imiaion is a general mechanism for rapidly acquiring new skills or behaviors in humans and robos. Several approaches o imiaion have previously been proposed (e.g., [1,]). Many of hese rea he problem of imiaion as rajecory-following where he goal is o follow he eacher s rajecory as bes as possible. However, imiaion ofen involves he need o infer inenions and goals which inroduces considerable uncerainy ino he problem, besides he uncerainy already exising in he observaion process and in he environmen. Previous models of imiaion have ypically no been probabilisic and are herefore no geared owards handling uncerainy. There have been some recen effors in modeling goal-based imiaion [3] bu hese eiher assume ha he dynamics of environmen are given or need o learn he dynamics using a ime-consuming exploraion sage. A differen approach o imiaion is based on ideas from he field of Reinforcemen Learning (RL) []. In reinforcemen learning, he agen is assumed o receive rewards in cerain saes and he agen s goal is o learn a sae-oacion mapping ( policy ) ha maximizes he oal fuure expeced reward. The compuaional challenge of solving RL problem is hard for a variey of reasons: (1) he sae space is ofen exponenial in he number of aribues, and () for J.N. Kok e al. (Eds.): ECML 7, LNAI 71, pp , 7. c Springer-Verlag Berlin Heidelberg 7

2 75 D. Verma and R.P.N. Rao uncerain environmens wih large sae spaces, he agen needs o perform a large amoun of exploraion o learn a model of he environmen before learning a good policy. These problems can be amelioraed by using imiaion [5] ( or appreniceship []) where a eacher exhibis he opimal behavior ha is observed by he suden or he eacher guides he suden o he mos imporan saes for exploraion. Price and Bouilier formulae his in he RL framework as Implici Imiaion [7], in which he suden learns he dynamics of he environmen by passively observing he eacher wihou any explici communicaion regarding wha acions o ake. This speeds up he learning of policies. However, hese approaches rely on knowing or inferring an explici reward funcion in he environmen, which may no always be available or easy o infer. In his paper, we propose a new approach o imiaion ha is based on probabilisic Graphical Models (GMs). We pose he problem of imiaion learning as learning he parameers of he underlying GM for he menor s and observer s behavior (we use he erms menor/eacher (and observer/suden) inerchangeably in he paper). To faciliae he ransfer of knowledge from menor o observer we ie he parameers of dynamics for he menor wih ha of he observer, and updae he observer s policy using he learned menor policy. Parameers are learned using he expecaion-maximizaion (EM) algorihm for learning in GMs from parial daa. Our approach provides a principled approach o imiaion based compleely on an inernal GM represenaion, allowing us o leverage he growing number of efficien inference and learning echniques for GMs. Graphical Models for Imiaion Noaion: We use capial leers for variables and small case leers o denoe specific insances. We assume here are wo agens, he observer A o and he menor A m operaing in he environmen 1.LeΩ S be he se of saes in he environmen and Ω A he se of all possible acions available o he agen (boh finie). A ime, he agen is in sae S and execues acion A. The agen s sae changes in a sochasic manner given by he ransiion probabiliy P (S +1 S,A ), which is assumed o be independen of, i.e., P (S +1 = s S = s, A = a) =τ s sa. When obvious from conex, we use s for S = s and a for A = a, ec. For each sae s and acion a, here is a real valued reward R m (s, a) for he menor (R o (s, a) for he observer) associaed wih being in sae s and execuing he acion a (wih negaive values denoing undesirable saes or he cos of he acion). The parameers described above define a Markov Decision Process (MDP) [9]. Solving an MDP ypically involves compuing an opimal policy a = π(s) ha maximizes oal expeced fuure reward (eiher a finie 1 We use he superscrip o disinguish he wo agens and omi i for common variables (e.g., dynamics of he environmen). For simpliciy of exposiion, we assume ha agens operae (non-ineracively) in he same environmen. However, as discussed in [], his assumpion is no essenial and one can apply he echniques discussed here o he more general seing where observer and menor(s) have differen acion and sae spaces.

3 Imiaion Learning Using Graphical Models 759 horizon cumulaive reward or discouned infinie horizon cumulaive reward) when acion a is execued in sae s. In a ypical Reinforcemen Learning problem, he dynamics and he reward funcion are no known, and one canno herefore compue an opimal policy direcly. One can learn boh hese funcions by exploraion bu his requires he agen o execue a large number of exploraion seps before an opimal policy can be compued. Learning can be grealy sped up via implici imiaion [7] which involves an agen (he observer) observing anoher agen (menor) who has similar goals.. The main idea is o allow he agen o quickly learn he parameers in he relevan porion of he sae space, hereby cuing down on he exploraion required o compue a near-opimal policy. We assume ha he menor follows a saionary policy π m (s) which defines is behavior compleely. The observer is only able o observe he sequence of saes ha menor has been in (S m 1:) andno he acions: hisisimporanbecause some of he mos useful forms of imiaion learning are hose in which he eacher s acions are no available, e.g., when a robo mus learn by waching a human in such a scenario, he robo can observe body poses bu has no access o he human s acions (muscle or moor commands). The ask of he observer is hen o compue he bes esimae of he dynamics ˆτ and menor policy ˆπ m, givenisownhisorys o 1:,Ao 1: and he menor s sae hisory Sm 1:.Noehaπm can be compleely independen of he observer s reward funcion R o : in fac, he problem as formulaed above does no require he inroducion of a reward funcion a all. The goal is simply o imiae he menor by esimaing and execuing he menor s policy. In he special case where he menor is opimizing he same reward funcion as he observer, π m becomes he opimal MDP policy. Noe ha since he observer canno see acions ha he menor ook and he ransiion parameers are no given, he problem is differen from oher approaches which speed up RL via imiaion [,1]..1 Generaive Graphical Model Boh he menor and he observer are solving an MDP. One key observaion we make is ha given he menor policy he acion choice and dynamics can be modeled easily using a generaive model based on he well-known graphical model for MDP shown in Fig. 1(a). One does no need o know he menor s reward model as π m compleely explains he menor sae sequence observed. The figure shows he -slice represenaion of he Dynamic Bayesian Nework (DBN) used o model he imiaion problem. Since we are assuming ha he wo agens are operaing in he same environmen, hey have he same ransiion parameers (τ m =τ o =τ). Noe ha he wo graphical models (for he menor and observer respecively) are disconneced as he wo agens are non-ineracing. The menor s acions are guided by he opimal menor policy P (A m = a S m = s) =π m (a s) and he observer s acions by he policy P (A o = a S m = s) = π o (a s). Unlike he menor, he observer updaes is policy over ime (hence he subscrip on π o ). We require only he menor o have a saionary policy. The menor observaions s m 1:T are generaed by sampling he DBN. In our

4 7 D. Verma and R.P.N. Rao S m S m +1 τ sas π m Menor S F1 G A m Tied parameers A m +1 S o τ sas S o +1 F3 π o Observer F A o (a) A o +1 (b) Fig. 1. Model and Domain for Imiaion. (a) Graphical Model Represenaion for Imiaion. (b) FlagWorld Domain. experimens, when a goal sae is reached, we jump o he sar sae in he nex sep. T hus represens he oal number of seps aken by agen, which could span muliple episodes of reaching a goal sae. 3 Imiaion Via Parameer Learning Our approach o imiaion is based on esimaing he unknown parameers θ = (τ,π m ) of he graphical model in Fig. 1(a) given observed daa as evidence, i.e., ˆθ =(ˆτ,ˆπ m )= argmax P (θ s m θ 1:T,so 1:T,ao 1:T ). Noe ha he evidence does no include menor acions A m 1:T. This means ha he daa is incomplee as no all nodes of he graphical model are observed. A well-known approach o learning he parameers of a GM from incomplee daa [11] is o use he expecaionmaximizaion (EM) algorihm [1]. Alhough any parameer learning mehod could be used, we use EM in he presen sudy since i is a general-purpose, well-undersood algorihm widely used in machine learning. The EM algorihm involves saring wih an iniial esimae θ (chosen randomly or incorporaing any prior knowledge) which is hen ieraively improved by performing he following wo seps: Expecaion: The curren se of parameers θ i is used o compue a disribuion (expecaion) over he hidden nodes: h(a m 1:T )=P(Am 1:T θi,s m 1:T,so 1:T,ao 1:T ). This allows he expeced sufficien saisics o be compued for he complee daa se. Maximizaion: The disribuion h is hen used o compue he new parameers θ i+1 which maximize he (expeced) log-likelihood of evidence: θ i+1 = argmax h(a m θ 1:T )log(p (sm 1:T,am 1:T,so 1:T,ao 1:T θ)) a 1:T When saes and acions are discree, he new esimae can be compued by simply using he expeced couns. The wo seps above are performed alernaively

5 Imiaion Learning Using Graphical Models 71 unil convergence. The mehod is guaraneed o improve performance in each ieraion in ha he incomplee log likelihood of daa (log P (s m 1:T,so 1:T,ao 1:T θi )) is guaraneed o increase in every ieraion and converge o a local maximum [1]. We hen use he esimae for ˆθ o conrol he observer. In paricular, he observer combines he learned menor policy ˆπ m wih an exploraion sraegy o arrive a he policy π o. 3.1 Parameer Learning Resuls Domain: We esed our resuls on a benchmark problem known as he Flag- World domain [13] shown in Fig. 1(b). The agen s objecive is o reach he goal sae G saring from he sae S and pick up a subse of he hree flags locaed a saes F 1, F andf3. I receives a reward of 1 poin for each flag picked up bu rewards are discouned by a facor of γ =.99 a each ime sep unil he goal is reached; he laer consrain favors shores pahs o goal. The environmen is a sandard maze environmen used in RL [] in ha each acion (N,E,S,W) akes he agen o he inended sae wih a high probabiliy (.9) and o a sae perpendicular o he inended sae wih a small probabiliy (.1). The probabiliy mass going ino he wall or ouside he maze is assigned o he sae in which acion aken. This domain is ineresing in ha here are saes (33 locaions, augmened wih a boolean aribue for each flag picked), resuling in a large number of parameers ha needs o be learned ( sae acion pairs for which τ(s, a, :) and π m (a s) needs o be learned). However, he opimal policy pah is sparse and hence only a small subse of parameers needs o be learned o compue a near-opimal policy, hereby making i ideal for demonsraing he uiliy of imiaion as a medium o speed up RL. Exploraion versus Exploiaion: We used he ɛ greedy mehod o radeoff exploraion of he domain wih exploiaion of he curren learned policy: a random acion is chosen wih probabiliy ɛ, wihɛ gradually decreased over ime o favor exploraion iniially and exploiaion of he learned policy in laer ime seps. Resuls: The resuls of EM-based learning are shown in Fig (a) (averaged over 5 runs). The parameers were learned in a bach mode where T was increased from o 5 in seps of and reward in he las seps was repored. Average reward received is shown in op righ corner. Also shown are he Error in parameers (mean absolue difference w.r.. rue parameers 3 ), he log-likelihood of he learned parameers and value funcion of sar sae under he curren esimae for observer policy Vˆπ o(s) w.r. he rue ransiion parameers. The resuls show ha he observer is able o learn he menor policy o a high degree of accuracy, hough no perfecly. The uncerain dynamics of he environmen leads i o collec less rewards han he menor as he opimal policy is no learned everywhere. An imporan poin o noe is ha he error in 3 The error beween uniformly random parameers and rue parameers is 1.5 for π m and 1.75 for τ.

6 7 D. Verma and R.P.N. Rao Average Error (Mean Abs dis from rue) Average Log likelihood (per sep) Error in Learn Parameers ransiion policy Log likelihood of learn parameers Training Tes 5 (a) Reward obained by wo agens in las seps (5 Runs) 1 Reward 1 1 Menor (Oracle) Observer Value Funcion of Sar Sae of learn observer policy Value V(S) for Obs Opimal V(S) (b) Fig.. Imiaion Learning Resuls for FlagWorld Domain. (a) (Clockwise) Error in parameers (mean absolue difference w.r.. rue parameers), average reward received, he log-likelihood of he learned parameers, and value funcion of sar sae Vˆπ o(s) w.r. he rue ransiion parameers. (b) Comparison of learned policy (ParamImi) wih some popular exploraion echniques (measured in erms of average discouned reward obained per seps). ParamImi ouperforms all he pure exploraion-based mehods. parameers is sill quie high even when observer policy is quie good, hereby confirming he inuiion ha only a small (relevan) subse of parameers needs o be learned well before he agen can sar exploiing a learned policy. Figure (b) compares he relaive qualiy of he learned policy wih a number of pure exploraion-based echniques used in [13]. The bars represen he average discouned reward obained per seps in he nd sage, i.e., obained in nex, seps afer an iniial 1s sage of exploraion consising of, seps. For ParamImi (our algorihm) he average is aken afer only seps of exploraion. The righmos bar is he Menor value. As can be seen, ParamImi ouperforms all he exploraion sraegies wih far less experience. 3. Facored Graphical Model A major advanage of using a graphical models-based approach o imiaion is he abiliy o leverage domain knowledge o speed up learning. For example, he number of rue parameers in he FlagWorld is acually much less han he number ha was learned in he previous secion since here are only 33 locaions for which he ransiion parameers need o be learned: he dynamics are he same irrespecive of which flags have been picked up. To reflec his fac, we can facor he menor sae S m ino locaion L m and flag saus variable Picked Flag PF m as shown in Fig. 3(a) (and similarly for he observer). This reduces he number of ransiion parameers significanly (from τ sas o τ lal ).

7 Imiaion Learning Using Graphical Models 73 We can incorporae domain knowledge abou he flags by defining he CPT P (PF +1 L +1,PF )ashe, P (PF +1 L +1,PF )=δ(pf +1,pf(PF,i)) if L +1 = Fi = δ(pf +1,PF ) oherwise where pf(pf,i)ishedeerminsic funcion which maps he old value of PF o one in which he i h flag is picked up. L m PF m π m l,pf τ lal A m (a) L m +1 PF m +1 A m +1 Average Error (Mean Abs dis from rue) Average Log likelihood (per sep) Error in Learn Parameers ransiion policy Log likelihood of learn parameers Training Tes 5 Reward obained by wo agens in las seps (5 Runs) 1 Reward 1 1 Menor (Oracle) Observer Value Funcion of Sar Sae of learn observer policy Value (b) V(S) for Obs Opimal V(S) Fig. 3. Fas Learning using Facored Graphical Models. (a) Facored model for FlagWorld (only he menor model is shown). (b) Resuls using facored model. Noe he speed-up in learning w.r.. he unfacored case (Fig. (a)). The resuls of EM-based parameer learning for he facored graphical model are shown in Fig. 3(b). As expeced, he error in ransiion parameers goes down much more rapidly han in he unfacored case (compare wih Fig. (a)). Conclusion This paper inroduces a new framework for learning by imiaion based on modeling he imiaion process in erms of probabilisic graphical models. Imiaive policies are learned in a principled manner using he expecaion-maximizaion (EM) algorihm. The model achieves ransfer of knowledge by ying he parameers for he menor s dynamics wih hose of he observer. Our resuls 5 demonsrae ha he menor s policy can be esimaed direcly from observaions of This is a common rick used in GMs o encode deerminisic domain knowledge. 5 Addiional resuls are presened in he exended version of he paper available a hp://neural.cs.washingon.edu/. In paricular, we show how learning can be furher sped up by incorporaing reward informaion colleced on he way. Also, we demonsrae he generaliy of parameer learning by exending he graphical model o learn ask-oriened policies.

8 7 D. Verma and R.P.N. Rao he menor s sae sequences and ha significan speed-up in learning can be achieved by exploiing he graphical models framework o facor he sae space in accordance wih domain knowledge. Our curren work is focused on esing he approach more exhausively, especially in he conex of roboic imiaion. No only do Graphical Models provide a compuaionally efficien framework for general imiaion, hey are also being used for modeling behavior [1]. An exciing prospec of using graphical models for imiaion is he ease of exension o models wih more absracion, including parially observable, hierarchical, and relaional models. Acknowledgmens This maerial is based upon work suppored by ONR, he Packard Foundaion, and NSF Grans and 5. References 1. Schaal, S.: Is imiaion learning he roue o humanoid robos? Trends in Cogniive Sciences 3, 33 (1999). Dauenhahn, K., Nehaniv, C.: Imiaion in Animals and Arifacs. MIT Press, Cambridge, MA () 3. Verma, D., Rao, R.P.N.: Goal-based imiaion as probabilisic inference over graphical models. In: NIPS 1 (). Suon, R.S., Baro, A.: Reinforcemen Learning: An Inroducion. MIT Press, Cambridge, MA (199) 5. Akeson, C.G., Schaal, S.: Robo learning from demonsraion. In: Proc. 1h ICML, pp. 1 (1997). Abbeel, P., Ng, A.Y.: Appreniceship learning via inverse reinforcemen learning. In: ICML, pp. 1 () 7. Price, B., Bouilier, C.: Acceleraing reinforcemen learning hrough implici imiaion. JAIR 19, 59 9 (3). Price, B., Bouilier, C.: A bayesian approach o imiaion in reinforcemen learning. In: IJCAI, pp (3) 9. Bouilier, C., Dean, T., Hanks, S.: Decision-heoreic planning: Srucural assumpions and compuaional leverage. JAIR 11, 1 9 (1999) 1. Raliff, N.D., Bagnell, J.A., Zinkevich, M.A.: Maximum margin planning. In: ICML, pp () 11. Heckerman, D.: A uorial on learning wih bayesian neworks. Technical repor, Microsof Research, Redmond, Washingon (1995) 1. Dempser, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplee daa via he EM algorihm. Journal of he Royal Saisical Sociey, Series B 39, 1 3 (1977) 13. Dearden, R., Friedman, N., Andre, D.: Model-based Bayesian Exploraion. In: UAI- 99, San Francisco, CA, pp (1999) 1. Griffihs, T.L., Tenenbaum, J.B.: Srucure and srengh in causal inducion. Cogniive Psychology 51(), 33 3 (5)

Neural Network Model of the Backpropagation Algorithm

Neural Network Model of the Backpropagation Algorithm Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák

More information

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments Inernaional Journal of Elecrical and Compuer Engineering (IJECE) Vol. 6, No. 5, Ocober 2016, pp. 2415~2424 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i5.10639 2415 An Effiecien Approach for Resource Auo-Scaling

More information

Fast Multi-task Learning for Query Spelling Correction

Fast Multi-task Learning for Query Spelling Correction Fas Muli-ask Learning for Query Spelling Correcion Xu Sun Dep. of Saisical Science Cornell Universiy Ihaca, NY 14853 xusun@cornell.edu Anshumali Shrivasava Dep. of Compuer Science Cornell Universiy Ihaca,

More information

More Accurate Question Answering on Freebase

More Accurate Question Answering on Freebase More Accurae Quesion Answering on Freebase Hannah Bas, Elmar Haussmann Deparmen of Compuer Science Universiy of Freiburg 79110 Freiburg, Germany {bas, haussmann}@informaik.uni-freiburg.de ABSTRACT Real-world

More information

MyLab & Mastering Business

MyLab & Mastering Business MyLab & Masering Business Efficacy Repor 2013 MyLab & Masering: Business Efficacy Repor 2013 Edied by Michelle D. Speckler 2013 Pearson MyAccouningLab, MyEconLab, MyFinanceLab, MyMarkeingLab, and MyOMLab

More information

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports Downloaded from ascelibrary.org by Basil Sephanis on 07/13/16. Copyrigh ASCE. For personal use only; all righs reserved. Informaion Propagaion for informing Special Populaion Subgroups abou New Ground

More information

1 Language universals

1 Language universals AS LX 500 Topics: Language Uniersals Fall 2010, Sepember 21 4a. Anisymmery 1 Language uniersals Subjec-erb agreemen and order Bach (1971) discusses wh-quesions across SO and SO languages, hypohesizing:...

More information

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices Z. Zhang e al.: Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion in Hands-Free Voice Conrolled Devices 525 Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Georgetown University at TREC 2017 Dynamic Domain Track

Georgetown University at TREC 2017 Dynamic Domain Track Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain

More information

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Transfer Learning Action Models by Measuring the Similarity of Different Domains Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn

More information

A Case-Based Approach To Imitation Learning in Robotic Agents

A Case-Based Approach To Imitation Learning in Robotic Agents A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu

More information

Regret-based Reward Elicitation for Markov Decision Processes

Regret-based Reward Elicitation for Markov Decision Processes 444 REGAN & BOUTILIER UAI 2009 Regret-based Reward Elicitation for Markov Decision Processes Kevin Regan Department of Computer Science University of Toronto Toronto, ON, CANADA kmregan@cs.toronto.edu

More information

Speeding Up Reinforcement Learning with Behavior Transfer

Speeding Up Reinforcement Learning with Behavior Transfer Speeding Up Reinforcement Learning with Behavior Transfer Matthew E. Taylor and Peter Stone Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 {mtaylor, pstone}@cs.utexas.edu

More information

High-level Reinforcement Learning in Strategy Games

High-level Reinforcement Learning in Strategy Games High-level Reinforcement Learning in Strategy Games Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Guy Shani Department of Computer

More information

Improving Action Selection in MDP s via Knowledge Transfer

Improving Action Selection in MDP s via Knowledge Transfer In Proc. 20th National Conference on Artificial Intelligence (AAAI-05), July 9 13, 2005, Pittsburgh, USA. Improving Action Selection in MDP s via Knowledge Transfer Alexander A. Sherstov and Peter Stone

More information

Efficient Use of Space Over Time Deployment of the MoreSpace Tool

Efficient Use of Space Over Time Deployment of the MoreSpace Tool Efficient Use of Space Over Time Deployment of the MoreSpace Tool Štefan Emrich Dietmar Wiegand Felix Breitenecker Marijana Srećković Alexandra Kovacs Shabnam Tauböck Martin Bruckner Benjamin Rozsenich

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Go fishing! Responsibility judgments when cooperation breaks down

Go fishing! Responsibility judgments when cooperation breaks down Go fishing! Responsibility judgments when cooperation breaks down Kelsey Allen (krallen@mit.edu), Julian Jara-Ettinger (jjara@mit.edu), Tobias Gerstenberg (tger@mit.edu), Max Kleiman-Weiner (maxkw@mit.edu)

More information

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics Nishant Shukla, Yunzhong He, Frank Chen, and Song-Chun Zhu Center for Vision, Cognition, Learning, and Autonomy University

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Visual CP Representation of Knowledge

Visual CP Representation of Knowledge Visual CP Representation of Knowledge Heather D. Pfeiffer and Roger T. Hartley Department of Computer Science New Mexico State University Las Cruces, NM 88003-8001, USA email: hdp@cs.nmsu.edu and rth@cs.nmsu.edu

More information

Finding Your Friends and Following Them to Where You Are

Finding Your Friends and Following Them to Where You Are Finding Your Friends and Following Them to Where You Are Adam Sadilek Dept. of Computer Science University of Rochester Rochester, NY, USA sadilek@cs.rochester.edu Henry Kautz Dept. of Computer Science

More information

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Semi-Supervised Face Detection

Semi-Supervised Face Detection Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

More information

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Robot Learning Simultaneously a Task and How to Interpret Human Instructions Robot Learning Simultaneously a Task and How to Interpret Human Instructions Jonathan Grizou, Manuel Lopes, Pierre-Yves Oudeyer To cite this version: Jonathan Grizou, Manuel Lopes, Pierre-Yves Oudeyer.

More information

Rajesh P. N. Rao, Aaron P. Shon and Andrew N. Meltzoff

Rajesh P. N. Rao, Aaron P. Shon and Andrew N. Meltzoff 11 A Bayesian model of imitation in infants and robots Rajesh P. N. Rao, Aaron P. Shon and Andrew N. Meltzoff 11.1 Introduction Humans are often characterized as the most behaviourally flexible of all

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

FF+FPG: Guiding a Policy-Gradient Planner

FF+FPG: Guiding a Policy-Gradient Planner FF+FPG: Guiding a Policy-Gradient Planner Olivier Buffet LAAS-CNRS University of Toulouse Toulouse, France firstname.lastname@laas.fr Douglas Aberdeen National ICT australia & The Australian National University

More information

Comparison of network inference packages and methods for multiple networks inference

Comparison of network inference packages and methods for multiple networks inference Comparison of network inference packages and methods for multiple networks inference Nathalie Villa-Vialaneix http://www.nathalievilla.org nathalie.villa@univ-paris1.fr 1ères Rencontres R - BoRdeaux, 3

More information

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego

More information

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation School of Computer Science Human-Computer Interaction Institute Carnegie Mellon University Year 2007 Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation Noboru Matsuda

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

A Pilot Study on Pearson s Interactive Science 2011 Program

A Pilot Study on Pearson s Interactive Science 2011 Program Final Report A Pilot Study on Pearson s Interactive Science 2011 Program Prepared by: Danielle DuBose, Research Associate Miriam Resendez, Senior Researcher Dr. Mariam Azin, President Submitted on August

More information

Mathematics subject curriculum

Mathematics subject curriculum Mathematics subject curriculum Dette er ei omsetjing av den fastsette læreplanteksten. Læreplanen er fastsett på Nynorsk Established as a Regulation by the Ministry of Education and Research on 24 June

More information

Task Completion Transfer Learning for Reward Inference

Task Completion Transfer Learning for Reward Inference Machine Learning for Interactive Systems: Papers from the AAAI-14 Workshop Task Completion Transfer Learning for Reward Inference Layla El Asri 1,2, Romain Laroche 1, Olivier Pietquin 3 1 Orange Labs,

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Learning Prospective Robot Behavior

Learning Prospective Robot Behavior Learning Prospective Robot Behavior Shichao Ou and Rod Grupen Laboratory for Perceptual Robotics Computer Science Department University of Massachusetts Amherst {chao,grupen}@cs.umass.edu Abstract This

More information

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Dublin City Schools Mathematics Graded Course of Study GRADE 4 I. Content Standard: Number, Number Sense and Operations Standard Students demonstrate number sense, including an understanding of number systems and reasonable estimates using paper and pencil, technology-supported

More information

Improving Conceptual Understanding of Physics with Technology

Improving Conceptual Understanding of Physics with Technology INTRODUCTION Improving Conceptual Understanding of Physics with Technology Heidi Jackman Research Experience for Undergraduates, 1999 Michigan State University Advisors: Edwin Kashy and Michael Thoennessen

More information

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Varun Raj Kompella, Marijn Stollenga, Matthew Luciw, Juergen Schmidhuber The Swiss AI Lab IDSIA, USI

More information

Truth Inference in Crowdsourcing: Is the Problem Solved?

Truth Inference in Crowdsourcing: Is the Problem Solved? Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

Learning Rules from Incomplete Examples via Implicit Mention Models

Learning Rules from Incomplete Examples via Implicit Mention Models JMLR: Workshop and Conference Proceedings 20 (2011) 197 212 Asian Conference on Machine Learning Learning Rules from Incomplete Examples via Implicit Mention Models Janardhan Rao Doppa Mohammad Shahed

More information

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data Kurt VanLehn 1, Kenneth R. Koedinger 2, Alida Skogsholm 2, Adaeze Nwaigwe 2, Robert G.M. Hausmann 1, Anders Weinstein

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

CHANCERY SMS 5.0 STUDENT SCHEDULING

CHANCERY SMS 5.0 STUDENT SCHEDULING CHANCERY SMS 5.0 STUDENT SCHEDULING PARTICIPANT WORKBOOK VERSION: 06/04 CSL - 12148 Student Scheduling Chancery SMS 5.0 : Student Scheduling... 1 Course Objectives... 1 Course Agenda... 1 Topic 1: Overview

More information

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise A Game-based Assessment of Children s Choices to Seek Feedback and to Revise Maria Cutumisu, Kristen P. Blair, Daniel L. Schwartz, Doris B. Chin Stanford Graduate School of Education Please address all

More information

An Estimating Method for IT Project Expected Duration Oriented to GERT

An Estimating Method for IT Project Expected Duration Oriented to GERT An Estimating Method for IT Project Expected Duration Oriented to GERT Li Yu and Meiyun Zuo School of Information, Renmin University of China, Beijing 100872, P.R. China buaayuli@mc.e(iuxn zuomeiyun@263.nct

More information

An investigation of imitation learning algorithms for structured prediction

An investigation of imitation learning algorithms for structured prediction JMLR: Workshop and Conference Proceedings 24:143 153, 2012 10th European Workshop on Reinforcement Learning An investigation of imitation learning algorithms for structured prediction Andreas Vlachos Computer

More information

A Bayesian Model of Imitation in Infants and Robots

A Bayesian Model of Imitation in Infants and Robots To appear in: Imitation and Social Learning in Robots, Humans, and Animals: Behavioural, Social and Communicative Dimensions, K. Dautenhahn and C. Nehaniv (eds.), Cambridge University Press, 2004. A Bayesian

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Graphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task

Graphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task Graphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task Beate Grawemeyer and Richard Cox Representation & Cognition Group, Department of Informatics, University

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

An Evaluation of E-Resources in Academic Libraries in Tamil Nadu

An Evaluation of E-Resources in Academic Libraries in Tamil Nadu An Evaluation of E-Resources in Academic Libraries in Tamil Nadu 1 S. Dhanavandan, 2 M. Tamizhchelvan 1 Assistant Librarian, 2 Deputy Librarian Gandhigram Rural Institute - Deemed University, Gandhigram-624

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Task Completion Transfer Learning for Reward Inference

Task Completion Transfer Learning for Reward Inference Task Completion Transfer Learning for Reward Inference Layla El Asri 1,2, Romain Laroche 1, Olivier Pietquin 3 1 Orange Labs, Issy-les-Moulineaux, France 2 UMI 2958 (CNRS - GeorgiaTech), France 3 University

More information

Session Six: Software Evaluation Rubric Collaborators: Susan Ferdon and Steve Poast

Session Six: Software Evaluation Rubric Collaborators: Susan Ferdon and Steve Poast EDTECH 554 (FA10) Susan Ferdon Session Six: Software Evaluation Rubric Collaborators: Susan Ferdon and Steve Poast Task The principal at your building is aware you are in Boise State's Ed Tech Master's

More information

Seminar - Organic Computing

Seminar - Organic Computing Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

San Francisco County Weekly Wages

San Francisco County Weekly Wages San Francisco County Weekly Wages Focus on Post-Recession Recovery Q 3 205 Update Produced by: Marin Economic Consulting March 6, 206 Jon Haveman, Principal 45-336-5705 or Jon@MarinEconomicConsulting.com

More information

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design Burton Levine Karol Krotki NISS/WSS Workshop on Inference from Nonprobability Samples September 25, 2017 RTI

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

College Pricing and Income Inequality

College Pricing and Income Inequality College Pricing and Income Inequality Zhifeng Cai U of Minnesota, Rutgers University, and FRB Minneapolis Jonathan Heathcote FRB Minneapolis NBER Income Distribution, July 20, 2017 The views expressed

More information

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically

More information

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition Tom Y. Ouyang * MIT CSAIL ouyang@csail.mit.edu Yang Li Google Research yangli@acm.org ABSTRACT Personal

More information

AMULTIAGENT system [1] can be defined as a group of

AMULTIAGENT system [1] can be defined as a group of 156 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008 A Comprehensive Survey of Multiagent Reinforcement Learning Lucian Buşoniu, Robert Babuška,

More information

LEt s GO! Workshop Creativity with Mockups of Locations

LEt s GO! Workshop Creativity with Mockups of Locations LEt s GO! Workshop Creativity with Mockups of Locations Tobias Buschmann Iversen 1,2, Andreas Dypvik Landmark 1,3 1 Norwegian University of Science and Technology, Department of Computer and Information

More information

Cued Recall From Image and Sentence Memory: A Shift From Episodic to Identical Elements Representation

Cued Recall From Image and Sentence Memory: A Shift From Episodic to Identical Elements Representation Journal of Experimental Psychology: Learning, Memory, and Cognition 2006, Vol. 32, No. 4, 734 748 Copyright 2006 by the American Psychological Association 0278-7393/06/$12.00 DOI: 10.1037/0278-7393.32.4.734

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq 835 Different Requirements Gathering Techniques and Issues Javaria Mushtaq Abstract- Project management is now becoming a very important part of our software industries. To handle projects with success

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The

More information

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology Tiancheng Zhao CMU-LTI-16-006 Language Technologies Institute School of Computer Science Carnegie Mellon

More information

CSC200: Lecture 4. Allan Borodin

CSC200: Lecture 4. Allan Borodin CSC200: Lecture 4 Allan Borodin 1 / 22 Announcements My apologies for the tutorial room mixup on Wednesday. The room SS 1088 is only reserved for Fridays and I forgot that. My office hours: Tuesdays 2-4

More information

A Model of Knower-Level Behavior in Number Concept Development

A Model of Knower-Level Behavior in Number Concept Development Cognitive Science 34 (2010) 51 67 Copyright Ó 2009 Cognitive Science Society, Inc. All rights reserved. ISSN: 0364-0213 print / 1551-6709 online DOI: 10.1111/j.1551-6709.2009.01063.x A Model of Knower-Level

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

ACTIVITY: Comparing Combination Locks

ACTIVITY: Comparing Combination Locks 5.4 Compound Events outcomes of one or more events? ow can you find the number of possible ACIVIY: Comparing Combination Locks Work with a partner. You are buying a combination lock. You have three choices.

More information

Guru: A Computer Tutor that Models Expert Human Tutors

Guru: A Computer Tutor that Models Expert Human Tutors Guru: A Computer Tutor that Models Expert Human Tutors Andrew Olney 1, Sidney D'Mello 2, Natalie Person 3, Whitney Cade 1, Patrick Hays 1, Claire Williams 1, Blair Lehman 1, and Art Graesser 1 1 University

More information

Taking Kids into Programming (Contests) with Scratch

Taking Kids into Programming (Contests) with Scratch Olympiads in Informatics, 2009, Vol. 3, 17 25 17 2009 Institute of Mathematics and Informatics, Vilnius Taking Kids into Programming (Contests) with Scratch Abdulrahman IDLBI Syrian Olympiad in Informatics,

More information

An Introduction to Simio for Beginners

An Introduction to Simio for Beginners An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality

More information

Acquiring Competence from Performance Data

Acquiring Competence from Performance Data Acquiring Competence from Performance Data Online learnability of OT and HG with simulated annealing Tamás Biró ACLC, University of Amsterdam (UvA) Computational Linguistics in the Netherlands, February

More information

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.

More information

Corrective Feedback and Persistent Learning for Information Extraction

Corrective Feedback and Persistent Learning for Information Extraction Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,

More information

Math Hunt th November, Sodalitas de Mathematica St. Xavier s College, Maitighar Kathmandu, Nepal

Math Hunt th November, Sodalitas de Mathematica St. Xavier s College, Maitighar Kathmandu, Nepal Math Hunt-2017 11 th November, 2017 Sodalitas de Mathematica St. Xavier s College, Maitighar Kathmandu, Nepal SODALITAS DE MATHEMATICA To, Subject: Regarding Participation in Math Hunt-2017 Respected Sir/Madam,

More information

M55205-Mastering Microsoft Project 2016

M55205-Mastering Microsoft Project 2016 M55205-Mastering Microsoft Project 2016 Course Number: M55205 Category: Desktop Applications Duration: 3 days Certification: Exam 70-343 Overview This three-day, instructor-led course is intended for individuals

More information