Teaching a Machine to Read Maps with Deep Reinforcement Learning
|
|
- Duane Norris
- 6 years ago
- Views:
Transcription
1 Teaching a Machine o Read Maps wih Deep Reinforcemen Learning Gino Brunner and Oliver Richer and Yuyi Wang and Roger Waenhofer ETH Zurich {brunnegi, richero, yuwang, waenhofer}@ehz.ch Absrac The abiliy o use a 2D map o navigae a complex 3D environmen is quie remarkable, and even difficul for many humans. Localizaion and navigaion is also an imporan problem in domains such as roboics, and has recenly become a focus of he deep reinforcemen learning communiy. In his paper we each a reinforcemen learning agen o read a map in order o find he shores way ou of a random maze i has never seen before. Our sysem combines several sae-of-hear mehods such as A3 and incorporaes novel elemens such as a recurren localizaion cell. Our agen learns o localize iself based on 3D firs person images and an approximae orienaion angle. The agen generalizes well o bigger mazes, showing ha i learned useful localizaion and navigaion capabiliies. 1 Inroducion One of he main success facors of human evoluion is our abiliy o craf and use complex ools. No only did his abiliy give us a moivaion for social ineracion by eaching ohers how o use differen ools, i also enhanced our hinking capabiliies, since we had o undersand ever more complex ools. Take a map as an example; a map helps us navigae places we have never seen before. However, we firs need o learn how o read i, i.e., we need o associae he conen of a wo-dimensional map wih our hreedimensional surroundings. Wih algorihms becoming increasingly capable of learning complex relaions, a way o make machines inelligen is o each hem how o use already exising ools. In his paper, we each a machine how o read a map wih deep reinforcemen learning. The agen wakes up in a maze. The agen s view is an image: he maze rendered from he agen s perspecive, like a dungeon in a firs person video game. This rendered image is provided by he DeepMind Lab environmen (Beaie e al. 2016). The agen can be conrolled by a human, or as in our case, by a complex deep reinforcemen learning archiecure. 1 The agen can move (forward, backward, lef, righ) and roae (lef, righ), and is view image will change Names in alphabeic order opyrigh c 2018, Associaion for he Advancemen of Arificial Inelligence ( All righs reserved. 1 Our code can be found here: hps://gihub.com/ OliverRicher/map-reader.gi accordingly. In addiion, he agen ges o see a map of he maze, also an image, as can be seen in igure 1. One locaion on he map is marked wih an X - he agen s arge. The crux is ha he agen does no know where on he map i currenly is. Several locaions on he map migh correspond well wih he curren view. Thus he agen needs o move around o learn is posiion and hen move o he arge, as illusraed in igures 6 and 8. We do equip he agen wih an approximae orienaion angle, i.e., he agen roughly knows he direcion i is moving or looking. In he map, up is always norh. During raining he agen learns which approximae orienaion corresponds o norh. A complex muli-sage ask, such as navigaing a maze wih he help of a map, can be naurally decomposed ino several subasks: (i) The agen needs o observe is 3D environmen and compare i o he map o deermine is mos likely posiion. (ii) The agen needs o undersand he map, or in our case associae symbols on he map wih rewards and hereby gain an undersanding of wha a wall is, wha navigable space is, and wha he arge is. (iii) inally he agens needs o learn how o follow a plan in order o reach he arge. Our conribuion is as follows: We presen a novel modular reinforcemen learning archiecure ha consiss of a reacive agen and several inermediae subask modules. Each of hese modules is designed o solve a specific subask. The modules hemselves can conain neural neworks or alernaively implemen exac algorihms or heurisics. Our presened agen is capable of finding he arge in random mazes roughly hree imes he size of he larges mazes i has seen during raining. urher conribuions include: The Recurren Localizaion ell ha oupus a locaion probabiliy disribuion based on an esimaed sream of visible local maps. A simple mapping module ha creaes a visible local 2D map from 3D RGB inpu. The mapping module is robus, even if he agen s compass is inaccurae. 2 Relaed Work Reinforcemen learning in relaion o AI has been sudied since he 1950 s (Minsky 1954). Imporan early work on reinforcemen learning includes he emporal difference
2 learning mehod by Suon (1984; 1988), which is he basis for acor-criic algorihms (Baro, Suon, and Anderson 1983) and Q-learning echniques (Wakins 1989; Wakins and Dayan 1992). irs works using arificial neural neworks for reinforcemen learning include (Williams 1992) and (Gullapalli 1990). or an in-deph overview of reinforcemen learning we refer he ineresed readers o (Kaelbling, Liman, and Moore 1996), (Suon and Baro 1998) and (Szepesvári 2010). The curren deep learning boom was sared by, among oher conribuions, he backpropagaion algorihm (Rumelhar e al. 1988) and advances in compuing power and GPU frameworks. However, deep learning could no be applied effecively o reinforcemen learning unil recenly. Mnih e al. (2015) inroduced he Deep-Q-Nework (DQN) ha uses experience replay and arge neworks o sabilize he learning process. Since hen, several exensions o he DQN archiecure have been proposed, such as he Double Deep- Q-Nework (DDQN) (van Hassel, Guez, and Silver 2016) and he dueling nework archiecure (Wang e al. 2016). These neworks are based on using replay buffers o sabilize learning, such as prioriized experience replay (Schaul e al. 2015). The sae-of-he-ar A3 (Mnih e al. 2016) relies on asynchronous acor-learners o sabilize learning. In our sysem, we use A3 learning on a modified nework archiecure o rain our reacive agen and he localizaion module in an on-policy manner. We also make use of (prioriized) replay buffers o rain our agen off policy. A major challenge in reinforcemen learning are environmens wih delayed or sparse rewards. An agen ha never ges a reward can never learn good behavior. Thus Jaderberg e al. (2016) and Mirowski e al. (2016) inroduced auxiliary asks ha le he agen learn based on inermediae inrinsic pseudo-rewards, such as predicing he deph from a 3D RGB image, while simulaneously rying o solve he main ask, e.g., finding he exi in a 3D maze. The policies learned by he auxiliary asks are no direcly used by he agen, bu solely serve he purpose of helping he agen learn beer represenaions which improves is performance on he main ask. The idea of auxiliary asks is inspired by prior work on emporal absracions, such as opions (Suon, Precup, and Singh 1999), whose focus was on learning emporal absracions o improve high-level learning and planning. In our work we inroduce a modularized archiecure ha incorporaes inermediae subasks, such as localizaion, local map esimaion and global map inerpreaion. In conras o (Jaderberg e al. 2016), our reacive agen direcly uses he oupus of hese modules o solve he main ask. Noe ha we use an auxiliary ask inside our localizaion module o improve he local map esimaion. Kulkarni e al. (2016) inroduced a hierarchical version of he DQN o ackle he challenge of delayed and sparse rewards. Their sysem operaes a differen emporal scales and allows he definiion of goals using eniy relaions. The policy is learned in such a way o reach hese goals. We use a similar approach o make our agen follow a plan, such as, go norh. Mapping and localizaion has been exensively sudied in he domain of roboics (Thrun, Burgard, and ox 2005). A robo creaes a map of he environmen from sensory inpu (e.g., sonar or LIDAR) and hen uses his map o plan a pah hrough he environmen. Subsequen works have combined hese approaches wih compuer vision echniques (uenes-pacheco, Ascencio, and Rendón-Mancha 2015) ha use RGB(-D) images as inpu. Machine learning echniques have been used o solve mapping and planning separaely, and laer also ackled he join mapping and planning problem (Elfes 1989). Insead of separaing mapping and planning phases, reinforcemen learning mehods aimed a direcly learning good policies for roboic asks, e.g., for learning human-like moor skills (Peers and Schaal 2008). Recen advances in deep reinforcemen learning have spawned impressive work in he area of mapping and localizaion. The UNREAL agen (Jaderberg e al. 2016) uses auxiliary asks and a replay buffer o learn how o navigae a 3D maze. Mirowski e al. (2016) came up wih an agen ha uses differen auxiliary asks in an online manner o undersand if navigaion capabiliies manifes as a biproduc of solving a reinforcemen learning problem. Zhu e al. (2017) ackled he problems of generalizaion across asks and daa inefficiency. They use a realisic 3D environmen wih physics engine o gaher raining daa efficienly. Their model is capable of navigaing o a visually specified arge. In conras o oher approaches, hey use a memoryless feed-forward model insead of recurren models. Gupa e al. (2017) simulaed a robo ha navigaes hrough a real 3D environmen. They focus on he archiecural problem of learning mapping and planning in a join manner, such ha he wo phases can profi from knowing each oher s needs. Their agen is capable of creaing an inernal 2D represenaion of he local 3D environmen, similar o our local visible map. In our work a global map is given, and he agen learns o inerpre and read ha map o reach a cerain arge locaion. Thus, our agen is capable of following complicaed long range rajecories in an approximaely shores pah manner. urhermore, heir sysem is rained in a fully supervised manner, whereas our agen is rained wih reinforcemen learning. Bhai e al. (2016) augmen he sandard DQN wih semanic maps in he VizDoom (Kempka e al. 2016) environmen. These semanic maps are consruced from 3D RGB-D inpu, and hey employ echniques such as sandard compuer vision based objec recogniion and SLAM. They showed ha his resuls in beer learned policies. The ask of heir agen is o eliminae as many opponens as possible before dying. In conras, our agen needs o escape from a complex maze. urhermore, our environmens are designed o provide as lile semanic informaion as possible o make he ask more difficul for he agen; our agen needs o consruc is local visible map based purely on he shape of is surroundings. 3 Archiecure Many complex asks can be divided ino easier inermediae asks which when all solved individually solve he complex ask. We use his principle and apply i o neural nework archiecure design. In his secion we firs inroduce our concep of modular inermediae asks, and hen discuss how we implemen he modular asks in our map reading archiecure.
3 Visible Local Map Nework a -1 r -1 Recurren Localizaion ell {p i loc } N i=1 Visual Inpu NN NN Map Excerp Visible ield Visible Local Map {p i loc } N i=1 Map Inerpreaion Nework STTD r -1? H loc Reacive Agen V igure 2: The visible local map nework: The RGB pixel inpu is passed hrough wo convoluional neural nework (NN) layers and a fully conneced () layer before being concaenaed o he discreized angle ˆα and furher processed by fully conneced layers and a gaing operaion. igure 1: Archiecure overview and inerplay beween he four modules. ˆα is he discreized angle, a 1 is he las acion aken, r 1 is he las reward received, {p loc i } N i=1 is he esimaed locaion probabiliy disribuion over he N possible discree locaions, H loc is he enropy of he esimaed locaion probabiliy disribuion, STTD is he shor erm arge direcion suggesed by he map inerpreaion nework, V is he esimaed sae value and π is he policy oupu from which he nex acion a is sampled. 3.1 Modular Inermediae Tasks An inermediae ask module can be any informaion processing uni ha akes as inpu eiher sensory inpu and/or he oupu of oher modules. A module is defined and designed afer he inermediae ask i solves and can consis of rainable and hard coded pars. Since we are dealing wih neural neworks, he oupu and herefore he inpu of a module can be erroneous. Each module adjuss is rainable parameers o reduce is error independen of oher modules. We achieve his by sopping error back-propagaion on module boundaries. Noe ha his separaion has some advanages and drawbacks: Each module performance can be evaluaed and debugged individually. Small inermediae subask modules have shor credi assignmen pahs, which reduces he problem of exploding and vanishing gradiens during back-propagaion. Modules canno adjus heir oupu o fi he inpu needs of he nex module. This has o be achieved hrough inerface design, i.e., inermediae ask specificaion. Our neural nework archiecure consiss of four modules, each dedicaed o a specific subask. We firs give an overview of he inerplay beween he modules before describing hem in deail in he following secions. The archiecure overview is skeched in igure 1. The firs module is he visible local map nework; i akes he raw visual inpu from he 3D environmen and creaes for each frame a wo dimensional map excerp of he currenly visible surroundings. The second module, he recurren localizaion cell, akes he sream of visible local map excerps and inegraes i ino a local map esimaion. This local map esimaion is compared o he global map o ge a probabiliy disribuion over he discreized possible locaions. The hird module is called map inerpreaion nework; i learns o inerpre he global map and oupus a shor erm arge direcion for he esimaed posiion. The las module is a reacive agen ha learns o follow he esimaed shor erm arge direcion o ulimaely find he exi of he maze. We allow our agen o have access o a discreized angle ˆα describing he direcion i is facing, comparable o a robo having access o a compass. urhermore, we do no limi ourself o compleely unsupervised learning and allow he agen o use a discreized version of is acual posiion during raining. This could be implemened as a robo raining on he nework wih he help of a GPS signal. The robo could rain as long as he accuracy of he GPS signal is below a cerain hreshold and ac on he rained nework as soon as he GPS signal ges inaccurae or oally los. We leave such a pracical implemenaion of our algorihm o fuure work and focus here on he algorihmic srucure iself. We now describe each module archiecure individually before we discuss heir join raining in Secion 3.6. If no specified oherwise, we use recified linear uni acivaions afer each layer. 3.2 Visible Local Map Nework The visible local map nework preprocesses he raw visual RGB inpu from he environmen hrough wo convoluional neural nework layers followed by a fully conneced layer. We adaped his preprocessing archiecure from (Jaderberg e al. 2016). The hereby generaed feaures are concaenaed o a 3-ho discreized encoding ˆα of he orienaion angle α, i.e., we inpu he angle as n-dimensional vecor where each dimension represens a discree sae of he angle, wih n = 30. We se he hree vecor componens ha represen he discree angle values closes o he acual angle o one while he remaining componens are se o zero, e.g. ˆα = [ ]. We used a 3-ho insead of a 1-ho encoding o smooh he inpu. Noe ha his encoding has an average quanizaion error of 6 degrees. The discreized angle and preprocessed visual feaures are passed hrough a fully conneced layer o ge an inermediae represenaion from which wo hings are esimaed:
4 s -1 a -1 r -1 ~ V M sofmax LM es+mfb s o f m a x LM es {p i loc } N i=1. s LM mfb igure 3: Skech of he informaion flow in he recurren localizaion cell. The las egomoion esimaion s 1, he discreized angle ˆα, he las acion a 1 and reward r 1 are passed hrough wo fully conneced () layers and combined wih a wo dimensional convoluion beween he former local map esimaion LM 1 es and he curren visible local map inpu o ge he new egomoion esimaion s. This egomoion esimaion is used o shif he previously esimaed local map LM 1 es and he previous map feedback local map LM mfb 1. A weighed and clipped combinaion of hese local map esimaions, LM es+mfb 1, is convolved wih he full map o ge he esimaed locaion probabiliy disribuion {p loc i } N i=1. Recurren connecions are marked by empy arrows. 1. A reconsrucion of he map excerp ha corresponds o he curren visual inpu 2. The curren field of view, which is used o gae he esimaed map excerp such ha only esimaes which lie in he line of sigh make i ino he visible local map. This gaing is crucial o reduce noise in he visible local map oupu. See igure 2 for a skech of he visible local map nework archiecure. 3.3 Recurren Localizaion ell Moving around in he environmen, he agen generaes a sream of visible local map excerps like he oupu in igure 2 or he visible local map inpu Ṽ in igure 3. The recurren localizaion cell hen builds an egocenric local map ou of his sream and compares i o he acual map o esimae he curren posiion. The agen has o predic is egomoion o shif he egocenric esimaed local map accordingly. We refer o igure 3 for a skech of he archiecure described hereafer. Le M be he curren map, Ṽ he oupu of he visible local map nework, ˆα he discreized 3-ho encoded orienaion angle, a 1 he 1-ho encoded las acion aken, r 1 he exrinsic reward received by aking acion a 1, LM es he esimaed local map a ime sep, LM mfb he map feedback local map a ime sep, LM es+mfb he esimaed local map wih map feedback a ime sep, s he esimaed necessary shifing (or esimaed egomoion) a ime sep and {p loc i } N i=1 he discree esimaed locaion probabiliy disribuion. Then we can describe he funcionaliy of he recurren localizaion cell by he following equaions: s = sofmax(f(s 1, ˆα, a 1, r 1 ) + LM 1 es Ṽ ) [ ] +0.5 LM es = LM 1 es s + Ṽ 0.5 LM es+mfb = [ LM es {p loc i } N i=1 = sofmax LM mfb = + λ LM mfb 1 ( s m LM es+mfb N p i g(m, i) i=1 ] Here, f( ) is a wo layer feed forward neural nework, denoes a wo dimensional discree convoluion wih sride one in boh dimensions, [ ] denoes a clipping o [ 0.5, +0.5], λ is a rainable map feedback parameer and g(m, i) exracs from he map m he local map around locaion i. 3.4 Map Inerpreaion Nework The goal of he map inerpreaion nework is o find rewarding locaions on he map and consruc a plan o ge o hese locaions. We achieve his in hree sages: irs, he nework passes he map hrough wo convoluional layers followed by a recified linear uni acivaion o creae a 3-channel reward map. The channels are rained (as discussed in Secion 3.6) o represen wall locaions, navigable locaions and arge locaions respecively. This reward map is hen area averaged, recified and passed o a parameer free 2D shores pah planning module which oupus for each of he discree locaions on he map a disribuion over {Norh, Eas, Souh, Wes}, i.e., a shor erm arge direcion (STTD), as well as a measure of disance o he neares arge locaion. This plan is hen muliplied wih he esimaed locaion probabiliy disribuion o ge he smooh STTD and arge disance of he currenly esimaed locaion. Noe ha planning for each possible locaion and querying he plan wih he full locaion probabiliy disribuion helps o resolve he exploiaion-exploraion dilemma of he reacive agen: An uncerain locaion probabiliy disribuion close o he uniform disribuion will resul in an uncerain STTD disribuion over {Norh, Eas, Souh, Wes}, hereby encouraging exploraion. )
5 A locaion probabiliy disribuion over locaions wih similar STTD will accumulae hese similariies and resul in a clear STTD for he agen, even hough he locaion migh sill be unclear (exploiaion). 3.5 Reacive Agen and Inrinsic Reward As menioned, he reacive agen faces wo parially conradicing goals: following he STTD (exploiaion) and improving he localizaion by generaing informaion rich visual inpu (exploraion), e.g., no excessive saring a walls. The agen learns his rade off hrough reinforcemen learning, i.e., by maximizing he expeced sum of rewards. The rewards we provide here are exrinsic rewards from he environmen (negaive reward for running ino walls, posiive reward for finding he arge) as well as inrinsic rewards linked o he shor erm goal inpus of he reacive agen. These shor erm goal inpus are he STTD disribuion over {Norh, Eas, Souh, Wes} and he measure of disance o he neares arge locaion from he map inerpreaion nework as well as he normalized enropy H loc of he discree locaion probabiliy disribuion {p loc i } N i=1. Hloc represens a measure of locaion uncerainy which is linked o he need for exploraion. The inrinsic reward consiss of wo pars o encourage boh exploraion and exploiaion. The exploraion inrinsic reward I explor in each imesep is he difference in locaion probabiliy disribuion enropy o he previous imesep: I explor = H 1 loc H loc Noe ha his reward is posiive if and only if he locaion probabiliy disribuion enropy decreases, i.e., when he agen ges more cerain abou is posiion. The exploiaion inrinsic reward should be a measure of how well he egomoion of he agen aligns wih he STTD. or his we calculae an approximae wo dimensional egomoion vecor e from he egomoion probabiliy disribuion esimaion s. Similarly we calculae a STTD vecor d 1 from he STTD disribuion over {Norh, Eas, Souh, W es} of he previous imesep. We calculae he exploiaion inrinsic reward I exploi as do produc beween he wo vecors: I exploi = e T d 1 Noe ha his reward is posiive if and only if he angle difference beween he wo vecors is no bigger han 90 degrees, i.e., if he esimaed egomoion was in he same direcion as suggesed by he STTD in he imesep before. As inpu o he reacive agen we concaenae he discreized 3-ho angle ˆα, he las exrinsic reward and he locaion probabiliy disribuion enropy H loc o he STTD disribuion and he esimaed arge disance. The agen iself is a simple feed-forward nework consising of wo fully conneced layers wih recified linear uni acivaion followed by a fully conneced layer for he policy and a fully conneced layer for he esimaed sae value respecively. The agens nex acion is sampled from he sofmax-disribuion over he policy oupus. 3.6 Training Losses To rain our agen, we use a combinaion of on-policy losses, where he daa is generaed from rollous in he environmen, and off-policy losses, where we sample he daa from a replay memory. More specifically, he oal loss is he sum of he four module specific losses: 1. L vlm, he off-policy visible local map loss 2. L loc, he on-policy localizaion loss 3. L rm, he off-policy reward map loss and 4. L a, he on-policy reacive agens acing loss We rain our agen as asynchronous advanage acor criic, or A3, wih addiional losses; similar o DeepMind s UN- REAL agen (Jaderberg e al. 2016): In each raining ieraion, every hread rolls ou up o 20 seps in he environmen and accumulaes he localizaion loss L loc and acing loss L a. or each sep, an experience frame is pushed o an experience hisory buffer of fixed lengh. Each experience frame conains all inpus he nework requires as well as he curren discreized rue posiion. rom his experience hisory, frames are sampled and inpus replayed hrough he nework o calculae he visible local map loss L vlm and he reward map loss L rm. We now describe each loss in more deail. The oupu Ṽ of he visible local map nework is rained o mach he visible excerp of he map V, consruced from he discreized locaion and angle. In each raining ieraion 20 experience frames are uniformly sampled from he experience hisory and he visible local map loss is calculaed as he sum of L2 disances beween visible local map oupus Ṽ k and arges V k : L vlm = k S Ṽk V k 2 Here, S denoes he se of sampled frame indices. Our localizaion loss L loc is rained on he policy rollous in he environmen. or each sep, we compare he esimaed posiion o he acual posiion in wo ways, which resuls in a cross enropy locaion loss L loc,xen and a disance locaion loss L loc,d. The cross enropy locaion loss is he cross enropy beween he locaion probabiliy disribuion {p loc i } N i=1 and a 1-ho encoding of he acual posiion. The disance loss L loc,d is calculaed a each sep as he L2 disance beween he acual wo dimensional cell posiion coordinaes c pos and he esimaed cenroid of all possible cells i weighed by heir corresponding probabiliy p loc i : L loc,d N = c pos p loc i c i i=1 2 In addiion o raining he locaion esimaion direcly we also assign an auxiliary local map loss L loc,lm o help wih he local map consrucion. We calculae he local map loss only once per raining ieraion as L2 disance beween he las esimaed local map LM es and he acual local map a ha poin in ime. The goal of he reward map loss L rm is o have he hree channels of he reward map represen wall locaions, free
6 Seps Moving Average 3,000 2,000 1, Toal Training Seps 10 6 igure 4: Training performance of 8 acor hreads ha sar raining on 5x5 mazes. The verical black lines mark jumps o larger mazes of he hread in blue. space locaions and arge locaions respecively. To do his, we leverage he seing ha running ino a wall gives a negaive exrinsic reward, moving in open space gives no exrinsic reward and finding he arge gives a posiive exrinsic reward. Therefore he problem can be ransformed ino esimaing an exrinsic reward. Each raining ieraion we sample 20 frames from he experience hisory. This sampling is independen from he visible local map loss sampling and skewed o have in expecaion equally many frames wih posiive, negaive and zero exrinsic reward. or each frame, he frames map is passed hrough he convoluion layers of he map inerpreaion nework o creae he corresponding reward map while he visual inpu and localizaion sae saved in he frame are fed hrough he nework o ge he esimaed locaion probabiliy disribuion. The reward map loss is he cross enropy predicion error of he reward a he esimaed posiion. Our reacive agen s acing loss is equivalen o he A3 learning described by Mnih e al. (2016). We also adaped an acion repea of 4 and a frame rae of 15 fps. The whole nework is rained by RMSprop gradien descen wih gradien back propagaion sopped a module boundaries, i.e., each module is only rained on is module specific loss. 4 Environmen and Resuls To evaluae our archiecure we creaed a raining and es se of mazes wih he corresponding black and whie maps in he DeepMind Lab environmen. The mazes are quadraic grid mazes wih each maze cell being eiher a wall, an open space, he arge or he spawn posiion. The raining se consiss of 100 mazes of differen sizes; 20 mazes each in he sizes 5x5, 7x7, 9x9, 11x11 and 13x13 maze cells. The es se consiss of 900 mazes; 100 in each of he sizes 5x5, 7x7, 9x9, 11x11, 13x13, 15x15, 17x17, 19x19 and 21x21. Noe ha he ouermos cells in he mazes are always walls, herefore he maximal navigable space of a 5x5 maze is 3x3 maze cells. Thus he navigable space for he bigges es mazes is roughly 3 imes larger han for he bigges raining mazes. or he localizaion, we used a locaion cell granulariy 3 imes finer han he maze cells, which resuls in a oal of N =63x63=3969 discree locaion saes on he bigges Seps needed 4,000 2, Maze widh igure 5: All he resuls of he (a mos 100) successful ess for each maze size. Every single es is represened by an x. The line connecs he arihmeic averages of each maze size. The disance beween origin and arge grows linearly wih maze size, as does he number of seps. 21x21 mazes. We rain our agen saring on small mazes and increase he maze sizes as he agen ges beer. More specifically we use 16 asynchronous agen raining hreads from which we sar 8 on he smalles (5x5) raining mazes while he oher raining hreads are sared 2 each on he oher sizes (7x7, 9x9, 11x11 and 13x13). This prevens he visible local map nework from overfiing on he small 5x5 mazes. The hread agens are placed ino a randomly sampled maze of heir currenly associaed maze size and ry o find he exi, while couning heir seps. A sep is one ineracion wih he environmen, i.e., sampling an acion from he agens policy π and receiving he corresponding nex visual inpu, discreized angle and exrinsic reward from he environmen. A sep is no he same as a locaion or maze grid cell; as agens accelerae, here is no direc correlaion beween seps and acual walked disance. We consider each sampled maze an episode sar. The episode ends successfully if he agen manages o find he arge and he seps needed are sored. If he agen does no find he exi in 4500 seps, he episode ends as no successful. Afer an episode ends, a new episode is sared, i.e., a new maze is sampled. Noe ha in his seing he agen is always placed in a newly sampled maze and no in he same maze as in (Jaderberg e al. 2016) and (Mirowski e al. 2016). or each hread we calculae a moving average of seps needed o end he episodes. Once his moving average falls below a maze size specific hreshold, he hread is ransferred o rain on mazes of he nex bigger size. Once a hread s moving average of seps needed in he bigges raining mazes (13x13) falls below he hreshold, he hread is sopped and is raining is considered successful. Once all hreads reach his sage, he overall raining is considered successful and he agen is fully rained. We calculae he moving average over he las 50 episodes and use 60, 100, 140, 180 and 220 seps as hreshold for he maze sizes 5x5, 7x7, 9x9, 11x11 and 13x13, respecively. igure 4 shows he raining performance of 8 acor hreads. One can see ha he agens someimes overfi heir policies which resuls in emporarily decreased performance even hough he maze size did no increase. In he end however, all hreads reach good performance. The rained agen is esed on he 900 es se mazes, he
7 Maze size Targes found 5x5 100% 7x7 100% 9x9 100% 11x11 99% 13x13 99% 15x15 98% 17x17 93% 19x19 93% 21x21 91% Table 1: Percenage of arges found in he es mazes. Up o size 9x9 he agen always finds he arge. More ineresingly, he agen is able o find more han 90% of he arges in mazes ha are bigger han any maze i has seen during raining Seps needed igure 6: Example rajecories walked by he agen. Noe ha he agen walks close o he shores pah and is coninuous localizaion and planning les he agen find he pah o he arge even afer i ook a wrong urn igure 8: our example frames o illusrae he ypical behavior of he agen: The red line is he race of is acual posiion, while he shades of blue represen is posiion esimae. The darker he blue, he more confiden he agen is o be in his locaion. rame 1 shows he agen s rue saring posiion as a red do, frame 2 shows several similar locaions idenified afer a bi of urning, in frame 3 he agen sars o undersand he rue locaion, and in frame 4 i has moved. i.e., i coninuously needs o look around o know where i is. or he localizaion in he beginning of an episode, he agen also mainly relies on urning as can be seen in four example frames in igure Maze widh igure 7: omparison of our agen (blue lines) o an agen ha has perfec posiion informaion and an opimal shor erm arge direcion inpu (red lines). The solid lines coun all seps (urns and moves). The solid blue line is he same as he average line of igure 5. The dashed lines do no coun he seps in which he agen urns. The figure shows ha he overhead is mosly because of urning, as our agen needs o look around o localize iself. number of required seps per maze size are ploed in igure 5. We sop a es afer 4,500 seps, bu even for he bigges es mazes (21x21) he agen found more han 90% of he arges wihin hese 4,500 seps. See Table 1 for he percenage of exis found in all maze sizes. If he agen finds he exi i does so in almos shores pah manner, as can be seen in igure 6. However, he agen needs a considerable number of seps o localize iself. To evaluae his localizaion overhead, we rained an agen consising solely of he reacive agen module wih access o he perfec locaion and opimal shor erm arge direcion and ploed is average performance on he es se in igure 7. The figure shows a large gap beween he full agen and he agen wih access o he perfec posiion. This is due o urning acions, which he full agen performs o localize iself, onclusion We have presened a deep reinforcemen learning agen ha can localize iself on a 2D map based on observaions of is 3D surroundings. The agen manages o find he exi in mazes wih high success rae, even in mazes subsanially larger han i has ever seen during raining. The agen ofen finds he shores pah, showing ha he agen can coninuously reain a good localizaion. The archiecure of our sysem is buil in a modular fashion. Each module deals wih a subask of he maze problem and is rained in isolaion. This modulariy allows for a srucured archiecure design, where a complex ask is broken down ino subasks, and each subask is hen solved by a module. Modules consis of general archiecures, e.g., MLPs, or more ask-specific neworks such as our recurren localizaion cell. I is also possible o use deerminisic algorihm modules, such as in our shores pah planning module. Archiecure design is aided by he possibiliy o easily replace each module by ground ruh values, if available, o find sources of bad performance. Our agen is designed for a specific ask. We plan o make our modular archiecure more general and apply i o oher asks, such as playing 3D games. Since modules can be swapped ou and arranged differenly, i would be ineresing o equip an agen wih many modules and le i learn which module o use in which siuaion. Acknowledgmens We would like o hank he anonymous reviewers for heir helpful commens.
8 References Baro, A. G.; Suon, R. S.; and Anderson,. W Neuronlike adapive elemens ha can solve difficul learning conrol problems. IEEE Trans. Sysems, Man, and yberneics 13(5): Beaie,.; Leibo, J. Z.; Teplyashin, D.; Ward, T.; Wainwrigh, M.; Küler, H.; Lefrancq, A.; Green, S.; Valdés, V.; Sadik, A.; Schriwieser, J.; Anderson, K.; York, S.; an, M.; ain, A.; Bolon, A.; Gaffney, S.; King, H.; Hassabis, D.; Legg, S.; and Peersen, S Deepmind lab. orr abs/ Bhai, S.; Desmaison, A.; Miksik, O.; Nardelli, N.; Siddharh, N.; and Torr, P. H. S Playing doom wih slam-augmened deep reinforcemen learning. orr abs/ Elfes, A Using occupancy grids for mobile robo percepion and navigaion. IEEE ompuer 22(6): uenes-pacheco, J.; Ascencio, J. R.; and Rendón-Mancha, J. M Visual simulaneous localizaion and mapping: a survey. Arif. Inell. Rev. 43(1): Gullapalli, V A sochasic reinforcemen learning algorihm for learning real-valued funcions. Neural Neworks 3(6): Gupa, S.; Davidson, J.; Levine, S.; Sukhankar, R.; and Malik, J ogniive mapping and planning for visual navigaion. orr abs/ Jaderberg, M.; Mnih, V.; zarnecki, W. M.; Schaul, T.; Leibo, J. Z.; Silver, D.; and Kavukcuoglu, K Reinforcemen learning wih unsupervised auxiliary asks. orr abs/ Kaelbling, L. P.; Liman, M. L.; and Moore, A. W Reinforcemen learning: A survey. J. Arif. Inell. Res. 4: Kempka, M.; Wydmuch, M.; Runc, G.; Toczek, J.; and Jaskowski, W Vizdoom: A doom-based AI research plaform for visual reinforcemen learning. In IEEE onference on ompuaional Inelligence and Games, IG 2016, Sanorini, Greece, Sepember 20-23, 2016, 1 8. Kulkarni, T. D.; Narasimhan, K.; Saeedi, A.; and Tenenbaum, J Hierarchical deep reinforcemen learning: Inegraing emporal absracion and inrinsic moivaion. In Advances in Neural Informaion Processing Sysems 29: Annual onference on Neural Informaion Processing Sysems 2016, December 5-10, 2016, Barcelona, Spain, Minsky, M. L Theory of neural-analog reinforcemen sysems and is applicaion o he brain model problem. Princeon Universiy. Mirowski, P.; Pascanu, R.; Viola,.; Soyer, H.; Ballard, A. J.; Banino, A.; Denil, M.; Goroshin, R.; Sifre, L.; Kavukcuoglu, K.; Kumaran, D.; and Hadsell, R Learning o navigae in complex environmens. orr abs/ Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M. A.; idjeland, A.; Osrovski, G.; Peersen, S.; Beaie,.; Sadik, A.; Anonoglou, I.; King, H.; Kumaran, D.; Wiersra, D.; Legg, S.; and Hassabis, D Human-level conrol hrough deep reinforcemen learning. Naure 518(7540): Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T. P.; Harley, T.; Silver, D.; and Kavukcuoglu, K Asynchronous mehods for deep reinforcemen learning. orr abs/ Peers, J., and Schaal, S Reinforcemen learning of moor skills wih policy gradiens. Neural Neworks 21(4): Rumelhar, D. E.; Hinon, G. E.; Williams, R. J.; e al Learning represenaions by back-propagaing errors. ogniive modeling 5(3):1. Schaul, T.; Quan, J.; Anonoglou, I.; and Silver, D Prioriized experience replay. orr abs/ Suon, R. S., and Baro, A. G Reinforcemen learning - an inroducion. Adapive compuaion and machine learning. MIT Press. Suon, R. S.; Precup, D.; and Singh, S. P Beween mdps and semi-mdps: A framework for emporal absracion in reinforcemen learning. Arif. Inell. 112(1-2): Suon, R. S Temporal credi assignmen in reinforcemen learning. Suon, R. S Learning o predic by he mehods of emporal differences. Machine Learning 3:9 44. Szepesvári, Algorihms for Reinforcemen Learning. Synhesis Lecures on Arificial Inelligence and Machine Learning. Morgan & laypool Publishers. Thrun, S.; Burgard, W.; and ox, D Probabilisic roboics. MIT press. van Hassel, H.; Guez, A.; and Silver, D Deep reinforcemen learning wih double q-learning. In Proceedings of he Thirieh AAAI onference on Arificial Inelligence, ebruary 12-17, 2016, Phoenix, Arizona, USA., Wang, Z.; Schaul, T.; Hessel, M.; van Hassel, H.; Lanco, M.; and de reias, N Dueling nework archiecures for deep reinforcemen learning. In Proceedings of he 33nd Inernaional onference on Machine Learning, IML 2016, New York iy, NY, USA, June 19-24, 2016, Wakins,. J., and Dayan, P Q-learning. Machine learning 8(3-4): Wakins,. J.. H Learning from delayed rewards. Ph.D. Disseraion, King s ollege, ambridge. Williams, R. J Simple saisical gradien-following algorihms for connecionis reinforcemen learning. Machine Learning 8: Zhu, Y.; Moaghi, R.; Kolve, E.; Lim, J. J.; Gupa, A.; ei- ei, L.; and arhadi, A Targe-driven visual navigaion in indoor scenes using deep reinforcemen learning. In 2017 IEEE Inernaional onference on Roboics and Auomaion, IRA 2017, Singapore, Singapore, May 29 - June 3, 2017,
Neural Network Model of the Backpropagation Algorithm
Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák
More informationAn Effiecient Approach for Resource Auto-Scaling in Cloud Environments
Inernaional Journal of Elecrical and Compuer Engineering (IJECE) Vol. 6, No. 5, Ocober 2016, pp. 2415~2424 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i5.10639 2415 An Effiecien Approach for Resource Auo-Scaling
More informationMore Accurate Question Answering on Freebase
More Accurae Quesion Answering on Freebase Hannah Bas, Elmar Haussmann Deparmen of Compuer Science Universiy of Freiburg 79110 Freiburg, Germany {bas, haussmann}@informaik.uni-freiburg.de ABSTRACT Real-world
More informationFast Multi-task Learning for Query Spelling Correction
Fas Muli-ask Learning for Query Spelling Correcion Xu Sun Dep. of Saisical Science Cornell Universiy Ihaca, NY 14853 xusun@cornell.edu Anshumali Shrivasava Dep. of Compuer Science Cornell Universiy Ihaca,
More informationChannel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices
Z. Zhang e al.: Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion in Hands-Free Voice Conrolled Devices 525 Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion
More informationInformation Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports
Downloaded from ascelibrary.org by Basil Sephanis on 07/13/16. Copyrigh ASCE. For personal use only; all righs reserved. Informaion Propagaion for informing Special Populaion Subgroups abou New Ground
More information1 Language universals
AS LX 500 Topics: Language Uniersals Fall 2010, Sepember 21 4a. Anisymmery 1 Language uniersals Subjec-erb agreemen and order Bach (1971) discusses wh-quesions across SO and SO languages, hypohesizing:...
More informationMyLab & Mastering Business
MyLab & Masering Business Efficacy Repor 2013 MyLab & Masering: Business Efficacy Repor 2013 Edied by Michelle D. Speckler 2013 Pearson MyAccouningLab, MyEconLab, MyFinanceLab, MyMarkeingLab, and MyOMLab
More informationAI Agent for Ice Hockey Atari 2600
AI Agent for Ice Hockey Atari 2600 Emman Kabaghe (emmank@stanford.edu) Rajarshi Roy (rroy@stanford.edu) 1 Introduction In the reinforcement learning (RL) problem an agent autonomously learns a behavior
More informationLEARNING TO PLAY IN A DAY: FASTER DEEP REIN-
LEARNING TO PLAY IN A DAY: FASTER DEEP REIN- FORCEMENT LEARNING BY OPTIMALITY TIGHTENING Frank S. He Department of Computer Science University of Illinois at Urbana-Champaign Zhejiang University frankheshibi@gmail.com
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationTransferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task
Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task Stephen James Dyson Robotics Lab Imperial College London slj12@ic.ac.uk Andrew J. Davison Dyson Robotics
More informationReinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationNotes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1
Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationExploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationACTIVITY: Comparing Combination Locks
5.4 Compound Events outcomes of one or more events? ow can you find the number of possible ACIVIY: Comparing Combination Locks Work with a partner. You are buying a combination lock. You have three choices.
More informationA Case-Based Approach To Imitation Learning in Robotic Agents
A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM
Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology
ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology Tiancheng Zhao CMU-LTI-16-006 Language Technologies Institute School of Computer Science Carnegie Mellon
More informationAbstractions and the Brain
Abstractions and the Brain Brian D. Josephson Department of Physics, University of Cambridge Cavendish Lab. Madingley Road Cambridge, UK. CB3 OHE bdj10@cam.ac.uk http://www.tcm.phy.cam.ac.uk/~bdj10 ABSTRACT
More informationEDIT 576 DL1 (2 credits) Mobile Learning and Applications Fall Semester 2014 August 25 October 12, 2014 Fully Online Course
GEORGE MASON UNIVERSITY COLLEGE OF EDUCATION AND HUMAN DEVELOPMENT GRADUATE SCHOOL OF EDUCATION INSTRUCTIONAL DESIGN AND TECHNOLOGY PROGRAM EDIT 576 DL1 (2 credits) Mobile Learning and Applications Fall
More informationarxiv: v1 [cs.dc] 19 May 2017
Atari games and Intel processors Robert Adamski, Tomasz Grel, Maciej Klimek and Henryk Michalewski arxiv:1705.06936v1 [cs.dc] 19 May 2017 Intel, deepsense.io, University of Warsaw Robert.Adamski@intel.com,
More informationKnowledge Transfer in Deep Convolutional Neural Nets
Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract
More informationImproving Action Selection in MDP s via Knowledge Transfer
In Proc. 20th National Conference on Artificial Intelligence (AAAI-05), July 9 13, 2005, Pittsburgh, USA. Improving Action Selection in MDP s via Knowledge Transfer Alexander A. Sherstov and Peter Stone
More informationEDIT 576 (2 credits) Mobile Learning and Applications Fall Semester 2015 August 31 October 18, 2015 Fully Online Course
GEORGE MASON UNIVERSITY COLLEGE OF EDUCATION AND HUMAN DEVELOPMENT INSTRUCTIONAL DESIGN AND TECHNOLOGY PROGRAM EDIT 576 (2 credits) Mobile Learning and Applications Fall Semester 2015 August 31 October
More informationHuman Factors Computer Based Training in Air Traffic Control
Paper presented at Ninth International Symposium on Aviation Psychology, Columbus, Ohio, USA, April 28th to May 1st 1997. Human Factors Computer Based Training in Air Traffic Control A. Bellorini 1, P.
More information16.1 Lesson: Putting it into practice - isikhnas
BAB 16 Module: Using QGIS in animal health The purpose of this module is to show how QGIS can be used to assist in animal health scenarios. In order to do this, you will have needed to study, and be familiar
More informationOn the Combined Behavior of Autonomous Resource Management Agents
On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science
More informationContinual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots
Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Varun Raj Kompella, Marijn Stollenga, Matthew Luciw, Juergen Schmidhuber The Swiss AI Lab IDSIA, USI
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationLecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationCopyright Corwin 2015
2 Defining Essential Learnings How do I find clarity in a sea of standards? For students truly to be able to take responsibility for their learning, both teacher and students need to be very clear about
More informationCONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS
CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS Pirjo Moen Department of Computer Science P.O. Box 68 FI-00014 University of Helsinki pirjo.moen@cs.helsinki.fi http://www.cs.helsinki.fi/pirjo.moen
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationFirst Grade Standards
These are the standards for what is taught throughout the year in First Grade. It is the expectation that these skills will be reinforced after they have been taught. Mathematical Practice Standards Taught
More informationUsing Deep Convolutional Neural Networks in Monte Carlo Tree Search
Using Deep Convolutional Neural Networks in Monte Carlo Tree Search Tobias Graf (B) and Marco Platzner University of Paderborn, Paderborn, Germany tobiasg@mail.upb.de, platzner@upb.de Abstract. Deep Convolutional
More informationPaper Reference. Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier. Monday 6 June 2011 Afternoon Time: 1 hour 30 minutes
Centre No. Candidate No. Paper Reference 1 3 8 0 1 F Paper Reference(s) 1380/1F Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier Monday 6 June 2011 Afternoon Time: 1 hour
More informationA Reinforcement Learning Variant for Control Scheduling
A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement
More informationTD(λ) and Q-Learning Based Ludo Players
TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability
More informationEnd-of-Module Assessment Task
Student Name Date 1 Date 2 Date 3 Topic E: Decompositions of 9 and 10 into Number Pairs Topic E Rubric Score: Time Elapsed: Topic F Topic G Topic H Materials: (S) Personal white board, number bond mat,
More informationFUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria
FUZZY EXPERT SYSTEMS 16-18 18 February 2002 University of Damascus-Syria Dr. Kasim M. Al-Aubidy Computer Eng. Dept. Philadelphia University What is Expert Systems? ES are computer programs that emulate
More informationBackwards Numbers: A Study of Place Value. Catherine Perez
Backwards Numbers: A Study of Place Value Catherine Perez Introduction I was reaching for my daily math sheet that my school has elected to use and in big bold letters in a box it said: TO ADD NUMBERS
More informationEmergent Narrative As A Novel Framework For Massively Collaborative Authoring
Emergent Narrative As A Novel Framework For Massively Collaborative Authoring Michael Kriegel and Ruth Aylett School of Mathematical and Computer Sciences, Heriot Watt University, Edinburgh, EH14 4AS,
More informationCapturing and Organizing Prior Student Learning with the OCW Backpack
Capturing and Organizing Prior Student Learning with the OCW Backpack Brian Ouellette,* Elena Gitin,** Justin Prost,*** Peter Smith**** * Vice President, KNEXT, Kaplan University Group ** Senior Research
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationPREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES
PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,
More informationChallenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley
Challenges in Deep Reinforcement Learning Sergey Levine UC Berkeley Discuss some recent work in deep reinforcement learning Present a few major challenges Show some of our recent work toward tackling
More informationAn Introduction to Simio for Beginners
An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality
More informationVisual CP Representation of Knowledge
Visual CP Representation of Knowledge Heather D. Pfeiffer and Roger T. Hartley Department of Computer Science New Mexico State University Las Cruces, NM 88003-8001, USA email: hdp@cs.nmsu.edu and rth@cs.nmsu.edu
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationArizona s College and Career Ready Standards Mathematics
Arizona s College and Career Ready Mathematics Mathematical Practices Explanations and Examples First Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS State Board Approved June
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationTraining a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski
Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer
More informationCLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH
ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department
More informationPedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers
Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers Monica Baker University of Melbourne mbaker@huntingtower.vic.edu.au Helen Chick University of Melbourne h.chick@unimelb.edu.au
More informationUsing EEG to Improve Massive Open Online Courses Feedback Interaction
Using EEG to Improve Massive Open Online Courses Feedback Interaction Haohan Wang, Yiwei Li, Xiaobo Hu, Yucong Yang, Zhu Meng, Kai-min Chang Language Technologies Institute School of Computer Science Carnegie
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationAttributed Social Network Embedding
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding
More information*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN
From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,
More informationCourse Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE
EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationMathematics subject curriculum
Mathematics subject curriculum Dette er ei omsetjing av den fastsette læreplanteksten. Læreplanen er fastsett på Nynorsk Established as a Regulation by the Ministry of Education and Research on 24 June
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationDeep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach
#BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying
More information8. UTILIZATION OF SCHOOL FACILITIES
8. UTILIZATION OF SCHOOL FACILITIES Page 105 Page 106 8. UTILIZATION OF SCHOOL FACILITIES OVERVIEW The capacity of a school facility is driven by the number of classrooms or other spaces in which children
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF
Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download
More informationGenerative models and adversarial training
Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?
More informationSchool Year 2017/18. DDS MySped Application SPECIAL EDUCATION. Training Guide
SPECIAL EDUCATION School Year 2017/18 DDS MySped Application SPECIAL EDUCATION Training Guide Revision: July, 2017 Table of Contents DDS Student Application Key Concepts and Understanding... 3 Access to
More informationSTT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.
STT 231 Test 1 Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point. 1. A professor has kept records on grades that students have earned in his class. If he
More informationGuide to the Uniform mark scale (UMS) Uniform marks in A-level and GCSE exams
Guide to the Uniform mark scale (UMS) Uniform marks in A-level and GCSE exams This booklet explains why the Uniform mark scale (UMS) is necessary and how it works. It is intended for exams officers and
More informationLEGO MINDSTORMS Education EV3 Coding Activities
LEGO MINDSTORMS Education EV3 Coding Activities s t e e h s k r o W t n e d Stu LEGOeducation.com/MINDSTORMS Contents ACTIVITY 1 Performing a Three Point Turn 3-6 ACTIVITY 2 Written Instructions for a
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationTeam Work in International Programs: Why is it so difficult?
Team Work in International Programs: Why is it so difficult? & Henning Madsen Aarhus University Denmark SoTL COMMONS CONFERENCE Karen M. Savannah, Lauridsen GA Centre for Teaching and March Learning 2013
More informationEvaluation of Learning Management System software. Part II of LMS Evaluation
Version DRAFT 1.0 Evaluation of Learning Management System software Author: Richard Wyles Date: 1 August 2003 Part II of LMS Evaluation Open Source e-learning Environment and Community Platform Project
More informationChamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform
Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform doi:10.3991/ijac.v3i3.1364 Jean-Marie Maes University College Ghent, Ghent, Belgium Abstract Dokeos used to be one of
More informationWhen!Identifying!Contributors!is!Costly:!An! Experiment!on!Public!Goods!
!! EVIDENCE-BASED RESEARCH ON CHARITABLE GIVING SPI$FUNDED$ When!Identifying!Contributors!is!Costly:!An! Experiment!on!Public!Goods! Anya!Samek,!Roman!M.!Sheremeta!! University!of!WisconsinFMadison! Case!Western!Reserve!University!&!Chapman!University!!
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationDirect and Indirect Passives in East Asian. C.-T. James Huang Harvard University
Direct and Indirect Passives in East Asian C.-T. James Huang Harvard University 8.20-22.2002 I. Direct and Indirect Passives (1) Direct (as in 2a) Passive Inclusive (as in 2b) Indirect Exclusive (Adversative,
More informationSpeech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines
Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,
More informationData Integration through Clustering and Finding Statistical Relations - Validation of Approach
Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationA Review: Speech Recognition with Deep Learning Methods
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.1017
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationTHE VIRTUAL WELDING REVOLUTION HAS ARRIVED... AND IT S ON THE MOVE!
THE VIRTUAL WELDING REVOLUTION HAS ARRIVED... AND IT S ON THE MOVE! VRTEX 2 The Lincoln Electric Company MANUFACTURING S WORKFORCE CHALLENGE Anyone who interfaces with the manufacturing sector knows this
More informationOhio s Learning Standards-Clear Learning Targets
Ohio s Learning Standards-Clear Learning Targets Math Grade 1 Use addition and subtraction within 20 to solve word problems involving situations of 1.OA.1 adding to, taking from, putting together, taking
More informationMathematics Success Level E
T403 [OBJECTIVE] The student will generate two patterns given two rules and identify the relationship between corresponding terms, generate ordered pairs, and graph the ordered pairs on a coordinate plane.
More information