Reducing state space exploration in reinforcement learning problems by rapid identification of initial solutions and progressive improvement of them

Size: px
Start display at page:

Download "Reducing state space exploration in reinforcement learning problems by rapid identification of initial solutions and progressive improvement of them"

Transcription

1 Reducig state space exploratio i reiforcemet learig problems by rapid idetificatio of iitial solutios ad progressive improvemet of them Kary FRÄMLING Departmet of Computer Sciece Helsiki Uiversity of Techology P.O. Box 5400, FIN HUT FINLAND Abstract: Most existig reiforcemet learig methods require exhaustive state space exploratio before covergig towards a problem solutio. Various geeralizatio techiques have bee used to reduce the eed for exhaustive exploratio, but for problems like maze route fidig these techiques are ot easily applicable. This paper presets a approach that makes it possible to reduce the eed for state space exploratio by rapidly idetifyig a "usable" solutio. Cocepts of short- ad log term workig memory the make it possible to cotiue explorig ad fid better or optimal solutios. Key-Words: Reiforcemet learig, Trajectory samplig, Temporal differece, Workig memory, Maze route fidig 1 Itroductio The work preseted i this paper started from the idea to develop a artificial eural et (ANN) model that would do problem solvig ad learig i similar ways as humas ad aimals do. The model would also correspod to some very rough-level ideas ad kowledge about how the brai operates, i.e. activatios ad coectios betwee differet areas of the brai ad otios of short- ad log term workig memory. Aimal problem solvig maily seems to be based o trial ad learig. The success or failure of a trial modifies behavior i the "right" directio after some umber of trials, where "some umber" is i the rage oe (e.g. learig how to tur o the radio from "power" butto) to ifiity (e.g. learig how to grab thigs, which is a life-log adaptatio procedure). Such behavior is curretly studied maily i the scietific research area called reiforcemet learig (RL). RL methods have bee successfully applied to may problems where more "covetioal" methods are difficult to use due to factors like lackig data about the eviromet, which forces the eural et to explore its eviromet ad lear iteractively. Explorig is a procedure where the aget (see for istace [2] for a discussio o the meaig of the term aget ) has to take actios without a priori kowledge about how good or bad the actio is, which may be kow oly much later whe the goal is reached or whe the task failed. The RL problem used i this paper, maze route fidig, is commoly used i psychological studies of aimal learig ad behavior [3]. Aimals have to explore the maze ad costruct a iteral model of the maze i order to reach the goal. The more maze rus the aimal performs, the quicker it goes to the goal sice solutios get better memorized. Maze route fidig is ot a very complicated problem to solve with may existig methods as poited out i sectio 2 of this paper, where the problem setup is explaied. Therefore, the ANN solutio preseted i sectio 3 is ot uique by beig the first oe able to solve the problem. It does, however, solve the problem i a ew way, which eeds sigificatly less iitial exploratio tha existig methods. Iitial exploratio rus are maily shorteed by the SLAP (Set Lowest Actio Priority) reiforcemet preseted. Idetified solutios may be further reiforced by temporal differece (TD) methods [5]. Eve though TD learig ca be used as a elemet of the methods described i the paper, the methods preseted here do ot use classical otios of value fuctios for idicatig how good a state or a actio are for reachig the goal. Istead otios of short term workig memory ad log term workig memory are used for selectig appropriate actios at each state. Short term workig memory exists oly durig the problem solvig, while previous problem-solvig istaces are stored i log term workig memory. This memory orgaizatio gives ew possibilities to balace betwee exploratio of a ew eviromet ad/or ew solutios o oe had, ad exploitatio of existig kowledge o the other had. 2 Problem Formulatio Sutto ad Barto use a maze like the oe i Fig 1 i chapter 9 of their 1998 book [7]. The discussio that follows o the advatages ad disadvatages of existig RL methods is pricipally based o this book cocerig symbols, equatios ad method descriptios. A aget is positioed at the startig poit iside the maze ad has to fid a route to the goal poit. Each maze positio correspods to a state, where the aget selects oe

2 actio of four, i.e. goig orth, south, east or west, uless some of these are ot possible. Each state is uiquely idetified i a table-lookup maer. Iitially, the aget has o prior kowledge about the maze, so it has o idea of what actio should be take at differet states. Therefore it chooses a actio radomly the first time it comes to a previously uvisited state, without kowig if it is a good oe or ot. If it was a bad oe, the aget eds up i a dead ed ad has to walk back ad try aother directio. Comig back to a state already visited is also a bad sig sice the aget is walkig aroud i circles. a b Fig 1. Grid problem. a) Aget is show i start positio ad goal positio i upper right corer. b) Oe of the optimal solutios. 2.1 Symbolic methods This maze route fidig problem is easy to solve with a classical depth-first search through a iferece tree represetig all possible solutios [1]. The root of the tree is the startig state. The root has liks to all ext states that ca be reached from it, which agai have liks to all their ext states. The iferece tree ca be recursively costructed where leafs of the three are idicated by oe of three cases: 1. Goal reached. 2. Dead-ed, i.e. state with o ext-states. 3. Circuit detected, i.e. comig back to a previously ecoutered state. Depth-first search ca explore the tree util a solutio is foud, which ca be memorized. If the goal is to fid the optimal path, breadth-first search [1] or complete exploratio of the whole tree ca be used. Depth-first ad breadth-first search become ufeasible whe the search tree grows bigger due to a great umber of states or a great umber of liks (actios). Heuristic approaches are ofte used to overcome these problems. They make it possible to cocetrate oly o "iterestig" parts of the search tree by associatig umerical values with each tree ode or each lik i the search tree, which idicate the "goodess" of that ode or that lik. Heuristic values ca be give directly, calculated or obtaied by learig. Reiforcemet learig is oe way of learig these values. 2.2 Reiforcemet learig priciples I RL, heuristic estimates correspod to the otio of value fuctios, which are either state-values (i.e. value of a state i the search tree) or actio-values (i.e. value of a lik/actio i the search tree). I the maze problem, value fuctios should be adjusted so that "good" actios, i.e. those leadig to the goal as quickly as possible are selected. Oe possible RL approach to the maze problem usig state values would be to radomly select actios util the goal is reached, which forms oe episode. Durig the episode, a reward of 1 is give for all state trasitios except the oe leadig to the goal state. The the value of a state s S (set of possible states) for a give episode ca be defied formally as π ( ) = k V s Eπ γ rt + k+ 1 st = s, (1) k = 0 where V π (s) is the state value that correspods to the expected retur whe startig i s ad followig policy π thereafter [7]. A policy is the "rule" beig used for selectig actios, which ca be radom selectio as assumed here or some other rule. So, for Markov Decisio Processes (MDP), E π {} deotes the expected value give that the aget follows policy π. The value of the termial state, if ay, is always zero. γ k is a discoutig factor that is less tha or equal to oe ad determies to what degree future rewards affect the value of state s. Whe the umber of episodes usig radom policy approaches ifiity, the average state value over all episodes coverges to the actual state-value for policy π. Oce state values have coverged to correct values, states which are "closer" to the goal will have higher state values tha states that are further away. If the policy is the chaged to greedy exploratio, i.e. always takig the actio that leads to the ext state with the highest state value, the the aget will automatically follow the optimal path. Ufortuately, radom iitial exploratio is too time cosumig to be useful i practical problems. The usual way to treat this case is to use ε-greedy exploratio, where actios are selected greedily with probability (1 - ε), while radom actio selectio is used with probability ε. Aother versio of ε-greedy exploratio called softmax is sometimes used. Softmax selects actios leadig to high state values with a higher probability tha actios leadig to low state values, istead of usig radom actio selectio. Whe ε-greedy exploratio ad 1 reward o every state trasitio is used for the grid world of Fig 1, all state values ca be iitialized to 0 or small radom values. Durig exploratio, states that have ot bee visited or that have bee visited less tha others will have higher state values tha more frequetly visited oes. Therefore ε-greedy exploratio will by defiitio ted to exhaustively explore the whole state space, so iitial episodes are very log. Covergece towards correct state values also requires a great umber of episodes, so this approach is ot usable for bigger problems.

3 2.3 Mote-Carlo methods Aother possibility is to oly give positive reward at the ed of a episode ad zero reward for all itermediate trasitios. Mote-Carlo Policy Evaluatio [7] is oe possibility for propagatig the reward backwards through the state history of oe episode. If a reward of +1 is give for reachig the goal, +1 is added to the "retur values" of all states appearig i the episode. The state-value of a state is the the average retur value over all episodes. Usig ε- greedy exploratio, state values evetually coverge to the optimal policy, eve though guarateed covergece has ot yet bee formally proved accordig to [7]. For the maze problem used i this paper, geeratig episodes usig a radom policy requires a average of 1700 steps. Eve with TD methods studied i the ext sectio, early 30 episodes is required before covergece towards a solutio occurs, so for Mote-Carlo simulatio the umber of episodes eeded is probably over 100. This would mea over steps, which is very slow compared to all other methods treated later i this paper. 2.4 Temporal differece learig ad TD(λ) Mote-Carlo policy evaluatio requires successfully completed episodes i order to lear. Therefore it quickly becomes too slow i order to be usable for most applicatios sice it might require a very big umber of episodes before startig to select better actios tha usig a radom policy. Solvig this problem is oe of the mai issues i so called bootstrappig methods, like those based o temporal-differece (TD) learig [5]. Bootstrappig sigifies that state- or actio value updates occur at every state trasitio based o actual reward, but also o the differece betwee the curret state value ad the state value of the ext state accordig to: V ( st ) V ( st ) + α [ rt γv ( st+ 1) V ( st )] (2), which is kow as the TD(0) method. The more advaced TD(λ) algorithm, of which TD(0) is a istace, is curretly the most used bootstrappig method. TD(λ) uses a otio of eligibility trace, which λ refers to. A eligibility trace sigifies usig the state/actio history of each episode for propagatig rewards backwards, just like i Mote-Carlo methods. Associatig a eligibility trace value with each state, which is usually icreased by oe (accumulatig eligibility trace) every time the state is ecoutered durig a episode, creates the trace. λ is a trace decay parameter, which together with γ determies how fast the eligibility trace disappears for each state. For a accumulatig eligibility trace, a state s eligibility trace value e t (s) at time t is calculated by: γλet 1 ( s) if s st et ( s) = (3) γλet 1 ( s) + 1 if s = st Experiece has show that TD methods geerally coverge much faster to the optimal solutio tha do Mote-Carlo methods [7]. Usig a model of the eviromet that is costructed durig exploratio, ca further accelerate covergece as for Dya agets [6]. I Dya agets, the model memorizes which states are reached by what actios for each state/actio pair ecoutered, so TD learig ca be used for updatig value fuctios both durig iteractio with the eviromet ad without iteractio with the eviromet. For the maze i Fig 1, Sutto ad Barto have compared covergece times betwee direct reiforcemet learig ad Dya aget learig [7]. Sice both of these use radom exploratio o the first ru, the first episode lasted for about 1700 steps. Direct RL eeded about 30 episodes before covergig to the optimal path of 14 steps, while the best Dya aget foud it after about five episodes. However, both methods stay i eteral oscillatio betwee 14 steps ad 16 steps due to ε-greedy exploratio that regularly puts the aget off the optimal path. The mai shortage of these techiques is that they have a very log first exploratio ru, durig which they go through most states umerous times (54 states ad 1700 steps => ~32 visits per state). For a simple maze like the oe i Fig 1 this is ot a big problem, but the legth of the iitial exploratio ru ca be expected to grow expoetially as the umber of states icreases. These log exploratio rus are due to the eed of curret methods to first explore the whole state space i order to coverge towards a optimal solutio. Exploratio of the etire state space is impossible to use i most practical applicatios reported like backgammo [8], which has approximately states. However, may states correspod to similar game situatios, for which similar moves are appropriate. Therefore learig results for oe state ca be applied to umerous other states too if there is a way to idetify similar states based o state descriptios istead of treatig each state as a separate case. May artificial eural etworks are capable of such geeralizatio, where actios leared for oe state descriptio are automatically applied to similar states eve though these states would ever have bee ecoutered before. Also, i a game like backgammo, most states have a very small or zero probability of occurrig i a real game, so they do ot eed to be leared. However, i a maze problem this approach does ot seem to be applicable sice there are o geeral rules that could be leared based o a geeral descriptio of possible states. There are 16 differet states depedig o possible directios, but there is o geerally applicable rule for what actio is appropriate for each type of state, so the problem of excessively log iitial exploratio remais. The solutio proposed i this paper rapidly fids ad memorizes at least oe usable solutio usig miimal exploratio efforts ad the explores towards the optimal solutio.

4 3 Problem Solutio Oe of the iitial ideas of the work preseted here was to maitai a lik with aimal ad huma problem solvig ad the brai. This is why the reiforcemet learig methods preseted here use a artificial eural et (ANN) model eve though they could probably also be implemeted i other ways. I this "brai ispired" ANN, euros are either stimulus or actio euros, which seems more appropriate tha speakig about iputs ad outputs of the eural et. I the maze solvig problem, each state correspods to oe stimulus euro ad each possible actio to oe actio euro. Whe the ANN aget eters a completely ukow maze, it oly has four actio euros which correspod to the four possible actios, but it has o stimulus euros. Stimulus euros are created ad coected to actio euros for every state ecoutered for the first time durig a episode. Whe a ew stimulus euro is created, the weights of its coectios to actio euros are iitialized to small radom values for istace i the iterval ]0,1]. Sice these stimuli ad their coectio weights are created durig oe episode ad exist oly util the episode is fiished, they are here called short term workig memory. Oce a episode is fiished, both short term stimuli ad coectio weights ca be copied as istaces i log term workig memory. 3.1 ANN architecture The purpose of log term workig memory is to be able to solve the same problem more efficietly i the future. Whe a stimulus is activated i short term workig memory, we ca suppose that the correspodig stimulus istaces i log term workig memory are also activated to a certai degree. Sice log term workig memory istaces are coected to actio euros, they affect what actio is selected. Actios are selected accordig to the wier-takesall priciple, where the actio euro with the biggest activatio value wis. Activatio values of actio euros are calculated accordig to: stim ltm stim stwi, * si + * ltw j, i, i= 1 j= 1 i= 1 a = α * s (4) where a is the activatio value of actio euro, stw i, is the coectio weight from stimulus euro i to actio euro, s i is the curret activatio value of stimulus euro i, ltw j,i, is the coectio weight for log term workig memory istace j ad stimulus i to actio euro, stim is the umber of stimulus euros ad ltm is the umber of istaces i log term memory. α is a weightig parameter that adjusts to what degree stimulus activatios i short term workig memory cause activatio of correspodig stimuli i log term workig memory. α ca also be cosidered as a parameter that adjusts the ifluece of past experieces o actio selectio. Sice short term i workig memory coectio weights are always iitialized to radom values whe a state is ecoutered for the first time durig a episode, adjustig the α parameter offers a alterative to ε-greedy exploratio ad softmax for balacig betwee exploratio ad exploitatio. Equatio (4) ca be rewritte i the form a = stim stim ltm si stw, s + α ltw j, i, (5) i i i= 1 i= 1 j= 1, which shows that log term workig memory ca be implemeted as a vector of sums of stored coectio weights, which makes it possible to implemet the proposed model i a computatio- ad memory efficiet way. Oly two coectio weight matrices are eeded, oe for short term workig memory weights ad the other oe for log term workig memory weights. The short term workig memory matrix is of size (umber of actios)*(umber of states ecoutered durig curret episode). The log term workig memory matrix is of size (umber of actios)*(umber of states ever ecoutered). Straight matrix multiplicatio ad additio is eough to perform the eeded calculatios. 3.2 Search for "usable" iitial solutio Exploratio ad exploitatio happe simultaeously, which oe is predomiat depeds o the value of α ad o the umber of istaces i log term workig memory. Log term workig memory is iitially empty for a completely uexplored maze. Therefore actio selectio accordig to equatio (4) is radom the first time a ew state is ecoutered because the ew stimulus euro created i short term workig memory has radom iitial coectio weights. If a state already ecoutered durig the same episode is visited agai, it is either due to comig back from a dead ed or goig aroud i circles. I both cases it would be uwise to take the same actio as the previous time i the same state. I order to kow what actio was take the previous time, it is sufficiet to evaluate equatio (4) for the curret state ad see which actio wis. The wiig actio is puished accordig to the ew Set Lowest Actio Priority priciple, shortly SLAP. The "slapped" actio is puished by decreasig its weights eough to make it become the least activated amog possible actios the ext time we are i the same state. This is doe accordig to the formula: stw i, = ( a ami ) stwi, si a (6) where a is the activatio value of the slapped euro ad a mi is the ew activatio desired, obtaied by takig the lowest actio activatio amog possible actios (possible directios) ad subtractig a small ratio of it. Slappig is ot oly used for puishig actios that lead to dead eds ad circuits, slappig is also applied to the directio the aget comes from directly after eterig a

5 state. Otherwise the probability that the aget would go back i the same directio as it came from would be as high as takig a ew directio. The goal of SLAP is therefore maily to make the exploratio go to the goal as quickly as possible with miimal exploratio effort. Sutto ad Barto [7] call this priciple trajectory samplig ad show for a simple problem that this techique greatly reduces computatio time compared to exhaustive search, especially for problems with a great umber of states. For the sample ru i Fig 2, the first episode took oly 104 steps ad still directly gives the rather good solutio of 16 steps o the secod episode (14 is the optimal solutio). This result ca be cosidered excellet compared to the 1700 steps reported for TD ad Dya-Q i [7] for the first episode, ot to metio that they eed up to 30 episodes before reachig a 16-step solutio. The last actios used for each state are implicitly stored i short term workig memory weights by SLAP reiforcemets. Therefore a exploitatio ru that uses these weights will directly follow the shortest path discovered as i Fig 2b. This is also true if the aget starts from some other state ecoutered durig exploratio tha the iitial startig state as i Fig 2c. a b c Fig 2. a) First episode, 104 steps, b) secod episode, 16 steps, c) differet startig poit tha iitial oe, 13 steps. Oce a episode is fiished, short term workig memory weights ca be copied as a istace i log term workig memory. This ca either be doe directly or after a additioal reiforcemet has bee applied. This is implemeted by doig a "replay" of all stimuli activatios (states) ad rewardig the wiig actios by icreasig the value of the coectio weight betwee the stimulus ad the wiig actio by a fial reward value. Dispatchig the fial reward i this way is actually very similar to TD(λ) with γ = 1. The mai differece is that the eligibility trace is ot stored aywhere, it is recostructed istead. Eve though o formal proof is show here for the similarity with TD(λ), it ca still be assumed that TD(λ) methods could be used for propagatig the fial reward backwards as well. Therefore most existig experiece ad kowledge about TD methods could be applicable cocerig covergece, calculatio complexity etc. 3.2 Search for optimal solutio Sice oly a part of the state space is usually visited durig the iitial exploratio ad oly a part of the possible actios i differet states are used, the iitially idetified solutio has a high probability of beig sub optimal. This is also the case i Fig 2, where the optimal solutio would be 14 steps. However, the optimal solutio is very difficult to fid sice it has a much smaller probability of occurrig durig radom exploratio tha other solutios. At least two possibilities exist i order to fid the optimal solutio: 1. Lettig several eural et agets search for a solutio ad see which aget foud the best oe. 2. Usig low α ad/or low fial reward at the ed of episodes ad lettig the same aget do a great umber of exploratio rus. This could also be combied with ε-greedy ad softmax exploratio. The first possibility might seem to be rather wasteful, but sice iitial exploratio oly requires a average of 115 steps, there ca still be 15 agets explorig before reachig the 1700 steps used by the iitial ru with TD(λ) i [7]. Classical TD(λ) (without model as i Dya-Q) apparetly eeds far over exploratio steps before fidig a path requirig oly 16 steps, but the quite rapidly fids the optimal path with 14 steps. The experimetal probability of a aget usig radom policy to fid the optimal path is 0.007, which meas that it takes about 143 agets o the average before the optimal path is foud. Therefore a average of (143*115) steps are eeded for SLAP agets to fid the optimal solutio. This umber seems to be approximately the same as for TD(λ), but the huge advatage of SLAP agets over TD(λ) is that they fid a rather good solutio already after oe episode ad about 115 exploratio steps. Such a solutio ca be directly usable, so further exploratio ca be deferred to whe there is spare time to do it. This also correspods rather well to huma behavior first fid a "usable" solutio ad be curious about other solutios whe there is time for it. Table 1 shows the umber of steps eeded for the first ad the secod episodes of te sample SLAP agets. After each episode, the agets received a fial reward of oe at the ed of the episode before storig the solutio as a istace i log term workig memory. All agets had α = 1. For 30 sample agets, the logest iitial episode took 336 steps ad the shortest took 26 steps. The total umber of iitial episode steps for the 30 agets was 3460 ad the optimal solutio of 14 steps was foud by oe of these. Eve the worst secod episode solutio requirig 26 steps could be usable i may applicatios. Table 1. Exploratio steps for first episode versus secod episode for te differet agets. Ru # Episode Episode The secod possibility to fid the optimal path is to use the same aget all the time ad let it gradually improve.

6 This possibility has so far oly bee studied for the case of usig low α values ad a fial reward of oe. However, oly episodes that are shorter tha ay previous episode are stored as istaces i log term workig memory, which meas that episodes ted to get shorter as the umber of episodes icreases. Whe usig α = 0.01, the path followed became stable after a average of about 500 episodes ad a total of about steps. All agets that discovered the 14 step solutio at least oce (about oe aget out of five) evetually coverged to that solutio, while the others coverged to a solutio of 16 steps. Covergece could certaily be made much quicker i several ways. Oe way would be to use adaptive fial reward values, where a reward couter would cout the total amout of fial rewards give ad the give a bigger fial reward tha this amout for better solutios, thus slightly overridig all previous solutios. Ufortuately, despite its simplicity, this method has ot bee tested yet. Adjustig the values of α, the fial episode reward ad the iterval for radom iitial weights of ew stimuli i short term workig memory determie the balace betwee radom exploratio ad greedy exploratio. But if the solutios foud durig the first episodes are too far from the optimal solutio, these parameters are ot sufficiet for covergig to the optimal solutio. Usig ε-greedy exploratio should solve this problem sice it would itroduce stochastic behavior. Testig this is oe of the first issues of future research. Future research will also focus o comparig existig RL methods ad those proposed i this paper for other mazes ad for other kids of problems. It would be especially iterestig to exted the approach to problems requirig geeralizatio for differet states based o state descriptios. Oe such problem is the miefield avigatio problem treated i [4], which is more geeral tha well-kow cases like backgammo [8] that require a great amout of domai kowledge. I the miefield avigatio problem there are o states, oly cotiuousvalued state descriptios, where the umber of stimuli is costat while the degree of activatio of stimuli chages. All calculatios used i this paper are applicable to this kid of stimuli, but they will certaily eed to be further developed i order to solve this kid of problem. 4 Coclusio This paper presets how iitial exploratio rus i reiforcemet learig ca be sigificatly shorteed. This is achieved by the SLAP reiforcemet learig priciple, which makes the aget avoid comig back to states already visited. SLAP also has the side effect of memorizig the shortest path foud durig a episode i the weights of the eural et model preseted here, thus fidig "usable" solutios with miimal exploratio. Sice "usable" solutios are foud very quickly, it becomes feasible to let multiple agets do simultaeous exploratio ad retai the best oes. Lettig these agets commuicate ad exchage their iformatio would be a iterestig topic for future research sice that could further reduce exploratio time. Notios of short- ad log term memory preseted offer agets a possibility to maitai a balace betwee previously foud solutios ad searchig for eve better solutios. This gives agets a much more "huma like" behavior tha do existig RL methods, i.e. first fidig a usable solutio ad the beig curious eough to improve the solutio whe there is time for it. Most curret RL methods first exhaustively explore the whole state space several times ad the coverge towards a optimal solutio, which is defiitely ot how a huma idividual fids a ew way to avigate through a tow, for istace. Methods preseted here are still at a early stage of research, so a lot of work remais before their positio i the research area of reiforcemet learig ca be established. The results preseted i this paper should still give a clear idicatio that the methods developed give several big advatages compared to existig methods. If similar results are obtaied for other problems ad problem domais, reiforcemet learig could probably be used i may ew applicatio areas where they are ot yet feasible due to excessive exploratio times. Refereces: [1] Geesereth, M.R., Nilsso, N.J., Logical Foudatios of Artificial Itelligece, Morga Kaufma Publishers, [2] Jeigs, N.R., Sycara, K., Woolridge, M., A Roadmap of Aget Research ad Developmet, Autoomous Agets ad Multi-Aget Systems, Vol. 1, No. 1, 1998, pp [3] Louie, K., Wilso, M.A., Temporally Structured Replay of Awake Hippocampal Esemble Activity durig Rapid Eye Movemet Sleep, Neuro, Vol. 29, No. 1, 2001, pp [4] Su, R., Merrill, E., Peterso, T., From Implicit Skills to Explicit Kowledge: A Bottom-Up Model of Skill Learig, Cogitive Sciece, Vol. 25, No. 2, [5] Sutto, R.S., Learig to predict by the method of temporal differeces, Machie Learig, Vol. 3, 1988, pp [6] Sutto, R.S., Itegrated architectures for learig, plaig, ad reactig based o approximatig dyamic programmig, i Proceedigs of the Seveth Iteratioal Coferece o Machie Learig, Morga Kaufma Publishers, [7] Sutto, R.S., Barto, A.G., Reiforcemet Learig, A Bradford Book, MIT Press, Cambridge, MA, [8] Tesauro, G.J., Temporal differece learig ad TD- Gammo, Commuicatios of the ACM, Vol. 38, 1995, pp

Fuzzy Reference Gain-Scheduling Approach as Intelligent Agents: FRGS Agent

Fuzzy Reference Gain-Scheduling Approach as Intelligent Agents: FRGS Agent Fuzzy Referece Gai-Schedulig Approach as Itelliget Agets: FRGS Aget J. E. ARAUJO * eresto@lit.ipe.br K. H. KIENITZ # kieitz@ita.br S. A. SANDRI sadra@lac.ipe.br J. D. S. da SILVA demisio@lac.ipe.br * Itegratio

More information

Natural language processing implementation on Romanian ChatBot

Natural language processing implementation on Romanian ChatBot Proceedigs of the 9th WSEAS Iteratioal Coferece o SIMULATION, MODELLING AND OPTIMIZATION Natural laguage processig implemetatio o Romaia ChatBot RALF FABIAN, MARCU ALEXANDRU-NICOLAE Departmet for Iformatics

More information

E-LEARNING USABILITY: A LEARNER-ADAPTED APPROACH BASED ON THE EVALUATION OF LEANER S PREFERENCES. Valentina Terzieva, Yuri Pavlov, Rumen Andreev

E-LEARNING USABILITY: A LEARNER-ADAPTED APPROACH BASED ON THE EVALUATION OF LEANER S PREFERENCES. Valentina Terzieva, Yuri Pavlov, Rumen Andreev Titre du documet / Documet title E-learig usability : A learer-adapted approach based o the evaluatio of leaer's prefereces Auteur(s) / Author(s) TERZIEVA Valetia ; PAVLOV Yuri (1) ; ANDREEV Rume (2) ;

More information

'Norwegian University of Science and Technology, Department of Computer and Information Science

'Norwegian University of Science and Technology, Department of Computer and Information Science The helpful Patiet Record System: Problem Orieted Ad Kowledge Based Elisabeth Bayega, MS' ad Samso Tu, MS2 'Norwegia Uiversity of Sciece ad Techology, Departmet of Computer ad Iformatio Sciece ad Departmet

More information

arxiv: v1 [cs.dl] 22 Dec 2016

arxiv: v1 [cs.dl] 22 Dec 2016 ScieceWISE: Topic Modelig over Scietific Literature Networks arxiv:1612.07636v1 [cs.dl] 22 Dec 2016 A. Magalich, V. Gemmetto, D. Garlaschelli, A. Boyarsky Uiversity of Leide, The Netherlads {magalich,

More information

Management Science Letters

Management Science Letters Maagemet Sciece Letters 4 (24) 2 26 Cotets lists available at GrowigSciece Maagemet Sciece Letters homepage: www.growigsciece.com/msl A applicatio of data evelopmet aalysis for measurig the relative efficiecy

More information

Application for Admission

Application for Admission Applicatio for Admissio Admissio Office PO Box 2900 Illiois Wesleya Uiversity Bloomig, Illiois 61702-2900 Apply o-lie at: www.iwu.edu Applicatio Iformatio I am applyig: Early Actio Regular Decisio Early

More information

HANDBOOK. Career Center Handbook. Tools & Tips for Career Search Success CALIFORNIA STATE UNIVERSITY, SACR AMENTO

HANDBOOK. Career Center Handbook. Tools & Tips for Career Search Success CALIFORNIA STATE UNIVERSITY, SACR AMENTO HANDBOOK Career Ceter Hadbook CALIFORNIA STATE UNIVERSITY, SACR AMENTO Tools & Tips for Career Search Success Academic Advisig ad Career Ceter 6000 J Street Lasse Hall 1013 Sacrameto, CA 95819-6064 916-278-6231

More information

part2 Participatory Processes

part2 Participatory Processes part part2 Participatory Processes Participatory Learig Approaches Whose Learig? Participatory learig is based o the priciple of ope expressio where all sectios of the commuity ad exteral stakeholders

More information

Consortium: North Carolina Community Colleges

Consortium: North Carolina Community Colleges Associatio of Research Libraries / Texas A&M Uiversity www.libqual.org Cotributors Collee Cook Texas A&M Uiversity Fred Heath Uiversity of Texas BruceThompso Texas A&M Uiversity Martha Kyrillidou Associatio

More information

CONSTITUENT VOICE TECHNICAL NOTE 1 INTRODUCING Version 1.1, September 2014

CONSTITUENT VOICE TECHNICAL NOTE 1 INTRODUCING  Version 1.1, September 2014 preview begis oct 2014 lauches ja 2015 INTRODUCING WWW.FEEDBACKCOMMONS.ORG A serviced cloud platform to share ad compare feedback data ad collaboratively develop feedback ad learig practice CONSTITUENT

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

VISION, MISSION, VALUES, AND GOALS

VISION, MISSION, VALUES, AND GOALS 6 VISION, MISSION, VALUES, AND GOALS 2010-2015 VISION STATEMENT Ohloe College will be kow throughout Califoria for our iclusiveess, iovatio, ad superior rates of studet success. MISSION STATEMENT The Missio

More information

On March 15, 2016, Governor Rick Snyder. Continuing Medical Education Becomes Mandatory in Michigan. in this issue... 3 Great Lakes Veterinary

On March 15, 2016, Governor Rick Snyder. Continuing Medical Education Becomes Mandatory in Michigan. in this issue... 3 Great Lakes Veterinary michiga veteriary medical associatio i this issue... 3 Great Lakes Veteriary Coferece 4 What You Need to Kow Whe Issuig a Iterstate Certificate of Ispectio 6 Low Pathogeic Avia Iflueza H5 Virus Detectios

More information

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

2014 Gold Award Winner SpecialParent

2014 Gold Award Winner SpecialParent Award Wier SpecialParet Dedicated to all families of childre with special eeds 6 th Editio/Fall/Witer 2014 Desig ad Editorial Awards Competitio MISSION Our goal is to provide parets of childre with special

More information

also inside Continuing Education Alumni Authors College Events

also inside Continuing Education Alumni Authors College Events SUMMER 2016 JAMESTOWN COMMUNITY COLLEGE ALUMNI MAGAZINE create a etrepreeur creatig a busiess a artist creatig beauty a citize creatig the future also iside Cotiuig Educatio Alumi Authors College Evets

More information

TD(λ) and Q-Learning Based Ludo Players

TD(λ) and Q-Learning Based Ludo Players TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability

More information

Multimedia Application Effective Support of Education

Multimedia Application Effective Support of Education Multimedia Application Effective Support of Education Eva Milková Faculty of Science, University od Hradec Králové, Hradec Králové, Czech Republic eva.mikova@uhk.cz Abstract Multimedia applications have

More information

Seminar - Organic Computing

Seminar - Organic Computing Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

Learning Prospective Robot Behavior

Learning Prospective Robot Behavior Learning Prospective Robot Behavior Shichao Ou and Rod Grupen Laboratory for Perceptual Robotics Computer Science Department University of Massachusetts Amherst {chao,grupen}@cs.umass.edu Abstract This

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Speeding Up Reinforcement Learning with Behavior Transfer

Speeding Up Reinforcement Learning with Behavior Transfer Speeding Up Reinforcement Learning with Behavior Transfer Matthew E. Taylor and Peter Stone Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 {mtaylor, pstone}@cs.utexas.edu

More information

Improving Action Selection in MDP s via Knowledge Transfer

Improving Action Selection in MDP s via Knowledge Transfer In Proc. 20th National Conference on Artificial Intelligence (AAAI-05), July 9 13, 2005, Pittsburgh, USA. Improving Action Selection in MDP s via Knowledge Transfer Alexander A. Sherstov and Peter Stone

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Georgetown University at TREC 2017 Dynamic Domain Track

Georgetown University at TREC 2017 Dynamic Domain Track Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain

More information

High-level Reinforcement Learning in Strategy Games

High-level Reinforcement Learning in Strategy Games High-level Reinforcement Learning in Strategy Games Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Guy Shani Department of Computer

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Circuit Simulators: A Revolutionary E-Learning Platform

Circuit Simulators: A Revolutionary E-Learning Platform Circuit Simulators: A Revolutionary E-Learning Platform Mahi Itagi Padre Conceicao College of Engineering, Verna, Goa, India. itagimahi@gmail.com Akhil Deshpande Gogte Institute of Technology, Udyambag,

More information

Contents. Foreword... 5

Contents. Foreword... 5 Contents Foreword... 5 Chapter 1: Addition Within 0-10 Introduction... 6 Two Groups and a Total... 10 Learn Symbols + and =... 13 Addition Practice... 15 Which is More?... 17 Missing Items... 19 Sums with

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Lesson plan for Maze Game 1: Using vector representations to move through a maze Time for activity: homework for 20 minutes

Lesson plan for Maze Game 1: Using vector representations to move through a maze Time for activity: homework for 20 minutes Lesson plan for Maze Game 1: Using vector representations to move through a maze Time for activity: homework for 20 minutes Learning Goals: Students will be able to: Maneuver through the maze controlling

More information

Regret-based Reward Elicitation for Markov Decision Processes

Regret-based Reward Elicitation for Markov Decision Processes 444 REGAN & BOUTILIER UAI 2009 Regret-based Reward Elicitation for Markov Decision Processes Kevin Regan Department of Computer Science University of Toronto Toronto, ON, CANADA kmregan@cs.toronto.edu

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

University of Groningen. Systemen, planning, netwerken Bosman, Aart

University of Groningen. Systemen, planning, netwerken Bosman, Aart University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC On Human Computer Interaction, HCI Dr. Saif al Zahir Electrical and Computer Engineering Department UBC Human Computer Interaction HCI HCI is the study of people, computer technology, and the ways these

More information

A Pipelined Approach for Iterative Software Process Model

A Pipelined Approach for Iterative Software Process Model A Pipelined Approach for Iterative Software Process Model Ms.Prasanthi E R, Ms.Aparna Rathi, Ms.Vardhani J P, Mr.Vivek Krishna Electronics and Radar Development Establishment C V Raman Nagar, Bangalore-560093,

More information

Introduction to Simulation

Introduction to Simulation Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /

More information

Probability and Game Theory Course Syllabus

Probability and Game Theory Course Syllabus Probability and Game Theory Course Syllabus DATE ACTIVITY CONCEPT Sunday Learn names; introduction to course, introduce the Battle of the Bismarck Sea as a 2-person zero-sum game. Monday Day 1 Pre-test

More information

Process improvement, The Agile Way! By Ben Linders Published in Methods and Tools, winter

Process improvement, The Agile Way! By Ben Linders Published in Methods and Tools, winter Process improvement, The Agile Way! By Ben Linders Published in Methods and Tools, winter 2010. http://www.methodsandtools.com/ Summary Business needs for process improvement projects are changing. Organizations

More information

LEGO MINDSTORMS Education EV3 Coding Activities

LEGO MINDSTORMS Education EV3 Coding Activities LEGO MINDSTORMS Education EV3 Coding Activities s t e e h s k r o W t n e d Stu LEGOeducation.com/MINDSTORMS Contents ACTIVITY 1 Performing a Three Point Turn 3-6 ACTIVITY 2 Written Instructions for a

More information

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology Tiancheng Zhao CMU-LTI-16-006 Language Technologies Institute School of Computer Science Carnegie Mellon

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Improving Fairness in Memory Scheduling

Improving Fairness in Memory Scheduling Improving Fairness in Memory Scheduling Using a Team of Learning Automata Aditya Kajwe and Madhu Mutyam Department of Computer Science & Engineering, Indian Institute of Tehcnology - Madras June 14, 2014

More information

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems Hannes Omasreiter, Eduard Metzker DaimlerChrysler AG Research Information and Communication Postfach 23 60

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

Getting Started with Deliberate Practice

Getting Started with Deliberate Practice Getting Started with Deliberate Practice Most of the implementation guides so far in Learning on Steroids have focused on conceptual skills. Things like being able to form mental images, remembering facts

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

arxiv: v1 [math.at] 10 Jan 2016

arxiv: v1 [math.at] 10 Jan 2016 THE ALGEBRAIC ATIYAH-HIRZEBRUCH SPECTRAL SEQUENCE OF REAL PROJECTIVE SPECTRA arxiv:1601.02185v1 [math.at] 10 Jan 2016 GUOZHEN WANG AND ZHOULI XU Abstract. In this note, we use Curtis s algorithm and the

More information

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Syntactic systematicity in sentence processing with a recurrent self-organizing network Syntactic systematicity in sentence processing with a recurrent self-organizing network Igor Farkaš,1 Department of Applied Informatics, Comenius University Mlynská dolina, 842 48 Bratislava, Slovak Republic

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

FF+FPG: Guiding a Policy-Gradient Planner

FF+FPG: Guiding a Policy-Gradient Planner FF+FPG: Guiding a Policy-Gradient Planner Olivier Buffet LAAS-CNRS University of Toulouse Toulouse, France firstname.lastname@laas.fr Douglas Aberdeen National ICT australia & The Australian National University

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method Farhadi F, Sorkhi M, Hashemi S et al. An effective framework for fast expert mining in collaboration networks: A grouporiented and cost-based method. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(3): 577

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

ANALYSIS: LABOUR MARKET SUCCESS OF VOCATIONAL AND HIGHER EDUCATION GRADUATES

ANALYSIS: LABOUR MARKET SUCCESS OF VOCATIONAL AND HIGHER EDUCATION GRADUATES ANALYSIS: LABOUR MARKET SUCCESS OF VOCATIONAL AND HIGHER EDUCATION GRADUATES Authors: Ingrid Jaggo, Mart Reinhold & Aune Valk, Analysis Department of the Ministry of Education and Research I KEY CONCLUSIONS

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Implementing a tool to Support KAOS-Beta Process Model Using EPF Implementing a tool to Support KAOS-Beta Process Model Using EPF Malihe Tabatabaie Malihe.Tabatabaie@cs.york.ac.uk Department of Computer Science The University of York United Kingdom Eclipse Process Framework

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Varun Raj Kompella, Marijn Stollenga, Matthew Luciw, Juergen Schmidhuber The Swiss AI Lab IDSIA, USI

More information

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley Challenges in Deep Reinforcement Learning Sergey Levine UC Berkeley Discuss some recent work in deep reinforcement learning Present a few major challenges Show some of our recent work toward tackling

More information

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Transfer Learning Action Models by Measuring the Similarity of Different Domains Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Thomas F.C. Woodhall Masters Candidate in Civil Engineering Queen s University at Kingston,

More information

Evolution of Symbolisation in Chimpanzees and Neural Nets

Evolution of Symbolisation in Chimpanzees and Neural Nets Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication

More information

AMULTIAGENT system [1] can be defined as a group of

AMULTIAGENT system [1] can be defined as a group of 156 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008 A Comprehensive Survey of Multiagent Reinforcement Learning Lucian Buşoniu, Robert Babuška,

More information

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations 4 Interior point algorithms for network ow problems Mauricio G.C. Resende AT&T Bell Laboratories, Murray Hill, NJ 07974-2070 USA Panos M. Pardalos The University of Florida, Gainesville, FL 32611-6595

More information

GACE Computer Science Assessment Test at a Glance

GACE Computer Science Assessment Test at a Glance GACE Computer Science Assessment Test at a Glance Updated May 2017 See the GACE Computer Science Assessment Study Companion for practice questions and preparation resources. Assessment Name Computer Science

More information

Star Math Pretest Instructions

Star Math Pretest Instructions Star Math Pretest Instructions Renaissance Learning P.O. Box 8036 Wisconsin Rapids, WI 54495-8036 (800) 338-4204 www.renaissance.com All logos, designs, and brand names for Renaissance products and services,

More information

Measurement. When Smaller Is Better. Activity:

Measurement. When Smaller Is Better. Activity: Measurement Activity: TEKS: When Smaller Is Better (6.8) Measurement. The student solves application problems involving estimation and measurement of length, area, time, temperature, volume, weight, and

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

The Evolution of Random Phenomena

The Evolution of Random Phenomena The Evolution of Random Phenomena A Look at Markov Chains Glen Wang glenw@uchicago.edu Splash! Chicago: Winter Cascade 2012 Lecture 1: What is Randomness? What is randomness? Can you think of some examples

More information

Assessing Functional Relations: The Utility of the Standard Celeration Chart

Assessing Functional Relations: The Utility of the Standard Celeration Chart Behavioral Development Bulletin 2015 American Psychological Association 2015, Vol. 20, No. 2, 163 167 1942-0722/15/$12.00 http://dx.doi.org/10.1037/h0101308 Assessing Functional Relations: The Utility

More information

A student diagnosing and evaluation system for laboratory-based academic exercises

A student diagnosing and evaluation system for laboratory-based academic exercises A student diagnosing and evaluation system for laboratory-based academic exercises Maria Samarakou, Emmanouil Fylladitakis and Pantelis Prentakis Technological Educational Institute (T.E.I.) of Athens

More information

Major Milestones, Team Activities, and Individual Deliverables

Major Milestones, Team Activities, and Individual Deliverables Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering

More information

Grade 6: Correlated to AGS Basic Math Skills

Grade 6: Correlated to AGS Basic Math Skills Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

Knowledge Transfer in Deep Convolutional Neural Nets

Knowledge Transfer in Deep Convolutional Neural Nets Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract

More information

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education GCSE Mathematics B (Linear) Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education Mark Scheme for November 2014 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge

More information

Pre-AP Geometry Course Syllabus Page 1

Pre-AP Geometry Course Syllabus Page 1 Pre-AP Geometry Course Syllabus 2015-2016 Welcome to my Pre-AP Geometry class. I hope you find this course to be a positive experience and I am certain that you will learn a great deal during the next

More information

6 Financial Aid Information

6 Financial Aid Information 6 This chapter includes information regarding the Financial Aid area of the CA program, including: Accessing Student-Athlete Information regarding the Financial Aid screen (e.g., adding financial aid information,

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Learning and Transferring Relational Instance-Based Policies

Learning and Transferring Relational Instance-Based Policies Learning and Transferring Relational Instance-Based Policies Rocío García-Durán, Fernando Fernández y Daniel Borrajo Universidad Carlos III de Madrid Avda de la Universidad 30, 28911-Leganés (Madrid),

More information

The open source development model has unique characteristics that make it in some

The open source development model has unique characteristics that make it in some Is the Development Model Right for Your Organization? A roadmap to open source adoption by Ibrahim Haddad The open source development model has unique characteristics that make it in some instances a superior

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information