An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

Inernaional Journal of Elecrical and Compuer Engineering (IJECE) Vol. 6, No. 5, Ocober 2016, pp. 2415~2424 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i5.10639 2415 An Effiecien Approach for Resource Auo-Scaling in Cloud Environmens Bahar Asgari 1, Mosafa Ghobaei Arani 1, Sam Jabbehdari 2 1 Deparmen of Compuer Engineering, Mahalla Branch, Islamic Azad Universiy, Mahalla, Iran 2 Deparmen of Compuer Engineering, Norh Tehran Branch, Islamic Azad Universiy, Tehran, Iran Aricle Info Aricle hisory: Received Mar 27, 2016 Revised Jul 6, 2016 Acceped Jul 28, 2016 Keyword: Auo-scaling Cloud compuing Markov decision process Reinforcemen learning Scalabiliy ABSTRACT Cloud services have become more popular among users hese days. Auomaic resource provisioning for cloud services is one of he imporan challenges in cloud environmens. In he cloud compuing environmen, resource providers shall offer required resources o users auomaically wihou any limiaions. I means whenever a user needs more resources, he required resources should be dedicaed o he users wihou any problems. On he oher hand, if resources are more han user s needs exra resources should be urn off emporarily and urn back on whenever hey needed. In his paper, we propose an auomaic resource provisioning approach based on reinforcemen learning for auo-scaling resources according o Markov Decision Process (MDP). Simulaion Resuls show ha he rae of Service Level Agreemen (SLA) violaion and sabiliy ha he proposed approach beer performance compared o he similar approaches. Copyrigh 2016 Insiue of Advanced Engineering and Science. All righs reserved. Corresponding Auhor: Bahar Asgari, Deparmen of Compuer Engineering, Mahalla Branch, Islamic Azad Universiy, Mahalla, Iran. Email: Bahar_asgari88@yahoo.com 1. INTRODUCTION Cloud compuing is he number of virualized conneced compuers which offers single compuaional resource dynamically and o compue complex compuaion [1]-[3]. In oher word, cloud compuing saes o boh applicable programs offered as services on he Inerne, hardware and sofware sysems in daa ceners. By definiion, in daa cener hardware and sofware are called "cloud". Scalabiliy is one of he basic conceps in cloud compuing which is imporan in using higher efficiency of cloud compuing [4]. Scalabiliy is referred o increase sysem funcional power o have suiable response agains increased work load of course by adding sofware and hardware resources [5]. Whereas applicaions, especially applicaion program on web, do no have regular work load paerns so ha scalabiliy funcions (increase or decrease of scale) should have be done immediaely wih minimum human inervenion o provide resources for applicaions as soon as possible. Resource scaling wih minimum human inervenion is called auo-scaling [6]-[8]. Various workloads are of he bigges challenges in differen imes, so whenever provider wans o mee all he requiremens in all imes, i should reserve maximum needed resources previously for peak work load o suppor hem. In his siuaion provider someimes will be overprovisioning and i is going o be very cosly for hem (o buy maximum resources a peak imes) which leads o lower profi. Therefore funcional expenses will be reduced by urning off idle nodes on idle imes, bu i canno decrease financial expenses relaed purchasing and hosing IT equipmen s and heir depreciaion. If provider possesses only enough resources (average capaciy) o suppor average number of requess, he providers may be uilized, bu he provider migh no have enough local resources o mee cliens reques which leads o under- provisioning in some siuaions so provider has o rejec new cusomers or cancel Journal homepage: hp://iaesjournal.com/online/index.php/ijece

2416 ISSN: 2088-8708 previous services operaing on sysem. We should design a sysem which will be able o manage uncerainy and remove any problems in cloud environmen. Also i should be able o impac parameers like expense, efficiency, SLA violaion ec. We offer auo-scaling according o reinforcemen learning. Reinforcemen learning (RL) is a kind of decision making ha deermines a goal performing funcional model, applies policy wihou previous informaion. RL has been performed successfully in exensive fields o suppor auo-conrol and dedicae resources [9]-[12] which works on he basic assumpion of penaly and reward so he facors move oward operaions which lead o highes profi. Major par of RL is on he basis of deerminaion of opimal policies in Markov [13],[14]. In his paper we wan o propose auo-scaling approach using MDP o manage SLA violaion and scaling expense and o preserve sysem sabiliy. RL has he capaciy o response suiably using environmen experiences. RL leads o beer managemen of compromise SLA violaion and number of scales bu i causes higher expenses. The res of his paper is organized as follows: we review relaed works abou RL in second par; he proposed approach comes in hird par in deail. The performance evaluaion of proposed approach will be explained in fourh par. Finally conclusion and suggesion will be presened in fifh secion. 2. RELATED WORKS Various sudies have been carried ou abou auo-scaling and is implemenaion. Curren approaches have advanages and disadvanages. As he proposed approach in his paper is based upon RL, we review researches relaed o his echnique in his secion. Enda Barre e al. [14] have been considered he parallel Q learning o reduce ime of deerminaion abou opimal policies and online learning. Their proposed approach uses MDP along wih RL. Fouad Bahrpeyma e al. [15] suggess RL-DRP approach. They use neural neworks in heir proposed mechanism. The approach enable cloud service providers o mee high volume of requess wihou wasing any ime, valuable work and a he same ime conrol resources opimally. Xavier Dureilh e al. [16] have proposed using proper iniializaion in primary sages also increasing he rae of convergence in process of learning o solve problem. They have offered experimens resuls. Also hey have inroduced an efficien model o deec changes hen compleed learning process managemen based on ha. Bauer e al. [17] proposed using RL o manage hreshold orders. Firs conroller applies hese orders o he goal program o reinforce is qualiy feaures. Second conroller supervises he orders, adaps hresholds and changes condiions, also i deacivaes unrelaed orders. Jia Rao e al. [18] represen a RL aware virualized machine configuraion (VCONF). Cenral design of VCONF is prepared based on RL aware model o scale and adap. Amoui e al. used RL successfully in managemen qualiy of web programs o opimize program's oupu [19]. Using simulaion in iniializaion of learning funcions is one of he ineresing aspecs of i. Finally Table 1 shows he comparison of above echniques. Reference Enda Barre [14] Fouad Bahrpeyma [15] Auo-scaling echnique Parallel Q learning Table 1. Comparison of Techniques Advanages and disadvanages Decreasing ime of opimal policy deerminaion and online learning Disadvanages: challenges in deerminaion of iniial policies Conribuion I uses inheren parallelism in disribued compuing plaforms like cloud RL Fas convergence process Higher uilizaion I inroduces a new decision making process o use predicably analysis of demand which considers parameers of offer and demand Xavier Dureilh [16] RL Horizonal scaling Increase in convergence rae in learning sages Bahai [17] RL I limis siuaion as pair of operaion-condiion and provides he possibiliy of re-use of learned models in an order se for nex sage Reinforcemen of load based on effecive limi JiaRao [18] VCONF VCONF is good adapaion wih online auo configuraion policies wih heerogeneous VMs VCONF is enable o guide iniial seing wihou decreasing in funcion of VMs Amoui [19] RL Qualiy managemen in applicaion of new web design o opimize program oupu Inegraion in a real cloud conroller and auo programming They proposed using RL o manage hreshold orders. Firs conroller applies order o he goal program o improve he feaures of qualiy Cenral design of VCONF using RL aware model works o scale and adap Using simulaion for iniializaion of learning funcions IJECE Vol. 6, No. 5, Ocober 2016 : 2415 2424

IJECE ISSN: 2088-8708 2417 3. PROPOSED APPROACH Final goal is o make auo scaling sysem o have he abiliy of decreasing coss and increasing sysem sabiliy; a he same ime wih SLA requiremens and sysem efficiency I means o use an online policy o dedicae resources wih scaling auomaically. Proposed approach will be inroduced according o RL and MDP. The offered MDP consiues from 4 caegories included condiions, operaions, ransmied possibiliies and rewards so ha decision making abou scale up/ scale down will be accomplished based on i. 3.1. Reinforcemen Learning (RL) RL [7],[14],[15] is a compuaional approach o undersand auomaic base learning o make he bes decisions. I insiss on learning via direc involvemen of agen and environmen. Decision maker refers o he agen who learns from experience and is bes acion is o perform a is maximum in any environmen. An auo scaler is responsible for decisions abou scaling wihou human involvemen and is objecive is o adap resources dynamically o he applicaions according o inpu workload. I decides o allocae or deallocae resources o he applicaions based on workload. In any ime which = 0,1,2, ime sequences are separaed, agen shows condiion of environmen s s where s is all possible condiions and i selecs a A( s ) where As ( ) is all variables in he condiion of s bu in a deermined ime, sequence of hese funcions and agen will be he nex reward r 1 which finds iself in new condiion of s 1. Agen will selec from condiion possibiliies hen operaes he possible acion. This will be agen's policy ha shows π in which π (s, a) as a a a he condiion s s. So MDP can be shown in four caegories included condiions, operaions, ransmied possibiliies and rewards: S: Environmenal sae space A: oal acion space P(. s, a)defines disribuion of governed possibiliies on ransmied condiions s + 1 ~ p (. s, a ) Q (. s, a) defines disribuion of governed possibiliies on received reward. R ( s,a ) ~ q (. s, a ) The objecive of learning process inside learning Q is o achieve he opimal policy which reflecs by Q amoun in general reward and coninues by operaing in curren siuaion. The amoun of Q will be calculaed by equaion 1 which includes discouned reward (decreased reward) and shows RL process policy. Q ( s, a ) Q ( s, a ) ( r m a x Q ( s, a ) Q ( s, a ) ) 1 a (1) Where r 1 is medium received reward of selecing a in s condiion is learning rae and γ is discoun coefficien (Reducion). The overall process of RL has been shown in Algorihm 1: Algorihm1: Reinforcemen Learning Algorihm (Q- learning) 1. Iniialize Q(s,a) arbirarily 2. Repea ( for each episode) 3. Iniialize s 4. Repea 5. Choose a from s using policy derived from Q ( - greedy) 6. Take acion a and observe r,s Q s, a Q s, a + 7. a r + m ax Q +1 a s, a - Q s, a 8. s s ; 9. Unil s is erminal a An Efficien Approach for Resource Auo Scaling in Cloud Environmens (Bahar Asgari)

2418 ISSN: 2088-8708 3.2. Proposed Algorihm The proposed algorihm has been offered based on RL, ha is defined according o Markov process for auo scaling a MDP. Upper and lower hreshold have been defined oo and cloud service operaion will be moniored afer inroducion proposed MDP. Configuraion of proposed MDP has been considered as follows: S: he space of condiion: Full uilizaion, Under uillizaion, Normal uilizaion. A: he space of operaion: Scale up, Scale down, No- op. P (. s, a) defines disribuionpossibiliy governed on ransmied condiions. Q (. s, a) defines disribuion possibiliy governed on received reward. As we know MIPS means he number of insrucions per second. There has been inroduced wo variables for proposed approach included available MIPS and Requesed MIPS, boh are variables of service inpus. Q Updaing will be done using local regerssion according o hisory of insrucions. Division of wo amouns shows he amoun of uillizaion (equaion 2) and comparison of upper and lower hresholds deermine space of condiion. The full uilizaion and he under uilizaion condiion show in Equaion 3 and 4, respecively. Available MIPS Uilizaion Re quesed MIPS Requesed /Available High-Threshold Full-Uilizaion Under-Provisioning MIPS MIPS Requesed / Available Low-Threshold Under- uilizaion Over- Provisioning MIPS MIPS (2) (3) (4) Afer defining full uilizaion, under uilizaion and normal condiions, and operaing equaion 1 (Q(s, a) ), SLA violaion amoun will be acquired by Requesed MIPS and available MIPS hen decision will be made according o above funcions o do curren acion, i means o increase or decrease virual machine or no operaion. Table 2 represens process of decision making and Figure 1 shows a diagram included provider condiion changes regarded o uilizaion parameer. Table 2. Decision Making By MDP Uilizaion>High- Threshold Low-Threshold< Uilizaion < High -Threshold Uilizaion <Low- Threshold Sae() Full-Uilizaion (Under-Provisioning) Normal-Uilizaion (Normal- Provisioning) Under-Uilizaion (Over-Provisioning) Nex-Acion(+1) Scale_up No-op Scale_down Figure 1. Condiion of provider changes regarded o uilizaion parameer Proposed algorihm inroduced in his paper is represened as semi code offered in algorihm 2 according o Markov and decision making in Table 2. Algorihm2: Reinforcemen Learning (Q-Learning) 1. Iniialize Q (s, a)=0,s=0, a=0, high Range Q=0.8, low Range Q=0.2. 2. Observe he Available MIPS and Requesed MIPS. 3. Observe he curren sae S. IJECE Vol. 6, No. 5, Ocober 2016 : 2415 2424

IJECE ISSN: 2088-8708 2419 4. If (requesed MIPS/available MIPS) > high Range Q), sae [0]= 0; /*Full-Uilizaion sae*/ 5. Else if (requesed MIPS/available MIPS) <Low Range Q), sae[1]= 1; /* Under-Uilizaion sae */ 6. Else sae [2]= 2; /* Normal-Uilizaion sae*/ 7. Loop 8. Selec acion, choose for sae,based one of he acion selecion policy Uilizaion 9. Take acion, observe r, as well as he new sae, s. 10. Updae Q value for he sae using he Regression and observed r and he maximum reward possible for he nex sae. 11. Q s, a Q s, a + a r + m ax Q +1 a s, a - Q s, a 12. Se he sae s o he new sae s, s s 13. Unil s is erminal 4. PERFORMANCE EVALUATION There has been used of Cloudsim [20] simulaor for simulaion. Four kinds of virual machine corresponded o Amazon EC2 [21] have been performed which heir specificaions offered in Table III. There have been used four kinds of services regarded o he variey of available services in cloud and we have no focused on ype of service or special program so ha used services are independen o programs. These services are combinaion of all heerogeneous programs like HPC, Web and so on. Also work load has been modeled according o normal disribuion o be closer o real world. Scaling will be done in 24 hour period and in 5 minues inervals (288 five minues), Low-Threshold is considered 0.2 and High-Threshold is considered 0.8 Sandard deviaion is 3000 MIPS and Diff Range is 0.4 There has been considered a funcion for iniializaion cos. As cos funcion is compued by hour and we have 5 minues inervals so ha we have o muliple overall coss by 300/3600. Table 3. Specificaion of Virual Machine Type Of Virual MIPS RAM Price Core Machine (CPU) (MB) (Cen) Micro 500 1 633 0.026 Small 1,000 1 1,700 0.070 Exra Large 2,000 1 3,750 0.280 High-CPU Medium 2,500 1 850 0.560 Algorihm works by updaing Q. we have done Q updaing and obaining uilizaion by local regression. Updaing Q will be accomplished according o insrucion hisory; i means nex amoun will be deermined according o predicion of previous amoun. Prediced amoun should be muliplied by 0.7, because error possibiliy has been considered as 30 percen. Regression funcion helps us o scale VMs in he way ha decrease failed case along wih minimum cos. The amoun of Available MIPS and Requesed MIPS is calculaed in he main funcion of proposed approach. Also he amoun of SLA violaion is calculaed using heir difference. Requesed MIPS is divided o Available MIPS o inroduce wo dimensional array r [rsae] [0], r [rsae] [1] and r [rsaae] [2] which deermines underuilizaion, full uilizaion and normal funcions. Then he amoun of r will be updaed and he amoun of equaion Q (s, a) will be calculaed. Overall MIPS amoun is obained by division of violaion rae o he oal MIPS hen opimal curren acion will be seleced using Curren Acion() and Selec acion() according o he amoun of uilizaion. Then decision for scale down and scale up or null acion will be made.proposed approach which is learning auomaa aware [22] will be compared o cos aware auo scaling approach which is a simple, auomaa approach by parameers like cos, SLA violaion, iniializaion cos and number of scaling. There has been defined hree scenario for evaluaion proposed approach in Table 4. Table 4. Evaluaion Scenarios Scenario Goal Firs scenario Minimizaion SLA violaion Second scenario Minimizaion Toal Cos Third scenario Minimizaion number of scaling An Efficien Approach for Resource Auo Scaling in Cloud Environmens (Bahar Asgari)

2420 ISSN: 2088-8708 4.1. Firs Scenario Evaluaion of SLA violaion has been considered in firs scenario compared o wo oher approaches. SLA violaion will be happened when provider canno provide predefined measures in SLA for users. Some examples of SLA violaion is he number of los deadlines, lack of warrany on agreed MIPS, lack of warrany on agreed bandwidh, number of rejeced requess because of no having enough resources a he peak imes. Increasing rae of SLA violaion causes lower qualiy in providing services for user. If Requesed MIPS is no mach wih available MIPS, SLA violaion will happen. Figure 2 represens resuls of comparison of SLA violaion in hree compared approaches for 4 services. As you can see SLA violaion rae in proposed approach is less han he ohers. Figure 2. Comparison of SLA violaion in services Figure 3 shows he comparison of overall SLA violaion for services included cos aware, learning auomaa and proposed approach. As you can see resuls of proposed approach simulaion compared o learning auomaa and cos aware approach has lower rae of SLA violaion a he ime of simulaion so ha using Q learning echnique in auo scaling leads o reduce SLA violaion. So whenever SLA is imporan for auo scaling, we can use proposed approach. Figure 3. Comparison of overall SLA violaion in hree approaches 4.2. Second Scenario We address evaluaion of cos measure and comparing i wih oher approaches. Service cos will be calculaed according o hours of uiliy. I means user pays he cos according o speed, power and capaciy of requesed resource (CPU, Memory and disk and ) also ime of using resource. Naurally cos will be low when we use resource wih lower speed and capaciy in lower inervals. I can decrease cos bu affecs oher qualiy facors. So o have a high qualiy service we have o increase cos. Cos is one of he mos imporan facors for users. I means ha user always sruggles o accomplish he reques wih minimum cos. Parameer of cos inroduces in hree caegories: Iniializaion cos, Runime cos and Toal cos. Iniializaion cos is iniial cos for seing up VMs. Runime cos equals o cos according o uiliy per hour which will be paid for VMs operaion. Toal cos calculaes by equaion 5: IJECE Vol. 6, No. 5, Ocober 2016 : 2415 2424

IJECE ISSN: 2088-8708 2421 Toal Cos = Iniializaion Cos Runime Cos (5) Runime cos of VM increases nearly 20 percen in simulaion of verical scale up. Cos measure in simulaion is calculaed according o addiion of VM iniializaion cos and VM runime cos. Figure 4 shows VM iniializaion cos. I specifies ha a Q aware approach has high iniializaion cos while an auomaa aware approach will save iniializaion cos subsanially. Figure 4. Comparison of iniializaion cos in hree approaches Figure 5 represens VM runime cos in 3 services in 24 hours. Resuls of simulaion shows proposed approach has lower runime cos. Figure 5. Comparison of VM runime cos in hree approaches Figure 6 represens resuls of simulaion according o oal cos of scaling for 3 compared approaches in a 24 hour period. Figure 6. Comparison of oal cos in hree approaches An Efficien Approach for Resource Auo Scaling in Cloud Environmens (Bahar Asgari)

2422 ISSN: 2088-8708 Resuls of simulaion according oal cos in 24 hours for four services have been represened in Figure 7. Figure 7. Comparison of overall heoal cos in hree approaches As i is obvious in Figure 6 and 7, scaling aware of learning auomaa has lower cos compared o proposed approach and cos aware approach. Proposed approach in his paper has he highes oal cos in comparison. Q aware approach has high iniializaion and runime coss. Finally oal cos which is addiion of wo menioned coss shows ha he approach will no be proper approach compared o learning auomaa and cos aware approaches whenever he cos measure is considered. 4.3. Third Scenario In hird scenario we address comparing number of scales wih he oher wo approaches. The number of eliminaion or adding VMs is one of he imporan facors in dynamic scaling. I affecs speed of response in compuing environmen. Also i can cause o operaional overload and imposes cos o he sysem. Proper managemen of he measure helps us o achieve minimum cos, increase rae of response, consequenly reducion he rae of SLA violaion. Overall scaling funcion calculaes oal number of scales. The number of scaling funcions for four services has been shown in Figure 8. As you can see he number of scaling funcions for proposed approach will no change dramaically so ha sysem will have a proper sabiliy. Figure 8. Comparison he number of scaling in hree approaches As i is represened in Figure 9, he number of scaling funcions has been decreased in proposed approach compared o wo oher approaches according o resuls of simulaion. Reducion helps o opimize SLA violaion rae, lower cos and higher sysem sabiliy. IJECE Vol. 6, No. 5, Ocober 2016 : 2415 2424

IJECE ISSN: 2088-8708 2423 Figure 9. Comparison of overall he number of scaling in hree approaches 5. CONCLUSION Cloud services are disribued infrasrucures which exend space of communicaion and service. The resource providing has been very imporan because of daily grow of cloud services and scaling issue has been welcomed as one of he mos imporan feaures of cloud compuaion. In his paper we have represened an approach based upon reinforcemen learning also have addressed Markov model. There are 3 imporan facors in proposed approaches including SLA violaion rae, scaling cos and number of scales. Regarding cos measure, Q aware approach is no proper approach compared o he auomaa and cos aware approaches. Bu proposed approach reduces number of scales which leads o opimize rae of SLA violaion and sysem sabiliy. Also proposed approach decreases SLA violaion and opimizing SLA leads o increase cos. As a resul i makes difficul having minimum cos. On he oher hand focusing on he minimum cos leads o SLA violaion. So, we can observe subsanial reducion in SLA violaion and higher sysem sabiliy by using Q-learning echnique in auo scaling. Therefore i is possible o coninue sudies abou auo-scaling regarded oher effecive facors and oher approaches for example he condiion space will be changed according o uilizaion or we can represen a novice approach in auo-scaling using parallel Q learning and combinaion of parallel facor and new condiion. Also we can apply RL o predicae load in web aware sofware s. Also i is possible o merge RL and machine learning. Overload in proposed approach should be consider carefully oo. REFERENCES [1] R. Buyya, e al., Cloud compuing: principles and paradigms, John Wiley & Sons, vol. 87, 2010. [2] A. Vijaya and V. Neelanarayanan, A Model Driven Framework for Porable Cloud Services, Inernaional Journal of Elecrical and Compuer Engineering (IJECE), vol/issue: 6(2), pp. 708-716, 2016. [3] M. G. Arani and M. Shamsi, An Exended Approach for Efficien Daa Sorage in Cloud Compuing Environmen, Inernaional Journal of Compuer Nework and Informaion Securiy, vol/issue: 7(8), pp. 30, 2015. [4] M. G. Arani, e al., An auonomic approach for resource provisioning of cloud services, Cluser Compuing, pp. 1-20, 2016. [5] N. Roy, e al., Efficien auoscaling in he cloud using predicive models for workload forecasing, in Cloud Compuing (CLOUD), 2011 IEEE Inernaional Conference on, pp. 500-507, 2011. [6] K. Mogouie, e al., A Novel Approach for Opimizaion Auo-Scaling in Cloud Compuing Environmen, Inernaional Journal of Modern Educaion and Compuer Science, vol/issue: 7(8), pp. 9, 2015. [7] A. Liu, Theoreical Analysis for Scale-down-Aware Service Allocaion in Cloud Sorage Sysems, Inernaional Journal of Elecrical and Compuer Engineering, vol/issue: 3(1), pp. 21, 2013. [8] H. Ghiasi and M. G. Arani, Smar Virual Machine Placemen Using Learning Auomaa o Reduce Power Consumpion in Cloud Daa Ceners. [9] R. Hu, e al., Efficien Resources Provisioning Based on Load Forecasing in Cloud, The Scienific World Journal, 2014. [10] Y. Chevaleyre, e al., Issues in muliagen resource allocaion, Informaica, vol/issue: 30(1), 2006. [11] M. Jacyno, e al., Undersanding decenralised conrol of resource allocaion in a minimal muli-agen sysem, in Proceedings of he 6h inernaional join conference on Auonomous agens and muliagen sysems, pp. 208, 2007. [12] E. Scalas, e al., Growh and allocaion of resources in economics: The agen-based approach, Physica A: Saisical Mechanics and is Applicaions, vol/issue: 370(1), pp. 86-90, 2006. [13] E. Barre, e al., Applying reinforcemen learning owards auomaing resource allocaion and applicaion scalabiliy in he cloud, Concurrency and Compuaion: Pracice and Experience, vol/issue: 25(12), pp. 1656-1674, 2013. An Efficien Approach for Resource Auo Scaling in Cloud Environmens (Bahar Asgari)

2424 ISSN: 2088-8708 [14] B. B. G. Abadi and M. G. Arani, Resource Managemen of IaaS Providers in Cloud Federaion, Inernaional Journal of Grid and Disribued Compuing, vol/issue: 8(5), pp. 327-336, 2015. [15] F. Bahrpeyma, e al., An adapive RL based approach for dynamic resource provisioning in Cloud virualized daa ceners, Compuing, pp. 1-26, 2015. [16] X. Dureilh, e al., Using reinforcemen learning for auonomic resource allocaion in clouds: Towards a fully auomaed workflow, in ICAS 2011, The Sevenh Inernaional Conference on Auonomic and Auonomous Sysems, pp. 67-74, 2011. [17] R. M. Bahai and M. Bauer, Towards adapive policy-based managemen, in Nework Operaions and Managemen Symposium (NOMS), 2010 IEEE, pp. 511-518, 2010. [18] J. Rao, e al., VCONF: a reinforcemen learning approach o virual machines auo-configuraion, in Proceedings of he 6h inernaional conference on Auonomic compuing, pp. 137-146, 2009. [19] M. Amoui, e al., Adapive acion selecion in auonomic sofware using reinforcemen learning, in Auonomic and Auonomous Sysems, 2008. ICAS 2008. Fourh Inernaional Conference on, pp. 175-181, 2008. [20] R. N. Calheiros, e al., CloudSim: a oolki for modeling and simulaion of cloud compuing environmens and evaluaion of resource provisioning algorihms, Sofware: Pracice and Experience, vol/issue: 41(1), pp. 23-50, 2011. [21] Amazon EC2 insance ypes, hp://aws.amazon.com/ec2/. [22] K. Mogouie, e al., A Novel Approach for Opimizaion Auo-Scaling in Cloud Compuing Environmen, Inernaional Journal of Modern Educaion & Compuer Science, vol/issue: 7(8), pp. 9-16, 2015. BIOGRAPHIES OF AUTHORS Bahar Asgari received he B.S.C degree in Informaion Technology from PNU Universiy, Iran in 2012, and M.S.C degree from Azad Universiy of mahalla, Iran in 2015, respecively. Her research ineress include Cloud Compuing, Disribued Sysems, Big Daa and Sofware Engineering. Mosafa Ghobaei Arani received he B.S.C degree in Sofware Engineering from Universiy of Kashan, Iran in 2009, and M.S.C degree from Azad Universiy of Tehran, Iran in 2011, respecively. He is a PhD Candidae in Islamic Azad Universiy, Science and Research Branch, Tehran, Iran. His research ineress include Grid Compuing, Cloud Compuing, Pervasive Compuing, Disribued Sysems and Sofware Developmen. Sam Jabbehdari currenly working as an assisan professor a he deparmen of Compuer Engineering in IAU (Islamic Azad Universiy), Norh Tehran Branch, in Tehran, since 1993. He received his boh B.Sc. and M.S. degrees in Elecrical Engineering Telecommunicaion from Khajeh Nasir Toosi Universiy of Technology, and IAU, Souh Tehran branch in Tehran, Iran, respecively. He was honored Ph.D. degree in Compuer Engineering from IAU, Science and Research Branch, Tehran, Iran in 2005. His curren research ineress are Scheduling, QoS, MANETs, Wireless Sensor Neworks and Cloud Compuing. IJECE Vol. 6, No. 5, Ocober 2016 : 2415 2424