Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia bracek@mizu.sk Absrac We apply a neural nework o model neural nework learning algorihm iself. The process of weighs updaing in neural nework is observed and sored ino file. Laer, his daa is used o rain anoher nework, which hen will be able o rain neural neworks by imiaing he rained algorihm. We use backpropagaion algorihm for boh, for raining, and for sampling he raining process. We imiae he raining of he nework as whole. All he weighs and weigh changes of mulilayer neural nework are processed in parallel in order o model muual dependencies beween weighs. Experimenal resuls are provided. Keywords: mealearning, learning o learn, error backpropagaion. Inroducion Adapive or opimizing learning algorihms migh be used in he neural nework learning or in he machine learning domains. Insead of fixed learning algorihms, hese algorihms improve heir own learning performance over ime, or hey develop paricular learning mehods from scrach. This ype of learning algorihms is known as he mealearning or he learning o learn approach. Works by Jürgen Schmidhuber and Sepp Hochreier [] [3] are represenaive of recen research in his area and more comprehensive overview is given by Sebasian Thrun in [4]. Thrun defines learning o learn as abiliy of algorihm o improve performance a each nex ask wih experience from previous asks [4]. Schmidhuber emphasizes on he abiliy of learner o evaluae and compare learning mehods and course of learning, and using his evaluaion o selec proper learning sraegy []. We can recognize following paradigms among mealearning approaches: similariy exploiaion, learning parameers adapaion, discovery of learning algorihm. Paricular mehods migh be focused on any of hese paradigms, or on all of hem. Similariy exploiaion is he idea ha a group of asks shares some similariy, which once learned, migh speed-up he learning of anoher asks. We can simply learn he sequence of asks o exploi heir similariy, bu some mechanism for disinguishing he ask-specific knowledge versus common cross-ask knowledge should improve he performance. Adapaion of learning parameers migh be done by some mea-learning algorihm above he learning algorihm. This paradigm migh be based more on he learning abou learning, hen on he learning o learn idea. However, knowledge abou learning is he sep o he learning o learn. Discovery of learning algorihm is he design of learning algorihm from scrach. This is more learning he learning hen learning o learn. Here we shif focus from adapaion ino learning. Mealearning algorihms migh be based on he reinforcemen learning or on supervised learning. In he reinforcemen learning case, he learner hrough rial-and-error experience improves no only is performance on some paricular ask, bu also is abiliy o learn. This can be achieved by reaing learning algorihms as a par of solved ask. I is: learning is one of acions of he learner. In he supervised learning scenario, learning and mealearning are usually reaed as independen processes.
Backpropagaion Imiaion In his secion we will describe modelling of he error backpropagaion algorihm. We wan rain neural nework o rain anoher neural nework. A neural nework model of error backpropagaion algorihm should be able o rain neural neworks in he similar manner as he original backpropagaion algorihm did. To obain such a model we will sample he raining process of backpropagaion learning and hen ry o imiae i. This is simple, while also a general approach o mealearning. I consiss of he following sequence:. rain arbirary neural nework wih error backpropagaion algorihm and sample he learning process,. rain he learning nework o imiae original learning algorihm, 3. rain arbirary neural nework using he learning nework. Consider mulilayer neural nework wih neuron acivaions x i, link weighs w ij, biases θ i, and neuron acivaion funcions f i (in i ): x i = f i (in i ) in i = M w ij x j + θ i () j= The in i is inpu ino i-h neuron and M is he number of links connecing ino i-h neuron. The error J in he supervised learning mode is defined: J p = N i= (ev p i xp i ) () The p is he index of daa paern, N is he number of oupu neurons of neural nework, and ev p i is he expeced oupu of i-h neuron on p-h paern. For simpliciy, we will omi paern index p laer. Gradien based error minimizing adapaion of weighs follows: w ij = γ J w ij = γ J w ij = γδ i x j (3) The weigh w ij links he j-h neuron ino i-h neuron, he γ is learning rae consan, and δ i is defined as: δ i = J = J = J f (in i ) (4) The f (in i ) is he derivaive of acivaion funcion f(in i ). For oupu neurons we ge: δ i = J f (in i ) = (ev i x i )f (in i ) (5) For neurons in hidden layers we ge: N h δ i = f J in h (in i ) = in h N h = f J N l (in i ) w hl x l = in h l= N h = f J N h (in i ) w hi = f (in i ) δ h w hi (6) in h The N h is number of links coming from i-h neuron and h is index of hese links and corresponding neurons. The N l is number of neurons which have connecions ino h neurons (see Fig.). The rule (6) is he error backpropagaion rule, defining he backward propagaion of error hrough nework. Rule (3) defines weigh changes minimizing his error, and rule (5) ses he base for error minimizaion. j w ij i N l Figure : Neuron indices for rule (6). The error backpropagaion algorihm is defined by rules (), (), (3), (5), and (6). To model his algorihm we may sample variables: w, w, θ, δ, x, ev, J, in, and f (in). Some of hese variables can be derived from ohers, so no full se of hem is necessary. We can model eiher, he rule (6), or he full se of rules. When modelling full se of rules, ineracions in he whole nework may be processed in model. When modelling rule (6) only, only neighborhood of paricular link is considered. N h
The number of inpus and oupus of learning nework is equal o he number of sampled variables. Oupus are w changes, and θ possibly. To use learning nework, rules (6), (5), and (3) from original backpropagaion algorihm have o be replaced wih oupus of his learning nework..98, number of raining cycles is, number of hidden neurons is. Nework opology is he same as on he Fig.3. inpu 3 Experimens w oupu Consider he neural nework on he Fig. wih wo inpus, one hidden neuron, and one oupu. I has hree weighs and wo biases, which changes we will ry o approximae wih learning nework. Thus, we will sample hese changes while learning wih he backpropagaion algorihm. Then, we will rain learning nework o approximae hem. Besides hese changes, we will sample all hree weighs and wo biases, wo inpus, one oupu, and one expeced value on oupu. This is: 9 inpus and 5 oupus for he learning nework. Such a learning nework wih wo hidden neurons is on he Fig.3. The number of hidden neurons is arbirary, i migh depend on he asks learned and on he complexiy of original raining algorihm, in our case error backpropagaion. v hb_ yb_ x x y ev hidden h h d_v d_hb_ d_yb_ d_w d_w inpu x w hidden w x hb w hb_ h yb v yb_ oupu Figure : Simple nework o be rained. The x, and x are inpus; y is oupu; w, w, and v are weighs; hb, and yb are biases; h is hidden neuron acivaion, and y is he oupu. In he s experimen we will rain he nework from Fig. o approximae boolean funcion AND (Tab.). This is simple ask and neworks learn quickly. Parameers of raining of basic nework are: γ =.3, number of raining cycles is 5. Parameers of raining of learning nework are: γ = y Figure 3: Learning nework wih wo hidden neurons for raining of nework from Fig.. Inpus are he variables describing he sae of neural nework on Fig. and oupus are changes of hem provided by he learning algorihm. AND x x y OR x x y Table : Training daa for he boolean AND and OR funcions for nework on Fig.. 3
The raining hisory for learning nework is on he Fig.4. Comparison of raining of basic nework using backpropagaion algorihm and using learning nework is on he Fig.5. The learning nework achieved beer convergence hen he original backpropagaion algorihm. However, he performance of learning nework depends also on is raining i is prone o overfiing. Also noe, ha implemenaions of raining using learning nework and error backpropagaion algorihm may differ in speed. In our case, in his experimen, backpropagaion raining ook.48 seconds, while learning nework raining ook 3.78 seconds. In he nd experimen we will use basic nework wihou a hidden layer. This is sufficien for he AND-funcion approximaion. We will have inpus, oupu, and hidden neurons in he basic nework; and 7 inpus, 3 oupus, and hidden neurons in he learning nework. The raining hisory for his learning nework is on he Fig.6. Comparison of raining of basic nework wihou hidden neurons using backpropagaion algorihm and using learning nework is on he Fig.7. Performance of learning nework in his seup is comparable o performance of backpropagaion algorihm, alhough i is slighly worse hen in he s experimen wih hidden neurons..5.5 J() J().5.5 4 6 8 4 6 8 3 4 5 6 7 8 9 Figure 4: Error of he raining of learning nework for he basic nework from Fig.. Figure 6: Error of he raining of learning nework for he basic nework wihou hidden neurons..5.45 Error Backpropagaion Learning Nework.5.45 Error Backpropagaion Learning Nework.4.4.35.35.3.3 J().5 J().5...5.5...5.5 4 6 8 4 6 8 Figure 5: Error of he raining of basic nework from Fig. using error backpropagaion algorihm and using he learning nework from Fig.3. Figure 7: Error of he raining of basic nework wihou hidden neurons using error backpropagaion algorihm and using he learning nework. 4
In he 3rd experimen we will increase he number of hidden neurons. We will have inpus, oupu, and 4 hidden neurons in he basic nework; and inpus, 7 oupus, and 3 hidden neurons in he learning nework. The ask is OR-funcion approximaion. The raining hisory for his learning nework is on he Fig.8. Comparison of raining of basic nework wihou hidden neurons using backpropagaion algorihm and using learning nework is on he Fig.9. Performance of learning nework in his seup is again beer hen original error backpropagaion algorihm. are: inpus: x and y, oupus: innersquare and ouersquare, and 5 hidden neurons for he basic nework; and 38 inpus, 7 oupus, and 3 hidden neurons for he learning nework. The raining hisory for his learning nework is on he Fig.. Comparison of raining of basic nework using backpropagaion algorihm and using learning nework is on he Fig.. Training wih he learning nework in his ask diverges, while raining wih backpropagaion algorihm sopped on some level bu did no diverge..5 4 J() J() 8 6.5 4 4 6 8 4 6 8 3 4 5 6 7 8 9 Figure 8: Error of he raining of learning nework for he basic nework wih 4 hidden neurons. J().5.45.4.35.3.5..5..5 Error Backpropagaion Learning Nework 4 6 8 Figure 9: Error of he raining of basic nework wih 4 hidden neurons using error backpropagaion algorihm and using he learning nework. In he 4h experimen we will ry more difficul classificaion ask. The Fig. depics raining and esing daa ses. The nework opologies Figure : Error of he raining of learning nework for he basic nework for square classificaion ask. J() 7 6 5 4 3 Error Backpropagaion Learning Nework 5 5 Figure : Error of he raining of basic nework for square classificaion ask using error backpropagaion algorihm and using he learning nework. 5
raining se esing se Using neural nework model of backpropagaion algorihm o rain neural neworks is a viable approach. Novel mehods of performance uning of learning algorihm are possible when using his model. There is, however, a risk of learning insabiliy wih his approach, and acual modelling of backpropagaion can be done in several differen modes. Figure : Training and esing ses for he classificaion ask. The ask is o classify poins in space by heir x and y coordinaes, wheher hey will fi ino inner square or no. 4 Analysis The AND/OR-funcions approximaion asks wih hidden unis show good resuls when rained wih learning nework. The abiliy of learning nework o ouperform backpropagaion algorihm seems promising. In fuure, knowledge from several raining algorihms migh be used o rain learning nework in order o ge even beer performance by exploiing bes of all of hese algorihms. Trial and error mode migh be furher used when raining learning nework o furher improve is performance, possibly beyond he reach of convenional learning algorihms. Slighly worse performance in he experimens wihou hidden unis poins o he nonlinear characer of eiher, learning rules of backpropagaion algorihm, or he neural nework which we rain. The problem wih applicaion of our approach o more complex classificaion ask migh be of similar characer as insabiliy of overrained learning nework. Chance of insabiliy of learning wih learning nework is an inheren propery of his approach. The beer performance wih simpler neworks favorizes he modelling of he rule (6) only of backpropagaion algorihm, insead of modelling all he rules, which we did in experimens. To furher invesigae he neural nework modelling of backpropagaion algorihm, rule (6) modelling, and deeper analysis of variable se for algorihm sampling migh help. 5 Conclusion References [] M.Karák, Mealearning mehods for neural neworks, (in Slovak), MS Thesis, Technical Universiy Košice, (5). neuron.uke.sk/ jaksa/heses [] J.Schmidhuber, J.Zhao, and M.Wiering, Simple principles of mealearning, Technical Repor IDSIA-69-96, IDSIA, (996). cieseer.is.psu.edu/schmidhuber96simple.hml [3] S.Hochreier, A.S.Younger, and P.R.Conwell, Learning o Learn Using Gradien Descen, Lecure Noes in Compuer Science, vol.3, (). cieseer.is.psu.edu/hochreierlearning.hml [4] S.Thrun, Learning To Learn: Inroducion. cieseer.is.psu.edu/aricle/hrun96learning.hml [5] J.Schmidhuber, Evoluionary Principles in Self- Referenial Learning, Diploma Thesis, Technische Universiä München, (987). www.idsia.ch/ juergen/diploma.hml [6] J.Schmidhuber, On Learning How o Learn Learning Sraegies, Technical Repor FKI-98-94, Fakul für Informaik, Technische Universi München, (994). cieseer.is.psu.edu/schmidhuber95learning.hml [7] J.Schmidhuber, A General Mehod for Incremenal Self-Improvemen and Muli-agen Learning in Unresriced Environmens, In X.Yao (Ed.), Evoluionary Compuaion: Theory and Applicaions, Scienific Publ. Co., Singapore, (996). cieseer.is.psu.edu/aricle/schmidhuber96general.hml [8] J.Schmidhuber, A Neural Nework Tha Embeds Is Own Mea-Levels, In Proc. of he Inernaional Conference on Neural Neworks 93, San Francisco. IEEE, (993). cieseer.is.psu.edu/schmidhuber93neural.hml 6