ARTIFICIAL neural networks (ANN s) have been used

Size: px
Start display at page:

Download "ARTIFICIAL neural networks (ANN s) have been used"

Transcription

1 694 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 3, MAY 1997 A New Evolutionary System for Evolving Artificial Neural Networks Xin Yao, Senior Member, IEEE, and Yong Liu Abstract This paper presents a new evolutionary system, i.e., EPNet, for evolving artificial neural networks (ANN s). The evolutionary algorithm used in EPNet is based on Fogel s evolutionary programming (EP). Unlike most previous studies on evolving ANN s, this paper puts its emphasis on evolving ANN s behaviors. This is one of the primary reasons why EP is adopted. Five mutation operators proposed in EPNet reflect such an emphasis on evolving behaviors. Close behavioral links between parents and their offspring are maintained by various mutations, such as partial training and node splitting. EPNet evolves ANN s architectures and connection weights (including biases) simultaneously in order to reduce the noise in fitness evaluation. The parsimony of evolved ANN s is encouraged by preferring node/connection deletion to addition. EPNet has been tested on a number of benchmark problems in machine learning and ANN s, such as the parity problem, the medical diagnosis problems (breast cancer, diabetes, heart disease, and thyroid), the Australian credit card assessment problem, and the Mackey Glass time series prediction problem. The experimental results show that EPNet can produce very compact ANN s with good generalization ability in comparison with other algorithms. Index Terms Evolution, evolutionary programming, evolution of behaviors, generalization, learning, neural-network design, parsimony. I. INTRODUCTION ARTIFICIAL neural networks (ANN s) have been used widely in many application areas in recent years. Most applications use feedforward ANN s and the backpropagation (BP) training algorithm. There are numerous variants of the classical BP algorithm and other training algorithms. All these training algorithms assume a fixed ANN architecture. They only train weights in the fixed architecture that includes both connectivity and node transfer functions. 1 The problem of designing a near optimal ANN architecture for an application remains unsolved. However, this is an important issue because there are strong biological and engineering evidences to support that the function, i.e., the information processing capability of an ANN is determined by its architecture. There have been many attempts in designing ANN architectures (especially connectivity 2 ) automatically, such as various Manuscript received January 6, 1996; revised August 12, 1996 and November 12, This work was supported by the Australian Research Council through its small grant scheme. The authors are with the Computational Intelligence Group, School of Computer Science, University College, The University of New South Wales, Australian Defence Force Academy, Canberra, ACT, Australia Publisher Item Identifier S (97) Weights in this paper indicate both connection weights and biases. 2 This paper is only concerned with connectivity and will use architecture and connectivity interchangeably. The work on evolving both connectivity and node transfer functions was reported elsewhere [4]. constructive and pruning algorithms [5] [9]. Roughly speaking, a constructive algorithm starts with a minimal network (i.e., a network with a minimal number of hidden layers, nodes, and connections) and adds new layers, nodes, and connections if necessary during training, while a pruning algorithm does the opposite, i.e., deletes unnecessary layers, nodes, and connections during training. However, as indicated by Angeline et al. [10], Such structural hill climbing methods are susceptible to becoming trapped at structural local optima. In addition, they only investigate restricted topological subsets rather than the complete class of network architectures. Design of a near optimal ANN architecture can be formulated as a search problem in the architecture space where each point represents an architecture. Given some performance (optimality) criteria, e.g., minimum error, fastest learning, lowest complexity, etc., about architectures, the performance level of all architectures forms a surface in the space. The optimal architecture design is equivalent to finding the highest point on this surface. There are several characteristics with such a surface, as indicated by Miller et al. [11], which make evolutionary algorithms better candidates for searching the surface than those constructive and pruning algorithms mentioned above. This paper describes a new evolutionary system, i.e., EPNet, for evolving feedforward ANN s. It combines the architectural evolution with the weight learning. The evolutionary algorithm used to evolve ANN s is based on Fogel s evolutionary programming (EP) [1] [3]. It is argued in this paper that EP is a better candidate than genetic algorithms (GA s) for evolving ANN s. EP s emphasis on the behavioral link between parents and offspring can increase the efficiency of ANN s evolution. EPNet is different from previous work on evolving ANN s on a number of aspects. First, EPNet emphasises the evolution of ANN behaviors by EP and uses a number of techniques, such as partial training after each architectural mutation and node splitting, to maintain the behavioral link between a parent and its offspring effectively. While some of previous EP systems [3], [10], [12] [15], acknowledged the importance of evolving behaviors, few techniques have been developed to maintain the behavioral link between parents and their offspring. The common practice in architectural mutations was to add or delete hidden nodes or connections uniformly at random. In particular, a hidden node was usually added to a hidden layer with full connections. Random initial weights were attached to these connections. Such an approach tends to destroy the behavior already learned by the parent and create poor behavioral link between the parent and its offspring /97$ IEEE

2 YAO AND LIU: NEW EVOLUTIONARY SYSTEM 695 Second, EPNet encourages parsimony of evolved ANN s by attempting different mutations sequentially. That is, node or connection deletion is always attempted before addition. If a deletion is successful, no other mutations will be made. Hence, a parsimonious ANN is always preferred. This approach is quite different from existing ones which add a network complexity (regularization) term in the fitness function to penalize large ANN s (i.e., the fitness function would look like ). The difficulty in using such a function in practice lies in the selection of suitable coefficient, which often involves tedious trial-and-error experiments. Evolving parsimonious ANN s by sequentially applying different mutations provides a novel and simple alternative which avoids the problem. The effectiveness of the approach has been demonstrated by the experimental results presented in this paper. Third, EPNet has been tested on a number of benchmark problems, including the parity problem of various sizes, the Australian credit card accessment problem, four medical diagnosis problems (breast cancer, diabetes, heart disease, and thyroid), and the Mackey Glass time series prediction problem. It was also tested on the two-spiral problem [16]. Few evolutionary systems have been tested on a similar range of benchmark problems. The experimental results obtained by EPNet are better than those obtained by other systems in terms of generalization and the size of ANN s. The rest of this paper is organized as follows. Section II discusses different approaches to evolving ANN architectures and indicates potential problems with the existing approaches, Section III describes EPNet in detail and gives motivations and ideas behind various design choices, Section IV presents experimental results on EPNet and some discussions, and finally Section V concludes with a summary of the paper and a few remarks. II. EVOLVING ANN ARCHITECTURES There are two major approaches to evolving ANN architectures. One is the evolution of pure architectures (i.e., architectures without weights). Connection weights will be trained after a near optimal architecture has been found. The other is the simultaneous evolution of both architectures and weights. Schaffer et al. [17] and Yao [18] [21] have provided a comprehensive review on various aspects of evolutionary artificial neural networks (EANN s). A. The Evolution of Pure Architectures One major issue in evolving pure architectures is to decide how much information about an architecture should be encoded into a chromosome (genotype). At one extreme, all the detail, i.e., every connection and node of an architecture can be specified by the genotype, e.g., by some binary bits. This kind of representation schemes is called the direct encoding scheme or the strong specification scheme. At the other extreme, only the most important parameters of an architecture, such as the number of hidden layers and hidden nodes in each layer are encoded. Other detail about the architecture is either predefined or left to the training process to decide. This kind of representation schemes is called the indirect encoding scheme or the weak specification scheme. Fig. 1 [20], [21] shows the evolution of pure architectures under either a direct or an indirect encoding scheme. It is worth pointing out that genotypes in Fig. 1 do not contain any weight information. In order to evaluate them, they have to be trained from a random set of initial weights using a training algorithm like BP. Unfortunately, such fitness evaluation of the genotypes is very noisy because a phenotype s fitness is used to represent the genotype s fitness. There are two major sources of noise. 1) The first source is the random initialization of the weights. Different random initial weights may produce different training results. Hence, the same genotype may have quite different fitness due to different random initial weights used by the phenotypes. 2) The second source is the training algorithm. Different training algorithms may produce different training results even from the same set of initial weights. This is especially true for multimodal error functions. For example, a BP may reduce an ANN s error to 0.05 through training, but an EP could reduce the error to due to its global search capability. Such noise can mislead the evolution because of the fact that the fitness of a phenotype generated from genotype is higher than that generated from genotype does not mean that has higher fitness than. In order to reduce such noise, an architecture usually has to be trained many times from different random initial weights. The average results will then be used to estimate the genotype s fitness. This method increases the computation time for fitness evaluation dramatically. It is one of the major reasons why only small ANN s were evolved in previous studies [22] [24]. In essence, the noise identified in this paper is caused by the one to many mapping from genotypes to phenotypes. Angeline et al. [10] and Fogel [3], [25] have provided a more general discussion on the mapping between genotypes and phenotypes. It is clear that the evolution of pure architectures has difficulties in evaluating fitness accurately. As a result, the evolution would be very inefficient. B. The Simultaneous Evolution of Both Architectures and Weights One way to alleviate the noisy fitness evaluation problem is to have a one to one mapping between genotypes and phenotypes. That is, both architecture and weight information are encoded in individuals and are evolved simultaneously. Although the idea of evolving both architectures and weights is not new [3], [10], [13], [26], few have explained why it is important in terms of accurate fitness evaluation. The simultaneous evolution of both architectures and weights can be summarized by Fig. 2. The evolution of ANN architectures in general suffers from the permutation problem [27], [28] or called competing conventions problem [17]. It is caused by the many to one mapping from genotypes to phenotypes since two ANN s which order their hidden nodes differently may have different

3 696 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 3, MAY 1997 Fig. 1. A typical cycle of the evolution of architectures. Fig. 2. A typical cycle of the evolution of both architectures and weights. The word genetic used above is rather loose and should not be interpreted in the strict biological sense. Genetic operators are just search operators. genotypes but are behaviorally (i.e., phenotypically) equivalent. This problem not only makes the evolution inefficient, but also makes crossover operators more difficult to produce highly fit offspring. It is unclear what building blocks actually are in this situation. For example, ANN s shown in Figs. 3(a) and 4(a) are equivalent, but they have different genotypic representations as shown by Figs. 3(b) and 4(b) using a direct encoding scheme. In general, any permutation of the hidden nodes will produce behaviorally equivalent ANN s but with different genotypic representations. This is also true for indirect encoding schemes. C. Some Related Work There is some related work to evolving ANN architectures. For example, Smalz and Conrad [29] proposed a novel approach to assigning credits and fitness to neurons (i.e., (a) (b) Fig. 3. (a) An ANN and (b) its genotypic representation, assuming that each weight is represented by four binary bits. Zero weight implies no connection. nodes) in an ANN, rather than the ANN itself. This is quite different from all other methods which only evaluate a complete ANN without going inside it. The idea is to identify those neurons which are most compatible with all of the network contexts associated with the best performance

4 YAO AND LIU: NEW EVOLUTIONARY SYSTEM 697 (a) (b) Fig. 4. (a) An ANN which is equivalent to that given in Fig. 3(a) and (b) its genotypic representation. on any of the inputs [29]. Starting from a population of redundant, identically structured networks that vary only with respect to individual neuron parameters, their evolutionary method first evaluates neurons and then copies with mutation the parameters of those neurons that have high fitness values to other neurons in the same class. In other words, it tries to put all fit neurons together to generate a hopefully fit network. However, Smalz and Conrad s evolutionary method does not change the network architecture, which is fixed [29]. The appropriateness of assigning credit/fitness to individual neurons also needs further investigation. It is well known that ANN s use distributed representation. It is difficult to identify a single neuron for the good or poor performance of a network. Putting a group of good neurons from different ANN s together may not produce a better ANN unless a local representation is used. It appears that Smalz and Conrad s method [29] is best suited to ANN s such as radial basis function (RBF) networks. Odri et al. [30] proposed a nonpopulation-based learning algorithm which could change ANN architectures. It uses the idea of evolutional development. The algorithm is based on BP. During training, a new neuron may be added to the existing ANN through cell division if an existing neuron generates a nonzero error [30]. A connection may be deleted if it does not change very much in previous training steps. A neuron is deleted only when all of its incoming or all of its outgoing connections have been deleted. There is no obvious way to add a single connection [30]. The algorithm was only tested on the XOR problem to illustrate its ideas [30]. One major disadvantage of this algorithm is its tendency to generate larger-than-necessary ANN and overfit training data. It can only deal with strictly layered ANN s. III. EPNET In order to reduce the detrimental effect of the permutation problem, an EP algorithm, which does not use crossover, is adopted in EPNet. EP s emphasis on the behavioral link between parents and their offspring also matches well with the emphasis on evolving ANN behaviors, not just circuitry. In its current implementation, EPNet is used to evolve feedforward ANN s with sigmoid transfer functions. However, this is not an inherent constraint. In fact, EPNet has minimal constraint on the type of ANN s which may be evolved. The feedforward ANN s do not have to be strictly layered or fully connected between adjacent layers. They may also contain hidden nodes with different transfer functions [4]. The major steps of EPNet can be described by Fig. 5, which are explained further as follows [16], [31] [34]. 1) Generate an initial population of networks at random. The number of hidden nodes and the initial connection density for each network are uniformly generated at random within certain ranges. The random initial weights are uniformly distributed inside a small range. 2) Partially train each network in the population on the training set for a certain number of epochs using a modified BP (MBP) with adaptive learning rates. The number of epochs,, is specified by the user. The error value of each network on the validation set is checked after partial training. If has not been significantly reduced, then the assumption is that the network is trapped in a local minimum and the network is marked with failure. Otherwise the network is marked with success. 3) Rank the networks in the population according to their error values, from the best to the worst. 4) If the best network found is acceptable or the maximum number of generations has been reached, stop the evolutionary process and go to Step 11). Otherwise continue. 5) Use the rank-based selection to choose one parent network from the population. If its mark is success, go to Step 6), or else go to Step 7). 6) Partially train the parent network for epochs using the MBP to obtain an offspring network and mark it in the same way as in Step 2), where is a user specified parameter. Replace the parent network with the offspring in the current population and go to Step 3). 7) Train the parent network with a simulated annealing (SA) algorithm to obtain an offspring network. If the SA algorithm reduces the error of the parent network significantly, mark the offspring with success, replace its parent by it in the current population, and then go to Step 3). Otherwise discard this offspring and go to Step 8). 8) First decide the number of hidden nodes to be deleted by generating a uniformly distributed random number between one and a user-specified maximum number. is normally very small in the experiments, no more than three in most cases. Then delete hidden nodes from the parent network uniformly at random. Partially train the pruned network by the MBP to obtain an offspring network. If the offspring network is better than the worst network in the current population, replace the worst by the offspring and go to Step 3). Otherwise discard this offspring and go to Step 9). 9) Calculate the approximate importance of each connection in the parent network using the nonconvergent method. Decide the number of connections to be deleted in the same way as that described in Step 8). Randomly delete the connections from the parent network according to the calculated importance. Partially train the pruned network by the MBP to obtain an offspring network. If the offspring network is better than the worst network in the current population, replace the worst by the offspring and go to Step 3). Otherwise discard this offspring and go to Step 10).

5 698 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 3, MAY 1997 Fig. 5. Major steps of EPNet. The above evolutionary process appears to be rather complex, but its essence is an EP algorithm with five mutations: hybrid training, node deletion, connection deletion, connection addition, and node addition. Details about each component of EPNet are given in the following sections. A. Encoding Scheme for Feedforward ANN s The feedforward ANN s considered by EPNet are generalized multilayer perceptrons [35, pp ]. The architecture of such networks is shown in Fig. 6, where and are inputs and outputs, respectively, Fig. 6. A fully connected feedforward ANN [35, p. 273]. 10) Decide the number of connections and nodes to be added in the same way as that described in Step 8). Calculate the approximate importance of each virtual connection with zero weight. Randomly add the connections to the parent network to obtain Offspring 1 according to their importance. Addition of each node is implemented by splitting a randomly selected hidden node in the parent network. The new grown network after adding all nodes is Offspring 2. Partially train Offspring 1 and Offspring 2 by the MBP to obtain a survival offspring. Replace the worst network in the current population by the offspring and go to Step 3). 11) After the evolutionary process, train the best network further on the combined training and validation set until it converges. where is the following sigmoid function: and are the number of inputs and outputs, respectively, is the number of hidden nodes. In Fig. 6, there are circles, representing all of the nodes in the network, including the input nodes. The first circles are really just copies of the inputs. Every other node in the network, such as node number, which calculates and, takes inputs from every node

6 YAO AND LIU: NEW EVOLUTIONARY SYSTEM 699 that precedes it in the network. Even the last output node (the th), which generates, takes input from other output nodes, such as the one which outputs. The direct encoding scheme is used in EPNet to represent ANN architectures and connection weights (including biases). This is necessary because EPNet evolves ANN architectures and weights simultaneously and needs information about every connection in an ANN. Two equal size matrices and one vector are used to specify an ANN in EPNet. The dimension of the vector is determined by a user-specified upper limit, which is the maximum number of hidden nodes allowable in the ANN. The size of the two matrices is, where and are the number of input and output nodes, respectively. One matrix is the connectivity matrix of the ANN, whose entries can only be zero or one. The other is the corresponding weight matrix whose entries are real numbers. Using two matrices rather than one is purely implementation-driven. The entries in the hidden node vector can be either one, i.e., the node exists, or zero, i.e., the node does not exist. Since this paper is only concerned with feedforward ANN s, only the upper triangle will be considered in the two matrices. There will be no connections among input nodes. Architectural mutations can be implemented easily under such a representation scheme. Node deletion and addition involve flipping a bit in the hidden node vector. A zero bit disables all the connections to and from the node in the connectivity matrix. Connection deletion and addition involve flipping a bit in the connectivity matrix. A zero bit automatically disables the corresponding weight entry in the weight matrix. The weights are updated by a hybrid algorithm described later. B. Fitness Evaluation and Selection Mechanism The fitness of each individual in EPNet is solely determined by the inverse of an error value defined by (1) [36] over a validation set containing patterns where and are the maximum and minimum values of output coefficients in the problem representation, is the number of output nodes, and are actual and desired outputs of node for pattern. Equation (1) was suggested by Prechelt [36] to make the error measure less dependent on the size of the validation set and the number of output nodes. Hence a mean squared error percentage was adopted. and were the maximum and minimum values of outputs [36]. The fitness evaluation in EPNet is different from previous work in EANN s since it is determined through a validation set which does not overlap with the training set. Such use of a validation set in an evolutionary learning system improves the generalization ability of evolved ANN s and introduces little overhead in computation time. The selection mechanism used in EPNet is rank based. Let sorted individuals be numbered as, with (1) the zeroth being the fittest. Then the selected with probability [37] th individual is The selected individual is then modified by the five mutations. In EPNet, error is used to sort individuals directly rather than to compute and use to sort them. C. Replacement Strategy and Generation Gap The replacement strategy used in EPNet reflects the emphasis on evolving ANN behaviors and maintaining behavioral links between parents and their offspring. It also reflects that EPNet actually emulates a kind of Lamarckian rather than Darwinian evolution. There is an on-going debate on whether Lamarckian evolution or Baldwin effect is more efficient in simulated evolution [38], [39]. Ackley and Littman [38] have presented a case for Lamarckian evolution. The experimental results of EPNet seem to support their view. In EPNet, if an offspring is obtained through further BP partial training, it always replaces its parent. If an offspring is obtained through SA training, it replaces its parent only when it reduces its error significantly. If an offspring is obtained through deleting nodes/connections, it replaces the worst individual in the population only when it is better than the worst. If an offspring is obtained through adding nodes/connections, it always replaces the worst individual in the population since an ANN with more nodes/connections is more powerful although it s current performance may not be very good due to incomplete training. The generation gap in EPNet is minimal. That is, a new generation starts immediately after the above replacement. This is very similar to the steady-state GA [40], [41] and continuous EP [42], although the replacement strategy used in EPNet is different. It has been shown that the steady-state GA and continuous EP outperform their classical counterparts in terms of speed and the quality of solutions [40] [42]. The replacement strategy and generation gap used in EPNet also facilitate population-based incremental learning. Vavak and Forgarty [43] have recently shown that the steady-state GA outperformed the generational GA in tracking environmental changes which are relatively small and occur with low frequency. D. Hybrid Training The only mutation for modifying ANN s weights in EPNet is implemented by a hybrid training algorithm consisting of an MBP and an SA algorithm. It could be regarded as two mutations driven by the BP and SA algorithm separately. They are treated as one in this paper for convenience sake. The classical BP algorithm [44] is notorious for its slow convergence and convergence to local minima. Hence it is modified in order to alleviate these two problems. A simple heuristic is used to adjust the learning rate for each ANN in the population. Different ANN s may have different learning rates. During BP training, the error is checked after every

7 700 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 3, MAY 1997 epochs, where is a parameter determined by the user. If decreases, the learning rate is increased by a predefined amount. Otherwise, the learning rate is reduced. In the latter case the new weights and error are discarded. In order to deal with the local optimum problem suffered by the classical BP algorithm, an extra training stage is introduced when BP training cannot improve an ANN anymore. The extra training is performed by an SA algorithm. When the SA algorithm also fails to improve the ANN, the four mutations will be used to change the ANN architecture. It is important in EPNet to train an ANN first without modifying its architecture. This reflects the emphasis on a close behavioral link between the parent and its offspring. The hybrid training algorithm used in EPNet is not a critical choice in the whole system. Its main purpose is to discourage architectural mutations if training, which often introduces smaller behavioral changes in comparison with architectural mutations, can produce a satisfactory ANN. Other training algorithms which are faster and can avoid poor local minima can also be used in EPNet. For example, recently proposed new algorithms, such as guided evolutionary simulated annealing [45], NOVEL [46] and fast evolutionary programming [47], can all be used in EPNet. The investigation of the best training algorithm is outside the scope of this paper and would be the topic of a separate paper. E. Architecture Mutations In EPNet, only when the hybrid training fails to reduce the error of an ANN will architectural mutations take place. For architectural mutations, node or connection deletions are always attempted before connection or node additions in order to encourage the evolution of small ANN s. Connection or node additions will be tried only after node or connection deletions fail to produce a good offspring. Using the order of mutations to encourage parsimony of evolved ANN s represents a dramatically different approach from using a complexity (regularization) term in the fitness function. It avoids the time-consuming trial-and-error process of selecting a suitable coefficient for the regularization term. Hidden Node Deletion: Certain hidden nodes are first deleted uniformly at random from a parent ANN. The maximum number of hidden nodes that can be deleted is set by a user-specified parameter. Then the mutated ANN is partially trained by the MBP. This extra training process can reduce the sudden behavioral change caused by the node deletion. If this trained ANN is better than the worst ANN in the population, the worst ANN will be replaced by the trained one and no further mutation will take place. Otherwise connection deletion will be attempted. Connection Deletion: Certain connections are selected probabilistically for deletion according to their importance. The maximum number of connections that can be deleted is set by a user-specified parameter. The importance is defined by a significance test for the weight s deviation from zero in the weight update process [48]. Denote the weight update by the local gradient of the linear error function with respect to example and weight, the significance of the deviation of from zero is defined by the test variable [48] where denotes the average over the set. A large value of test variable indicates higher importance of the connection with weight. The advantage of the above nonconvergent method [48] over others is that it does not require the training process to converge in order to test connections. It does not require any extra parameters either. For example, Odri et al. s method needs to guess values for four additional parameters. The idea behind the test variable (2) is to test the significance of the deviation of from zero [48]. Equation (2) can also be used for connections whose weights are zero, and thus can be used to determine which connections should be added in the addition phase. Similar to the case of node deletion, the ANN will be partially trained by the MBP after certain connections have been deleted from it. If the trained ANN is better than the the worst ANN in the population, the worst ANN will be replaced by the trained one and no further mutation will take place. Otherwise node/connection addition will be attempted. Connection and Node Addition: As mentioned before, certain connections are added to a parent network probabilistically according to (2). They are selected from those connections with zero weights. The added connections are initialized with small random weights. The new ANN will be partially trained by the MBP and denoted as Offspring 1. Node addition is implemented through splitting an existing hidden node, a process called cell division by Odri et al. [30]. In addition to reasons given by Odri et al. [30], growing an ANN by splitting existing ones can preserve the behavioral link between the parent and its offspring better than by adding random nodes. The nodes for splitting are selected uniformly at random among all hidden nodes. Two nodes obtained by splitting an existing node have the same connections as the existing node. The weights of these new nodes have the following values [30]: where is the weight vector of the existing node and are the weight vectors of the new nodes, and is a mutation parameter which may take either a fixed or random value. The split weights imply that the offspring maintains a strong behavioral link with the parent. For training examples which were learned correctly by the parent, the offspring needs little adjustment of its inherited weights during partial training. The new ANN produced by node splitting is denoted as Offspring 2. After it is generated, it will also be partially trained by the MBP. Then it has to compete with Offspring 1 for survival. The survived one will replace the worst ANN in the population. (2)

8 YAO AND LIU: NEW EVOLUTIONARY SYSTEM 701 TABLE I THE PARAMETERS USED IN THE EXPERIMENTS WITH THE N PARITY PROBLEM Fig. 7. The best network evolved by EPNet for the seven-parity problem. Fig. 8. The best network evolved by EPNet for the eight-parity problem. F. Further Training After Evolution One of the most important goal for ANN s is to have a good generalization ability. In EPNet, a training set is used for the MBP and a validation set for fitness evaluation in the evolutionary process. After the simulated evolution, the best evolved ANN is further trained using the MBP on the combined training and validation set. Then this further trained ANN is tested on an unseen testing set to evaluate its performance. Alternatively, all the ANN s in the final population can be trained using the MBP and the one which has the best performance on a second validation set is selected as EPNet s final output. This method is more time-consuming, but it considers all the information in the final population rather than just the best individual. The importance of making use of the information in a population has recently been demonstrated by evolving both ANN s [49], [50] and rule-based systems [50], [51]. The use of a second validation set also helps to prevent ANN s from overfitting the combined training and the first validation set. Experiments using either one or two validation sets will be described in the following section. IV. EXPERIMENTAL STUDIES A. The Parity Problems EPNet was first tested on the parity problem where [34]. All patterns were used in training. No validation sets were used. The parameters used in the experiments are given in Table I. Ten runs were conducted for each value from four to eight for the parity problem. The results are summarized in Table II, where number of epochs indicates the total number of epochs taken by EPNet when the best network is obtained. The results obtained by EPNet are quite competitive in comparison with those obtained by other algorithms. Table III compares EPNet s best results with those of cascadecorrelation algorithm (CCA) [5], the perceptron cascade algorithm (PCA) [7], the tower algorithm (TA) [6], and the FNNCA [8]. All these algorithms except for the FNNCA can produce networks with short cut connections. Two observations can be made from this table. First, EPNet can evolve very compact networks. In fact, it generated the smallest ANN among the five algorithms compared here. Second, the size of the network evolved by EPNet seems to grow slower than that produced by other algorithms when the size of the problem (i.e., ) increases. That is, EPNet seems to perform even better for large problems in terms of the number of hidden nodes. Since CCA, PCA, and TA are all fully connected, the number of connections in EPNet-evolved ANN s is smaller as well. Figs. 7 and 8 show the best networks evolved by EPNet for the seven- and eight-parity problem, respectively. Tables IV and V give their weights. It is rather surprising that a threehidden-node network can be found by EPNet for the eight-

9 702 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 3, MAY 1997 TABLE II SUMMARY OF THE RESULTS PRODUCED BY EPNet ON THE N PARITY PROBLEM. ALL RESULTS WERE AVERAGED OVER TEN RUNS TABLE III COMPARISON BETWEEN EPNet AND OTHER ALGORITHMS IN TERMS OF THE MINIMAL NUMBER OF HIDDEN NODES IN THE BEST NETWORK GENERATED. THE FIVE-TUPLES IN THE TABLE REPRESENT THE NUMBER OF HIDDEN NODES FOR THE FOUR-, FIVE-, SIX-, SEVEN-, AND EIGHT-PARITY PROBLEM, RESPECTIVELY. - MEANS NO RESULT IS AVAILABLE parity problem. This demonstrates an important point made by many evolutionary algorithm researchers an evolutionary algorithm can often discover novel solutions which are very difficult to find by human beings. However, EPNet might take a long time to find a solution to a large parity problem. Some of the runs did not finish within the user-specified maximum number of generations. Although there is a report on a two-hidden-node ANN which can solve the parity problem [52], their network was handcrafted and used a very special node transfer function, rather than the usual sigmoid one. B. The Medical Diagnosis Problems Since the training set was the same as the testing set in the experiments with the parity problem, EPNet was only tested for its ability to evolve ANN s that learn well but not necessarily generalize well. In order to evaluate EPNet s ability in evolving ANN s that generalize well, EPNet was applied to four real-world problems in the medical domain, i.e., the breast cancer problem, the diabetes problem, the heart disease problem, and the thyroid problem. All date sets were obtained from the UCI machine learning benchmark repository. These medical diagnosis problems have the following common characteristics [36]. The input attributes used are similar to those a human expert would use in order to solve the same problem. The outputs represent either the classification of a number of understandable classes or the prediction of a set of understandable quantities. In practice, all these problems are solved by human experts. Examples are expensive to get. This has the consequence that the training sets are not very large. There are missing attribute values in the data sets. These data sets represent some of the most challenging problems in the ANN and machine learning field. They have a small sample size of noisy data. The Breast Cancer Data Set: The breast cancer data set was originally obtained from W. H. Wolberg at the University of Wisconsin Hospitals, Madison. The purpose of the data set is to classify a tumour as either benign or malignant based on cell descriptions gathered by microscopic examination. The data set contains nine attributes and 699 examples of which 458 are benign examples and 241 are malignant examples.

10 YAO AND LIU: NEW EVOLUTIONARY SYSTEM 703 TABLE IV CONNECTION WEIGHTS AND BIASES (REPRESENTED BY T ) FOR THE NETWORK IN FIG. 7 TABLE V CONNECTION WEIGHTS AND BIASES (REPRESENTED BY T ) FOR THE NETWORK IN FIG. 8 The Diabetes Data Set: This data set was originally donated by Vincent Sigillito from Johns Hopkins University and was constructed by constrained selection from a larger database held by the National Institute of Diabetes and Digestive and Kidney Diseases. All patients represented in this data set are females of at least 21 years old and of Pima Indian heritage living near Phoenix, AZ. The problem posed here is to predict whether a patient would test positive for diabetes according to World Health Organization criteria given a number of physiological measurements and medical test results. This is a two class problem with class value one interpreted as tested positive for diabetes. There are 500 examples of class 1 and 268 of class 2. There are eight attributes for each example. The data set is rather difficult to classify. The so-called class value is really a binarised form of another attribute which is itself highly indicative of certain types of diabetes but does not have a one to one correspondence with the medical condition of being diabetic. The Heart Disease Data Set: This data set comes from the Cleveland Clinic Foundation and was supplied by Robert Detrano of the V.A. Medical Center, Long Beach, CA. The purpose of the data set is to predict the presence or absence of heart disease given the results of various medical tests carried out on a patient. This database contains 13 attributes, which have been extracted from a larger set of 75. The database originally contained 303 examples but six of these contained missing class values and so were discarded leaving 297. Twenty seven of these were retained in case of dispute, leaving a final total of 270. There are two classes: presence and absence (of heart disease). This is a reduction of the number of classes in the original data set in which there were four different degrees of heart disease. The Thyroid Data Set: This data set comes from the ann version of the thyroid disease data set from the UCI machine learning repository. Two files were provided. anntrain.data contains 3772 learning examples. ann-test.data contains 3428 testing examples. There are 21 attributes for each example. The purpose of the data set is to determine whether a patient referred to the clinic is hypothyroid. Therefore three classes are built: normal (not hypothyroid), hyperfunction and subnormal functioning. Because 92 percent of the patients are not hyperthyroid, a good classifier must be significantly better than 92%. Experimental Setup: All the data sets used by EPNet were partitioned into three sets: a training set, a validation set, and a testing set. The training set was used to train ANN s by MBP, the validation set was used to evaluate the fitness of the ANN s. The best ANN evolved by EPNet was further trained on the combined training and validation set before it was applied to the testing set. As indicated by Prechelt [36], [53], it is insufficient to indicate only the number of examples for each set in the above partition, because the experimental results may vary significantly for different partitions even when the numbers in each set are the same. An imprecise specification of the partition of a known data set into the three sets is one of the most frequent obstacles to reproduce and compare published neural-network learning results. In the following experiments, each data set was partitioned as follows. For the breast cancer data set, the first 349 examples were used for the training set, the following 175 examples for the validation set, and the final 175 examples for the testing set. For the diabetes data set, the first 384 examples were used for the training set, the following 192 examples for the validation set, the final 192 examples for the testing set. For the heart disease data set, the first 134 examples were used for the training set, the following 68 examples for the validation set, and the final 68 examples for the testing set. For the thyroid data set, the first 2514 examples in anntrain.data were used for the training set, the rest in ann-train.data for the validation set, and the whole ann-test.data for the testing set. The input attributes of the diabetes data set and heart disease data set were rescaled to between 0.0 and 1.0 by

11 704 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 3, MAY 1997 a linear function. The output attributes of all the problems were encoded using a 1-of- output representation for classes. The winner-takes-all method was used in EPNet, i.e., the output with the highest activation designates the class. There are some control parameters in EPNet which need to be specified by the user. It is, however, unnecessary to tune all these parameters for each problem because EPNet is not very sensitive to them. Most parameters used in the experiments were set to be the same: the population size (20), the initial connection density (1.0), the initial learning rate (0.25), the range of learning rate (0.1 to 0.75), the number of epochs for the learning rate adaptation (5), the number of mutated hidden nodes (1), the number of mutated connections (one to three), the number of temperatures in SA (5), and the number of iterations at each temperature (100). The different parameters were the number of hidden nodes of each individual in the initial population and the number of epochs for MBP s partial training. The number of hidden nodes for each individual in the initial population was chosen from a uniform distribution within certain ranges: one to three hidden nodes for the breast cancer problem; two to eight for the diabetes problem; three to five for the heart disease problem; and six to 15 for the thyroid problem. The number of epochs for training each individual in the initial population is determined by two user-specified parameters: the stage size and the number of stages. A stage includes a certain number of epochs for MBP s training. The two parameters mean that an ANN is first trained for one stage. If the error of the network reduces, then another stage is executed, or else the training finishes. This step can repeat up to the-number-of-stages times. This simple method balances fairly well between the training time and the accuracy. For the breast cancer problem and the diabetes problem, the two parameters were 400 and two. For the heart disease problem, they were 500 and two. For the thyroid problem, they were 350 and three. The number of epochs for each partial training during evolution (i.e., ) was determined in the same way as the above. The two parameters were 50 and three for the thyroid problem, 100 and two for the other problems. The number of epochs for training the best individual on the combined training and testing data set was set to be the same (1000) for all four problems. A run of EPNet was terminated if the average error of the population had not decreased by more than a threshold value after consecutive generations or a maximum number of generations was reached. The same maximum number of generations (500) and the same (10) were used for all four problems. The threshold value was set to 0.1 for the thyroid problem, and 0.01 for the other three. These parameters were chosen after some limited preliminary experiments. They were not meant to be optimal. Experimental Results: Tables VI and VII show EPNet s results over 30 runs. The error in the tables refers to the error defined by (1). The error rate refers to the percentage of wrong classifications produced by the evolved ANN s. It is clear from the two tables that the evolved ANN s have very small sizes, i.e., a small number of hidden nodes and connections, as well as low error rates. For example, TABLE VI ARCHITECTURES OF EVOLVED ARTIFICIAL NEURAL NETWORKS an evolved ANN with just one hidden node can achieve an error rate of % on the testing set for the diabetes problem. Another evolved ANN with just three hidden nodes can achieve an error rate of 1.925% on the testing set for the thyroid problem. In order to observe the evolutionary process in EPNet, Figs show the evolution of the mean of average numbers of connections and the mean of average classification accuracy of ANN s over 30 runs for the four medical diagnosis problems. The evolutionary processes are quite interesting. The number of connections in ANN s decreases in the beginning of the evolution. After certain number of generations, the number starts increasing in some cases, e.g., Fig. 9. This phenomenon illustrates the effectiveness of the ordering of different mutations in EPNet. There is an obvious bias toward parsimonious ANN s. In the beginning stage of the evolution, very few ANN s will be fully trained and thus most of them will have high errors. Deleting a few connections from an ANN will not affect its high error very much. After each deletion, further training is always performed, which is likely to reduce the high error. Hence deletion will be successful and the number of connections will be reduced. After certain number of generations, ANN s in the population will have fewer connections and lower errors than before. They have reached such a level that further deletion of connections will increase their errors in spite of further training due to the insufficient capacity of the ANN. Hence deletion is likely to fail and addition is likely to be attempted. Since further training after adding

12 YAO AND LIU: NEW EVOLUTIONARY SYSTEM 705 TABLE VII ACCURACIES OF EVOLVED ARTIFICIAL NEURAL NETWORKS Fig. 9. Evolution of ANN s connections and accuracy for the breast cancer problem. extra connections to an ANN often reduces its error because of a more powerful ANN, addition is likely to succeed. Hence the number of connections increases gradually while the error keeps reducing. Such trend is not very clear in Figs. 11 and 12, but it is expected to appear if more generations were allowed for the experiments. The heart disease and thyroid problems are larger than the breast cancer and diabetes problems. They would need more time to reach the lowest point for the number of connections. Comparisons with Other Work: Direct comparison with other evolutionary approaches to designing ANN s is very difficult due to the lack of such results. Instead, the best and latest results available in the literature, regardless of whether the algorithm used was an evolutionary, a BP or a statistical one, were used in the comparison. It is possible that some papers which should have been compared with were overlooked. However, the aim of this paper is not to compare EPNet exhaustively with all other algorithms.

13 706 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 3, MAY 1997 Fig. 10. Evolution of ANN s connections and accuracy for the diabetes problem. Fig. 11. Evolution of ANN s connections and accuracy for the heart disease problem. Fig. 12. Evolution of ANN s connections and accuracy for the thyroid problem. All 30 runs took less than 100 generations to finish. Some of them took less than 50 generations to finish. In those cases, the average number of connections and accuracy between the last generation and the 50th one were set to be the same as those at the last generation in order to draw this figure.

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Network Backpropagation Learning Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

Knowledge-Based - Systems

Knowledge-Based - Systems Knowledge-Based - Systems ; Rajendra Arvind Akerkar Chairman, Technomathematics Research Foundation and Senior Researcher, Western Norway Research institute Priti Srinivas Sajja Sardar Patel University

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

INPE São José dos Campos

INPE São José dos Campos INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA

More information

An empirical study of learning speed in backpropagation

An empirical study of learning speed in backpropagation Carnegie Mellon University Research Showcase @ CMU Computer Science Department School of Computer Science 1988 An empirical study of learning speed in backpropagation networks Scott E. Fahlman Carnegie

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

A Pipelined Approach for Iterative Software Process Model

A Pipelined Approach for Iterative Software Process Model A Pipelined Approach for Iterative Software Process Model Ms.Prasanthi E R, Ms.Aparna Rathi, Ms.Vardhani J P, Mr.Vivek Krishna Electronics and Radar Development Establishment C V Raman Nagar, Bangalore-560093,

More information

Ordered Incremental Training with Genetic Algorithms

Ordered Incremental Training with Genetic Algorithms Ordered Incremental Training with Genetic Algorithms Fangming Zhu, Sheng-Uei Guan* Department of Electrical and Computer Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore

More information

Knowledge Transfer in Deep Convolutional Neural Nets

Knowledge Transfer in Deep Convolutional Neural Nets Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

Evolution of Symbolisation in Chimpanzees and Neural Nets

Evolution of Symbolisation in Chimpanzees and Neural Nets Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems Ajith Abraham School of Business Systems, Monash University, Clayton, Victoria 3800, Australia. Email: ajith.abraham@ieee.org

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

Seminar - Organic Computing

Seminar - Organic Computing Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts

More information

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Using focal point learning to improve human machine tacit coordination

Using focal point learning to improve human machine tacit coordination DOI 10.1007/s10458-010-9126-5 Using focal point learning to improve human machine tacit coordination InonZuckerman SaritKraus Jeffrey S. Rosenschein The Author(s) 2010 Abstract We consider an automated

More information

NCEO Technical Report 27

NCEO Technical Report 27 Home About Publications Special Topics Presentations State Policies Accommodations Bibliography Teleconferences Tools Related Sites Interpreting Trends in the Performance of Special Education Students

More information

How to Judge the Quality of an Objective Classroom Test

How to Judge the Quality of an Objective Classroom Test How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Test Effort Estimation Using Neural Network

Test Effort Estimation Using Neural Network J. Software Engineering & Applications, 2010, 3: 331-340 doi:10.4236/jsea.2010.34038 Published Online April 2010 (http://www.scirp.org/journal/jsea) 331 Chintala Abhishek*, Veginati Pavan Kumar, Harish

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial

More information

Classification Using ANN: A Review

Classification Using ANN: A Review International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 7 (2017), pp. 1811-1820 Research India Publications http://www.ripublication.com Classification Using ANN:

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Sanket S. Kalamkar and Adrish Banerjee Department of Electrical Engineering

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Applications of data mining algorithms to analysis of medical data

Applications of data mining algorithms to analysis of medical data Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

University of Groningen. Systemen, planning, netwerken Bosman, Aart

University of Groningen. Systemen, planning, netwerken Bosman, Aart University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Thomas F.C. Woodhall Masters Candidate in Civil Engineering Queen s University at Kingston,

More information

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data Kurt VanLehn 1, Kenneth R. Koedinger 2, Alida Skogsholm 2, Adaeze Nwaigwe 2, Robert G.M. Hausmann 1, Anders Weinstein

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

A Comparison of Annealing Techniques for Academic Course Scheduling

A Comparison of Annealing Techniques for Academic Course Scheduling A Comparison of Annealing Techniques for Academic Course Scheduling M. A. Saleh Elmohamed 1, Paul Coddington 2, and Geoffrey Fox 1 1 Northeast Parallel Architectures Center Syracuse University, Syracuse,

More information

Australia s tertiary education sector

Australia s tertiary education sector Australia s tertiary education sector TOM KARMEL NHI NGUYEN NATIONAL CENTRE FOR VOCATIONAL EDUCATION RESEARCH Paper presented to the Centre for the Economics of Education and Training 7 th National Conference

More information

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and

More information

An Introduction to Simio for Beginners

An Introduction to Simio for Beginners An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality

More information

Utilizing Soft System Methodology to Increase Productivity of Shell Fabrication Sushant Sudheer Takekar 1 Dr. D.N. Raut 2

Utilizing Soft System Methodology to Increase Productivity of Shell Fabrication Sushant Sudheer Takekar 1 Dr. D.N. Raut 2 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 04, 2014 ISSN (online): 2321-0613 Utilizing Soft System Methodology to Increase Productivity of Shell Fabrication Sushant

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Issues in the Mining of Heart Failure Datasets

Issues in the Mining of Heart Failure Datasets International Journal of Automation and Computing 11(2), April 2014, 162-179 DOI: 10.1007/s11633-014-0778-5 Issues in the Mining of Heart Failure Datasets Nongnuch Poolsawad 1 Lisa Moore 1 Chandrasekhar

More information

Probability estimates in a scenario tree

Probability estimates in a scenario tree 101 Chapter 11 Probability estimates in a scenario tree An expert is a person who has made all the mistakes that can be made in a very narrow field. Niels Bohr (1885 1962) Scenario trees require many numbers.

More information

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers. Information Systems Frontiers manuscript No. (will be inserted by the editor) I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers. Ricardo Colomo-Palacios

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Circuit Simulators: A Revolutionary E-Learning Platform

Circuit Simulators: A Revolutionary E-Learning Platform Circuit Simulators: A Revolutionary E-Learning Platform Mahi Itagi Padre Conceicao College of Engineering, Verna, Goa, India. itagimahi@gmail.com Akhil Deshpande Gogte Institute of Technology, Udyambag,

More information

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS L. Descalço 1, Paula Carvalho 1, J.P. Cruz 1, Paula Oliveira 1, Dina Seabra 2 1 Departamento de Matemática, Universidade de Aveiro (PORTUGAL)

More information

TD(λ) and Q-Learning Based Ludo Players

TD(λ) and Q-Learning Based Ludo Players TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

ReFresh: Retaining First Year Engineering Students and Retraining for Success

ReFresh: Retaining First Year Engineering Students and Retraining for Success ReFresh: Retaining First Year Engineering Students and Retraining for Success Neil Shyminsky and Lesley Mak University of Toronto lmak@ecf.utoronto.ca Abstract Student retention and support are key priorities

More information

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science Gilberto de Paiva Sao Paulo Brazil (May 2011) gilbertodpaiva@gmail.com Abstract. Despite the prevalence of the

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

A SURVEY OF FUZZY COGNITIVE MAP LEARNING METHODS

A SURVEY OF FUZZY COGNITIVE MAP LEARNING METHODS A SURVEY OF FUZZY COGNITIVE MAP LEARNING METHODS Wociech Stach, Lukasz Kurgan, and Witold Pedrycz Department of Electrical and Computer Engineering University of Alberta Edmonton, Alberta T6G 2V4, Canada

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

The dilemma of Saussurean communication

The dilemma of Saussurean communication ELSEVIER BioSystems 37 (1996) 31-38 The dilemma of Saussurean communication Michael Oliphant Deparlment of Cognitive Science, University of California, San Diego, CA, USA Abstract A Saussurean communication

More information

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations 4 Interior point algorithms for network ow problems Mauricio G.C. Resende AT&T Bell Laboratories, Murray Hill, NJ 07974-2070 USA Panos M. Pardalos The University of Florida, Gainesville, FL 32611-6595

More information

Learning to Schedule Straight-Line Code

Learning to Schedule Straight-Line Code Learning to Schedule Straight-Line Code Eliot Moss, Paul Utgoff, John Cavazos Doina Precup, Darko Stefanović Dept. of Comp. Sci., Univ. of Mass. Amherst, MA 01003 Carla Brodley, David Scheeff Sch. of Elec.

More information

Probability and Game Theory Course Syllabus

Probability and Game Theory Course Syllabus Probability and Game Theory Course Syllabus DATE ACTIVITY CONCEPT Sunday Learn names; introduction to course, introduce the Battle of the Bismarck Sea as a 2-person zero-sum game. Monday Day 1 Pre-test

More information

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance Cristina Conati, Kurt VanLehn Intelligent Systems Program University of Pittsburgh Pittsburgh, PA,

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George

More information

Major Milestones, Team Activities, and Individual Deliverables

Major Milestones, Team Activities, and Individual Deliverables Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering

More information

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms ABSTRACT DEODHAR, SUSHAMNA DEODHAR. Using Grammatical Evolution Decision Trees for Detecting Gene-Gene Interactions in Genetic Epidemiology. (Under the direction of Dr. Alison Motsinger-Reif.) A major

More information

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne Web Appendix See paper for references to Appendix Appendix 1: Multiple Schools

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Degeneracy results in canalisation of language structure: A computational model of word learning

Degeneracy results in canalisation of language structure: A computational model of word learning Degeneracy results in canalisation of language structure: A computational model of word learning Padraic Monaghan (p.monaghan@lancaster.ac.uk) Department of Psychology, Lancaster University Lancaster LA1

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

Research Article Hybrid Multistarting GA-Tabu Search Method for the Placement of BtB Converters for Korean Metropolitan Ring Grid

Research Article Hybrid Multistarting GA-Tabu Search Method for the Placement of BtB Converters for Korean Metropolitan Ring Grid Mathematical Problems in Engineering Volume 2016, Article ID 1546753, 9 pages http://dx.doi.org/10.1155/2016/1546753 Research Article Hybrid Multistarting GA-Tabu Search Method for the Placement of BtB

More information

Thesis-Proposal Outline/Template

Thesis-Proposal Outline/Template Thesis-Proposal Outline/Template Kevin McGee 1 Overview This document provides a description of the parts of a thesis outline and an example of such an outline. It also indicates which parts should be

More information

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING Yong Sun, a * Colin Fidge b and Lin Ma a a CRC for Integrated Engineering Asset Management, School of Engineering Systems, Queensland

More information

Chapter 2 Rule Learning in a Nutshell

Chapter 2 Rule Learning in a Nutshell Chapter 2 Rule Learning in a Nutshell This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the

More information