THE associative memory problem is stated as follows. We

Size: px

Start display at page:

Download "THE associative memory problem is stated as follows. We"

Melvin Paul
6 years ago
Views:

1 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 A Weighted Voting Model of Associative Memory Xiaoyan Mu, Paul Watta, and Mohamad H. Hassoun Abstract This paper presents an analysis of a random access memory (RAM)-based associative memory which uses a weighted voting scheme for information retrieval. This weighted voting memory can operate in heteroassociative or autoassociative mode, can store both real-valued and binary-valued patterns and, unlike memory models, is equipped with a rejection mechanism. A theoretical analysis of the performance of the weighted voting memory is given for the case of binary and random memory sets. Performance measures are derived as a function of the model parameters: pattern size, window size, and number of patterns in the memory set. It is shown that the weighted voting model has large capacity and error correction. The results show that the weighted voting model can successfully achieve high-detection and -identification rates and, simultaneously, low-false-acceptance rates. Index Terms Associative memory, capacity, neural network, retrieval, voting, weighted voting. I. INTRODUCTION THE associative memory problem is stated as follows. We are given a fundamental memory set of desired associations: where and,. The task is to design a system which robustly stores the fundamental associations [13], such that the following holds: 1) when presented with as input, the system should produce at the output. 2) when presented with a noisy (corrupted, distorted, or incomplete) version of at the input, the system should also produce at the output; 3) when presented with an input that is not sufficiently similar to any of the inputs in the memory set the system should reject the input. The meaning of the words noisy and not sufficiently similar are application dependent. For example, in an image processing application, translation, rotation, and scale variations are common types of image distortion. Early work on associative neural networks [9], [12], [15], [19], [20] focused on designing systems to meet the first two re- Manuscript received May 23, 2006; revised October 26, 2006; accepted November 14, This work was supported in part by the the University of Michigan under the Office of the Vice President for Research (OVPR) Grant. X. Mu is with the Department of Electrical and Computer Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN USA ( mu@rose-hulman.edu). P. Watta is with the Department of Electrical and Computer Engineering, University of Michigan-Dearborn, Dearborn, MI USA ( watta@umich.edu). M. H. Hassoun is with the Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI USA ( hassoun@eng.wayne.edu). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TNN quirements. In fact, this focus on just the first two requirements is characteristic of more recent research as well [10], [30], [31], [38]. For many practical applications, though, the third requirement is as important or even more important than the first two. Many neural-net-based associative memories have no rejection mechanism and, hence, cannot even distinguish between meaningful patterns and pure noise. This glaring deficiency may explain why few, if any, of the associative neural memory designs have found their way into practical applications. This is unfortunate since many important and practical applications require associative memory as a main component. Automated human face recognition is one such application. In [16], a voting-based model of associative memory was proposed and analyzed. This associative memory model was called a two-level decoupled Hamming memory in order to emphasize the two types of processing that it employs. The low-level processing consists of a network of random access memory (RAM) processors. Each RAM unit computes local distance measures between the given input pattern and each of the patterns in the memory set. After cycling through the entire memory set, each RAM processor casts a vote for the pattern in the memory set to which it is (locally) closest. The higher level processing of this memory consists of a voting network which tallies the votes cast for each memory set pattern, and then outputs the pattern with the most votes. The two-level decoupled Hamming memory was shown to have a number of advantages over other neural-net-based associative memories [16], [37]. First, it was shown both theoretically and experimentally that the two-level decoupled Hamming memory has high capacity and offers a large amount of error correction. Second, this model never produces spurious memories and cannot get stuck in oscillations. Third, this memory model is easy to maintain. Associations can be easily (and online) added or deleted from the memory set. Some neural net approaches require complete retraining when adding or deleting a single association. Fourth, given a single memory key, this model is able to retrieve more than just one pattern. By ordering the number of votes, the memory can retrieve an ordered list of best-matching patterns. Finally, the system can be implemented in hardware with present digital technology. In order to be consistent with the present discussion, we will refer to the two-level decoupled Hamming memory as the voting-based associative memory model, or simply voting model. In this paper, we extend the analysis of the voting model in several ways. First, the voting memory is enhanced by including a rejection mechanism. The rejection threshold is in terms of number of votes, which is typically easier to adjust and more robust than thresholds set up to discriminate among distances. Second, we generalize the voting scheme by allowing for each local processor to cast not just a single vote, but a set of weighted /$ IEEE

2 MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 757 votes. The resulting memory model will be referred to as the weighted voting memory. Third, a theoretical analysis of the capacity of the weighted voting memory is given, as well as an analysis of how to compute the weights from training data. The innovations of the voting memory developed here were inspired by the application of human face recognition [7], [42]. The face-recognition research community has developed protocols for assessing system performance [32], [33]. There are the two main tests commonly used: identification test and watchlist test. In both cases, we are given a database of patterns (face images) to store. In the identification test, the system is tested with images of known people (of course, the test images are different from the database images). For a given input image, the task is to identify which database person it is. In this case, no rejection state is needed. The measure of merit here is the identification rate (IR), which is the probability that a given input image will be matched with the correct database person. In the associative memory literature, the term retrieval rate is commonly used and typically means the same thing as identification rate. We will use both terms interchangeably. In the watchlist test, the system must have a rejection mechanism. Here, the two test sets are required:, which contains images of the known people, and, which contains images of strangers (people not in the database). For the test set, there are the following two measures of merit: the detection and identification rate (DIR), which is the percentage of images that are correctly matched with the known individuals, and the false rejection rate (FRR), which is the percentage of images that are rejected by the system [32]. For the test set, there is only one measure of interest: the false acceptance rate (FAR), which gives the percentage of imposter images that are incorrectly matched with someone in the database. Of course there is a tradeoff between the detection and identification rate and the false acceptance rate. Typically, face-recognition systems are designed with a tunable parameter or threshold which allows one to control the tradeoff between DIR and FAR. A receiver operating characteristic (ROC) curve can be constructed which shows how DIR and FAR vary as a function of. Note that the identification test can be seen as a special case of the watchlist test on, where the threshold is set to 0 (so the system does not reject any images). We propose that researchers in neural associative memories adopt both the identification and watchlist testing methodology. In this paper, for memory sets consisting of random binary patterns, we are able to derive theoretical expressions for the retrieval rate, detection and identification rate, and the false acceptance rate for both the voting model and the weighted voting model. This paper is organized as follows. Section II reviews the operation of the voting model and extends the analysis to consider rejection and the watchlist problem. In Section III, a weighted voting model is proposed and its operation is described. In Section IV, a theoretical analysis of the performance of the weighted voting memory is given. Section V gives experimental results on random memory patterns. These experimental results are compared to the theoretical predictions. Finally, Section VI summarizes the results and discusses future extensions of this work. Fig. 1. Structure of the voting associative memory. II. VOTING ASSOCIATIVE MEMORY In this section, we first review the operation of the voting associative memory and some of the results in [16]. The previous analysis relied mainly on a continuous approximation of the necessary probability distributions. In this paper, we derive the corresponding discrete probability distributions. As will be shown in Section V, the discrete distributions give a much more accurate model of the behavior of the voting memory. In addition, we extend the analysis by considering rejection and we derive measures of performance on the watchlist test. A. Operation of the Voting Associative Memory In the voting associative memory, the -dimensional input and each memory pattern are partitioned into a collection of nonoverlapping windows of size. For notional simplicity, we will assume that divides, hence, the number of windows is an integer. For the input (memory key), let denote the data in each window. That is, is the portion of contained in the th window, etc. The database patterns are partitioned in the same way:,. The partitioned database patterns can be stored in a RAM-type network, where the th RAM holds all the database patterns associated with the th window:. Fig. 1 shows the architecture of a RAM network with nine windows arranged in a 3 3 structure. The voting network requires a distance measure to be computed locally at each window. Let be a distance measure between two -dimensional vectors. Any suitable distance function can be used. In this paper, we use the Hamming distance for binary memory patterns and the city-block distance for realvalued patterns. In either case, the (local) distance between and is given by

3 758 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 Fig. 2. Distance calculation for the voting memory. The distance between the highlighted window of the input and corresponding window of each of the memory set patterns is computed. where and denote the th component of and, respectively. The local distance calculation of the voting network is shown in Fig. 2. At each window, we compute a local distance between the input key and all the memory patterns. The smallest distance is found, say, and then the local window casts a vote for memory pattern. The decision network examines the votes of all the windows, chooses the memory pattern, say, that received the most votes, and then, outputs. Combination schemes more sophisticated than plurality voting could also be used [14], [21] [24], [40]. Note that the network structure of the voting memory is similar to the WISARD system of Aleksander [1], [25]. However, the present system is formulated to work with general types of data, not just binary data. In addition, the present system uses regular connections between the input and the RAM units and not random connections. Finally, the voting-based method of information retrieval is not present in the WISARD system. It is easy to introduce a rejection mechanism in the voting model. We simply use a threshold to indicate whether the number of votes received by the best-matching pattern is sufficiently large. In the case that the number of votes received is less than, the input is rejected. It is interesting to note the two-level structure of this associative memory network. The RAMs are low-level processors which operate on just a portion of the image. The decision network is a higher level computation which integrates and makes sense of the low-level information. Of course the problem of understanding the connection between low-level processing and high-level decision making has long been an area of interest in both neurobiology and artificial intelligence [3], [6], [35], [43]. randomly and independently). We will assume that each component of the fundamental memory patterns has a 50% chance of being 1 and 50% chance of being 0. We start by considering the identification task. In this case, we test the system with noisy versions of the memory set patterns and see how well the system can retrieve the correct pattern. To create a suitable memory key, we proceed as follows. We first select one of the memory set patterns called the target memory pattern, or simply target. The memory key is formed by corrupting the target memory pattern with an amount of uniform random noise; that is, with probability, each component of the target pattern is flipped from its original value. Each of the remaining fundamental memories will be called nontarget memory patterns. The goal of the theoretical analysis is to derive expressions for the retrieval rate, detection and identification rate, and the false acceptance rate model parameters: dimension of memory patterns ; window size; number of patterns in the memory set; amount of noise. as a function of the following C. Probability of Voting for the Target and Nontarget Patterns Recall that our memory key and memory set patterns are partitioned into windows. Let be a random variable which gives the local Hamming distance between a single window of the input and the corresponding window of the target pattern. From our assumptions, follows a binomial distribution: where, and is the combination of things taken at a time Similarly, let be a random variable which gives the local distance between the input and one of the nontarget patterns. The distribution for is given by [17], [37] For each, the local window will vote for the target pattern if and, where ranges over all the indices except for the index of the target image. Hence, the probability that a local window will vote for the target pattern is given by (1) (2) B. Assumptions for Theoretical Analysis The voting network is suitable for storing real-valued and heteroassociative memory sets. For the theoretical analysis that follows, though, we assume that the memory set is binary-valued ( ) and random (each memory pattern is generated (3)

4 MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 759 Similarly, a local window will vote for the th nontarget pattern if and, where ranges over all the indices except for and the index of the target image. Hence, the probability that a local window will vote for the nontarget pattern is given by where the mean and variance are given by (8) (9) Similarly, the distribution for can be approximated as (10) with mean and variance given by (11) (12) E. Estimation of the Correct Retrieval Rate The Continuous Case D. Number of Votes for the Target and Nontarget Patterns and give the probability that a single window will vote for the target or one of the nontarget patterns, respectively. The decision network counts up the votes for each of the windows, and then, chooses the fundamental memory with the most votes. Here, we ask: what is the total number of votes received by the target and the nontarget memories? Let be a random variable that gives the total number of votes received by the target pattern. Each window of the RAM network performs a Bernoulli experiment with probability of success (i.e., getting a vote) and probability of failure (not getting a vote). If the experiment is repeated over the windows, the probability that the target receives votes follows a binomial distribution: (4) (5) We now have probability density functions for, the number of votes received by the target pattern, and, the number of votes received by the th nontarget pattern. These density functions can be expressed in discrete form [(5) and (6)], or in an approximate continuous form [(7) and (10)]. The system will retrieve the correct pattern when is larger than for each and every one of the nontarget memory patterns. Of these nontarget patterns, we need only concern ourselves with the one that received the maximum number of votes. Let be a random variable that gives the maximum number of votes among all the nontarget patterns. In the previous analysis [16], the authors showed that the density function for can be expressed in terms of the continuous densities as where with (13) is the cumulative distribution function associated. That is, for the th nontarget pattern (14) where. Similarly, let be the total number of votes received by the th nontarget pattern. also follows a binomial distribution of the form: Then, the average value of is given by (15) (6) By the central limit theorem, and assuming, the binomial distribution for approaches a normal distribution given by (7) In the original analysis [16], the probability of correct retrieval was defined as. However, this definition is not useful, since it does not properly model how the voting memory works. Rather, for the identification problem, the probability of correct retrieval is simply the probability that the number of votes received by the target pattern exceeds that received by the maximum nontarget pattern (16)

5 760 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 Assuming that and are independent random variables (and since both are Gaussian), can be written as As with the continuous case, the probability of correct retrieval is the probability that. In the discrete case, though, is an integer that varies from 0 to. Hence, (17) The proof of (17) is shown in Appendix A. Upon substituting (13) and (14), we get an expression for the probability of correct retrieval in terms of the two Gaussian density functions and and rearranging the in- Plugging in for the given value of equality, we receive (18) Assuming is independent of F. Estimation of the Correct Retrieval Rate The Discrete Case The probability of correct retrieval in (18) is an approximation. It is possible, though, to derive an exact expression for the probability of correct retrieval using the discrete density functions. Note that this analysis was not done in [16]. We have the discrete distributions for the number of votes received by the target pattern (5) and the number of votes received by a nontarget pattern (6). To compute the probability of correct retrieval, we first need to compute the (discrete) distribution of the number of votes received by the maximum nontarget pattern. Unlike the continuous case, here, we have to consider the possibility of ties among the nontarget patterns. For example, suppose the maximum number of votes among nontarget patterns is. The probability that a single nontarget memory set pattern received votes (and all the other nontarget patterns received less) is. The probability that precisely of the nontarget patterns achieve number of votes (a -way tie at the top) is given by. To account for all possible ties, we sum over all possible values of (21) G. Watchlist Test Here, the voting network employs a rejection mechanism. A threshold is set, and if the winning memory set pattern does not receive at least votes, the input is rejected. The probability of correct retrieval here is the DIR, which is the probability that and (22) To compute in the discrete form, we start with (21), but now the smallest allowable value for is (23) (19) Similarly, in continuous form, is given by Substituting (6) into (19), we receive (20) (24)

6 MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 761 To see how well the system rejects unknown patterns (the false acceptance rate), we probe the system not with a corrupted version of one of the memory patterns, but with a completely new (and random) input pattern. Since there is no target pattern in the memory set, all memory patterns are now considered nontarget patterns. For a single local window, the probability that the th memory pattern receives a vote is a bit simpler than (4) and is given by. In discrete form, the probability of false acceptance is computed as and in continuous form (32) (33) (25) Let be the total number of votes received by the th memory pattern. View as the probability of a single success (getting a vote). is the total number of successes if the experiment is repeated, hence, follows a binomial distribution: (26) Let denote the maximum number of votes received by one of the (nontarget) memory patterns. The probability density function of can be obtained by accounting for all possible ties [similar to (19)]. The only difference here is that there are nontarget patterns instead of (27) Again, assuming a large number of windows, the discrete distribution for in (26) can be approximated by a Gaussian distribution of the form where the mean and variance are given by (28) (29) (30) In the continuous case, the random variable is a maximum over Gaussian random variables, and hence, similar to (13), and has density function given by (31) The probability of false acceptance is the probability that one of the memory patterns receives more than votes: III. WEIGHTED VOTING MODEL At the local level, the voting memory works in an all-or-nothing fashion. That is, the memory pattern that has the smallest (local) distance gets a vote and all the other memory patterns get nothing. In noisy environments, though, it is quite possible that the target pattern may not appear first on the list of best-matching patterns. In this case, the worst happens: the window casts a vote for a nontarget pattern and the target pattern gets nothing. In this section, we extend the voting model by having each window cast a set of weighted votes one for each memory set pattern. This more general model is called the weighted voting memory model. A. Operation of the Weighted Voting Memory The weighted voting model operates as follows. As before, we start with a fundamental memory set and partition it into windows. In general, the memory set can be heteroassociative and the patterns real-valued. Now, suppose we are given a memory key and we want to determine which (if any) memory pattern it should be associated with. As before, we compute local distance measures at each window. However, instead of just choosing the smallest distance and assigning a vote to the corresponding memory pattern, we sort all the distances and assign a rank to each. Let the memory set pattern that has the smallest (local) distance be assigned rank. The pattern with the next smallest distance will have rank, etc. The distance computations and ranking of memory set patterns are done independently by each local window. Hence, the weighted voting model has the same parallel structure as the voting model shown in Fig. 1. The only difference is that each window now requires a sorting operation and not a simple select. After all the rankings have been computed, they are routed to the decision network. The decision network examines the rankings for each memory set pattern, and then, computes an appropriate output. The design of the decision network will be given in Section IV. A simple example will help clarify the concepts. Suppose we have a memory set consisting of patterns and suppose the patterns are partitioned into windows (in a 3 3ar- rangement). Suppose that for a given memory key, the local distances are computed and sorted, and the resulting rankings are as shown in Fig. 3(a). For example, for the first window[highlighted in Fig. 3(a)], the local distances,, and were found to satisfy. Hence, for this window, memory set pattern #1 is assigned rank, pattern #2 is assigned rank

7 762 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 Fig. 3. Example of a memory set with three patterns and a set of local classifiers. (a) Rankings assigned to each memory set pattern by the weighted voting model. (b) Votes cast by the simple voting model., and pattern #3 is assigned rank. The ranking from all the other windows are shown, as well. The voting network can be seen as a special case of weighted voting, where we only consider the rank information. Fig. 3(b) shows the corresponding pattern of votes registered in the simple voting network. In this example, though, each memory set pattern receives three votes, and hence, the voting network cannot adequately discriminate among the memory set patterns. Clearly, in this example, the second and third place rankings give additional information that would lead us to prefer memory set pattern #1 over the other two. B. Statistical Interpretation of the Weights Now, how does the decision network operate? Given an input pattern, what we really want to compute for each memory set pattern is the probability that is the target. Let this probability be denoted, where denotes the target pattern. Each window of the weighted voting network can be seen as a memory or classifier in its own right. We expect that each local memory will give a crude estimate that is the target (really, window only sees and not all of ). Now, how do we combine the local probabilities to estimate? We will adopt the commonly used combination scheme of simply summing (or averaging) the local estimates [2], [18], [40] (34) Now, for the weighted voting model, how do we compute the local probabilities? For each memory set pattern, what we have available is the set of rankings that received at each local window (see Fig. 3). Let us tally the rankings: let be the number of windows that were found to have rank 1, the number of windows that have rank 2, etc. In general, let denote the number of windows that have rank,. The number of windows is,so (35) Typically, is much larger than the number of windows, so many of the terms will be 0. We propose that for the weighted voting memory, the total number of votes assigned to pattern be a weighted sum of the number of windows at each ranking that received (36) where the weights are used to adjust the relative importance of each ranking. Note that the simple voting memory can be seen as a special case of weighted voting with and all other weights set to zero:. Although there are many possible ways of choosing proper weights, we propose that the weights be set as follows: rank (37) That is, given the fact that we know that a memory pattern is (locally) ranked 1, is the probability that it is, in fact, the target pattern. In noisy environments, the target pattern does not always locally get ranked 1, though; and, given the fact that a memory pattern locally receives a rank of, is the probability that said memory pattern is the target. For notational convenience, let rank. Hence, the total number of votes received by pattern is given by (38) The probabilities,, can be computed from training data by running trial simulations. If no training data is available, the following heuristic setting is found to be

8 MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 763 Fig. 4. Using the rankings in Fig. 3(a), the weight assigned to each local window is shown. The weighted vote assigned to each pattern is the sum of these local probabilities [see (34)]. useful: [27]. Note that, in general, a different set of probabilities can be computed at each window:,,. For example, in face recognition, the weights used for windows near the eye region might be different than the weights for windows near the mouth region. However, experimental evidence shows that, at least for face recognition, the weights are very similar from window to window, and hence, a single set of weights can be used for all windows. For the special case of binary and random memory patterns, it is possible to derive theoretical expressions for and as a function of pattern dimension, window size, and noise level. In fact, the derivation will be given in Section IV. Note that (38) shows how the local probabilities are combined to produce a global measure that is the target pattern. Again, the combination scheme is the simple summing given in (34) (the only difference is that the scaling factor has been dropped here). To see this, let us continue the weighted voting example given in Fig. 3(a). Based on the given rankings, Fig. 4 shows the weight assigned to each window of the memory set patterns. Memory pattern #1 has,, and. Using (38), the weighted vote assigned to memory pattern #1 is given by (38) is used to compute and. This equation, though, requires that we will first compute the weights and the expected number of patterns at each rank for the target pattern, as well as the th nontarget pattern. Note that, in this section, we are using the same notation,,,, etc., that was used when discussing the voting model. However, it should be clear from the context that all quantities in this section pertain to the weighted voting model. A. Weights To compute the weights, we need to compute rank. That is, at the local level, after all distances are computed and sorted, if we know that a particular memory pattern appears th on the list, what is the probability that it is actually the target pattern? It will be more convenient to compute rank, so we will employ Bayes theorem rank rank rank (39) is the probability of randomly choosing the target pattern from the memory set, since there are possible memory patterns and only one target pattern. In addition, rank is the probability of finding rank in the sorted list. Since rank is one of possible rankings, rank. Hence, (39) reduces to rank rank (40) For convenience, let rank. To compute the weights, we have to compute for each rank. This will be done in Section IV-B. B. Number of Windows at Each Rank for the Target Pattern: which is the same as adding all the local probabilities for memory pattern #1 given in Fig. 4. IV. DERIVATION OF WEIGHTED VOTING PARAMETERS FOR BINARY MEMORY SETS For the theoretical analysis that follows, we again assume that the memory set is binary-valued and random. In addition, we start with the identification problem and assume that the memory key is constructed by choosing one of the fundamental memories and corrupting it with an amount of uniform random noise. We want to compute how many votes are received by each memory pattern. As before, there are really only two cases to consider: 1) the number of votes received by the target memory pattern (i.e., the memory pattern used to construct the memory key) and 2) the number of votes received by the th nontarget memory pattern. In the weighted voting scheme, Let denote the total number of windows of the target memory that have rank. In this section, we derive the distribution for. Of course, the total number of windows of the target memory that will end up with rank can be computed if we know the probability that a single window of the target memory will have rank. As mentioned previously, this probability is denoted. The quantity is the probability that the target memory is locally ranked 1, as shown schematically in Fig. 5(a). Note that this probability is the same as (3) (recall that the voting model only considers the first rank case, so in Section III-A, the expression probability of voting for the target means probability that it is rank 1). The probability that at a local window the target memory has rank 2 (i.e., appears second in the ordered list of best-matching patterns) is shown schematically in Fig. 5(b). In this case, we require precisely one of the nontarget memories to be ranked higher than the target. The derivation of is given in Appendix B. Similarly, the conditions for deriving the probability are shown schematically in Fig. 5(c). As shown in

9 764 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 Fig. 6. Schematic diagram of the conditions for computing P. Fig. 5. Schematic diagram of the conditions for computing (a) P, (b) P, and (c) P. the top patterns) and 2) the target pattern is behind the th nontarget pattern. As shown in Appendix B, is given by Appendix B, is given by (41) The set of probabilities will be used to compute. However, note that from (40), ; hence, these probabilities are the weights in (38) used to compute the number of votes assigned to each memory pattern. Now, we are in a position to compute. At a single window, gives the probability that the target memory pattern will be ranked. Of course, with probability, the target memory will not be ranked at that window. If the experiment is repeated over all windows, the probability that there will be successes follows a binomial distribution: (42) (43) At a single window, gives the probability that the th nontarget memory pattern will be ranked, and gives the probability that it will not be ranked. If the experiment is repeated over all windows, the probability that there will be successes follows a binomial distribution: (44) C. Number of Windows at Each Rank for Nontarget Patterns Let rank be the probability that the th nontarget memory set pattern has rank. The probability that the th nontarget memory will have rank 1 is the same as in (4). To compute the probability that the th nontarget memory pattern has rank 2, there are the following two cases to consider: 1) the target pattern is in front of the th nontarget pattern (that is, the target has rank 1) and 2) the target pattern is behind the th nontarget pattern. These conditions are illustrated in Fig. 6. Similarly, the probability that the th nontarget pattern has rank is the sum of the probability of the following two cases: 1) the target pattern is in front of the th nontarget pattern (among D. Continuous Approximation of and By the central limit theorem, and assuming a large number of windows, each of the discrete densities in (42) (there are of them) can be approximated with a normal distribution. In this case, the continuous density function for is given by where the expected value and variance of are given by (45) (46) (47)

10 MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 765 TABLE I ENUMERATION OF ALL POSSIBLE VALUES FOR N, N, AND N FOR THE SIMPLE EXAMPLE SHOWN IN FIG. 7. PROBABILITY FOR EACH CASE IS SHOWN, ALONG WITH THE RESULTING VALUE OF N Fig. 7. Simple example with m =3memory patterns and N=n =3windows. Similarly, the discrete probability density functions in (44) can be approximated by a continuous normal density (48) where the expected value and variance of are given by (49) (50) The probability distribution for can be computed by exhaustively enumerating all possible rankings of the votes. To see this, let us construct an even simpler example than the one given previously. Suppose we construct a weighted voting network with memory patterns and windows. A schematic diagram of such a memory set is shown in Fig. 7. In this case, we can enumerate all possible values for, as shown in Table I. For each set of values, Table I shows the probability that it occurs and the resulting number of votes. The distribution for can be obtained by tabulating and organizing the right-most column. For example, to find the probability, we sum the probability for all rows of Table I that have. Note that the sum of all the values must be. Hence, an analytic expression for the discrete probability density function can be written as E. Weighted Votes Received by Both Target and Nontarget Patterns Discrete Case Now, we have computed the weights (41) and the number of windows for each rank and for each memory pattern. Actually, we computed for the target pattern (42) and for the th nontarget pattern (44). We can now compute the total number of votes received by each memory set pattern. The total number of votes received by the target pattern is given by (51) where is the delta function if otherwise. (52) In general, for the weighted voting memory, the weights are nonnegative real numbers. Hence, the number of votes is not an integer [nor is in (52)]. However, as was seen in Table I,

11 766 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 there are only a finite number of possibilities for, and hence, can be described by the discrete distribution given in (52). Similarly, the discrete probability density function of can be written as Similarly, for the th nontarget pattern, the total number of weighted votes is given by (58) (53) Clearly, obtaining and by (52) and (53) is a computationally intensive endeavour. In general, the number of nontrivial terms in the sum (or number of rows in the table) is Again, since is a linear combination of normally distributed random variables [see (48) (50)], then, is also normally distributed with density function (59) where the mean and the variance are given by (60) which increases very fast for large. For example, the table required for a memory set with just patterns and windows has roughly rows. Hence, the continuous approximation of the discrete densities (developed in Section IV-F) will be very useful for the weighted voting model. Let be the maximum of random variables. Similar to (19), the distribution for can be derived by accounting for all possible ties among the weighted votes received by the nontarget patterns G. Estimation of Correct Retrieval Rate As before, in the continuous case, the density for be obtained from the individual densities by using (61) can (62) (54) where is the distribution function corresponding to. The probability of correct retrieval is the probability that, which can be computed as follows: F. Weighted Votes Received by Both Target and Nontarget Patterns Continuous Case Here, we again compute the total number of weighted votes received by the target pattern. However, this time, we use the normal approximations (45) (47). A linear combination of independent normally distributed random variables distribution with mean distribution of the form: also follows a normal and variance [34]. Hence, follows a normal (63) (55) H. Watchlist Test Using the fact that by, the mean and variance are given Here, we use a threshold follows:. Again, DIR is the probability and is computed as (56) (64) (57) For the false positive test, the memory key is created by generating a completely random input. Here, there is no target pattern in the memory set, and all memories are nontarget patterns. At a single local window, the probability that the th

12 MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 767 memory has rank 1 is the same as the simple voting case (25) and is given by Each of the distribution discrete probability density functions in (67) can be approximated by a normal (71) (65) The probability that the th memory has rank at a single window is simpler than (43) because here we do not have to worry about the position of the target pattern (since there is none). In this case, we have nontarget patterns ranked higher than pattern (there are ways of choosing these patterns) and all other patterns have higher rank. Hence, is given by where the mean and variance are given by (72) (73) In the continuous case, since the total number of votes received by the th memory pattern (67) is a linear combination of Gaussians, it had a normal distribution of the form (74) where the mean and variance are given by (75) (66) At a single window, gives the probability that the th nontarget memory pattern will be ranked, and gives the probability that it will not be ranked. If the experiment is repeated over all windows, the probability that there will be successes follows a binomial distribution: can be ob- In the continuous case, the distribution for tained from (76) (77) The total number of votes received by the memory pattern is (67) th nontarget Finally, the probability of false acceptance is the probability which is given by (78) (68) Similar to (53) and (54), the discrete form of the probability density functions can be computed by going through all possible combinations of (69) Let denote the maximum among. The distribution for is given by (70) By varying the threshold, we can achieve different values of and. An ROC curve can be plotted to show the tradeoff between the two as a function of the threshold. I. Discussion Note that for the weighted voting model, we used all possible rankings rank in constructing the weighted sum of votes in (38). Of course, one may limit the number of terms in the sum to the top, say, rankings (79) This will be useful in Section V because for large values of and, the discrete distributions characterizing the weighted voting model [(52), (53), (69), and (70)] cannot easily be obtained by the exhaustive procedure outlined in Section IV-F (too computationally intensive). However, the discrete distributions can be computed if a suitably small value of is chosen.

768 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 Fig. 8. Binary image (left) with various amounts of random noise. V. EXPERIMENTAL RESULTS In this section, we simulate the voting and weighted voting models on random memory sets.

13 768 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 Fig. 8. Binary image (left) with various amounts of random noise. V. EXPERIMENTAL RESULTS In this section, we simulate the voting and weighted voting models on random memory sets. The experimental methodology is as follows. We first choose model parameters: memory set size, pattern dimension, window size, and input noise. Next, a memory set with random binary patterns is generated. An appropriate memory key is then generated. For experiments involving the detection and identification rate, one of the memory patterns is randomly chosen as the target, and the memory key is created by corrupting the target with an amount of noise. For experiments involving the false acceptance rate, a completely new and random memory key is created. Next, the memory key is sent through the system, and all the local distance calculations and rankings are made. For the voting model, the values of,, and are tabulated (the tilda indicates quantities determined from simulation). For the weighted voting memory, the values of are tabulated for the target pattern, and for one of the nontarget patterns. From a single run, the simulated number of votes will not provide a good estimation of the true value. Hence, we repeat the aforementioned procedure over several runs (each time a new memory set is generated) and average the results. An estimate of the probability is obtained by dividing the average value of by the total number of windows Similarly, for the th nontarget pattern,. In Section V-A, we will see that the weighted voting model and, to a lesser extent, the voting model can tolerate a large quantity of uniform random noise. In order to get a feel for the noise levels that are being discussed, Fig. 8 shows the effect of various amounts of noise. Here, the original image (the letter A) is corrupted with noise to. A. Experiment 1 For the first experiment, the following parameters were chosen: memory set size, pattern dimension, window size, and input noise. The left-hand side of Fig. 9 shows the results for the voting network. Fig. 9(a) shows a histogram of the number of votes (experimentally observed) received by the target pattern, a typical nontarget pattern, and the nontarget pattern with the most votes. Fig. 9(b) shows the corresponding theoretical results computed using the discrete distributions and Fig. 9(c) shows the continuous approximations. To obtain these theoretical graphs, the probabilities in (1) and in (2) are computed for each. Next, the values of and can be computed using (3) and (4), respectively. It is possible to code these calculations directly using nested iterative loops. Once these values are known, the discrete distributions for,, and can be computed using (5), (6), and (20), respectively. In the continuous case, and are first used to compute the mean and variances given in (8), (9), (11), and (12). Finally, the continuous distributions,, and can be computed using (7), (10), and (13). As expected, the discrete distribution provides a very accurate estimation of the experimental histograms. The continuous approximation for is not as accurate. The reason for this is that in the continuous case, we model each as a Gaussian. It is clear from Fig. 9, though, that is not a perfect Gaussian. The error in is magnified in computing because of the term raised to the power [see (13)]. Fig. 10 shows a superposition of the continuous and discrete densities of Fig. 9. Here, underestimates the number of votes received by the maximum nontarget pattern. Hence, the retrieval rate will be overestimated. To use the weighted voting model, we first need to compute the weights. For each rank, Fig. 11 shows a plot of the theoretical values of and obtained from (41) and (43), respectively (solid lines). The values obtained from the simulation and are also shown (as circles). Since there is such close agreement between the theoretical and experimental values, and are subsampled, and every fifth sample is shown (otherwise, the graphs would completely overlap). For small values of, exceeds, but for larger values of, the situation is reversed, and exceeds. In computing the output of the weighted voting model, the values are used as the weights for the experimental results, and are the weights for the theoretical results. The right-hand side of Fig. 9 shows the resulting histograms for the weighted voting model. In order to present theoretical results using the discrete model, we limited the weighted sum to include terms [see (79)]. In Fig. 9(d), the results of the simulation are given. Fig. 9(e) shows the corresponding theoretical results using the discrete model (52) (54), and Fig. 9(f) shows the continuous theoretical results (55) (62). Comparing with the previous voting results [Fig. 9(a) (c)], we see that for the same parameters, the weighted voting model offers better performance because the distributions for and are more separated. For the voting model, the correct retrieval rate was experimentally found [using (21)] to be (in percent), with theoretical prediction. Note that, for convenience, we express these probabilities as a percentage;

14 MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 769 Fig. 9. Simulation results. (b) and (e) Theoretical results (discrete distribution). (c) and (f) Theoretical results (continuous approximation). (a) (c) Results of the voting and (d) (f) weighted voting models with parameters N = , n =102 10, =0:48, and m =150. (a) and (d) that is, 100 times the actual probability. For the weighted voting model, the experimental result [again obtained from (21), but this time using the distributions of the weighted voting model] was and the corresponding theoretical result is. B. Experiment 2 For the second experiment, we use the same system parameters as in the experiment 1, except now the system dimension is reduced: memory set size, pattern dimension, window size, and input noise. The resulting histograms for,, and for both the voting and weighted voting models are shown in Fig. 12. For the voting model, Fig. 12(a) gives the simulation result and Fig. 12(b) gives the theoretical result (using the discrete model). For the weighted voting model, Fig. 12(c) gives the simulation result and Fig. 12(d) gives the theoretical result (again using the discrete model). As expected, reducing the pattern dimension reduces system performance. For the voting model, there is much more overlap between and [compare Fig. 12(a) with Fig. 9(a). In fact, the means of the two distributions are nearly the same. Again, the weighted voting model is able to provide more separation between the two distributions. In terms of overall performance, the voting model gave and, while the weighted voting model gave and. C. Experiment 3 For the third experiment, we will keep the same parameters as experiment 1, but this time double the number of memory patterns, pattern dimension, window size, and input noise. The results for the voting model are shown in Fig. 13(a) and (b) and for the weighted model in Fig. 13(c) and (d). Here, the voting model shows a large overlap between the distributions of and. As with the previous example, this

15 770 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 able to achieve a retrieval rate of about 95% when storing 1000 memory patterns. E. Effect of Noise Level Fixing the memory set size at patterns, Fig. 15 shows the retrieval rate as the level of input noise is varied from to. Fig. 15 shows the results for both the voting model and the weighted voting model (the weighted voting are the two rightmost curves). The experimental results are shown as circles, and the theoretical results as a dashed line (discrete model) and solid line (continuous model). Fig. 10. Comparison between the discrete density and the continuous approximation for the voting model. Fig. 11. Comparison between simulation and theory for the case of window size, noise =0:48, memory set size m =150, and image size is clearly a case where the capacity of the voting model has been exceeded (too many patterns stored for the given dimension and noise level). Again, the weighted voting memory gives a much better separation between the distributions. A summary of the results of the previous three experiments is shown in Table II. In all three of these experiments, the performance of the weighted voting model is substantially better than the voting model. D. Effect of Database Size For the experiments in this section and in subsequent sections (unless otherwise noted), the parameters were set as follows: pattern dimension, window size, and input noise. For the weighted voting model, the number of terms in the sum is set to. Fig. 14 shows how the retrieval rate for both the voting and weighted voting memory varies as the memory set size is increased from to. Clearly, the performance of the voting model degrades sharply as the number of memory patterns increases. The weighted voting model, though, is much more robust and is F. Effect of Window Size The selection of the window size is an important consideration for both the voting and weighted voting models. Fig. 16 shows how the retrieval rate varies as a function of window size for the voting model [Fig. 16(a)] and the weighted voting model [Fig. 16(b)]. As noted in [16], on binary and random memory sets, the performance of the voting models is best when using very small window sizes (for example, 2 2) or large window sizes (for example, larger than 10 10). Performance suffers when intermediate window sizes are used (such as 5 5or6 6). A natural question arises: If the retrieval rate can be maximized by using just a single window of size, why should we consider what happens with the intermediate window sizes? One reason is that correct retrieval performance is not the only design consideration to be taken into account. Retrieval speed is also an important factor in practical systems. If parallel hardware is available which allows for the fast and efficient computation of local distances (in a windowed approach), it is conceivable that such a system could operate many times faster than a conventional and serial system which uses just a single window. In this case, it is important to know how the system performs with intermediate sizes and important to know that, for a fixed window size, the proposed weighted voting scheme yields better results than standard voting. In addition, previous studies [16], [27] have indicated that for correlated patterns such as face images, the detection and identification rate is best for intermediate window sizes. From a designer s point of view, this is an interesting and welcome result because it allows one to find an optimal intermediate window size that works best for the problem at hand. In this case, we get the best of both worlds: fast parallel computation and high classification results. G. Effect of Number of Rankings to Include In this experiment, we study the effect of the number of rankings that are included in the weighted sum of votes [see (79)]. Fig. 17(a) shows how the retrieval rate varies when is varied from to. Here, the experimental results are shown as circles and the continuous theoretical results are shown as the solid line. Notice that for large values of, the continuous model underestimates. For small values of, is overestimated, as can be seen in Fig. 17(b), where we zoom in on the range to. Clearly, for the weighted voting model, it is best to use a large value for.

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 771 Fig. 12. (c) Simulation results. (b) and (d) Theoretical results using the discrete distributions.

(a) and (b) Results for the voting and (c) and (d) weighted voting models with parameters: N = 300 2 300, n =102 10, =0:48, and m = 300. (a) and (c) Simulation results.

16 MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 771 Fig. 12. (c) Simulation results. (b) and (d) Theoretical results using the discrete distributions. (a) and (b) Results for the voting and (c) and (d) weighted voting models with N = , n =102 10, =0:48, and m =150. (a) and Fig. 13. (a) and (b) Results for the voting and (c) and (d) weighted voting models with parameters: N = , n =102 10, =0:48, and m = 300. (a) and (c) Simulation results. (b) and (d) Theoretical results. H. ROC Curves An ROC curve showing how both and vary for various memory set sizes is shown in Fig. 18. In Fig. 18(a), the results for the voting model are given. Here, the solid line is the experimental result and the dashed line is the theoretical result(discrete model). The resultsforthe weighted votingmodel areshown in Fig. 18(b). The performance of the weighted voting model is very good, and hence, most of the results are clumped at the top of the graph.

17 772 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 TABLE II SUMMARY OF THE THREE EXPERIMENTS. THE THEORETICAL P AND EXPERIMENTAL P ~ VALUES FOR THE PROBABILITY OF CORRECT RETRIEVAL ARE EXPRESSED IN PERCENT Fig. 14. Effect of database size on the retrieval rate for the voting model (lower curve) and the weighted voting model (upper curve). Here, N = , n = 52 5, and = 0:4. Both experimental results (circles) and discrete theoretical results (dashed line) are shown. Fig. 15. Effect of noise on the retrieval rate for the voting model (lower two curves) and the weighted voting model (upper two curves). Here, N =60260, n =52 5, and m = 200. Recall that ROC curves are constructed by using various values of the system threshold. For the left-most part of each curve, a large value of the threshold is used. In this case, to meet the threshold, the best-matching candidate must receive a large number of votes. As the threshold is reduced, the detection and identification rate increases, but at the expense of a higher false acceptance rate. Of course, to implement the proposed voting-based system, a single threshold must be chosen. The selected value depends on the nature of the application. For example, for high-security systems, a premium is placed on low false acceptance, hence, a suitably high threshold is chosen. VI. SUMMARY The goal of associative memory research is to design systems that can reliably store and retrieve information. Early research into associative neural memories demonstrated that a parallel and distributed processing paradigm can be used to design content-addressable types of memory. Although these early designs were quite interesting and mathematically elegant, they were very poor at doing what they were designed to do: reliably store information. The voting-based systems discussed in this paper still use the parallel and distributed processing approach, but the systems we propose are much more practical. In fact, the characteristics of a practical memory system are explicitly outlined in this paper: the detection and identification rate should be maximized, and (simultaneously) the false acceptance rate and false rejection rate should be minimized. In this paper, we have extended the analysis of the voting associative memory proposed in [16]. In addition, we proposed a generalization of the voting memory where rather than cast a single vote for the best-matching memory set pattern, each window casts a set of weighted votes. For the case of random and binary memory set patterns, we were able to derive expressions for the retrieval rate, detection and identification rate, and false acceptance rate for both the voting memory and the weighted voting memory. For random/binary memory, the simulations reported here show that the weighted voting memory consistently outperforms the voting memory. The price to pay for this increased performance is in terms of computation time: the weighted voting memory requires a local sorting operation, while the voting memory only requires a min-select operation, and training: the weighted voting memory requires training data and a training phase in order to compute the weights, while the voting memory does not require such a training phase. It is important to note that the weighted voting memory is not limited to binary patterns. The memory has been successfully applied to the difficult problem of human face recognition (using grayscale patterns); details can be found in [27] [29].

18 MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 773 Fig. 16. Effect of window size on the retrieval rate for (a) the voting model and (b) the weighted voting model. Both experimental results (circles) and discrete theoretical results (solid line) are shown. Here, m =200, N =602 60, and =0:4. Fig. 17. P versus number of terms R in the sum [see (79)]. Experimental results (circles) and theoretical results (solid line) using the continuous model are shown. (a) Range 0 R 100. (b) Range 0 R 20. Here, N =602 60, n =52 5, m = 200, and =0:4. The weighted voting associative memory model proposed in this paper has the following desirable properties. 1) The proposed model can operate with binary patterns or, with a simple change in local distance measure, with grayscale patterns or real-valued patterns. Models like the Hopfield memory and bidirectional associative memory (BAM) do not easily generalize to grayscale patterns (for example, one can just use the binary expansion representation for each pixel, but the resulting network will be eight times as big and require 64 times more weights. For image processing problems with a large number of pixels, the resulting network is simply too large to be practical). 2) The proposed system has a rejection mechanism and a tunable threshold which allows the user to adjust and. 3) Our theoretical derivation completely characterizes the performance of the voting memory and weighted voting memory for binary and random memory sets. In fact, the theoretical analysis gives a framework for how capacity can be analyzed for a whole class of voting-based systems. 4) We have shown by simulation on binary and random memory sets that the proposed weighted voting memory outperforms the voting memory. In addition, we have shown that the proposed systems can reliably reject patterns with low signal-to-noise ratio. Hence, the proposed memory exhibits very high performance and can be used for practical associative memory problems. 5) The proposed memory model never produces a spurious memory. 6) The memory can operate in autoassociative or heteroassociative mode. 7) If desired, the memory can retrieve an ordered list of bestmatching patterns and not just a single pattern. 8) The voting-based retrieval mechanism of the proposed model is excellent at handling localized noise [8]. This interesting property will be explored in a future paper. 9) In terms of hardware realization, the proposed model is practical in the sense that it does not require full interconnectivity and can be realized using present day digital technology. 10) The system is very easy to maintain, and memory patterns can be easily added or deleted from the memory set without extensive retraining. Note that the weighted voting strategy outlined here is quite general and can be used with other types of features [5]; for example, eigenfaces [4], [26], [36], wavelets [11], [39], [41], etc.

19 774 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 For two independent random variables and, the distribution of their sum is given by the convolution formula (A-3) Assuming that and are independent random variables, then the distribution of their sum is given by (A-4) Plugging into (A-2), we get (A-5) Interchanging the order of integration, we get and using a variable substitution as (A-6), (A-6) can be written Fig. 18. ROC curves showing how P and P vary as a function of the threshold (implicit in the graph) and memory set size for (a) the voting model and (b) the weighted voting model. Here, N =0:4. =602 60, n =52 5, and (A-7) Since is Gaussian, has the following symmetry property:. Plugging in, we receive In future work, we will study the performance of the weighted voting memory when using other types of features. In addition, we will look into ways of making the computation more efficient. APPENDIX A Here, we want to show that which is the desired result (by our notation, ). (A-8) and (A-1) The desired computation can be written as the difference of two random variables. If we know the probability density, then the probability is easily computed (A-2) APPENDIX B Here, we will derive expressions for and. These quantities are needed to compute the distribution for the number of votes received by the target pattern at each rank : [see (42)] and the number of votes received by one of the nontarget patterns: [see (44)]. A. Computation of is the probability that, at a single local window, the target pattern appears th in the sorted list. is the probability that

20 MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 775 the target memory has rank 1. This probability is the same as (3), and for convenience, is rewritten here B. Computation of Let rank be the probability that the th nontarget memory set pattern has rank. The probability that the th nontarget memory will have rank 1 is the same as in (4) (B-1) What is the probability that the target memory has rank 2; i.e., appears second in the ordered list of best-matching patterns? For each, the target memory is the second winner if and for precisely one of the nontarget memories, and for all the other memories (here ranges over all indices excluding the index of the target pattern and ). A diagram illustrating these conditions is shown in Fig. 5(a). The probability of meeting all three of these conditions is given by (B-2) (B-6) To compute the probability that the th nontarget memory pattern has rank 2, there are two following cases to consider: 1) the target pattern is in front of the th nontarget pattern (that is, the target has rank 1) and 2) the target pattern is behind the th nontarget pattern. These conditions are illustrated in Fig. 6. In the first case, there are three following conditions that must be met: and and for all other nontarget patterns. In the second case, there are four following conditions to be met: and for one of the nontarget patterns (there are ways to choose this nontarget pattern) and for the target pattern and for all other nontarget patterns Using (1) and (2), (42) can be written as (B-3) Similarly, there are three conditions to be met for the target pattern to be the th winner, as illustrated in Fig. 5(b). Here, there are nontarget patterns with a higher rank than the target pattern. In addition, there are ways of selecting the nontarget memories ranked higher than the target. Hence, is given by Using (1) and (2), (B-7) can be written as (B-7) (B-4) Using (1) and (2), (B-4) can be written as (B-8) (B-5) Similarly, the probability that the th nontarget pattern has rank is the sum of the probability of the following two cases: the target pattern is in front of the th nontarget pattern (among the top patterns) and the target pattern is behind the th nontarget pattern. This time, there are ways to choose the

21 776 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 first nontarget patterns in front of the rank nontarget pattern. In addition, there are ways to choose the preceding nontarget patterns when the target is in front of the rank nontarget pattern Using (1) and (2), (B-9) can be written as (B-9) (B-10) REFERENCES [1] I. Aleksander, W. Thomas, and P. Bowden, WISARD a radical step forward in image recognition, Sens. Rev., pp , [2] F. M. Alkoot and J. Kittler, Experimental evaluation of expert fusion strategies, Pattern Recognit. Lett., vol. 20, pp , [3] C. Altmann, H. Bülthoff, and Z. Kourtzi, Perceptual organization of local elements into global shapes in the human visual cortex, Current Biol., vol. 13, pp , [4] P. Belhumeur, J. Hespanha, and D. Kreigman, Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp , Jul [5] C. Bishop, Neural Networks for Pattern Recognition. New York: Oxford Univ. Press, [6] A. Burton, V. Bruce, and P. Hancock, From pixels to people: A model of familiar face recognition, Cogn. Sci., vol. 23, no. 1, pp. 1 31, [7] R. Chellappa, C. Wilson, and S. Sirohey, Human and machine recognition of faces: A survey, Proc. IEEE, vol. 83, no. 5, pp , May [8] L. Chen and N. Tokuda, Robustness of regional matching scheme over global matching scheme, Artif. Intell., vol. 144, pp , [9] P. Chou, The capacity of the Kanerva associative memory, IEEE Trans. Inf. Theory, vol. 35, no. 2, pp , Mar [10] G. Costantini, D. Casali, and R. Perfetti, Neural associative memory storing gray-coded gray-scale images, IEEE Trans. Neural Netw., vol. 14, no. 3, pp , May [11] B. Duc and S. Fischer, Face authentication with Gabor information on deformable graphs, IEEE Trans. Image Process., vol. 8, no. 4, pp , Apr [12] M. H. Hassoun, Ed., Associative Neural Memories: Theory and Implementation. New York: Oxford Univ. Press, [13] M. H. Hassoun, Fundamentals of Artificial Neural Networks. Cambridge, MA: MIT Press, [14] T. Ho, J. Hull, and S. Srihari, Decision combination in multiple classifier system, IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, no. 1, pp , Jan [15] J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons, Proc. Nat. Acad. Sci.,, vol. 81, pp , [16] N. Ikeda, P. Watta, M. Artiklar, and M. Hassoun, Generalizations of the hamming net for high performance associate memory, Neural Netw., vol. 14, no. 9, pp , [17] N. Ikeda, P. Watta, and M. Hassoun, Capacity analysis of the twolevel decoupled hamming associative memory, in Proc. Int. Joint Conf. Neural Netw., Anchorage, AK, May 4 9, 1998, pp [18] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp , Mar [19] T. Kohonen, Self-Organization and Associative Memory. Berlin, Germany: Springer-Verlag, [20] B. Kosko, Bidirectional associative memories, IEEE Trans. Syst., Man, Cybern., vol. SMC-18, pp , [21] L. I. Kuncheva, A theoretical study on six classifier fusion strategies, IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 2, pp , Feb [22] L. Lam and S. Y. Suen, Application of majority voting to pattern recognition: An analysis of its behavior and performance, IEEE Trans. Syst., Man Cybern. A, Syst. Humans, vol. 27, no. 5, pp , Sep [23] S. Lin, S. Kung, and L. Lin, Face recognition/detection by probabilistic decision-based neural network, IEEE Trans. Neural Netw., vol. 8, no. 1, pp , Jan [24] X. Lin, S. Yacoub, J. Burns, and S. Simske, Performance analysis of pattern classifier combination by plurality voting, Pattern Recognit. Lett., vol. 24, no. 12, pp , [25] G. Lockwood and I. Aleksander, Predicting the behaviour of G-RAM networks, Neural Netw., vol. 16, pp , [26] H. Moon and P. J. Phillips, Analysis of PCA-based face recognition algorithms, in Empirical Evaluation Techniques in Computer Vision, K. W. B. Owyer and P. J. Phillips, Eds. Los Alamitos, CA: IEEE Comp. Soc. Press, 1998, pp [27] X. Mu, Automated face recognition: A weighted voting method, Ph.D. dissertation, Dept. Electr. Comput. Eng., Wayne State Univ., Detroit, MI, 2004, [28] X. Mu, M. Hassoun, and P. Watta, Combining local distance measures: Summing, voting, and weighted voting, in Proc. IEEE Int. Conf. Syst., Man, Cybern., Waikoloa, HI, Oct , 2005, pp [29] X. Mu, P. Watta, and M. Hassoun, A weighted voting model of associative memory: Experimental analysis, in Proc. IEEE Int. Conf. Syst., Man, Cybern., Big Island, HI, Oct , 2005, vol. 2, pp [30] M. Muezzinoglu and C. Guzelis, A Boolean Hebb rule for binary associative memory design, IEEE Trans. Neural Netw., vol. 15, no. 1, pp , Jan [31] M. Muezzinoglu, C. Guzelis, and J. Zurada, A new design method for the complex-valued multistate Hopfield associative memory, IEEE Trans. Neural Netw., vol. 14, no. 4, pp , Jul [32] P. Phillips, H. Moon, S. Rizvi, and P. Rauss, The FERET evaluation methodology for face-recognition algorithms, IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 10, pp , Oct [33] P. Phillips, H. Wechsler, J. Huang, and P. Rauss, The FERET database and evaluation procedure for face recognition algorithms, Image Vis. Comput., vol. 16, no. 5, pp , [34] J. Rice, Mathematical Statistics and Data Analysis, 2nd ed. Belmont, CA: Duxbury Press, [35] J. Saarinen and D. Levi, Integration of local features into a global shape, Vis. Res., vol. 41, no. 14, pp , [36] M. Turk and A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci., vol. 3, no. 1, pp , [37] P. Watta, N. Ikeda, M. Artiklar, A. Subramanian, and M. Hassoun, Comparison between theory and simulation for the 2-level decoupled hamming associative memory, presented at the IEEE Int. Conf. Neural Netw., Washington, DC, Jul , 1999, Paper #JCNN 0337, unpublished. [38] R. Wilson and E. Hancock, A study of pattern recovery in recurrent correlation associative memories, IEEE Trans. Neural Netw., vol. 14, no. 3, pp , May 2003.

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 777 [39] L. Wiskott, J. Fellous, N. Kruger, and C. von der Malsburg, Face recognition by elastic graph matching, IEEE Trans. Pattern Anal. Mach.

, Man, Cybern., vol. 22, no. 3, pp. 418 435, May/Jun. 1992. [41] B. Zhang, H. Zhang, and S. Ge, Face recognition by applying wavelet subband representation and kernel associative memory, IEEE Trans.

399 458, 2003. [43] S. Zhu and Y. Wu, From local features to global perception: A perspective of Gestalt psychology from Markov random field theory, Neurocomput., pp. 939 945, 1999.

22 MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 777 [39] L. Wiskott, J. Fellous, N. Kruger, and C. von der Malsburg, Face recognition by elastic graph matching, IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 7, pp , Jul [40] L. Xu, A. Krzyzak, and C. Y. Suen, Methods of combining multiple classifiers and their applications to handwriting recognition, IEEE Trans. Syst., Man, Cybern., vol. 22, no. 3, pp , May/Jun [41] B. Zhang, H. Zhang, and S. Ge, Face recognition by applying wavelet subband representation and kernel associative memory, IEEE Trans. Neural Netw., vol. 15, no. 1, pp , Jan [42] W. Zhao, R. Chellappa, P. Phillips, and A. Rosenfeld, Face recognition: A literature survey, ACM Comput. Surv., vol. 35, no. 4, pp , [43] S. Zhu and Y. Wu, From local features to global perception: A perspective of Gestalt psychology from Markov random field theory, Neurocomput., pp , Xiaoyan Mu received the Ph.D. degree from Wayne State University, Detroit, MI, in July Currently, she is an Assistant Professor in the Department of Electrical and Computer Engineering, Rose-Hulman Institute of Technology, Terre Haute IN. Her research interests are in the areas of artificial intelligence, pattern recognition, neural networks, image processing, and computer vision. Paul Watta received the B.S., M.S., and Ph.D. degrees in electrical engineering from Wayne State University, Detroit, MI, in 1987, 1988, and 1994, respectively. Currently, he is an Associate Professor at the University of Michigan-Dearborn in the Department of Electrical and Computer Engineering. His research interests include associative memory, image processing, face recognition, pattern recognition, and computer music. Mohamad H. Hassoun received the B.S., M.S., and Ph.D. degrees in electrical engineering from Wayne State University, Detroit, MI, in 1981, 1982, and 1986, respectively. He is a Professor in the Department of Electrical and Computer Engineering at Wayne State University, and served as Interim Chair in He founded the Computation and Neural Networks Laboratory which performs research in the field of artificial neural networks, machine learning, and pattern recognition. He has numerous papers and book chapters on artificial neural network subjects. He is the editor of Associative Neural Memories: Theory and Implementations (Oxford Univ. Press, 1993). He is also the author of the graduate textbook entitled Fundamentals of Artificial Neural Networks (MIT Press, 1995). Dr. Hassoun has served as Associate Editor and Reviewer for a number of technical journals. Since January 1998, he has been the Co-Editor-in-Chief of the Neural Processing Letters. He served on the program committees of several international conferences on neural networks. He received a National Science Foundation Presidential Young Investigator Award in 1990 and a number of teaching awards at Wayne State University including the President s Award for Excellence in Teaching.

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,