Situation Assessment using Graphical Models

Similar documents
Data Fusion Models in WSNs: Comparison and Analysis

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Seminar - Organic Computing

Learning Methods for Fuzzy Systems

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Rule-based Expert Systems

Commanding Officer Decision Superiority: The Role of Technology and the Decision Maker

Lecture 1: Machine Learning Basics

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

SARDNET: A Self-Organizing Feature Map for Sequences

Probabilistic Latent Semantic Analysis

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Henry Tirri* Petri Myllymgki

Reinforcement Learning by Comparing Immediate Reward

Life and career planning

Rule Learning With Negation: Issues Regarding Effectiveness

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Abstractions and the Brain

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

DIDACTIC MODEL BRIDGING A CONCEPT WITH PHENOMENA

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

How do adults reason about their opponent? Typologies of players in a turn-taking game

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Visual CP Representation of Knowledge

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Learning Methods in Multilingual Speech Recognition

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Integrating E-learning Environments with Computational Intelligence Assessment Agents

A Data Fusion Model for Location Estimation in Construction

Action Models and their Induction

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Agent-Based Software Engineering

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

D Road Maps 6. A Guide to Learning System Dynamics. System Dynamics in Education Project

SYSTEM ENTITY STRUCTUURE ONTOLOGICAL DATA FUSION PROCESS INTEGRAGTED WITH C2 SYSTEMS

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Probability estimates in a scenario tree

Axiom 2013 Team Description Paper

Case Acquisition Strategies for Case-Based Reasoning in Real-Time Strategy Games

Knowledge Transfer in Deep Convolutional Neural Nets

Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor

Rule Learning with Negation: Issues Regarding Effectiveness

Software Maintenance

Predicting Future User Actions by Observing Unmodified Applications

5. UPPER INTERMEDIATE

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

A Model of Knower-Level Behavior in Number Concept Development

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Knowledge-Based - Systems

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

A Model to Detect Problems on Scrum-based Software Development Projects

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

UK Institutional Research Brief: Results of the 2012 National Survey of Student Engagement: A Comparison with Carnegie Peer Institutions

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Wisconsin 4 th Grade Reading Results on the 2015 National Assessment of Educational Progress (NAEP)

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

A Case-Based Approach To Imitation Learning in Robotic Agents

On-Line Data Analytics

Automating the E-learning Personalization

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

Delaware Performance Appraisal System Building greater skills and knowledge for educators

EDEXCEL FUNCTIONAL SKILLS PILOT. Maths Level 2. Chapter 7. Working with probability

Learning Probabilistic Behavior Models in Real-Time Strategy Games

An OO Framework for building Intelligence and Learning properties in Software Agents

Knowledge Synthesis and Integration: Changing Models, Changing Practices

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

GACE Computer Science Assessment Test at a Glance

Computerized Adaptive Psychological Testing A Personalisation Perspective

Toward Probabilistic Natural Logic for Syllogistic Reasoning

Go fishing! Responsibility judgments when cooperation breaks down

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Planning with External Events

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Transfer Learning Action Models by Measuring the Similarity of Different Domains

A Reinforcement Learning Variant for Control Scheduling

Quantitative Research Questionnaire

Scenario Design for Training Systems in Crisis Management: Training Resilience Capabilities

Full text of O L O W Science As Inquiry conference. Science as Inquiry

Artificial Neural Networks written examination

Probabilistic Mission Defense and Assurance

ProFusion2 Sensor Data Fusion for Multiple Active Safety Applications

A Note on Structuring Employability Skills for Accounting Students

Radius STEM Readiness TM

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

An Introduction to Simio for Beginners

Causal Link Semantics for Narrative Planning Using Numeric Fluents

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Parent Information Welcome to the San Diego State University Community Reading Clinic

Unit 7 Data analysis and design

TD(λ) and Q-Learning Based Ludo Players

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Transcription:

Situation Assessment using Graphical Models Peter Bladon, Richard J. Hall and W. Andy Wright BAE SYSTEMS Advanced Technology Centre Sowerby Building FPC 267, PO Box 5 Filton Bristol, BS34 7QW, UK peter.bladon,richard.j.hall,andy-src.wright @baesystems.com Abstract This paper presents a Bayesian network framework for situation assessment. The framework is generated from a set of technical requirements that would be a prerequisite for any situation assessment system. It is shown that Bayesian networks readily satisfy these requirements and produce a system that readily fits into the Endsley description of situation assessment. The framework can also be seen as part of the Observe and Orientate components of the OODA loop paradigm. Keywords: Situation assessment, probability theory, Bayesian networks. Introduction Situation assessment (or situation awareness) is a key component of any decision making process. In a military context, it is especially important to build awareness of complex, evolving, situations in a timely and accurate manner. It is, perhaps, somewhat surprising that there is no unique or agreed definition of situation assessment. In pilot in the loop scenarios there are at least three definitions currently in use. These include Endsley [], Bedney and Meister [2] and Smith and Hancock [3]. The definition of Endsley is perhaps the most enlightening: An alternative definition is that given by the Joint Development Laboratory (JDL) in relation to data fusion [4]. Here data fusion is divided into a number of levels: Level : Pre-processing and pre-detection Level : Single object refinement Level 2: Situation refinement Level 3: Implication refinement Level 4: Process refinement In this context, situation assessment is most closely associated with JDL level 2 ( situation refinement ). Finally, there is Boyd s OODA (Observe, Orient, Decide and Act) paradigm [5] see figure. Often the Orient element, the process of placing observations into context, is associated with situation assessment. Act Observe Orient Situation awareness is the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and a projection of their status in the near future. which supports a three level model for situation awareness: Level : Perception of elements of current situation Level 2: Comprehension of current situation Level 3: Projection of future status Decide Figure : Boyd s OODA-loop. The common theme running through these different descriptions is the fusing of raw data, from different sources, with different attributes, in order to draw inferences about the context and behaviour of objects within the environment. Note that situation assessment is a necessary precursor to the decision making process, not a replacement for it. 886

Previous work has advocated the use of a Bayesian framework for situation Assessment, e.g. [7, 8]. In this paper we show that this approach is consistent with a set of technical requirements that any situation assessment system must meet. This Bayesian network approach fits readily with the Endsley model of situation assessment and contains elements of the OODA-loop and JDL models. 2 Requirements for a Situation Assessment System Building a system to do situation assessment is an information modelling task. It is necessary to build a world model that can associate and process real observations in order to derive meaningful conclusions that reflect what is happening in a given situation. From a technical and system view point, such a system should have a number of important attributes. Specifically: Robust: able to handle inconsistent, uncertain and incomplete data. The same object may be captured by many sensors which may report different positions; some sensors may fail to capture any signal at all. Interrogate and summarise: provide answers to specific queries that summarise the internal state of the system. We should be able to ask tell me if aircraft 23 is hostile without getting reports on the state of the rest of the world. Reason consistently: the answers to any two questions asked of the system (for a given system state) must not produce inconsistent answers. Encode prior relationships: facilitate the institution of known relationships between variables. Traceable and auditable: the reasoning process must be understandable and verifiable. Fuse different types of data: for example, radar sensor data and linguistic data from intelligence reports. In addition, it would be advantageous if the system were: Extendable: to allow the easy inclusion of new variables and dependencies. Tuneable: to be able to improve the accuracy of the system by learning from available data. If this was the case, then our model would not be limited to a closed world where new variables and new situations do not arise. 3 The Approach Any practical situation assessment system would need to satisfy these requirements. We will show that these requirements are naturally satisfied by adopting a probabilistic world model, which is simply the joint distribution over all the variables of interest: () For discrete variables, this joint distribution is a table, one row for each distinct state of the variables. Using the rules of probability theory, it is straightforward (in theory) to determine quantities such as "!$#%&&'() *+(-,.%-# /2 32 *+(-,.%-# -42 3 (2) This should be read as Probability that the aircraft is hostile, given specific readings for *+(-,.%-# and *+(-,.%-# -. This can be calculated for a particular model by marginalising out other variables, conditional on the readings of the sensors. In practice, this approach suffers from two disadvantages. Firstly, marginalisation is difficult, as a substantial fraction of the state space (size 5 for a discrete model with 6 variables and 5 states per variable) must be summed over. For large models this rapidly becomes computationally infeasible. Secondly, model building is difficult. This is in part due to the size of the state space that must be specified. More significantly, a large joint distribution does not capture the dependency structure of the domain, since many variables may be independent, or conditionally independent of one another. A more intuitive interpretation of a probability distribution is given by a graphical model. Such a model is a factorisation of the joint distribution represented by a graph. Each node in the graph corresponds to one or more variables in the joint distribution and arcs represent dependencies between the nodes. There are two main types of graphical model: Markov random-fields and Bayesian networks [9]. Markov random-fields are undirected models; their utility is limited because only a very restricted set of graphs (triangulated graphs) conveniently map onto any joint probability distribution. Bayesian networks are directed graphs where each node represents a conditional distribution, and arrows represent conditioning of child variables on their parents. In general, a Bayesian network will factor the joint distribution as: 7 89 : ; < ; ) = ; (3) where = ; is the parent set of node > This decomposition allows reasoning about small pieces of the problem in isolation. Consider figure 2, which shows a simple graph with 887

X 4 X X 2 X 3 X 5 X 6 probabilistic graphs: prior knowledge is encoded in the structure and parameters of the graph. Traceable and auditable: using a graphical model (a Bayesian network) for the probability distribution allows the individual relationships between variables to be expressed as a function of local dependencies. Fuse different types of data: probability theory deals with relationships between uncertain variables; the nature of the variables themselves is unimportant. Figure 2: A simple six variable graph. six variables. The factorisation of the joint distribution has the form: (4) 7 ) 7 2) 7 ) 2) The factorisation gives an immediate picture of the dependencies between variables, which might be very difficult to establish from just a direct inspection of the joint distribution. 3. Satisfying the requirements Now we can see how using graphical models provides a framework that meets the requirements for a practical situation assessment system. Specifically: Robust: probability provides a mathematically consistent framework for dealing with uncertain and incomplete information [6]. Incomplete (i.e. unknown variables) can be allowed for by marginalisation. Interrogate and summarise: through the process of marginalisation, the dependence of one set of unknown variables on a set of known variables can be determined. This process, which allows for the removal of other variables, is not regarded as important, i.e., it allows specific questions to be asked and summarised answers to be given. Reason consistently: since all reasoning is done using one probabilistic model, all answers are derived from a single multi-dimensional distribution and, therefore, are consistent. Encode prior relationships: the Bayesian probability framework facilitates the inclusion of background (prior) knowledge and the updating of this when new information is available. This is facilitated within To make our system extendable and tuneable, it is necessary for it to adapt its structure and parameterization to allow for the inclusion of new variables and information. 3.2 Learning in Graphical Models In many systems it is possible to construct a graphical model directly from prior knowledge about the problem. However, in more general situations, where prior knowledge is not as readily available, this is not the case. This limitation can be overcome through the use of machine learning methods. Learning in graphical models may be divided into two related components; Parameter learning: where the parameters that govern the function of a given graphical structure are learnt. Structural learning: where the structure of the graph is determined from the data. There has been substantial progress on the first of these components. The more simple methods use the expectation and maximisation (EM) algorithm [] to compute the most likely parameter values. This algorithm has the advantage that it is possible to learn the parameters from incomplete data (i.e. data where there are missing elements in the training data at random positions). More sophisticated methods [] offer greater accuracy but they do so at a cost of increased computation. This is an important issue since situation assessment is a key component of the OODA loop. Boyd showed that a key prerequisite to overcoming an opponent is the ability to cycle round your own OODA loop at a rate faster than your opponent does, effectively getting ahead the opponent s OODA loop. The trade off between accuracy and speed, therefore, is a crucial element in any situation assessment scenario. Learning the structure of the graphical model presents a more difficult problem. Even for very simple structures, it has been shown that the search over all possible edge node combinations grows exponentially in the number of variables and so is NP-hard [2]. Consequently, most applications of these methods rely on using prior knowledge about the problem to define the structure of the graph and only use the data that are available to obtain appropriate parameters given this fixed structure. For an involved inference 888

problem with a complex graph such a process requires substantial quantities of data, something that is often not readily available. What is required is a learning process that is able to make use of the prior information that is available, and to balance this against the available data. One way to develop this idea is to divide the data into context-dependent sets and in turn use these sets to develop separate context-specific independent models [3]. Specifically, the inference problem is divided into separate context-specific problems, which can then be represented as separate graphical models. This approach has the advantage that it allows a graph representing a larger problem to be broken down into smaller sub-graphs. Since the learning process over any graph scales exponentially, this significantly eases the computational complexity and so reduces the data requirements. The models found by this multigraph learning procedure can then be used to form a mixture of graphs. Under an appropriate Bayesian framework, such graphs can then be computed over in a similar fashion to other mixture model type frameworks [4]. 4 A Simple Example To illustrate the utility of this framework, consider a simple example where a Bayesian network is used for situation assessment. In this example, a suite of two sensors (nodes Sensor and ) is used to determine if an aircraft is friendly or an enemy. The two sensors are conditioned on the signature of the aircraft (the node) which in turn is conditioned on the true identity of the aircraft (the node), as shown in figure 3. The sensor nodes have Sensor Figure 3: A simple graph showing the relationship between, and sensor nodes a binary state ; ( ; 3, where = 3 is the identity of the sensor; the sensor nodes give an indication of the state of the node, which has states - ( 3. The states of the node are denoted by &(-, (-, ( 3. The relationships between the sensor readings, signature and true identity of the aircraft are uncertain: each node has an Sensor (.66.34 (.34.66 Table : Conditional probability table for Sensor, given (.76.24 (.24.76 Table 2: Conditional probability table for, given associated conditional probability table whose size depends on the number of parent nodes. The probability tables for Sensor and are shown in tables and 2. These reflect the probabilities of detection of a given signature. Note there is greater differentiation with since it is slightly more accurate than Sensor. In this artificial example, the acts as a noisy version of the variable. In practice, this might be a transponder signal that on rare occasions might lie as a result of malfunctions or spoofing. This relationship is captured in table 3. In the absence of sensor information, the probability that the aircraft is friendly is given by the probabilities shown in table 4. These represent our prior state of knowledge about observing friendly or enemy aircraft. The scenario can be extended to allow for the variability of the sensors with the state of the weather. Accordingly, a new node is added with states -, #92,+ 2, 3 that conditions both sensor nodes, as in figure 4. The conditional probability tables, updated to include the weather, are shown in tables 5 and 6. Note how, in this scenario, the performance oensor decreases in rain, while that of increases. The prior probabilities for the node are given in table 7. The final extension of this scenario is to include some other type information about the aircraft that is independent of the sensors, e.g. some intelligence information. The node, with states (32 linked to the node is shown in figure 5. However, this information is also &(-, (-, ( (.9...9 Table 3: Conditional probability table for, given 889

&(-,.6 (-, (.4 Table 4: Prior probabilities for Orientate Sensor Observe Sensor Figure 4: The graph extended to include the effect of the Sensor (, #/2,.9. (, #/2,..9 2,.6.4 ( 2,.4.6 Table 5: Conditional probability table for Sensor, given and (, #/2,.6.4 (, #/2,.4.6 2,.8.2 ( 2,.2.8 Table 6: Conditional probability table for, given and, #/2,.8 2,.2 Table 7: Prior probabilities for Figure 5: The complete graph showing all variables and their relationships. The mapping of the graph to the Orientate and Observe elements of the OODA-loop is also shown. &(-, (-, ( (.8.2..9 Table 8: Conditional probability table for, given uncertain and to allow for this a new probability table (table 8) is introduced. It is interesting to note that this simple model, as shown in figure 5, associates data from different sources, in context, and leads to a structure that naturally fits with the Endsley view of situation assessment []. The model also encompasses the Observe and Orientate elements of the OODA-loop, and levels and 2 from the JDL model of data fusion. 4. Inference with the model There are a number of interesting questions that can be answered with this model regarding the true state of the sensors, weather and intelligence. This is done by entering all known data into the model and using the junction tree algorithm [9] to determine the probability of the unknown states. With no evidence entered into the model (i.e. no observations have been made) then the marginal probabilities of each state are shown in figure 6. We now enter evidence into the graph (i.e. observations have been made). is set to (indicating detection of a friend). Sensor has presumably detected nothing. This is a case where there is an incomplete set of observations, but the network can still reason consistently and effectively. The node states for this case are shown in figure 7. 89

.5.5.5.5.5 Sensor.5 f e f 2 e2 The node now indicates that the probability of the aircraft being a friend is.8. The probability of the signature coming from a friend has increased. The probability that the intelligence reports that is true has increased to.65 (i.e. it is more likely that any intelligence reports that we now get will indicate the aircraft is a friend). Note also that the network is indicating that the measurement was more likely to have been made in poor weather. Now consider the case where the sensor readings conflict. Sensor detects an unfriendly IFF signature ( (2 ) and believes the aircraft is friendly ( ). This is shown in figure 8. The network predicts that the aircraft is a friend with probability.7. This is because has better prediction capability than Sensor when the weather is bad (i.e. raining, which has a high prior probability) see table 7. We see how understanding the conditional relationships between the nodes allows us to audit (interpret) the final state of the system in terms of individual causes and effects. Figure 6: State probabilities with no observations..5.5.5.5 Sensor.5.5 f e f 2 e2.5.5.5 Sensor.5.5 f e f 2 e2.5 Figure 7: Probabilities obtained where only is set with a known value. Figure 8: Both sensors observed Alternatively, if the weather is known to be good (i.e. it is not raining) then the probability that Sensor is giving the correct answer increases and the probability of shows the probability of foe is now.7, as indicated in figure 9. This demonstrates explaining away: there is less support for the evidence of, given we know it is not raining. If an intelligence report arrives, indicating that thff is true ( ), then this evidence explains away the reading of Sensor ( ( ). The node indicates the probability of the platform being a friend is.75, as shown in figure. This simple example demonstrates the utility of the approach. The contextural relationships embodied in the problem are encoded in the structure of the graph and its parameters. The junction tree algorithm allows marginalisation of unobserved variables so that incomplete information is handled in an elegant and efficient manner. 89

5 Conclusions.5.5 This paper postulates a set of requirements for a practical situation assessment system. It is shown that a Bayesian network approach provides a framework that not only fulfils these requirements but also fits the Endsley description of situation assessment. Furthermore, the framework provides a practical set of tools with which data and information can be placed in context and manipulated in a globally consistent way. Inference is demonstrated for a simple example using the junction tree algorithm. This shows the ability of this approach to:.5.5.5 Sensor.5 f e f 2 e2 Figure 9: explains away. cope with uncertainty, 2. cope with missing or incomplete data, 3. provide a parsimonious description of how inference is obtained 4. explain away variables where no data is available. The ability to learn the strength of relationships between system variables is an additional advantage of the framework, which relies on the utilisation of a number of machine learning methods that have been recently developed. However, the application of any learning method will have to balance accuracy with speed to ensure timely input into any OODA loop. It should be remembered that situation assessment is only the first part of any decision process. To make effective decisions, it is also necessary to predict the outcome of any action and understand its consequences. Acknowledgements.5.5 This work is supported by UK MOD under the Probabilistic Graphical Methods for Situational Assessment project. The authors would also like to thank D. Nicholson of BAE SYSTEMS ATC for helpful comments and suggestions..5.5 Sensor.5.5 References [] M. Endsley, Towards a theory of situation awareness in dynamic systems, Human Factors, Vol 37, pp. 32 64, 995. f e f 2 e2 Figure : report explains inconsistent, Sensor and variables [2] G. Bedney and D. Meister, Theory of activity and situation awareness, International Journal of Cognitive Ergonomics, Vol 3, pp. 63 72, 999. [3] K. Smith and P. Hancock, Situtation awareness is adaptive, externally directed consciousness, Human Factors, Vol 37, pp. 37 48, 995. [4] E. Waltz and J. Linas, Multisensor Data Fusion, Artech House, Norwood, Massachusetts, 99. 892

[5] J. R. Boyd, A discourse on winning and losing, Unpublished set of briefing slides available at Air University Library, Maxwell AFB, Alabama, May 987. [6] J. M. Bernado and A. F. Smith, Bayesian Theory, John Wiley, New York, 994. [7] E. J. Wright, S. M. Mahoney and K. B. Laskey, Use of domain knowledge models to recognize cooperative force activities, Proceedings of the 2 MSS National Symposium on Sensor and Data Fusion, San Diego, CA, 2. [8] E. J. Wright, S. M. Mahoney, K. C. Ng and K. B. Laskey, Measuring Performance for Situation Assessment, Proceedings of 2 MSS National Symposium on Sensor Data Fusion, 2. [9] J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference, Morgan Kaufman, San Mateo, CA. 988. [] A. Dempster, N. Laird, and D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society B 39 pp. 38, 977. [] M.I. Jordan, Learning in Graphical Models, MIT Press Cambridge, MA, 999. [2] D. Chickering, Learning Bayesian Networks is NPcomplete, In Learning from Data e.d. D. Fisher and H. Lens pp. 2 3 Springer-Verlag, 996. [3] C. Boutilier, N. Friedman, M. Goldsmidt, and D. Koller, Context-specific independence in Bayesian networks, In Proc 8th Conference on Uncertainty in AI. pp. 64 72, 996. [4] M.I. Jordan and R.A. Jacobs, Hierarchical mixtures of experts and the EM algorithm, Neural Computation, Vol. 6, pp. 8 24, 994. 893