with Neural Networks 1 Claudia Ulbricht, Georg Dorner Austrian Research Institute for Articial Intelligence, Schottengasse 3

Similar documents
Learning Methods for Fuzzy Systems

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

SARDNET: A Self-Organizing Feature Map for Sequences

phone hidden time phone

A Reinforcement Learning Variant for Control Scheduling

Evolutive Neural Net Fuzzy Filtering: Basic Description

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Human Emotion Recognition From Speech

Visual CP Representation of Knowledge

Does the Difficulty of an Interruption Affect our Ability to Resume?

Modeling function word errors in DNN-HMM based LVCSR systems

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Abstractions and the Brain

Rule Learning With Negation: Issues Regarding Effectiveness

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Rule Learning with Negation: Issues Regarding Effectiveness

Soft Computing based Learning for Cognitive Radio

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Modeling function word errors in DNN-HMM based LVCSR systems

On-the-Fly Customization of Automated Essay Scoring

A study of speaker adaptation for DNN-based speech synthesis

Software Maintenance

Python Machine Learning

Reinforcement Learning by Comparing Immediate Reward

Artificial Neural Networks

On the Formation of Phoneme Categories in DNN Acoustic Models

INPE São José dos Campos

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Adaptive Learning in Time-Variant Processes With Application to Wind Power Systems

Why Did My Detector Do That?!

Proceedings of Meetings on Acoustics

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Assignment 1: Predicting Amazon Review Ratings

Assessing Functional Relations: The Utility of the Standard Celeration Chart

Evolution of Symbolisation in Chimpanzees and Neural Nets

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

The Power Integration Diffusion Model for Production Breaks

Using the Artificial Neural Networks for Identification Unknown Person

Circuit Simulators: A Revolutionary E-Learning Platform

Word Segmentation of Off-line Handwritten Documents

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Probabilistic Latent Semantic Analysis

Speech Recognition at ICSI: Broadcast News and beyond

Not the Quit ting Kind

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Guru: A Computer Tutor that Models Expert Human Tutors

The Effects of Ability Tracking of Future Primary School Teachers on Student Performance

Knowledge Transfer in Deep Convolutional Neural Nets

Statewide Framework Document for:

Speech Emotion Recognition Using Support Vector Machine

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Bluetooth mlearning Applications for the Classroom of the Future

TD(λ) and Q-Learning Based Ludo Players

Data Fusion Models in WSNs: Comparison and Analysis

On the Combined Behavior of Autonomous Resource Management Agents

Including the Microsoft Solution Framework as an agile method into the V-Modell XT

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

A redintegration account of the effects of speech rate, lexicality, and word frequency in immediate serial recall

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Causal Relationships between Perceived Enjoyment and Perceived Ease of Use: An Alternative Approach 1

Speaker Identification by Comparison of Smart Methods. Abstract

Artificial Neural Networks written examination

Interaction Design Considerations for an Aircraft Carrier Deck Agent-based Simulation

SURVIVING ON MARS WITH GEOGEBRA

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Measures of the Location of the Data

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community

The number of involuntary part-time workers,

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

An Empirical and Computational Test of Linguistic Relativity

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

Detailed course syllabus

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

An Introduction to Simio for Beginners

University of Groningen. Systemen, planning, netwerken Bosman, Aart

A Pipelined Approach for Iterative Software Process Model

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

UNDERSTANDING THE INITIAL CAREER DECISIONS OF HOSPITALITY MANAGEMENT GRADUATES IN SRI LANKA

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Title:A Flexible Simulation Platform to Quantify and Manage Emergency Department Crowding

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Softprop: Softmax Neural Network Backpropagation Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Transcription:

Forecasting Fetal Heartbeats with Neural Networks 1 Claudia Ulbricht, Georg Dorner Austrian Research Institute for Articial Intelligence, Schottengasse 3 and Institute of Medical Cybernetics and Articial Intelligence, Freyung 6 A{1010 Vienna, Austria claudia@ai.univie.ac.at, georg@ai.univie.ac.at Andreas Lee Department of Prenatal Diagnosis and Therapy University Hospital Vienna Wahringer Gurtel 18{20, A{1090 Vienna, Austria A{1090 Vienna, Austria Andreas.Lee@akh-wien.ac.at Abstract The given task is to forecast the intervals between the heartbeats recorded from a fetus. The six tested neural network models combine input windows, hidden layer feedback, and self-recurrent unit feedback in different ways. The two networks combining an input window and hidden layer feedback performed best. One of them has additional self-recurrent feedback loops around the units in the state layer, which enable the system to deal with time-warped patterns. It turns out to be reasonable to combine several techniques for processing the temporal aspects inherent to the input sequence. 1 The Task Using the cardiotocogram (the CTG) is common for routine fetal monitoring. The CTG consists of fetal heartbeat and uterine contraction signals. At the site under investigation, such signals have been recorded and stored for further analysis. Usually, the heart rate is pre-processed before it is analyzed. In this study, though, each single heartbeat interval is recorded for obtaining greater precision. The overall aim is the development of an intelligent alarm system which can be employed as a tool for decision support. The rst step when processing the 1 This is an extended version of the paper: Ulbricht C., Dorner G., Lee A.: Forecasting Fetal Heartbeats with Neural Networks, in Bulsari A.B., et al.(reds.), Solving Engineering Problems with Neural Networks, Systeemitekniikan seura ry, Turku, pp.403-406, 1996. 1

given data sets is to detect the artefacts, so that they can be removed. In order to improve the detection of artefacts, the next value in the time series can be forecast and compared with the actual value. Values which deviate considerably from the forecast are more likely to be disturbed by measuring errors than those which are close to the forecast ones. As proposed in [Miksch et al., 1995], such a forecasting system could also be used for \repairing" the input signals. Instead of replacing missing values by average or preceding ones, they could be replaced by the forecast values which are more likely to resemble the true values. 2 Tested Neural Network Models Most neural network research has focused on processing single patterns, but sequence processing requires a method for saving information for subsequent time steps. Overviews of such neural networks, which can be used for handling temporal aspects, are given, for instance, in [Ulbricht et al., 1992], [Mozer, 1993], [Rohwer, 1994], and [Chappelier and Grumbach, 1994]. All the networks tested on the task of forecasting heartbeat intervals had at least a single input unit for the sequence element, three hidden units, and one output unit for the forecast. The six tested variants of network models are depicted in Figs. 1 through 3. The focus lies on the layers and the links between them. The numbers of the layers refer to the order in which they are updated. A dashed arrow denotes a link for copying unit activations, whereas a full arrow denotes a set of links connecting each unit of one layer with all the units of the other layer. The following models were tested: 1. A network with an input window (Fig. 1): In this non-recurrent network, the window is obtained by delaying the single sequence element at the input several times in a row. The resulting window is of size 5, i.e. it contains the ve most recent sequence elements x(t?1); x(t?2); : : : ; x(t?5). 2. A network with hidden layer feedback (Fig. 1): In this simple recurrent network, the hidden layer is delayed and fed back like in the network described in [Elman, 1990]. 3. A network with an input window of size 5 and hidden layer feedback (Fig. 2): It combines delays with and without feedback. 4. A network with a self-recurrent feedback loop around the input (Fig. 2): In this network, the temporal aspects are handled by delaying and feeding back the activation of a unit to itself. The feedback loop has a weight, and the input is weighted by 1?. 5. A network with an input window of size 5 and self-recurrent feedback loops around all the units in the input window (Fig. 3): It uses both a window and unit feedback. 2

6. A network with an input window of size 5, hidden layer feedback, and selfrecurrent feedback loops around the units in the memory layer (Fig. 3): In the taxonomy presented in [Mozer, 1993], such a memory with self-recurrent feedback loops is referred to as \exponential trace memory," because it contains an exponentially weighted average. This memory can also be regarded as the state of the network. Due to the feedback loops, the state changes more slowly. The speed of change depends on the weights. Such sluggish states can also be obtained by using any other nearly auto-associative nextstate function. The important point is that \state vectors at nearby points in time must be similar," as it is stated in [Jordan, 1986]. The resulting models are better suited to dealing with patterns in sequences which are warped in time, because they have the intrinsic capability of generalizing over the temporal dimension. x t-1, x t-2,..., x t-n x t-1 Memory L4 Figure 1: Networks 1 and 2 x t-1, x t-2,..., x t-n Memory L4 x t-1 Figure 2: Networks 3 and 4 In all the networks, the weights of the self-recurrent feedback loops were equal to 0:9. The forecast of the window network can be written as: ^x(t) = F1 (x(t?1); x(t?2); x(t?3); x(t?4); x(t?5)) : (1) Thereby F1 denotes the mapping of the whole neural network. The forecast of the network with hidden layer feedback is ^x(t) = F2 (x(t?1); h1(t?1)) ; (2) 3

x t-1, x t-2,..., x t-n x t-1, x t-2,..., x t-n Memory L4 Figure 3: Networks 5 and 6 where h1(t? 1) is a vector of length 3 (bold letters are used for vectors and mappings to vectors). Since the hidden layer is part of the feedback loop, it contains information of past inputs: h1(t?1) = f1 (x(t?2); h1(t?2)) ; (3) where f1 represents the mapping from the input to the hidden layer. The forecast of the third network is: where h2(t?1) is: ^x(t) = F3 (x(t?1); x(t?2); x(t?3); x(t?4); x(t?5); h2(t?1)) ; (4) h2(t?1) = f2 (x(t?2); x(t?3); x(t?4); x(t?5); x(t?6); h2(t?2)) : (5) The fourth network has unit feedback in the input layer: ^x(t) = F4 ((1?) x(t?1) + i1 1 (t?1)) ; (6) where i1 1 (t) stands for the single element of the contents of the input layer i1(t): i1 1 (t) = (1?) x(t?1) + i1 1 (t?1): (7) For the network with an input window and self-recurrent feedback loops, the output is: ^x(t) = F5( (1?) x(t?1) + i2 1 (t?1); (1?) x(t?2) + i2 2 (t?1); (1?) x(t?3) + i2 3 (t?1); (1?) x(t?4) + i2 4 (t?1); (1?) x(t?5) + i2 5 (t?1) ): (8) In this equation, i2 1 (t?1); : : : ; i2 5 (t?1) are the components of the vector i2(t?1) representing the input layer. Finally, the forecast obtained with the sixth network 4

can be described as: ^x(t) = F6( x(t?1); x(t?2); x(t?3); x(t?4); x(t?5); (1?) h3 1 (t?1) + s1(t?1); (1?) h3 2 (t?1) + s2(t?1); (1?) h3 3 (t?1) + s3(t?1) ); (9) where h3 1 (t?1); : : : ; h3 3 (t?1) are the components of the hidden layer h3(t?1), and where s1(t?1); : : : ; s3(t?1) are the components of the state layer s(t?1). 3 Comparative Analysis A sequence consisting of 1200 elements was used as the training set. The validation set, which contained 600 elements, was used to determine when to stop training. Another 600 sequence elements were used for testing. A segment of the sequence of heartbeats is depicted in Fig. 4. The heartbeats were measured in milliseconds. The interval ranging from 0 to 1200 milliseconds was transformed for the network to the interval ranging from zero to one. The mean square error (MSE) is taken as a measure for evaluating the performance: MSE = 1 N NX (x d (n)? x t (n)) 2 ; (10) n=1 where x d (n) is the n-th network output, and x t (n) the n-th target output out of N instances. An auto-regressive model, an AR[1] model as described in [Box and Jenkins, 1970], x(t) = x(t?1) + "(t); (11) is set up for comparison with the networks. With equal to one, a random walk process is modeled. When using this model for forecasting, the estimate for x(t) is equal to x(t?1). If it turns out that this is the best estimate, it can only be said that subsequent intervals are likely to be similar. However, if better forecasting models can be found, more can be said about the sequence of beats. For such an AR[1] model with an equal to one, the MSE on the test set is 0.0047. Each type of network was tested three times. The results are visualized in Fig. 4. For each network, the MSE averaged over three tests with random weight initialization is shown. Each type of network was tested three times. The MSE on the test set is given in Table 1 for all the experiments. Additionally, the mean of the three experiments is provided. The results are visualized in Fig. 4. It turns out that it is possible to obtain better forecasts with appropriately designed neural networks than with a simple AR[1] model. The networks combining an input window and hidden layer feedback (Networks 3 and 6) perform best. According to the t-test, the results of these two networks are signicantly 5

Fetal Heartbeat Intervals 1200 1000 Interval in milliseconds 800 600 400 200 0 1200 1220 1240 1260 1280 1300 Beats Figure 4: Sequence of heartbeats 0.008 Heartbeat Forecasting Results 0.006 Mean MSE 0.004 0.002 Net 1: 0.0061 Net 2: 0.0059 Net 3: 0.0039 Net 4: 0.0073 Net 5: 0.0076 Net 6: 0.0038 0.000 Figure 5: Overview of the results from forecasting heartbeat intervals 6

Conguration MSE Net Window Hidden Self-recurrent Test Test Test Mean Nr. Size Feedback Feedback 1 2 3 Error 1 5 0.0066 0.0053 0.0064 0.0061 2 1 1 0.0078 0.0060 0.0039 0.0059 3 5 1 0.0039 0.0038 0.0041 0.0039 4 1 Input 0.0083 0.0071 0.0066 0.0073 5 5 Window 0.0077 0.0077 0.0073 0.0076 6 5 1 Memory 0.0038 0.0038 0.0038 0.0038 Table 1: Results from forecasting heartbeat intervals better at the 95%-level. Moreover, the following points result from an analysis of the mean errors: Network 3, having both an input window and hidden layer feedback, performs very well in all the experiments. This demonstrates how an appropriate combination of non-recurrent and recurrent mechanisms (an input window and hidden layer feedback) can lead to much better results than using only one of the two mechanisms, as it is the case in Networks 1 and 2. Self-recurrent feedback loops in the input layer, as they are used in Networks 4 and 5, are not well suited to this application. They do not suce to provide enough information about the sequence. This is a typical result, as self-recurrent input feedback alone is not sucient for capturing the temporal aspects which are relevant to most types of applications. The network which combines several sequence-handling methods (Network 6) performs even slightly better than the same type of network without selfrecurrent feedback loops (Network 3). This shows that unit feedback in the state layer, which leads to a slowly changing state, can improve the performance of the network. 7

4 Conclusion The given task was to forecast the intervals between fetal heartbeats. The performances of six dierent neural network models and a simple auto-regressive model were tested empirically. The outcome can be regarded as an example demonstrating that good results can be obtained when various methods are combined in a single neural network. The best performance was obtained with a network which used layer delay, layer feedback, and unit feedback. It can be seen that the outcome is heavily dependent on how the state is formed. Additional self-recurrent feedback loops around the state layer can even slightly improve the network performance. They make the state change more slowly, which is better for dealing with time-warped sequences. Even though the results are dierent for each application, it can be concluded that combining several techniques for processing the temporal aspects inherent to the input sequence seems to be reasonable. Especially the combination of an input window and layer feedback turns out to lead to good results. They can function better than an input window or layer feedback alone. Acknowledgements This research is supported by the \Medizinisch{Wissenschaftlicher Fonds des Burgermeisters der Bundeshauptstadt Wien." References [Box and Jenkins, 1970] G.E. Box and G.M. Jenkins. Holden-Day, San Francisco, 1970. Time Series Analysis. [Chappelier and Grumbach, 1994] J.-C. Chappelier and A. Grumbach. Time in Neural Networks. SIGART Bulletin, Vol. 5, No. 3, pages 3{11, 1994. [Elman, 1990] J.L. Elman. Finding Structure in Time. Cognitive Science, 14:179{ 211, 1990. [Jordan, 1986] M.I. Jordan. Attractor Dynamics and Parallelism in a Connectionist Sequential Machine. In Proceedings of the Eighth Annual Conference of the Cognitive Science Society, pages 531{546. Erlbaum, Hillsdale, NJ, 1986. [Miksch et al., 1995] S. Miksch, W. Horn, C. Popow, and F. Paky. Automated Data Validation and Repair Based on Temporal Ontologies. Technical Report TR-95-04, Osterreichisches Forschungsinstitut fur Articial Intelligence, Wien, 1995. 8

[Mozer, 1993] M.C. Mozer. Neural Net Architectures for Temporal Sequence Processing. In A. Weigend and N. Gershenfeld, editors, Predicting the Future and Understanding the Past. Addison-Wesley Publishing, Redwood City, CA, 1993. [Rohwer, 1994] R. Rohwer. The Time Dimension of Neural Network Models. SIGART Bulletin, Vol. 5, No. 3, pages 36{44, 1994. [Ulbricht et al., 1992] C. Ulbricht, G. Dorner, S. Canu, D. Guillemyn, G. Marijuan, J. Olarte, C. Rodriguez, and I. Martin. Mechanisms for Handling Sequences with Neural Networks. In C.H. Dagli et al., editors, Intelligent Engineering Systems through Articial Neural Networks, ANNIE'92, volume 2, pages 273{278. ASME Press, New York, 1992. 9