Available online at ScienceDirect. Procedia Computer Science 61 (2015 ) 18 23

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Assignment 1: Predicting Amazon Review Ratings

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Probabilistic Latent Semantic Analysis

WHEN THERE IS A mismatch between the acoustic

Generative models and adversarial training

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Speech Emotion Recognition Using Support Vector Machine

PSIWORLD Keywords: self-directed learning; personality traits; academic achievement; learning strategies; learning activties.

arxiv: v2 [cs.cv] 30 Mar 2017

Learning From the Past with Experiment Databases

Artificial Neural Networks written examination

Introduction to Causal Inference. Problem Set 1. Required Problems

(Sub)Gradient Descent

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Calibration of Confidence Measures in Speech Recognition

CS Machine Learning

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

University of Groningen. Systemen, planning, netwerken Bosman, Aart

On-the-Fly Customization of Automated Essay Scoring

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Procedia - Social and Behavioral Sciences 197 ( 2015 )

Learning Methods for Fuzzy Systems

Procedia - Social and Behavioral Sciences 237 ( 2017 )

Australian Journal of Basic and Applied Sciences

Learning Methods in Multilingual Speech Recognition

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

STA 225: Introductory Statistics (CT)

Procedia - Social and Behavioral Sciences 143 ( 2014 ) CY-ICER Teacher intervention in the process of L2 writing acquisition

Truth Inference in Crowdsourcing: Is the Problem Solved?

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Semi-Supervised Face Detection

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Procedia Computer Science

Attributed Social Network Embedding

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Procedia - Social and Behavioral Sciences 209 ( 2015 )

Wenguang Sun CAREER Award. National Science Foundation

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Institutional repository policies: best practices for encouraging self-archiving

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Speaker Identification by Comparison of Smart Methods. Abstract

Modeling function word errors in DNN-HMM based LVCSR systems

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

NCEO Technical Report 27

Taxonomy of the cognitive domain: An example of architectural education program

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Math 96: Intermediate Algebra in Context

International Conference on Current Trends in ELT

Speech Recognition at ICSI: Broadcast News and beyond

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Procedia - Social and Behavioral Sciences 98 ( 2014 ) International Conference on Current Trends in ELT

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Procedia - Social and Behavioral Sciences 191 ( 2015 ) WCES 2014

Probability and Statistics Curriculum Pacing Guide

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

arxiv: v1 [math.at] 10 Jan 2016

Automating the E-learning Personalization

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Lecture 1: Basic Concepts of Machine Learning

ScienceDirect. Malayalam question answering system

Reinforcement Learning by Comparing Immediate Reward

Data Fusion Through Statistical Matching

On-Line Data Analytics

Procedia - Social and Behavioral Sciences 197 ( 2015 )

Procedia - Social and Behavioral Sciences 136 ( 2014 ) LINELT 2013

Learning By Asking: How Children Ask Questions To Achieve Efficient Search

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Improving Fairness in Memory Scheduling

arxiv: v1 [cs.cl] 2 Apr 2017

The Good Judgment Project: A large scale test of different methods of combining expert predictions

A simulated annealing and hill-climbing algorithm for the traveling tournament problem

Rule Learning With Negation: Issues Regarding Effectiveness

ScienceDirect. Noorminshah A Iahad a *, Marva Mirabolghasemi a, Noorfa Haszlinna Mustaffa a, Muhammad Shafie Abd. Latif a, Yahya Buntat b

Modeling function word errors in DNN-HMM based LVCSR systems

CSL465/603 - Machine Learning

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Content-free collaborative learning modeling using data mining

Laboratorio di Intelligenza Artificiale e Robotica

Human Emotion Recognition From Speech

A survey of multi-view machine learning

Computerized Adaptive Psychological Testing A Personalisation Perspective

Procedia - Social and Behavioral Sciences 191 ( 2015 ) WCES Why Do Students Choose To Study Information And Communications Technology?

Transcription:

Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 61 (2015 ) 18 23 Complex Adaptive Systems, Publication 5 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 2015-San Jose, CA Semi-Supervised Clustering for Sparsely Sampled Longitudinal Data Mariko Takagishi a, Hiroshi Yadohisa b * a Graduate School of Culture and Information Science, Doshisha University, Kyoto, 610-0394, Japan. b Department of Culture and Information Science, Doshisha University, Kyoto, 610-0394, Japan. Abstract Longitudinal data studies track the measurements of individual subjects over time. The features of the hidden classes in longitudinal data can be effectively extracted by clustering. In practice, however, longitudinal data analysis is hampered by the sparse sampling and different sampling points among subjects. These problems have been overcome by adopting a functional clustering data approach for sparsely sampled data, but this approach is unsuitable when the difference between classes is small. Therefore, we propose a semi-supervised approach for clustering sparsely sampled longitudinal data in which the clustering result is aided and biased by certain labeled subjects. The effectiveness of the proposed method was evaluated in simulation. The proposed method proved especially effective even when the difference between classes is blurred by interference such as noise. In summary, by adding some subjects with class information, we can enhance existing information to realize successful clustering. 2015 The Authors. Published by by Elsevier B.V. B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of scientific committee of Missouri University of Science and Technology. Peer-review under responsibility of scientific committee of Missouri University of Science and Technology Keywords: clustering; functional data; sparse 1. Introduction Longitudinal data measurements are repeatedly measured from individual subjects at multiple time points. Longitudinal data analysis is often hampered by two problems: the data are sparsely sampled and the sampling points differ among the subjects. To overcome the first problem, the data acquired over time can be analyzed using * Corresponding author. Tel.: +81 774 65 7657. E-mail address: applesan728@gmail.com (Mariko Takagishi), hyadohis@mail.doshisha.ac.jp (Hiroshi Yadohisa) 1877-0509 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of scientific committee of Missouri University of Science and Technology doi:10.1016/j.procs.2015.09.138

Mariko Takagishi and Hiroshi Yadohis / Procedia Computer Science 61 ( 2015 ) 18 23 19 several basis functions, an approach known as functional data analysis. In addition, approaches of functional data analysis that were modified for sparsely sampled data have been proposed, such as classification 9 and clustering 4. There is an important difference between classification and clustering: the former groups subjects with given class labels; the latter groups subjects without class labels. In machine-learning terminology, classification and clustering are referred to as supervised and unsupervised learning, respectively. If class labels are assigned to some of the subjects, the learning is called semi-supervised learning 1. Semi-supervised learning for subject grouping can be categorized into semi-supervised classification or semi-supervised clustering. Semi-supervised classification adds unlabeled subjects to improve the generalization of the model, whereas semi-supervised clustering uses labeled subjects to aid and bias the clustering results. In this study, we propose a semi-supervised clustering model based on functional data approach for sparsely sampled longitudinal data. As the related work, Kawano and Konishi proposed semi-supervised logistic discrimination for functional data 5, but their method is semi-supervised classification which can only be applied when we have the information of all classes from initial labeled subjects. However, often a situation arises where the information of some classes is not available. Also, this method is not for sparsely sampled data. Therefore, we extend the functional clustering model (FCM) for sparsely sampled data proposed by James and Sugar, so that the proposed model can utilize the existing class labels to aid the clustering result. In simulation, we investigate the effectiveness of the proposed method in a situation that we think is feasible in longitudinal data analysis. 2. Clustering model for sparsely sampled data with class labels Our proposed clustering model for sparsely sampled longitudinal data exploits the existing class labels. This section and the one following introduce the model and the objective function, respectively. Finally, we derive the update formula that estimates the parameters. 2.1. Model The given data of subject i are represented by two vectors: L i dimensional observation vector y i, which contains the observed value for subject i at each time point, and L i dimensional time point vector t i, which contains the time at which the observed value is obtained. We then introduce the existing p basis functions to represent observation vector, where the basis functions are natural cubic splines. Then the basis function matrix for subject i is the L i p matrix S i (s 1 (t i ),..., s p (t i )). In the proposed model, the observation vector of each subject is constructed as a linear combination of basis functions, and the coefficients are modeled using p dimensional vector, p h matrix (where h ( min(p, K-1)) which are common to all subjects, the h-dimensional vector (k =1,, K), which is common to each class, and a p-dimensional random vector. Moreover, for some of the subjects, the class label vector is given for which c ik {0, 1}, K c ik k 1 =1, (i =1,..., m) (m<n). (2.1) c ik 1 indicates that subject i belongs to class k. In this formulation, the proposed model is written as (2.2a) (2.2b) (2.3)

20 Mariko Takagishi and Hiroshi Yadohis / Procedia Computer Science 61 ( 2015 ) 18 23 (2.4) Here, S contains the basis matrix values at the given time point of all subjects. The proposed model (2.2a), (2.2b) is applicable to sparsely sampled data due to the random variable in coefficients, which indicates the individual variability. With the constraint (2.3), represents the mean curve of all subjects. Another characteristic of this model is that through the formulation, can be visualized in low-dimensional space. In addition, under the constraint (2.4), the distance between each subject and the class mean can be visualized in Euclidian metrics (for details, see 4 ). Moreover, if all class label vector is not given in model (2.2a), i.e., no class label information is provided, the proposed model (2.2a), (2.2b) reduces to FCM. 2.2. Objective function To estimate the parameters in the proposed model (2.2a), (2.2b), we use an expectation-maximization (EM) algorithm 2. To derive the objective function which is maximized in EM algorithm, latent K-dimensional random variables are assigned to the unlabeled subjects. Let z i =(z ik ) (k =1,..., K)be a latent random variable such that z ik {0, 1}, K z ik k 1 =1, (i = m +1,..., n). (2.5) z ik 1 indicates that subject i belongs to class k. Note that the same notation describes the class label vector (2.1) the difference is that z i is unobservable. Under constraint (2.5), z i is distributed in a multinominal distribution. Therefore, the sample log-likelihood based on all random variables in the model, z i (i = m +1,...,n), y i and can be written as (2.6) 2.3. Parameter Estimation The EM algorithm for estimating the parameters in the proposed model proceeds as follows. Initialize parameters: Randomly allocate initial values to all parameter. E-step: Calculate the conditional expectations of the latent variables in (2.6), namely, z i, and as follows.

Mariko Takagishi and Hiroshi Yadohis / Procedia Computer Science 61 ( 2015 ) 18 23 21 M step: Update parameters using the conditional expectations calculated in the E step, as follows. Check for convergence: Terminate the calculation if the change for the objective function between two consecutive steps is less than a convergence criterion; otherwise, return to E step. 3. Simulations The proposed method is demonstrated in a situation that is expected to highlight its advantage, and which we consider feasible in longitudinal data analysis. Given this situation, we generate artificial data and compare the clustering results of the proposed method and FCM 4. 3.1. Situation settings As mentioned above, we evaluate a conceivably realistic situation that highlights the advantages of the proposed method. Consider that we are given some measurements; e.g., indicators of disease progress, which change over time in one of the two patterns: stable (cluster 1) or gradually increasing (cluster 2). Meanwhile, the measured subjects are divided into three groups: those whose measurements remain stable over time (group 1), those whose measurements will increase at later times (group 2), and those whose measurements have already increased (group 3). Subjects in group 1 can be grouped into cluster 1, whereas subjects in groups 2 and 3 are assigned to cluster 2. In addition, the class label of subjects in groups 1 and 2 is unknown, whereas that of subjects in group 3 is known (in the artificial data, the class label of all subjects is known). In this scenario, we can expect that by applying the clustering, those subjects whose measurements will later increase (group 2) can be detected in advance. If the measurement indicates the progress of the disease, clustering can be used to prevent the incipient disease progression. 3.2. Data and evaluation procedures In this subsection, we explain the generation of the artificial data and evaluation of the results. The true functions are f 1(t) = (1/10)*(1.1) t for group 1, f 2(t) = (1/10)*(1.23) t for group 2, and f 3(t) = (1/10)*(1.24) t for group 3 (figure 1). The time point t ranges from 0 to 75. The value of the time point vector of each subject is randomly selected from 0 to 75, and the observation value for subject i at time point t il is given by y il f b (t il ) e il (b=1, 2, 3) where e il ~ N(0, ).

22 Mariko Takagishi and Hiroshi Yadohis / Procedia Computer Science 61 ( 2015 ) 18 23 From the artificial data, we constructed two datasets: dataset 1 comprising of subjects in groups 1 and 2 (namely, the unlabeled subjects), and dataset 2 comprising groups 1, 2, and 3. In dataset 2, the subjects in group 3 are labeled to cluster 2 (Figures 2, 3). The FCM were then applied to dataset 1, and the proposed method was applied to dataset 2. The number of basis functions was set to 5. The convergence criterion was 0.001. The number of subjects in each group is n/3. In this simulation, three factors were manipulated: the number of subjects (n), the error variance ( ), the number of time point in the range of t (T), i.e., as T decreases, the data gets sparse. Finally, the clustering result was evaluated by the adjusted rand index (ARI 7 ). ARI takes the maximal value of 1 when it perfectly recovers the underlying clustering structure. In addition, as ARI decreases, the recovery gets worse. To ensure a fair comparison, the evaluation was restricted to subjects in groups 1 and 2 in both datasets. For example, if n = 45, the number of subjects in each group is 15. Then, 30 subjects in groups 1 and 2 were used to evaluate ARI in both FCM and the proposed method. 3.3. Results Figure 4 shows boxplots of the ARI. Initially, the result of ARI is reduced as the error variance increases in both FCM and proposed method. However, in small sample n* = 30 and 40 (including group 3, n = 45 and 60, respectively), and the error variances are 3 and 5, the ARI tends to remain high in the proposed method but reduced to 0 in FCM. Meanwhile, despite the high error variance, the performances of FCM and the proposed method are similar in n* = 60. Because FCM uses the data of all the subjects in the parameter estimation, it might compensate for the incomplete information of some subjects when analyzing a sufficiently large sample. 4. Conclusions This study proposed a semi-supervised clustering model for sparsely sampled longitudinal data. The model was formulated and an update formula for the parameter estimation was derived. The effectiveness of the proposed method was demonstrated in simulation. The method proved particularly effective when the difference between classes was blurred by high noise variance and the number of subjects was relatively small. However, the proposed method performs well only when the measurements of the labeled and unlabeled subjects presumed to be in the same cluster are similar; otherwise, the labeled subjects do not contribute to the clustering result. Therefore, note that in practice, it should be known that the measurements of at least some of the unlabeled subjects must change similarly to those of the labeled subjects. Finally, the parameters in the proposed method are obtained by maximizing the likelihood, which does not necessarily yield the best classification performance. Therefore, in future work, we will modify the objective function to maximize the classification performance. Fig.1. True functions for groups 1, 2 and 3, respectively. Fig. 2. Graphical image for dataset (as a whole, n T matrix). The white parts correspond to dataset 1, whereas the white and gray parts correspond to dataset 2.

Mariko Takagishi and Hiroshi Yadohis / Procedia Computer Science 61 ( 2015 ) 18 23 23 Fig. 3. The left panel is an artificial dataset with groups 1 and 2, whereas the right panel is a dataset with group 1, 2 and 3 (n = 60; T = 6; = 3). Fig. 4. Boxplots of ARI. From the left, the boxplots show the results of the proposed method and FCM by varying (1, 3, 5) respectively. n* indicates the number of subjects in groups 1 and 2 that were used to evaluate ARI, and n indicates the number of subjects in all groups. References 1. Chapelle O., Schoelkopf B., Zien A.. Semi-Supervised Learning. MITPress; 2006. 2. Dempster A.P., Laird N.M., Rubin D.B.. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 1977; 39: 1-38. 3. Green P.J., Silverman B.W.. Nonparametric Regression and Generalized Linear Models: A roughness Penalty Approach. CRC Press; 1993. 4. James G.M., Sugar C.A.. Clustering for sparsely sampled functional data. Journal of the American Statistical Association 2003; 98: 397-408. 5. Kawano S., Konishi S.. Semi-supervised logistic discrimination for functional data. Bulletin of Informatics and Cybernetics 2012; 44: 1-15. 6. Laird N.M., James H.W.. Random-effects models for longitudinal data. Biometrics 1982; 38: 963-974. 7. Hubert L., Arabie P.. Comparing partitions. Journal of Classification 1985; 2: 193-218. 8. Martinez-Uso A. Pla F., Sotoca J.. A semi-supervised Gaussian mixture model for image segmentation. In Proceedings of the International Conference on Pattern Recognition (ICPR 2010) 2010; 2941-2944. 9. Müller H.G.. Functional modelling and classification of longitudinal data. Scandinavian Journal of Statistics 2005; 32: 223-240. 10. Ramsay J.O., Silverman B.W.. Functional Data Analysis, 2nd ed.. Springer New York; 2005. 11. Rice J.A., Functional and longitudinal data analysis: perspectives on smoothing. Statistica Sinica 2004; 14: 631-648. 12. Verbeke G., Molenberghs G.. Linear Mixed Models for Longitudinal Data. Springer New York; 2009.