Childhood Obesity epidemic analysis using classification algorithms Suguna. M M.Phil. Scholar Trichy, Tamilnadu, India suguna15.9@gmail.com Abstract Obesity is the one of the most serious public health challenges of 21 st century globally there are more than one billion overweight adults and it is increasing three fold. In Today s world people get affected by many diseases where such disease leads to many other diseases. Obesity is such a condition that leads to many diseases. In this paper several data mining classification algorithms are used to detect the obesity and overweight conditions in children and detect the cause further. Keywords: Data mining, classification, childhood obesity, J48, CART, Naive Bayes I. INTRODUCTION Data mining is the process of extracting knowledge from large data. It is also called as knowledge in database. It is performed in iterative sequence. Classification maps data into predefined groups or classes often referred as supervised learning as classes are determined prior to examine. A classification task begins with a data set in which the class assignments are known. Medical diagnosis is a classification process. Fat is one of the main macronutrients. Fat plays an important role in preserving vital organs and it also maintains body temperature and it also store calories for future use. But however when it is stored in excess lead to obesity. Obesity is a serious worldwide health epidemic that affects one in four Americans. This phenomenon is global and about 30 million Indian are found to be obese and found to be double in next five years. Obesity is the condition that arise when body fat has accumulated to the extent that may have a negative effect on health, and it also leads to many other disease such as diabetes, sleep apnea, heart disease, stroke, gout, gallbladder disease, high blood pressure,osteoarthritis and even certain type of cancer. Childhood obesity is one of the most serious public health challenges of the 21st century. Children and adolescents who are obese are likely to be obese as adults and they are likely to have many health related and psychological problems. Childhood obesity is doubled in children and quadrupled in adults in the past years. The paper is organized as follows: section II as Related works, section III as Data set description, section IV as Proposed model, section V as Experimental results and finally section VI as Conclusion II. RELATED WORKS Adnan M.H.M [4] et al in their work provides the review of various data mining methods and its utilization in predictions of obesity and points outs the merits and demerits of particular data mining techniques. Adnan M.H.M [5] et al in their paper provides a novel framework for predictions of childhood obesity predictions using NBtree. Abduh Elbanna[6] et al in gastroenterology and surgery using data mining in obesity related gastrointestinal motility disorder; (IBS) and surgical bariatric approaches. Naive Bayes algorithm is used to obtain result and the rapid miner tool was employed. Alex Aussem [7] et al in their paper provided the framework for the analysis of visceral obesity and its determinants in women using Bayesians Networks. Archana Bhattarai [8] et al in their research work use text classification to predict obesity and its co-morbidities. Various algorithms such as Naive Bayes, Support Vector Machine (SVM), J48 and J48 and extraction were used. Syed Taghi Heydari[10] et al in their work performed a comparative study on detection of obesity using Artificial Neural Network and Logistics Regression. They predicted that Artificial Neural Network provides better result than Logistics Regression however they cited the requirement of a customized classifier for the Neural Networks.Sivaranjani.T [11] provides a comparative study on the prediction of obesity by using KNN and ID3 algorithms.she predicted that the ID3 algorithm works efficiently and provide more accurate result than the KNN algorithm by using Rapid Miner tool.shaoyan Zhang et al[12]in their paper provided a comparison of logistic regression with six data mining techniques such as decision trees (C4.5), association rules, Artificial Neural Networks (ANN), Naive Bayes, Bayesian networks and Support Vector Machines (SVM).They concluded that Support Vector Machine (SVM) provides more accurate result than the rest five data mining techniques. Sunita Soni et al [13] in their work discussed about the Case Based Reasoning (CBR).Three Data Mining techniques such as Nearest Neighborhood, Decision Tree and Bayesian Classification, were applied on distributed case bases for Case retrieval and Case adaptation. their work talks about the use of data mining technology in RES Publication 2012 Page 22 III. DATA SET DESCRIPTION The data are collected from the Child and Adolescent Health Measurement Initiative (CAHMI). DRC Indicator Dataset: 2007 National Survey of Children s Health. Data Resource Center for Child and Adolescent Health, ww.childhealthdata.org.. The objective of the data set is to
analysis the obesity among children whose age is between 10 to 17. The data set consists of 12 attributes that are used to predict the obesity. The detailed description of the data set are given in the below table 3.1 3.1 data set description No Name of the Description attribute 1 CID Children Id 2 Age Age of the children 3 Gender Gender of the children 4 Weight Weight (pounds) 5 Height Height (mts) 6 Sleep hour Sleep time 7 Exercise time Exercise time 8 Time on Time spend in Computer computer 9 Watching TV Time spend on hours watching TV 10 Depression Depression level 11 Health status Status of health 12 Body Mass Index Value 13 CLS class 14 Category Category of the The attributes given here are based on the data type. The data type used here is numerical and nominal, here category and gender takes nominal values and the rest of the data takes the numerical values. IV. PROPOSED MODEL In the proposed method mainly decision tree is used for predicting the obesity from the given data set instances. Here the framework can be given as below, In the proposed model three different types of decision tree algorithms such as Simple Cart, J48 and NB Tree are applied on type obesity dataset in the WEKA tool and the performance is calculated. 4.1 Simple Cart Table 4.1: Proposed Framework INPUT DATA PREPROCESSING COLLECT DB DATA CLEANING DECISION TREE ALGORITHM OUTPUT CART is the Classification And Regression Tree it is greedy algorithm, it is used to build binary decision tree in that it chooses the locally best discriminatory feature at each stage in the process. As with ID3 entropy is used as a measure to choose the best splitting attribute and criterion. Here child is created for each sub category, only two children are created. The splitting is performed around what is determined to be the best split point. At each step an exhaustive search is performed to determine the best split. CART handles the missing data by simply ignoring that record in calculating the goodness of a split on that attribute. The tree stops growing when no split will improve the performance, it also contains a pruning strategy. 4.2 J48 Algorithm J48 builds decision trees from a set of labeled training data using the concept of information entropy. It uses the fact that each attribute of the data can be used to make a decision by splitting the data into smaller subsets. J48 examines the normalized information gain (difference in entropy) that results from choosing an attribute for splitting the data. To make the decision, the attribute with the highest RES Publication 2012 Page 23
normalized information gain is used. Then the algorithm recurs on the smaller subsets. The splitting procedure stops if all instances in a subset belong to the same class. Then a leaf node is created in the decision tree telling to choose that class. But it can also happen that none of the features give any information gain. In this case J48 creates a decision node higher up in the tree using the expected value of the class.j48 can handle both continuous and discrete attributes, training data with missing attribute value sand attributes with differing costs. Further it provides an option for pruning trees after creation. Fig 5.1 Performance of the Algorithms based on the time taken 4.3 NB Tree Naive bayes classification is a type of Bayesian classification. It is a simple classification technique. It assumes that the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is called class conditional independence. It exhibits high accuracy and speed when applied to large database. It is used to compare it results with decision tree networks and neural networks. V. EXPERIMENTAL RESULTS The given three types of decision tree algorithms like Simple Cart, J48 and NB Tree are applied on the obesity data set in WEKA and the performance of the algorithm are given based various factors. The performance can be obtained based on the time taken to build the tree and correctly classified instances Fig 5.2 Performance of the Algorithms based on the accuracy Table 5.1 Time taken by the algorithms to build the decision tree Name of the Algorithm Simple Cart Time Taken to build the decision tree 0.03 seconds J48 0.01seconds NB Tree 0.45seconds RES Publication 2012 Page 24
The decision tree model for the obesity is given by the tree structure. Fig 5.3 Decision tree for obesity dataset Body Mass Index Table 5.3 confusion matrix for Simple CART,NB Tree and J48 191 0 0 0 0 34 0 0 0 0 33 0 0 0 0 18 <= Under weight The above decision tree algorithm predicts the class label. The final output will be patterns which are used to find the children fall in which Class based on it they are classified as Underweight, Normal, Overweight and obese or not. A Confusion Matrix is a useful visualization tool for analyzing the classifier accuracy. Structure of the confusion matrix can be given as below Where, Normal CLASS Table 5.2: Structure of the Confusion Matrix TP FP >1 <= >3 Over weight TN TP is True Positive, obese children correctly diagnosed. FP is False Positive, Normal people incorrectly identified as obese. TN is True Negative, Normal people correctly identified as healthy. FN is False Negative, obesity people incorrectly identified as healthy. The confusion matrix based on the execution of decision tree classification algorithm such as Simple Cart, NB tree and J48 FN Obese From the various factors such as time taken to build the decision tree, accuracy and confusion matrix we arrive at a conclusion that J48 algorithm provide better result. using WEKA tool are given below, RES Publication 2012 Page 25 VI. CONCLUSION Data mining plays a vital role in mining large database and its usage in determining hidden information is very useful in medical mining. In medical field classification techniques have high utility. This experimental model is built based as a test case on the training dataset. This experiment is successfully performed in children health training dataset with several data mining classification algorithms and found that J48 algorithm provides better performance with 100% of accuracy and minimum time taken. It is believed that the data mining can help in the obesity research and improve the health care of people with overweight and obesity. It can also be implemented using several classification techniques. REFERENCE [1] Han J. Kamber. M, Data Mining; Concepts and Techniques, Morgan Kaufmann Publishers. [2] Margaret H. Dunham, Data Mining Techniques and Algorithms, Prentice Hall Publishers. [3] Arun.K.Pujari, Data mining Techniques, University Press( India) Private Limited,2001 [4] Adnan, M.H.M., Husain.W, Rashid, N.A.A.Rashid, A survey on utilization of data mining for childhood obesity prediction, Information and Telecommunication Technologies(APSITT), 2010 8th Asia -pacific symposium on, vol., no., pp.1,6, 15-18 July 2010. [5] Adnan, M.H.M,Husain. W., Rashid, N.A.A.Rashid, A framework for childhood obesity classifications and predictions using NBtree, Information Technology in Asia (CITA 11), 2011 7th International Conference, vol., no., pp.1,6, 12-13 July 2011. [6] AbduhElbanna, AbdElrazek M., AlyAbdElrazek Is Advanced Statistical Computing Technology a Clue in Applied Medicine? A Study using Data Mining as a Predictor Technology in Gastroenterology & Bariatric Surgery; Novel Elbanna Operations, Global Journal of Computer Science
and Technology Software & Data EngineeringVolume 13 Issue 12 Version 1.0 Year 2013. [7] Aussem et al., Analysis of lifestyle and metabolic predictors of visceral obesity with Bayesian Networks. BMC Bioinformatics 2010 11:487. [8] ArchanaBhattarai, VasileRus, DipankarDasgupta., Classification of Clinical Conditions: A Case Study on Prediction of obesity and its Co-morbidities, Department of Computer Science,The university of Memphis,209 Dunn Hall. [9] LalithaSarojaThota et al., A Review on Information Technology in Obesity Epidemic: Prediction and Prevention, International Journal of Advanced Research in Computer Science and Software Engineering 3(9), September - 2013, pp. 775-780. [10] SeyedTaghiHeydari et al., Comparison of Artificial Neural Networks with Logistic Regression for Detection of Obesity Springer April 2011. [11] T.Sivaranjani, Comparative Study on Obesity Based on ID3 and KNN International Journal of Advanced Research in Computer Science And Management Studies pp 389,396vol.2, Issue 9, September 2014. [12] Shaoyan Zhang, C.T., XiaojunZeng, Hong Qiao, Iain Buchan, John Keane, Comparing data mining methods withlogistic regression in childhood obesity prediction. Information Systems Frontiers, 2009. 11(4): p. 51. [13] Soni, S.; Pillai, J., "Usage of Nearest Neighborhood, Decision Tree and Bayesian Classification Techniques in Development of Weight Management Counseling System," Emerging Trends in Engineering and Technology, 2008. ICETET '08. First International Conference, vol., no., pp.691, 694, 16-18 July 2008. BIOGRAPHY M.Suguna, M.Sc, B.Ed, M.Phil, completed M.phil in Bishop Heber College Trichy (2014-2015). Paper Published: A Technical review on Obesity Analysis using classification algorithms, International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.55 (2015). RES Publication 2012 Page 26