DATA ANALYSIS IN ROAD ACCIDENTS USING ANN AND DECISION TREE

International Journal of Civil Engineering and Technology (IJCIET) Volume 9, Issue 4, April 2018, pp. 214 221, Article ID: IJCIET_09_04_023 Available online at http://www.iaeme.com/ijciet/issues.asp?jtype=ijciet&vtype=9&itype=4 ISSN Print: 0976-6308 and ISSN Online: 0976-6316 IAEME Publication Scopus Indexed DATA ANALYSIS IN ROAD ACCIDENTS USING ANN AND DECISION TREE Roop Kumar R PG Scholar, Department of Computer Science, CHRIST (Deemed to be University), Bengaluru, India Ramamurthy B Associate Professors, Department of Computer Science, CHRIST (Deemed to be University), Bengaluru, India. ABSTRACT Road accidents have become some of the main causes for fatal death globally. A report tells that road accident is the major cause for high death rate other than wars and diseases. A study by World Health Organization (WHO), Global status report on road safety 2015 says over 1.24 million people die every year due to road accidents worldwide and it even predicts by 2020 this number can even increase by 20-50%. This can affect the GDP of the Country, for developing countries this can affect adversely. This paper shows the use of data analytics techniques to build a prediction model for road accidents, so that these models can be used in real time scenario to make some policies and avoid accidents. This paper has identified the attributes which has high impact on accident severity class label. Keywords: Road accident, Data analysis, ANN, Decision tree, Machine learning, Prediction/Classification, Back Propagation. Cite this Article: Roop Kumar R and Ramamurthy B, Data Analysis in Road Accidents Using ANN and Decision Tree, International Journal of Civil Engineering and Technology, 9(4), 2018, pp. 214 221. http://www.iaeme.com/ijciet/issues.asp?jtype=ijciet&vtype=9&itype=4 1. INTRODUCTION Over 3,400 people die every day and 10,00,000 people are injured or disabled every year in road accident worldwide. These numbers are very large compared to the average death rate of the world. According to WHO, road accident is ranked as the 10 th leading cause for death in the year 2015 with 18 deaths per 1,00,000 [1]. India is Ranked as first in road death in the world [2]. Global status report on road safety 2013 say more than 231000 points are estimated number of death every year in India alone due to road accidents, these numbers are higher than the number of people killed through the wars in our country. There are18.9 people per 1,00,000 killed in accidents [3]. As a developing country, these road accidents can affect the country s economy. http://www.iaeme.com/ijciet/index.asp 214 editor@iaeme.com

Roop Kumar R and Ramamurthy B The proposed paper analyses the road accident data collected. Using various data analytics methodologies. Identifying the factors impacting the accident severity of the given data by using information gain measure and chi square measure. These road accidents can be prevented and their overall impact on the country and its development can reduce drastically if proper models are applied and policies made. Before deciding upon the policies and models we first have to analyse the causes and impact factors. This can be done by using Data Analysis techniques. The data then extracted should be converted into a more useful model or policy making algorithm, this part is where machine learning comes into picture. Data analysis is a technique which helps to discover different knowledge available in the data. It gives insight views of the data, so that the data becomes more informative and useful form for business or domain perspective. There different types of learning technique available like Supervised learning and Unsupervised learning. These techniques can be used in road accident data to find different views of the data. Machine learning is a branch of AI which helps to build intelligent system by making it learn from the existing cases, and experience for better system. Most of the techniques are nature inspired. These concepts are eventually available in Data analysis as different classifier and clustering algorithms; which can be used for building prediction model. These algorithms are self-learner where the system learns itself without any definitive program. This paper uses machine learning concepts for data analysis in road accidents to build an intelligent system that can predict future base on the available data used for training the system. Road accident have become some of the major health concerns due to increase in the fatal deaths and injuries. Road accident data is collected and analysis is done to find the pattern available in the data. Before starting the analysis, the data is first pre-processed according to the requirement of the user for future purpose. Identifying the intensity of the attribute in the accident from the available data set. The key objective of data analysis in road accident is to identify the key factor for the accident and form some policies that would reduce the accident level. There are four major factors which leads to road traffic and accidents such as [4]: Driver factors Vehicle factors Road factors Environment factors 2. LITERATURE SURVEY There are different data mining techniques available for analysis of road accidents. This paper compares the accuracy of different technique and models for road accident, proposed by different researchers. Some of the techniques are: Random forest, Rough set Decision tree Artificial neural network Naïve Bayes, etc. Of which Naïve Bayes, Decision tree gives better accuracy for the respective data [5]. http://www.iaeme.com/ijciet/index.asp 215 editor@iaeme.com

Data Analysis in Road Accidents Using ANN and Decision Tree The author in this paper identifying the key factors for accident by a proposed framework, the data is being pre-processed, clustered using clustering algorithms like Euclidean distance, dynamic time wrapping, triangle distance metric and hierarchical cluster analysis for a time series and identify the trend of accident using trend analysis for 2 different district and identify the similarities between them and come to conclusion. It uses trend analysis for identifying the trend in the pre-processed data set. It is difficult and time consuming to analyse every time series of every cluster. [6] This paper has an in-depth study focusing on the application of event analysis through investigating the accident details and reconstructing the scenario. The goal of this research study are as follow: 1. To identify the factors contributing based on the findings obtained from crash investigation and reconstruction by using a case study; 2. To apply an event analysis in establishing the links between the events to describe the crash scenario based on the available information. The proposed model makes a conceptual view of the accidents and then analyses the data for the accuracy. It makes in depth study about the accident. It identifies the key factory accurately because of the reconstruction of the accident from the report given from the people present in that place at that time. [7] This paper deals with prediction of traffic incident duration such that the driver gets prior information. Neural Network method of prediction is used to build the model. This paper uses the real traffic incident data for building the prediction model, 660 Records where used for training the model and 170 Records where used for testing the built model. The result generated had 85.35% of accuracy with the actual result. [8] This paper uses Rough set theory which is a kind of uncertainty analysis method. Initially the information decision table is created using the available data set, then simplified algorithm of rough set model is used to calculate the degree of the attributes, importance of different factors to their corresponding accident morphologies. [9] The researcher has used discovery algorithm to identify useful insight from road accident dataset having multiple attributes. Unlike classification learning, subgroup discovery pursues rules of not the accuracy but the generality and unusualness. Depending on the factors of the data we are focusing our attention to, we may combine multiple relevant features of interest to make a synthetic target feature, and give it to the subgroup discovery algorithm. After a set of rules is derived, some post processing steps are taken to make the ruleset more compact and easier to understand. [10] 3. METHODS The proposed paper uses two prediction models namely artificial neural network and decision tree for predicting the outcome of the accident data Artificial Neural Network (ANN) is supervised learning used for classification of the result based on the model built using training data. It is nature inspired concept imitating the brain cell, where every neuron individually processes data and provide input for next level of neuron. The common goal is to predict the values based on trained data. ANN is one of the important concepts of deep learning where large sized data set is used for building better classification model. Compared with all other prediction model ANN provides the best model in most of the cases. In the proposed paper neuralnet package in R is used [11], it uses back propagation method where the actual and predicted results are compared and based on the http://www.iaeme.com/ijciet/index.asp 216 editor@iaeme.com

Roop Kumar R and Ramamurthy B incorrectness, the weight of the nodes are adjusted. This is also known as feedback method. [12] Decision Tree is one of the simple and commonly used prediction model. The tree consists multiple if else rule for classifying the output for a given input. Decision tree provides simple rules which is easy to understand. The model provides good results when the size of the data is limited. Decision tree resemble human reasoning and mapping of data [13]. The package used is C50 in R which is also rule based model, this model is an extended version of C4.5 package in R which uses Shannon entropy for better information gain node. the C50 package can handle both numeric and other non-numeric data types also. [14] The dataset used in this study is obtained from the UK data service. This data is the record of accidents that occurred in the year 2013 throughout UK. The dataset consists of 138660 records and 30 attributes. The class label attributes used in this data set for research purpose is Accident Severity. Figure 1 shows the procedure followed for building the model. Figure 1: Flow diagram of the model building process Data Collection: In this phase road accident related data for the year 2013 in United Kingdom is collected from UKDA, UK data service website, which stores data of the major surveys taken under UK government, provided for researchers and business analysts for research purpose. [15] Data Pre-processing: Pre-processing helps in transforming the data into required form. To reduce the complexity of the data, remove the unwanted attributes, that improves the accuracy of the result, Data sampling is performed to train and test the model for available data set. In this process 2 major methods are followed: Data field selection: Based on the Domain knowledge the fields are selected for further process in the analysis. Data Cleaning: The missing values and outliers in the data is removed so that the quality of the model is maintained. Data transformation is the end result of the Data pre-processing phase where the actual data is converted into the required form for analysis purpose. http://www.iaeme.com/ijciet/index.asp 217 editor@iaeme.com

Data Analysis in Road Accidents Using ANN and Decision Tree Table 1 Shows the list of attributes available in the data set before pre-processing Table 1 Attributes of the data set before pre-processing Accident Index Number Police Force Code Accident Severity(class label) Number of vehicles Number of Casualties Date, Month, Year Time(hour, Minute of the accident) Local Authority Location(OSGR)Northing,Easting 1 st Road details(road class, Road Speed limit Junction details number, Road type) Junction Control 2 nd Road details(road class, Road number) Pedestrian Crossing Details(human control, physical facilities) Light condition Weather condition Road surface condition Special condition at site Carriage way hazards Did police attend the Scene To reduce the complexity of the model built, the dimensionality is reduced in data based on the domain knowledge and other dimensionality reduction techniques. Table 2 shows the list of attributes used for analysis. The attributes where modified for model building. Table 2 Pre-processed attributes details used for analysis Attribute Name Possible values Number of Vehicles 1:one vehicle; 2:two vehicles;3:three vehicles; 4: four or more vehicles(vehicles involved in accident) Quartile 1:january-march; 2:april-june; 3:july-september; 4:october-december Time period 0:day ;2:night State 1: England;2: Wale;3: Scotland 1 st Road class 1-6: road classes available in UK Junction details 0-9: details of the nearby junction Junction control 0-4: junction control details 2 nd Road class 1-6: road classes available in UK Pedestrian Crossing-Physical Facilities 0:false; 1: true Light conditions 1: day light; 4: darkness-lights lit; 5: darkness-lights unlit; 6: darkness-no lighting; 7: darkness-lighting unknown Weather condition 1-6: weather condition in the accident area Road surface condition 0-5:condition of the road Special conditions at site 0:false;1:true Accident severity(class label) 1:Fatal;2:Serious;3:Slight Model building: Using Artificial Neural Network and Decision tree algorithms models are built. There are two level process that occurs while building a model where data is split into two proportion. Training level: Prediction model is built based on the previous data available for the specific domain. Testing level: The accuracy of the model built in training level is measured by comparing the actual classification value and predicted value. Model extraction: The built model is extracted and used for future prediction of the given scenario. Identifying the key factors / attributes that affect the class label, accident severity. 4. EXPERIMENT RESULTS The experiment conducted used 4841 records and 14 attribute including the class label. Since the data is not evenly distributed, sampling was performed based on the different criteria correlating with the class label. The result achieved using the C50 method in R tool using all attributes was 79.8% which is better when compared with the paper proposed by Dipo T. http://www.iaeme.com/ijciet/index.asp 218 editor@iaeme.com

Roop Kumar R and Ramamurthy B Akomolafe et al [16] which gave 77.7% using ID3 method in decision tree. C50 has more additional feature than ID3 method. The number of rules generated is 292.The following table shows the attribute usage of in the decision tree. Table 3 provides the list of attributes and their usage percentage in building the model. Table 3 Decision tree attribute usage Attribute name Percentage Light Conditions 100% 1 st Road Class 92.80% Quartile 90.71% 2 nd Road Class 90.82% Number of Vehicles 86.93% State 74.11% Pedestrian Crossing-Physical Facilities 56.16% Time period 53.02% Junction Details 45.62% Junction control 37.69% Road Surface Conditions 27.69% Weather Conditions 26.16% Neuralnet method was implemented for the same data set for classifying the accident severity, the error reached 215.145 in 40495 steps with 4 nodes in the hidden layer. The accuracy of this model is 79% the data even include the time attribute to find the trend [6]. Liping Guan et al proposed work where various details about accident was collected and NN model was built to predict accident duration [8]. But the proposed model looks the data in different dimension. To predict the accident severity which is more important, based on these pattern preventive measures can be taken. Figure2 shows the model generated using the backpropagation algorithm in ANN. Figure 2 Neural network model built for road accident http://www.iaeme.com/ijciet/index.asp 219 editor@iaeme.com

Data Analysis in Road Accidents Using ANN and Decision Tree 5. CONCLUSIONS This paper shows the use of Machine learning concepts for building a prediction model in road accident data. Two basic models are used, artificial neural network and decision tree. Both models emulate human thought process. Since the data used was real time records building a generic model with higher accuracy is more difficult. So data sampling is performed on the dataset in pre-processing phase based on the class label, State, date. Based on the decision tree generated, rules are extracted. The experiment shows ANN gives better performance compared with the decision tree for the same sample. As the countries are moving towards digital era, when accident data is recorded in a tabular format with appropriate attributes, these data can be used for data analysis for a particular region, so appropriate policies and preventive measure can be taken. REFERENCES [1] World Health Organization, Global Health Observatory (GHO) data, 19 July 2017. http://www.who.int/gho/mortality_burden_disease/causes_death/top_10/en/ [2] Auto economic times, India ranks first in road deaths in the world. 19 July 2017, http://auto.economictimes.indiatimes.com/news/industry/india-ranks-first-in-road-deathsin-the-world/56221070 [3] Global status report on road safety2013: supporting a decade of action, World Health Organization,2013 [4] Rui Tian and Zhaosheng Yang, Method of road traffic accidents causes analysis based on Data Mining, IEEE, conference paper, 2010 [5] Maninder Singh and Amrit Kaur, A Review on road accident in traffic system using data mining techniques, International Journal of Science and research, 2016, 5(1), pp 1531-1532 [6] Sachin Kumar and Durga Toshniwal, A novel framework to analyze road accident time series data, SpringerOpen, 3(8), 2016 [7] Mouyid Bin Islam and Kunnawee Kanitpong, Identification of Factors in Road Accidents Through In-Depth Accident Analysis, IATSS Research, 2008, 32(2), pp 56-67 [8] Liping Guan et al, Traffic Incident Duration Prediction Based on Artificial Neural Network, International Conference on Intelligent Computation Technology and Automation, 2010 [9] Tao Gang et al, Cause Analysis of Traffic Accidents Based on Degrees of Attribute Importance of Rough Set, 2015, pp 1665-1669 [10] Jeongmin Kim et al, Mining Traffic accident data by subgroup discovery using combinatorial targets, IEEE, 2015 [11] Cran.r, neuralnet package,7 January 2017.https://cran.rproject.org/web/packages/neuralnet/neuralnet.pdf [12] KurtHornik et al, Multilayer feedforward networks are universal approximators, Elsevier,1989, 2(5), pp359-366 http://www.iaeme.com/ijciet/index.asp 220 editor@iaeme.com

Roop Kumar R and Ramamurthy B [13] S. B. Kotsiantis, Decision trees: a recent overview, Springer, 2011, 39(4), pp 261-283 [14] Cran.r, C50 package,7 January 2017. https://cran.rproject.org/web/packages/c50/c50.pdf [15] Department for Transport. Road Accident Statistics Branch, Road Accident Data, 2013 [computer file]. Colchester, Essex: UK Data Archive [distributor], September 2014. SN: 7550, http://dx.doi.org/10.5255/ukda-sn-7550-1 [16] Dipo T. Akomolafe, Akinbola Olutayo, Using Data Mining Technique to Predict Cause of Accident and Accident Prone Locations on Highways, American Journal of Database Theory and Application, 2012, 1(1), pp.26-38 [17] Jasvinder Singh, Mahipal Singh, Anil Baliram Ghubade and Manjinder Singh Analytical Hierarchy Process for Ro ad Accident of Motorcycle in India: A Case Study. International Journal of Mechanical Engineering and Technology, 8(7), 2017, pp. 1348 1356. [18] Aishwarya Patil and Deepthi Das, Comparative Analysis and Suggestion of Architectures for Reduction of Road Accidents, International Journal of Civil Engineering and Technology, 9(3), 2018, pp. 945 954 http://www.iaeme.com/ijciet/index.asp 221 editor@iaeme.com