A SURVEY ON BI-LEVEL TRAIT EXTRACTION-BASED TEXT MINING FOR FLAW VERDICT OF RAILWAY SYSTEMS

Volume 119 No. 16 2018, 2763-2768 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ A SURVEY ON BI-LEVEL TRAIT EXTRACTION-BASED TEXT MINING FOR FLAW VERDICT OF RAILWAY SYSTEMS 1. Priya.S 2. Preethi. V 3. Aarthi.S 1. Assistant professor, Department of Information Technology, GKM College of Eng and Technology 2. Assistant professor, Dept of Computer Science and Engineering, SRM Institute of Science and Technology 3. Assistant professor, Dept of Computer Science and Engineering, SRM Institute of Science and Technology priyasugam@gmail.com murugan.preethi18@gmail.com cse.aarthi@gmail.com ABSTRACT A vast quantity of text information is recorded within the forms of repair verbatim in railway maintenance sectors. Economical text mining of such maintenance information plays a very important role in sleuthing anomalies and rising fault identification potency. However unstructured verbatim, high-dimensional information, and unbalanced fault category distribution create challenges for feature alternatives and fault identification. We have a tendency to propose a bi-level feature extraction-based text mining that integrates options extracted at each syntax and semantic levels with the aim to boost the fault classification Performance. We have a tendency to initial perform Associate in Nursing improved χ2 statistics-based feature choice at the syntax level to beat the educational difficulty caused by Associate in Nursing unbalanced information set. We use a prior latent Dirichlet allocation-based feature choice at the semantic level to scale back the info. Finally, we have a tendency to fuse fault options derived from each syntax and linguistics levels via serial fusion. The planned methodology uses fault options at completely different levels and enhances the exactitude of fault identification for all fault categories, significantly minority ones. Its performance has been valid by employing a railway maintenance data set collected from 2008 to 2014 by a railway corporation. KEYWORDS: Repair verbatim data, Dirichlet allocation, and high dimensional data. INTRODUCTION: On repair verbatim data, text mining techniques can be used to bring the associations between fault terms and fault classes that improve the precision of fault diagnosis. In maintenance documents, the number of examples in one fault class (i.e., majority class) is significantly greater than that of the others (i.e., minority classes). Such imbalanced class distributions have brought serious issues to most classifier learning algorithms. We have improved x^2 statics for syntax level. This work gives a bi-level extraction-based text mining for diagnosing the faults to meet the challenges. To achieve the desired results, fault at syntactic and semantic level should be removed. At each level the extracted features provides variety of emphasis 2763

at a particular aspect and has its deficiencies, the proposed feature, fusion of two levels enhance the precision of fault diagnosis for all fault classes. EXISTING SYSTEM PROPOSED SYSTEM ADVANTAGES To reduce the data set into a lowdimensional. To overcome the learning difficulty caused by an imbalanced data set. A large amount of text data is noted in railway maintenance sectors. Efficient text mining of such maintenance data improves fault diagnosis efficiency. Unstructured fault data, highdimensional data and imbalanced fault class distribution give challenges for feature selections and fault diagnosis. In the process of Use r Star t the trai n Serv er Dat aba se malfunctioning, the trouble symptoms are generated and send to the monitoring centre database. After every diagnosis a fault is noted. This is how the existing system work EXISTING SYSTEM DISADVANTAGES Existing system disadvantages are High dimensional data Imbalanced fault class distribution. Unsupervised text mining models. Fault Verbati m Record Fig 1: architecture diagram IMPLEMENTATION Train Reache d the Destina Dest. PROPOSED SYSTEM We propose a bi-level feature extractionbased text mining that integrates features extracted at both syntax and semantic levels with the aim to improve the fault classification performance. We first perform an improved χ2 statistics-based feature selection at the syntax level to overcome the learning difficulty caused by an imbalanced data set. Then, we perform a prior latent Dirichlet allocation-based feature selection at the semantic level to reduce the data set into a low-dimensional topic space. There are four important module in which this proposed system work. First login is created for the user who wants to get information about the railway system. Fault verbatim is recorded for some period of time.analysing this data, information is predicted and fault is diagnosed and rectified. This is sent as information to the user if he needs detail about any information. We perform a prior latent Dirichlet allocation-based feature to reduce the data set into a low-dimensional topic space.. USER INTERFACE DESIGN: This design deals with website login and user registration.user must enter the details that 2764

are asked in this session which will be stored in the server to enable the user for login purpose. GENERATE FAULT VERBATIM RECORDS This module generates fault. Fault will be recorded if speed limit is exceeded by the train or if it doesn t throw signal at a particular station. Fig 2: Customer Login TO START TRAIN As soon as the user has registered, will be granted access to this website. In this website you will find START TRAIN TAB which provides fault verbatim that is recorded as a pop up message. Fig 5: Fault verbatim record GENERATE SIGNAL CODE This module generates a signal code when a train starts from a particular station. With this code a person can identify that a particular train has started or not and if it delays to generate this signal code, an error message will be send to the verbatim record. Fig 3: Generation of verbatim record RAILWAY RECORDS MAINTENANCE Maintenance of records over a period of time helps in future reference about the performance on those years. Fig 6: Signal code GENERATE DESTINATION DETAILS A destination detail provides information about date, time, Km, station code and arrival time once the train reaches the destination. All these details will be stored in the database. Fig 4: Maintenance record 2765

Fig 7: destination details TRAIN DESTINATION DETAIL This allows us to check the destination schedule and status of a particular train with arrival time and Halt time at each station.from this wecome to know whether the train has reached the destination REFERENCE: Fig 8:Train schedule [1] L. Huang and Y. L. Murphey, Text mining with application to engineering diagnostics, in Proc. 19th Int. Conf. IEA/AIE, Annecy, France, 2006, pp. 1309 1317. [2] D. G. Rajpathak, An ontology based text mining system for knowledge discovery from the diagnosis data in the automotive domain, Comput. Ind., vol. 64, no. 5, pp. 565 580, Jun. 2013. [3] J. Silmon and C. Roberts, Improving switch reliability with innovative condition monitoring techniques, Proc. IMechE, F C J. Rail Rapid Transit, vol. 224, no. 4, pp. 293 302, 2010. [4] D. Blei, A. Ng, and M. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res., vol. 3, pp. 993 1022, Jan. 2003. [5] J. Chang, J. Boyd-Graber, C.Wang, S. Gerrish, and D. Blei, Reading tea leaves: How humans interpret topic models, Neural Inf. Process. Syst., vol. 22, pp. 288 296, 2009. [6] D. A. Cieslak and N. V. Chawla, Learning decision trees for unbalanced data, in Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases- Part I. Berlin, Germany: Springer-Verlag, 2008, pp. 241 256. [7] T. Kailath, The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans. Commun. Technol., vol. 15, no. 1, pp. 52 60, Feb. 1967. [8] W. Wang, H. Xu, and X. Huang, Implicit feature detection via a constrained topic model and SVM, in Proc. Conf. Empirical Methods Natural Lang. Process., Seattle, WA, USA, 2013, pp. 903 907. [9] J. Yang, J. Yang, D. Zhang, and J. Lu, Feature fusion: Parallel strategy vs. serial strategy, Pattern Recognit., vol. 36, no. 6, pp. 1369 1381, Jun. 2003. [10] C. Drummond and R. C. Holte, C4. 5, class imbalanced, and cost sensitivity: Why undersampling beats over-sampling, in Proc. Workshop Learn. Imbalanced Datasets II, ICML, Washington, DC, USA, 2003, pp. 1 8. 2766

2767

2768