A SURVEY ON BI-LEVEL TRAIT EXTRACTION-BASED TEXT MINING FOR FLAW VERDICT OF RAILWAY SYSTEMS

Similar documents
Probabilistic Latent Semantic Analysis

Reducing Features to Improve Bug Prediction

Word Segmentation of Off-line Handwritten Documents

Automating the E-learning Personalization

A Comparison of Two Text Representations for Sentiment Analysis

Speech Emotion Recognition Using Support Vector Machine

Human Emotion Recognition From Speech

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Computerized Adaptive Psychological Testing A Personalisation Perspective

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Operational Knowledge Management: a way to manage competence

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

On-Line Data Analytics

Software Maintenance

Rule Learning With Negation: Issues Regarding Effectiveness

Data Fusion Models in WSNs: Comparison and Analysis

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Australian Journal of Basic and Applied Sciences

Classification Using ANN: A Review

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

Welcome to. ECML/PKDD 2004 Community meeting

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Python Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Rule Learning with Negation: Issues Regarding Effectiveness

A Case Study: News Classification Based on Term Frequency

Managing Experience for Process Improvement in Manufacturing

An Estimating Method for IT Project Expected Duration Oriented to GERT

Learning Methods for Fuzzy Systems

AQUA: An Ontology-Driven Question Answering System

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Customised Software Tools for Quality Measurement Application of Open Source Software in Education

TopicFlow: Visualizing Topic Alignment of Twitter Data over Time

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Mining Association Rules in Student s Assessment Data

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

arxiv: v1 [cs.lg] 3 May 2013

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

Towards Semantic Facility Data Management

SARDNET: A Self-Organizing Feature Map for Sequences

Linking Task: Identifying authors and book titles in verbose queries

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Customized Question Handling in Data Removal Using CPHC

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Semantic and Context-aware Linguistic Model for Bias Detection

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

DEVELOPMENT OF AN INTELLIGENT MAINTENANCE SYSTEM FOR ELECTRONIC VALVES

A student diagnosing and evaluation system for laboratory-based academic exercises

Empirical Software Evolvability Code Smells and Human Evaluations

Circuit Simulators: A Revolutionary E-Learning Platform

Conversational Framework for Web Search and Recommendations

Universidade do Minho Escola de Engenharia

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Learning From the Past with Experiment Databases

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

TextGraphs: Graph-based algorithms for Natural Language Processing

Education: Integrating Parallel and Distributed Computing in Computer Science Curricula

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

(Sub)Gradient Descent

Using dialogue context to improve parsing performance in dialogue systems

Session H1B Teaching Introductory Electrical Engineering: Project-Based Learning Experience

Laboratorio di Intelligenza Artificiale e Robotica

A Vector Space Approach for Aspect-Based Sentiment Analysis

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Problems of the Arabic OCR: New Attitudes

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

The Impact of Test Case Prioritization on Test Coverage versus Defects Found

UCEAS: User-centred Evaluations of Adaptive Systems

Agent-Based Software Engineering

Class Responsibility Assignment (CRA) for Use Case Specification to Sequence Diagrams (UC2SD)

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Applications of data mining algorithms to analysis of medical data

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Lecture 1: Basic Concepts of Machine Learning

Postprint.

Semi-Supervised Face Detection

Applications of memory-based natural language processing

Modeling user preferences and norms in context-aware systems

DYNAMIC ADAPTIVE HYPERMEDIA SYSTEMS FOR E-LEARNING

As a high-quality international conference in the field

Transcription:

Volume 119 No. 16 2018, 2763-2768 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ A SURVEY ON BI-LEVEL TRAIT EXTRACTION-BASED TEXT MINING FOR FLAW VERDICT OF RAILWAY SYSTEMS 1. Priya.S 2. Preethi. V 3. Aarthi.S 1. Assistant professor, Department of Information Technology, GKM College of Eng and Technology 2. Assistant professor, Dept of Computer Science and Engineering, SRM Institute of Science and Technology 3. Assistant professor, Dept of Computer Science and Engineering, SRM Institute of Science and Technology priyasugam@gmail.com murugan.preethi18@gmail.com cse.aarthi@gmail.com ABSTRACT A vast quantity of text information is recorded within the forms of repair verbatim in railway maintenance sectors. Economical text mining of such maintenance information plays a very important role in sleuthing anomalies and rising fault identification potency. However unstructured verbatim, high-dimensional information, and unbalanced fault category distribution create challenges for feature alternatives and fault identification. We have a tendency to propose a bi-level feature extraction-based text mining that integrates options extracted at each syntax and semantic levels with the aim to boost the fault classification Performance. We have a tendency to initial perform Associate in Nursing improved χ2 statistics-based feature choice at the syntax level to beat the educational difficulty caused by Associate in Nursing unbalanced information set. We use a prior latent Dirichlet allocation-based feature choice at the semantic level to scale back the info. Finally, we have a tendency to fuse fault options derived from each syntax and linguistics levels via serial fusion. The planned methodology uses fault options at completely different levels and enhances the exactitude of fault identification for all fault categories, significantly minority ones. Its performance has been valid by employing a railway maintenance data set collected from 2008 to 2014 by a railway corporation. KEYWORDS: Repair verbatim data, Dirichlet allocation, and high dimensional data. INTRODUCTION: On repair verbatim data, text mining techniques can be used to bring the associations between fault terms and fault classes that improve the precision of fault diagnosis. In maintenance documents, the number of examples in one fault class (i.e., majority class) is significantly greater than that of the others (i.e., minority classes). Such imbalanced class distributions have brought serious issues to most classifier learning algorithms. We have improved x^2 statics for syntax level. This work gives a bi-level extraction-based text mining for diagnosing the faults to meet the challenges. To achieve the desired results, fault at syntactic and semantic level should be removed. At each level the extracted features provides variety of emphasis 2763

at a particular aspect and has its deficiencies, the proposed feature, fusion of two levels enhance the precision of fault diagnosis for all fault classes. EXISTING SYSTEM PROPOSED SYSTEM ADVANTAGES To reduce the data set into a lowdimensional. To overcome the learning difficulty caused by an imbalanced data set. A large amount of text data is noted in railway maintenance sectors. Efficient text mining of such maintenance data improves fault diagnosis efficiency. Unstructured fault data, highdimensional data and imbalanced fault class distribution give challenges for feature selections and fault diagnosis. In the process of Use r Star t the trai n Serv er Dat aba se malfunctioning, the trouble symptoms are generated and send to the monitoring centre database. After every diagnosis a fault is noted. This is how the existing system work EXISTING SYSTEM DISADVANTAGES Existing system disadvantages are High dimensional data Imbalanced fault class distribution. Unsupervised text mining models. Fault Verbati m Record Fig 1: architecture diagram IMPLEMENTATION Train Reache d the Destina Dest. PROPOSED SYSTEM We propose a bi-level feature extractionbased text mining that integrates features extracted at both syntax and semantic levels with the aim to improve the fault classification performance. We first perform an improved χ2 statistics-based feature selection at the syntax level to overcome the learning difficulty caused by an imbalanced data set. Then, we perform a prior latent Dirichlet allocation-based feature selection at the semantic level to reduce the data set into a low-dimensional topic space. There are four important module in which this proposed system work. First login is created for the user who wants to get information about the railway system. Fault verbatim is recorded for some period of time.analysing this data, information is predicted and fault is diagnosed and rectified. This is sent as information to the user if he needs detail about any information. We perform a prior latent Dirichlet allocation-based feature to reduce the data set into a low-dimensional topic space.. USER INTERFACE DESIGN: This design deals with website login and user registration.user must enter the details that 2764

are asked in this session which will be stored in the server to enable the user for login purpose. GENERATE FAULT VERBATIM RECORDS This module generates fault. Fault will be recorded if speed limit is exceeded by the train or if it doesn t throw signal at a particular station. Fig 2: Customer Login TO START TRAIN As soon as the user has registered, will be granted access to this website. In this website you will find START TRAIN TAB which provides fault verbatim that is recorded as a pop up message. Fig 5: Fault verbatim record GENERATE SIGNAL CODE This module generates a signal code when a train starts from a particular station. With this code a person can identify that a particular train has started or not and if it delays to generate this signal code, an error message will be send to the verbatim record. Fig 3: Generation of verbatim record RAILWAY RECORDS MAINTENANCE Maintenance of records over a period of time helps in future reference about the performance on those years. Fig 6: Signal code GENERATE DESTINATION DETAILS A destination detail provides information about date, time, Km, station code and arrival time once the train reaches the destination. All these details will be stored in the database. Fig 4: Maintenance record 2765

Fig 7: destination details TRAIN DESTINATION DETAIL This allows us to check the destination schedule and status of a particular train with arrival time and Halt time at each station.from this wecome to know whether the train has reached the destination REFERENCE: Fig 8:Train schedule [1] L. Huang and Y. L. Murphey, Text mining with application to engineering diagnostics, in Proc. 19th Int. Conf. IEA/AIE, Annecy, France, 2006, pp. 1309 1317. [2] D. G. Rajpathak, An ontology based text mining system for knowledge discovery from the diagnosis data in the automotive domain, Comput. Ind., vol. 64, no. 5, pp. 565 580, Jun. 2013. [3] J. Silmon and C. Roberts, Improving switch reliability with innovative condition monitoring techniques, Proc. IMechE, F C J. Rail Rapid Transit, vol. 224, no. 4, pp. 293 302, 2010. [4] D. Blei, A. Ng, and M. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res., vol. 3, pp. 993 1022, Jan. 2003. [5] J. Chang, J. Boyd-Graber, C.Wang, S. Gerrish, and D. Blei, Reading tea leaves: How humans interpret topic models, Neural Inf. Process. Syst., vol. 22, pp. 288 296, 2009. [6] D. A. Cieslak and N. V. Chawla, Learning decision trees for unbalanced data, in Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases- Part I. Berlin, Germany: Springer-Verlag, 2008, pp. 241 256. [7] T. Kailath, The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans. Commun. Technol., vol. 15, no. 1, pp. 52 60, Feb. 1967. [8] W. Wang, H. Xu, and X. Huang, Implicit feature detection via a constrained topic model and SVM, in Proc. Conf. Empirical Methods Natural Lang. Process., Seattle, WA, USA, 2013, pp. 903 907. [9] J. Yang, J. Yang, D. Zhang, and J. Lu, Feature fusion: Parallel strategy vs. serial strategy, Pattern Recognit., vol. 36, no. 6, pp. 1369 1381, Jun. 2003. [10] C. Drummond and R. C. Holte, C4. 5, class imbalanced, and cost sensitivity: Why undersampling beats over-sampling, in Proc. Workshop Learn. Imbalanced Datasets II, ICML, Washington, DC, USA, 2003, pp. 1 8. 2766

2767

2768