World Journal of Engineering Research and Technology WJERT

Similar documents
Mining Association Rules in Student s Assessment Data

Rule Learning with Negation: Issues Regarding Effectiveness

CS Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

On-Line Data Analytics

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Learning From the Past with Experiment Databases

A Case Study: News Classification Based on Term Frequency

Australian Journal of Basic and Applied Sciences

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Software Maintenance

Word Segmentation of Off-line Handwritten Documents

Applications of data mining algorithms to analysis of medical data

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Linking Task: Identifying authors and book titles in verbose queries

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Python Machine Learning

Reducing Features to Improve Bug Prediction

Human Emotion Recognition From Speech

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Automating the E-learning Personalization

Assignment 1: Predicting Amazon Review Ratings

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

CSL465/603 - Machine Learning

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Learning Methods for Fuzzy Systems

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Mining Student Evolution Using Associative Classification and Clustering

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

SIE: Speech Enabled Interface for E-Learning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Speech Emotion Recognition Using Support Vector Machine

Lecture 1: Basic Concepts of Machine Learning

Content-based Image Retrieval Using Image Regions as Query Examples

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Customized Question Handling in Data Removal Using CPHC

Lecture 1: Machine Learning Basics

Learning Microsoft Office Excel

CS 446: Machine Learning

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Welcome to. ECML/PKDD 2004 Community meeting

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Presentation Advice for your Professional Review

K-Medoid Algorithm in Clustering Student Scholarship Applicants

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Test Effort Estimation Using Neural Network

Computerized Adaptive Psychological Testing A Personalisation Perspective

Probabilistic Latent Semantic Analysis

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

16.1 Lesson: Putting it into practice - isikhnas

Universiteit Leiden ICT in Business

Data Fusion Models in WSNs: Comparison and Analysis

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

AQUA: An Ontology-Driven Question Answering System

Activity Recognition from Accelerometer Data

Issues in the Mining of Heart Failure Datasets

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Platform for the Development of Accessible Vocational Training

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

Bug triage in open source systems: a review

Indian Institute of Technology, Kanpur

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Modeling function word errors in DNN-HMM based LVCSR systems

Data Fusion Through Statistical Matching

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

M55205-Mastering Microsoft Project 2016

36TITE 140. Course Description:

Classification Using ANN: A Review

Research Design & Analysis Made Easy! Brainstorming Worksheet

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

CHANCERY SMS 5.0 STUDENT SCHEDULING

Modeling function word errors in DNN-HMM based LVCSR systems

Problems of the Arabic OCR: New Attitudes

Utilizing Soft System Methodology to Increase Productivity of Shell Fabrication Sushant Sudheer Takekar 1 Dr. D.N. Raut 2

TeacherPlus Gradebook HTML5 Guide LEARN OUR SOFTWARE STEP BY STEP

OCR LEVEL 3 CAMBRIDGE TECHNICAL

New Features & Functionality in Q Release Version 3.2 June 2016

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Universidade do Minho Escola de Engenharia

A Right to Access Implies A Right to Know: An Open Online Platform for Research on the Readability of Law

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

INPE São José dos Campos

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Transcription:

wjert, 2018, Vol. 4, Issue 1, 462-466. Original Article ISSN 2454-695X WJERT www.wjert.org SJIF Impact Factor: 4.326 PREDICTING STUDENT PERFORMANCE USING RESULT MINING AND KNOWLEDGE FLOW IN WEKA Dr. A. Kanaka Durga* IT Department, Stanley College of Engineering and Technology for Women, Hyderabad, India. Article Received on 24/11/2017 Article Revised on 15/12/2017 Article Accepted on 05/01/2018 ABSTRACT *Corresponding Author Dr. A. Kanaka Durga It is natural that the quantity of data collected will continue to expand IT Department, Stanley rapidly because of the increasing ease, availability and popularity of College of Engineering and the web. Data Mining has its great application in organizations because Technology for Women, it collects large amount of data. By applying data mining techniques Hyderabad, India. people can work on the extraction of hidden, historical and previously unknown large databases. In this paper we used weka tool for the pre-processing, classification and analysis of institutional results of engineering students. Results show analysis of marks and backlogs. Knowledge flow analysis has been carried out on engineering students results. KEYWORDS: Classification, clustering, WEKA, data mining, Knowledge flow, engineering students. 1. INTRODUCTION Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools [1] can answer business questions that traditionally were too time consuming to resolve. They scour www.wjert.org 462

databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Huge amounts of data are being accumulated at current situations. Manual way of extracting information is very difficult in the current scenario as the volume of data to be processed is very huge. Data mining tools provide a better alternative to the manual method. In this paper we are using WEKA Tool for the analysis of engineering students data. This paper uses Result Data Mining Techniques (RDM) to provide more accurate result analysis. Data mining is a dynamic technology [2] to deal and extract the hidden potential data which is to be converted to useful information. It discovers information within the data that queries and reports can't effectively reveal. After gathering data from the university result, data mining technique need to be applied to determine the status of the students in various subjects. 2. RESULT DATA MINING Predicting under performance students in educational organizations is a very challenging task if it is done manually. Data Mining plays a major role in analyzing the weak areas of students and focus on the key areas where there is a scope for poor performance. Student results in various subjects of engineering have been mined to focus the thrust areas which affect the student performance. Student result mining involves extracting information from various subjects and the gathered information is processed using classification algorithms [3] of data mining. Tools can be developed using data mining such that performance evaluation becomes easier for the teachers. [5] 3. METHODOLOGY The results of engineering students are taken from the university website and preprocessed using WEKA tool. [4] WEKA is a popular data mining tool developed by the Waikato University, Newzealand. It consists of many algorithms used for filtering, classification, clustering, regression, association analysis etc, which are useful in data mining and many other fields like Natural Language Processing and Machine Learning. The algorithms in WEKA can be applied in the GUI or they can be invoked from the programmer's code. www.wjert.org 463

WEKA GUI consists of four tabs which are explorer, experimenter, knowledge flow and Simple CLI. This paper implements knowledge flow on the selected data set i.e engineering students' results and displays the experimental results. 4. RESULTS The dataset is taken from the university website and preprocessed using the explorer tab of WEKA tool. The dataset is prepared in notepad. Any text editor can be used for this purpose. The data should be written in ARFF format and the extension of the file consisting the target data is arff. The general structure of an ARFF file is shown in Fig. 1. @relation relationname @attribute attributename type @attribute attributename{options} @data ----data comes here------ Fig. 1: ARFF file general structure. Any ARFF file consists of two sections: one is the header section which includes the name of the relation and the names and types of the attributes. The attribute types can be numeric, nominal, ratio etc. Numeric types can be used in this dataset(engineering students results) for the marks scored by the student. Nominal attributes take only a set of values which are specified by options in Fig. 1. The attributes result, backlog and appeared in our dataset are nominal. Once the dataset is ready WEKA tool is used for preprocessing the data and visualized shown in Fig 1. Fig 2 shows the classification of data using Naïve Bayesian Classifier. Knowledge flow is applied in WEKA tool and is illustrated in two cases i.e. Fig 3 and Fig 4. www.wjert.org 464

Fig 1: Data Visualization. Fig 2: Data Classification. Fig 3: Knowledge Flow. Fig 4: Knowledge Flow. The knowledge flow in Fig. 3 Contains the following: ARFFLOADER, CLASS ASSIGNER, NAÏVE BAYES UPDATABLE CLASSIFIER, INCREMENTAL CLASSIFIER EVALUATOR, TEXT VIEWER AND STRIP CHART. The flow of Fig. 3 is explained as follows: The dataset is loaded using the ARFFLOADER and the output of the loader is given as instance to the class assigner which in turn gives instance as input to the NAIVE BAYES UPDATABLE CLASSIFIER. The purpose of a class assigner is to assign a column to be the class for any data set, training set or test set. The incremental output of the NAIVE BAYES UPDATABLE CLASSIFIER is given as input to the INCREMENTAL CLASSIFIER EVALUATOR. The purpose of the Incremental Classifier Evaluator is to evaluate the performance of incrementally trained classifiers. The output of the classifier is given as input to the text viewer and strip chart with respective formats. The final outputs can be viewed in these components of the knowledge flow. The flow of Fig. 4 is explained as follows: The dataset is loaded using the ARFFLOADER www.wjert.org 465

and the output of the loader is given as dataset to the class assigner which in turn gives dataset as input to cross validation fold maker. The output of the cross validation fold maker is given as input to the two text viewers in one of which training set is viewed and in the other test set is used. 6. CONCLUSION This paper includes the study of Data Mining tool applied to student result set. By using WEKA tool you can pre-process the data, classify the data for different subjects and do some result analysis the result data. The engineering student university result. We have applied data mining for analyzing the engineering students results. The development of such analysis tools help in the applications of e-educational systems, [6] which aid the faculty to predict performance of students. 7. REFERENCES 1. Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, 2nd ed., Morgan Kaufmann publishers, San Francisco, 2006. 2. Fayyad, U., & Stolorz, P. Data mining and KDD: promise and challenges. Future generation computer systems, 1997; 13(2): 99-115. 3. Guerra L, McGarry M, Robles V, Bielza C, Larrañaga P, Yuste R. Comparison between supervised and unsupervised classifications of neuronal cell types: A case study. Developmental neurobiology, 2011; 71(1): 71-82. 4. An Introduction to weka data mining tool. 5. Romiro C. and Ventura S., Educational data mining- A survey from 1995-2005 Expert systems with applications, 2007; 33: 135-146. 6. International Educational Data Society www.educationaldatamining.org. www.wjert.org 466