The Application of C4.5 Method in Determining the Passing of English Proficiency Test (EPT)

Similar documents
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Foreign Languages. Foreign Languages, General

Mining Association Rules in Student s Assessment Data

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Learning Methods in Multilingual Speech Recognition

CS Machine Learning

Word Segmentation of Off-line Handwritten Documents

Rule Learning With Negation: Issues Regarding Effectiveness

Learning From the Past with Experiment Databases

Learning Microsoft Office Excel

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

A Case Study: News Classification Based on Term Frequency

One Stop Shop For Educators

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Getting Started with Deliberate Practice

INTERMEDIATE ALGEBRA PRODUCT GUIDE

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

TEKS Correlations Proclamation 2017

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

K-Medoid Algorithm in Clustering Student Scholarship Applicants

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Empirical research on implementation of full English teaching mode in the professional courses of the engineering doctoral students

Lower and Upper Secondary

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Rule Learning with Negation: Issues Regarding Effectiveness

Python Machine Learning

English for Specific Purposes World ISSN Issue 34, Volume 12, 2012 TITLE:

Fall 2016 ARA 4400/ 7152

Applications of data mining algorithms to analysis of medical data

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Learning Microsoft Publisher , (Weixel et al)

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

International School of Kigali, Rwanda

South Carolina English Language Arts

IMPROVING SPEAKING SKILL OF THE TENTH GRADE STUDENTS OF SMK 17 AGUSTUS 1945 MUNCAR THROUGH DIRECT PRACTICE WITH THE NATIVE SPEAKER

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Textbook Evalyation:

West Haven School District English Language Learners Program

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Language Acquisition Chart

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

On-Line Data Analytics

Designing e-learning materials with learning objects

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

The Effectiveness of Realistic Mathematics Education Approach on Ability of Students Mathematical Concept Understanding

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

GERMAN STUDIES (GRMN)

Learning goal-oriented strategies in problem solving

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

Communicative Language Teaching (CLT): A Critical and Comparative Perspective

The development and implementation of a coaching model for project-based learning

Special Edition. Starter Teacher s Pack. Adrian Doff, Sabina Ostrowska & Johanna Stirling With Rachel Thake, Cathy Brabben & Mark Lloyd

Characteristics of the Text Genre Realistic fi ction Text Structure

Running head: LISTENING COMPREHENSION OF UNIVERSITY REGISTERS 1

Major Milestones, Team Activities, and Individual Deliverables

Applying ADDIE Model for Research and Development: An Analysis Phase of Communicative Language of 9 Grad Students

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Undergraduate Programs INTERNATIONAL LANGUAGE STUDIES. BA: Spanish Studies 33. BA: Language for International Trade 50

Introduction. on the New HSK Test

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

Reading Horizons. Organizing Reading Material into Thought Units to Enhance Comprehension. Kathleen C. Stevens APRIL 1983

How do we balance statistical evidence with expert judgement when aligning tests to the CEFR?

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

GACE Computer Science Assessment Test at a Glance

Parsing of part-of-speech tagged Assamese Texts

World Languages Unpacked Content for Classical Language Programs What is the purpose of this document?

HEPCLIL (Higher Education Perspectives on Content and Language Integrated Learning). Vic, 2014.

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

What is a Mental Model?

Customized Question Handling in Data Removal Using CPHC

Tour. English Discoveries Online

Mining Student Evolution Using Associative Classification and Clustering

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Managing Experience for Process Improvement in Manufacturing

Prediction of Maximal Projection for Semantic Role Labeling

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Student Course Evaluation Class Size, Class Level, Discipline and Gender Bias

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Abdul Rahman Chik a*, Tg. Ainul Farha Tg. Abdul Rahman b

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Standards-Based Bulletin Boards. Tuesday, January 17, 2012 Principals Meeting

AQUA: An Ontology-Driven Question Answering System

Seventh Grade Course Catalog

Issues in the Mining of Heart Failure Datasets

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

Loughton School s curriculum evening. 28 th February 2017

Test Effort Estimation Using Neural Network

Transcription:

The Application of C4.5 Method in Determining the Passing of English Proficiency Test (EPT) Edy Victor Haryanto Universitas Potensi Utama, Jl. K.L. Yos Sudarso Km. 6,5 No. 3 A Medan edyvictor@gmail.com Abstract English Proficiency Test or EPT is the test that is mandatory for all students of the University of Potensi Utama before completing their studies. This test is intended to measure the level of the students mastery of the English that has been taught during their studies. This test measures their English proficiency both spoken and written. The test consists of some sections, namely grammar, vocabulary, reading, listening and speaking. All this time, the process of the test results are still done manually and if there is a lot of data to be processed, it will certainly take a long time to determine the results of the test and find out who passed or not. Therefore, the writer is interested in developing a system that can help the process of the EPT test results quickly and accurately. In this study, there are several criteria used to determine students passing, namely Grammar, Vocabulary, Reading, Listening and Speaking by using C4.5. The results of the study indicate that the value of the test Grammar is in the top Gain. Therefore, it can be concluded that the higher the value of a grammar the test taker has, the more likely s/he is to pass. The writer also used WEKA software in completing the process of this study. Keywords: EPT, English, C4.5, Decision Tree, WEKA I. Introduction to determine the results of the test and find out who passed or not. For that, it is necessary to develop a system which English is one of the international languages which widely used in many countries around the world. Mastery of English is also one of many crucial factors both for job competition or job promotion. Considering the importance of English nowadays, Universitas Potensi Utama feels the need to prepare graduates to be able to master this language. One of the program that has already been implemented is inserted English as the mandatory subject in each semester of a student during the study period. Hopefully, by the teaching of English in each semester, there will be significant improvement on student mastery of the English, both oral and written. So that after they graduate, they are ready to compete in the world of work both nationally and internationally. To measure the students mastery of English and as the instrument of evaluating the success of this program, an evaluation test is needed to be made. For this purpose, a test which refers to the American Council on the Teaching of Foreign Languages (ACTFL) is developed. ACTFL is an international organization consisting of professionals that includes teachers, researchers and all who care about improving English language proficiency. This organization provides a clear indication of the scale and level of English proficiency descriptions candidates. This test is called the English Proficiency Test (EPT). This test measures students English proficiency both spoken and written. This test consists of written test which consists of grammar, vocabulary, and reading sections. Listening Comprehension Test and Speaking Test. All this time, the process of the students scores or the results of the test are still done manually and when there is a lot of data to be processed will certainly take a long time 188

can process the test results of the which can process a lot International Conference on Computer System 2014 of data quickly and accurately. C4.5 method is a method that can calculate the results of the test with the vast amounts of data and provide accurate results. Therefore, the authors use this method with the help of Weka software for data processing EPT. The criteria used to determine student graduation EPT, is Grammar, Vocabulary, Reading, Listening and Speaking. Indri in explaining the data mining research and comparison methods C4.5 and CART can be very helpful at all to determine the students' scores in the first half in order to determine whether the student passed or not in the first half at the Computer Engineering Program at the Polytechnic of Padang [5]. By using data mining classification algorithms can help students to determine who will be elected department so that students choose majors based on ability, talent and background and the methods applied are C4.5 [6]. C4.5 algorithm has a high level of accuracy in generating a decision, accuracy is up to 94% on the training phase and 93% in the test phase is done and can be applied with a lot of data [7] II. Research Methodology The method used in this study is the interviews, data collection, data processing given, the test data with the manual and its application with the help of software WEKA. The data in this study were collected from the section that organizes the EPT test is then processed by the method of C4.5 and Weka software. 189

III. Analysis and Discussion a. Data Analysis The data in this study based on the results of interviews with the part related to the amount of data to 20 people. Such data can be seen in Table 1. Table 1. Results The EPT test Table 5. Speaking Speaking Table 6. of Listening Listening Table 7. of Result Result x <= 299 Failed 301 < x <= 350 Novice 351 <= x <= 400 Intermediate x > 400 Superior b. Data Transformation Transformation is going to process the data in some class or classification, as for the data to be classified is the grammar, vocabulary, reading, speaking and listening. The process of sharing data based on some of the following variables : Table 2. of Grammar EPT test test score> = 300 then passed if below that value it did not pass. After transformsi on predetermined criteria, the obtained result data in Table 7. Data transformation results that have been obtained will be used as data to be processed using the C4.5 algorithm in making a decision tree. Table 7. Results of Transformation Grammar Table 3. Vocabulary Vocabulary Table 4. of Reading Reading 190 c. Algorithm C4.5 Data transformation results then analyzed to generate a decision tree using C4.5 algorithm, in general algorithm C4.5 to build a decision tree is as follows : 1. Calculation of Entropy and Gain 2. Selection of the highest gain as root (Node) 3. Repeat the calculation process Entropy and Gain to search for branches until all cases the branches have the same class, namely when all variables have been part of the decision tree or each variable has a leaf or a decision.

4. Create Rule based decision tree. To select attributes as root, based on the highest gain value from existing attributes. To calculate the gain used following the formula: (1) Grammer Where : 1. S : Association of Case 2. A : Attribute 3. n : Number of Partition Attribute A 4. Si : Number of Cases on Partition to-i 5. S : Number of Cases in S Calculation of entropy values can be seen in the following equation [1] : (2)?? Novice Superior Figure 1. Decision Trees At First Iteration In the next stage is to seek subsidiary of two criteria is less and good. The search process gain value will continue until all the branches have a decision. Tree resulting from the use of C4.5 algorithm can be seen in figure 2. The following : d. Discussion 1. Application of C4.5 Algorithm Data resulting from the transformation or further data preprocessing algorithms implemented in C4.5. The results of the implementation C4.5 algorithm on First Iteration As stated in table 8 : Table 8. Iteration Results Rading Novice Grammer Failed Novice Failed Failed Superior Intermediate Superior Superior Superior Figure 2. The Decision Tree is generated on for the implementation of the C4.5 algorithm Speaking b. Implementation Using Weka Applications Application of C4.5 algorithms using Weka application, the data used are set out in the amended table.7 in.csv form, while the results of the test application Weka seen in figure 3. following : In accordance with the results obtained in the first iteration of the highest value in the variable gain Grammar and Vocabulary, then that becomes the root of the Grammar is variable, because the value of the variable gain Grammar is the highest gain. Grammar variable has five criteria:, Pretty, Medium and Nice. At criteria decision still can not be obtained and should be done to find the search process gain selanjunya branch. Simply having Novice decision criteria. Medium has a Superior decision criteria. And criteria still needs to be done to find the search process gain branch and subsequent decisions. The resulting decision tree on the first iteration seen in Figure 1 below : 191 Figure 3. Decision Tree is generated using Weka application Figure 2 and Figure 3 is based on calculations using the C4.5 algorithm manually and using application weak, have the same result. Knowledge or rule resulting from the decision tree contained in figure 2 and figure 3 is as follows : 1. If value of the Grammar = Enough Then Result = Novice

2. If value of the Grammar = Medium Then Result = Superior 3. If value of the Grammar = and Reading = Then Result = Failed 4. If value of the Grammar = and Reading = Enough Then Result = Novice 5. If value of the Grammar = and Reading = Medium Then Result = Failed 6. If value of the Grammar = and Reading = Then Result = Failed 7. If value of the Grammar = and Speaking = Then Result = Superior 8. If value of the Grammar = and Speaking = Quite Then Result = Intermediate 9. If value of the Grammar = and Speaking = Medium Then Result = Superior 10. If the value of the Grammar = and Speaking = Excellent Then Result = Superior e. Testing Tests conducted on the results of EPT seen in table 9. following : Table 9. Data Testing Reference [1] HSSINA, Badr, et al, A comparative study of decision tree ID3 and C4.5, International Journal of Advanced Computer Science and Application, page 13-19. [2] Tariq O. Fald Elsid, Mergani. A. Eltahir, An Empirical Study of the Application of Techniques in Students Database, International Journal of Engineering Research and Applications, page 1-10, 2014. [3] Haryanto, Edy Victor, Decision Support System for Determining Graduation EPT (Case Study : STMIK Potensi Utama), SNIKOM, 2014. [4] Magdalena, Hilyah, Decision Support System to Determine the Best Graduate Students in Higher Education (Case Study STMIK ATMA Luhur Pangkal Pinang), SENTIKA, March 10, 2012. [5] Rahmayuni, Indri, Performance Comparison Algorithm C4.5 and CART in the of Data Value Student Computer Engineering Department State Polytechnic Padang, Journal of TEKNOIF, Vol 2. No. 1, April 2014. [6] Swastina, Liliana, Application of C4.5 Algorithm for Determining Subject Students, Journal of GEMA AKTUALITA, Vol. 2, No. 1, June 2013. [7] Anand, Dr. Sheila and K. Ranjesh, Analyst Of Seer Dataset For Breast Cancer Diagnosis Using C4.5 Algorithm, International Journal of Advanced Research in Computer and Communication Engineering Vol. 1, Issue 2, April 2012, Thandhalam table 9 show about result the testing the knowledge generated in the table C4.5 algorithm implementation in accordance with the data in the table testing. IV. Conclusion The resulting of the conclusions are as follows : 1. Results of testing the knowledge generated from the decision tree have compatibility in all trial data. 2. The test result data manually and with the use Weka applications generate the same decision tree. 3. Occurs of Pruning Variable data on Listening, because the decision has 3 variables can be generated. 4. The amount of data used will influence the decision tree and the knowledge generated, the greater the amount of data used will result in an increasingly complex decision trees. 192