The Web-Based Computerized Adaptive Testing

The Web-Based Computerized Adaptive Testing Porawat Visutsak Dept. of Computer and Information Science, Faculty of Applied Science, King Mongkut s University of Technology North Bangkok (KMUTNB), Bangkok, Thailand porawatv@kmutnb.ac.th Abstract Improving teaching-learning activities based on student ability needs the efficient evaluation process. This study delivers the individual difference assessments to the students based on the Computerized Adaptive Testing (CAT). The study provides the customization to tailor both the y of test questions and number of items to the student based on the adaptive approach. This study proposes the algorithm of CAT to guarantee that the items will be chosen to the smart students while the easier items will be retrieved to the lower ability students. By using the web-based technology for implementing the algorithm, the computer chooses and displays the questions then records and processes the student s answers. The Item selection is adaptive which is influenced by 2 factors: the first one is the student s answers to the previous questions, and the other is the specific statistical qualities of administrated and candidate items. Compared to the conventional paper-based testing where all students receive the same items, the advantages of designing the test with this interactive computer application, intelligent CAT algorithm can include larger percentage of items with appropriate y levels, higher levels of test score, better precision and shorter test-lengths. Keywords- Computerized adaptive testing, Web-based adaptive testing, Item response theory, Item analysis, Difficulty index, Discrimination index I. INTRODUCTION As it is well recognized, developing quality of learners requires cooperation of all the processes concerned in connection the links of all aspects of teaching and learning activities as well as evaluating the learners. Developing quality of learners capable of lifelong learning is therefore crucial and is required by all. These learners will one day become human resources able to contribute to sustainable development [9]. With the rapid advancement of computer technology in education has radically changed the educational activities and curriculums as well as teaching practices in the classroom. There are many applications involved in teaching and testing activities which is so called Learning Management System (LMS) or e-learning. Recently, the use of e-learning applications via mobile devices has been spotlighted in Thailand with an announcement of the policies and guidelines for the second decade of education reform (2009 2018). There are a hundred of researches according to the education reform policies presented in the National e-learning Conference 2013; Strengthening Learning Quality: Bridging Engineering and Education, organized by Thailand Cyber University. In order to achieve specific learning objectives, the e-learning content designer must focus on the design principles to make the e- Course enjoyable while motivating the learners with the built-in learning and evaluating processes [3], [10]. The other paradigm of improving learning ability is the interactive courseware. Many e-learning coursewares use animation for illustrating the content instead of using the text-based content. The realistic character animation with the learning contents and exercises has been proposed for this concept [7], [15]. In theory, a paradigm of test design for evaluating knowledge, skills, abilities, and other characteristics has been assembled in a paper-and-pencil format. In 120 multiple-choice paper-based test, the student must answer all questions, which has a number of problems, some obvious and some more subtle. The most obvious disadvantage of the conventional paper-based test is quite inefficient. All 120 items are not necessary for all students; if the student were to take only the 40 most items and answer 35 items correctly, a high score should be assigned to the student without wasting the time to administer the others 80 easy questions [13]. Another big issue is the large differences in precision among students. This is due to traditional tests typically being built with large numbers of medium-y items; which is the expense of low or high ability students. Low-ability students are exhausted and discouraged by the fact that most items are too to be relevant. Meanwhile, high-ability students are not appropriately challenged, because most items are too easy. Those students are measured with much less precision [13]. This paper proposes a practical Computerized Adaptive Testing (CAT) approach adopted from Item Analysis [11] together with Item Response Theory (IRT) [2], [8]. This study lists item pools, test administration, test security, and examinee issues as the four general areas of practical issues in developing and maintaining IRT-based CAT programs. The system implementation will be illustrated in the next section. The experiment results, the analysis for the future works, and the summaries will be shown, respectively.

II. METHODS IRT was invented to overcome the major drawback of the traditional Classical Test Theory (CTT) (de Klerk 2008, online) which utilized probability to explain the relationship between the examinee s ability and the question item response. This section will describe the algorithms to deliver CAT based on the IRT. IRT has a few basic assumptions, including unidimensionality, local independence, non-speeded test, and know-correct assumption, which need to be sustained before the model can be used to analyze the data [5]. In practical IRT, the system will random the item from the items pool and treats it as the basic unit for measuring the examinee s ability. The execution follows a simple principle: if an examinee responds correctly to an item, then the next item will be one level up in terms of y, and vice versa. For each item responded to, the examinee s estimated ability will be reevaluated, and then the appropriate level of the next item is given [12]. The process is repeated until a certain psychometric criterion is met; some students will need more items than others, meaning that the test can finish quickly for some examinees. The 2 index terms are used for administrating the test items [13]; one is the y index, the other is the discrimination index. The y index is used to categorize the level of the test items while the discrimination index is used to categorize the level of the examinees. These values are used for measuring the statistical qualities of administrated items of this study. The equation of the both index terms are shown in eq. (1) and eq. (2), respectively. P = (1) where P = y index R = No. of examinees which choose the correct answer N = No. of all students Note that the higher of P values = the easier of test item The good administrated items should have P = 0.5. Most of the CAT has been designed to compose of the administrated items which have the y index = [0.20-0.80]. Table 1 shows the range of the y index and its meaning. In the 4 multiple choices test [13] (one choice is correct, the others are wrong), the wrong-choice is used to assess the ability of the examinees. The wrong-choice, which is chosen by the most examinees, implies that it is a good wrong-choice. The discrimination index is used for measuring the efficiency of the wrong-choice, the equation below is illustrated the discrimination index: (2) P H-P L (3) where r = discrimination index H = No. of examinees which choose the correct answer in a high-examinees group L = No. of examinees which choose the correct answer in a low-examinees group N = No. of all examinees which choose the correct answer Table 1: The y index and its meaning Difficulty Difficulty the Quality of Test Item Index Level 0.80-1.0 Very easy (this item should be changed) 0.6-0.79 Easy Fair item 0.4-0.59 Fair Good item 0.2-0.39 Difficult Fair item 0.0-0.19 r is used to classify the students into the highexaminee group (high ability examinees) and the lowexaminee group (low ability examinees). In a practical CAT, r >= 0.2; [-1, 1]. Table 2 shows the range of the discrimination index and its meaning [11]. III. Very IRT-BASED CAT MODEL (this item should be changed) Table 2: The discrimination index and its meaning Discrimination Index the Quality of Test Item 0.40 Very good item 0.30-0.39 Good item 0.20-0.29 Fair item 0.19 (this item should be changed) The system has been implemented with PHP and MySQL. The IRT-based CAT system design and development was based on the usability testing [6], which was focused on the usability of both student and instructor. The GUI (Graphics User Interface) is easy to use for the student and the student does not need too much time to get familiar with the layout and tools of the test page. The test items were designed for evaluating students in the Computer and Information subject, in the Mathayom Suksa Level (Thailand Secondary Education), Kanchanaburi Provincial Administrative Organization School, Kanchanaburi, Thailand.

User-Profile DB Account Management System Admin. Item Pool Management Instructor Items Pool Test Management Test Result Adaptive Item Selection Student Figure 1: The IRT-based CAT. Figure 2: The GUI of the system. The items pool has 3 levels: easy,, and. When the system has been started, the first item will be randomized from level. If the student chooses the correct answer then the system will randomize the second item with the level up to the student, otherwise the lower level item will be shown next. Figure 1 shows the architecture of the IRT-based CAT, which consists of 7 modules: User-profile Database, Account Management, Item Pool

Management, Items Pool, Test Management, Test Result, and Adaptive Item Selection. The Account Management provides creation and management of user accounts and all user accounts are stored in the User-profile Database. The functions of Item Pool Management include items creation, unit updates, modification, and management. The Test Management can set the approach of test administration and records all test results in the Test Result database. The Adaptive Item Selection can administrate tests according to different adaptive test algorithms. According to the experiment, the proposed adaptive test algorithm was implemented in this module. All of test activities will be recorded and analyzed in the Test Management part. This part of the system will collect the statistical data such as the number of the most randomized items, the number of the most correct items, and the number of the most wrong items. These statistical data will be used for evaluating the test items and improving the study of the subject later on. The screen captured of the system is shown in figure 2. Figure 2 shows the example of GUI of the system, Let us assume that the bridge has been collapsed, which way is the shortest path from Home to School. The correct answer is A-D-E-H, with the total time = 7 minutes. IV. Easy EXPERIMENT RESULTS The system was tested in order to meet the 2 requirements: 1) Examinee experience Figure 3 shows the organization of the items pool. The test will be started at level item. If the examinee can choose the correct answer, the examinee will get 3 scores from this item, and then the next item with the level up to level will be retrieved. If the examinee continued responds correctly with the second item, the examinee will get 5 scores from this item, otherwise the examinee will get 0 score, and the third item with the level down to level will be Fair Difficult Figure 3: The organization of the items pool. manipulated next. The demonstration of the system is shown in table 3. Based on the examinee experience to the previous answer, the system can provide an appropriate challenge for each examinee. Low examinees are not discouraged or intimidated. High examinees enjoy receiving items. All examinees are measured with the same level of precision, even though they all potentially see different items, and the number of test items received by the low and high examinees may not be equal. This makes the test extremely from a psychometric perspective [1]. 2) Precision The system has been designed to actually be more precise than a conventional test while still using fewer items. For the example, the system randomizes 7 test items from the items pool; the scores of each item have been assigned as follow: 1 for easy item, 3 for item, and 5 for item. The test will be started from level and repeated as already mentioned in figure 3. Table 3 shows the demonstration of 7 items test in the system. Table 3: The demonstration of 7 items test in the system. Item Status A B C D 1 correct 3 2 correct 5 3 wrong 0 4 wrong easy 0 5 6 7 Total 12 correct correct wrong easy Note: A = Test status (correct/wrong) B = Level of the current item (easy//) C = Level of the next item (easy//) D = Item score (easy=1/=3/=5) As you can see from the demonstration, the examinee gets 3 from the test item No.1 then the item which is level up to will be shown as the test item No.2. The examinee gets 5 from this item (correct answer), the test will be repeated using the proposed algorithm, and when the test is done, the total scores are 12. V. SUMMARIES AND FUTURE WORKS In this paper, the structure and interface of CAT were described. The paper also gave the motivation of the new issue for designing the built-in learning and evaluating processes according to the policies and guidelines for the education reform in Thailand. 1 3 0

The IRT-based CAT controls the length, the number of test items and the y level of test items by adaptively responding to the examinees ability and further classifying examinees into different levels in an efficient way. Instructors can create test items easily and save them in the item pools. The system can respond the individual examinees: low ability examinees feel better, and high ability examinees feel challenged. Both will try harder than with a conventional test. It also provides shorter test lengths compared to a paper-and-pencil test. The system also provides the statistical report for evaluating the quality of the test as well as the later use for improving learning ability of the examinees. There is one significant limitation to be noted for the future works. That is the item exposure, since the system are designed to select the best items in the items pool, and these items will often become overexposed. The implementations of the control algorithm as well as the stopping rules remain challenging to the next study. Since the Emotional Quotient (EQ) is also the big issue in the assessment of the student learning ability. The other challenge is the emotional evaluation of the student during the test [14]. The real time emotion classification will be embedded to the IRT-based CAT in order to classify the emotion of the student into seven basic emotions scheme, namely neutral, angry, disgust, fear, happy, sad and surprise. Together with the scores gained by doing the test, the result of emotional classified will be shown also. Therefore, the instructor can use this information for the course development as well. ACKNOWLEDGMENT This research was funded by King Mongkut s University of Technology North Bangkok. Contract no. KMUTNB-GOV-58-63. The test items pool was supported by The Institute for the Promotion of Teaching Science and Technology (IPST). REFERENCES [1] Acton, G. Scott and Revelle, William. 2004. Evaluation of Ten Psychometric Criteria for Circumplex Structure. Methods of Psychological Research Online 2004, Vol. 9, No. 1. [2] Baker, Frank. 1994. Item Response Theory: Parameter Estimation Techniques, Journal of Educational Measurement. Vol. 31, No. 3. [3] Daoruang, Beesuda and Visutsak, Porawat. 2013. Promoting Learning of Primary Children through Games, in Proceedings of National e-learning Conference 2013 (NEC 2013); Strengthening Learning Quality: Bridging Engineering and Education, Bangkok, Thailand, 5-6 Aug. [4] de Klerk, Gerianne. 2008. Classical Test Theory (CTT), In M. Born, C.D. Foxcroft & R. Butter (Eds.), Online Readings in Testing and Assessment, International Test. Available from: http://www.intestcom.org/publications/orta/clas sical+test+theory.php. [5] Ackerman, T. A. 1989. Unidimensional IRT calibration of compensatory and noncompensatory multidimensional items. Applied Psychological Measurement, 13, 113-127. [6] Dumas, Joseph S. and Redish, Janice C. 1999. A Practical Guide to Usability Testing. Intellect Ltd; Rev Sub edition, 1 Jan. [7] Edelson, Daniel C. 1998. "Realising Authentic Science Learning through the Adaptation of Scientific Practice." International handbook of science education 1. [8] Hambleton, Ronald K., and Russell W. Jones. 1993. "Comparison of classical test theory and item response theory and their applications to test development". Educational measurement: issues and practice 12.3. [9] Hanchanlash, Jingjai. 2010. Annual Report 2010: Office for National Education Standards and Quality Assessment (ONESQA). Bangkok, Thailand. [10] Jatuweerapong, Panpaporn and Suwanrattanachot, Praweenya. 2013. Dynamic Visual Cues in Game- Based Multimedia Lessons: Enhancing Spatial Ability of Learners with Low Spatial Ability, National e-learning Conference 2013 (NEC 2013); Strengthening Learning Quality: Bridging Engineering and Education, Bangkok, Thailand, 5-6 Aug. [11] Kamolsin, Chiranun and Visutsak, Porawat. 2013. A Practical Items Selection for Adaptive Testing, in Proceedings of SRU National Research Conference 9 th 2013: Research for Local Development Towards the ASEAN Community, Suratthani Rajabhat University, Suratthani, Thailand, 21-22 Nov. [12] Tao, Yu-Hui, Yu-Lung Wu, and Hsin-Yi Chang. 2008. "A Practical Computer Adaptive Testing Model for Small-Scale Scenarios." Educational Technology & Society 11.3: 259-274. [13] Visutsak, Porawat. 2013. A Practical Items Selection for Web-Based Adaptive Testing, in Proceedings of The 1 st International Conference on Technical Education (ICTechEd1), Faculty of Technical Education, King Mongkut s University of Technology North Bangkok, Thailand, 28-29 Nov. [14] Visutsak, Porawat. 2013. "Emotion Classification through Lower Facial Expressions using Adaptive Support Vector Machines", JMMT: Journal of Man, Machine and Technology, Vol. 2, No. 1. [15] Visutsak, Porawat and Prachumrak, Korakot. 2013. The Skeleton Pruning-Smoothing Algorithm for Realistic Character Animation, JMMT: Journal of Man, Machine and Technology, Vol. 2, No. 1.