Determining Factors Influencing Listening Test Item Difficulty and Predicting Reading Proficiency

Similar documents
Evolutive Neural Net Fuzzy Filtering: Basic Description

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Learning Methods for Fuzzy Systems

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

CaMLA Working Papers

Python Machine Learning

Age Effects on Syntactic Control in. Second Language Learning

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Kamaldeep Kaur University School of Information Technology GGS Indraprastha University Delhi

Running head: LISTENING COMPREHENSION OF UNIVERSITY REGISTERS 1

A student diagnosing and evaluation system for laboratory-based academic exercises

Assignment 1: Predicting Amazon Review Ratings

Rule Learning With Negation: Issues Regarding Effectiveness

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Lower and Upper Secondary

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Running head: METACOGNITIVE STRATEGIES FOR ACADEMIC LISTENING 1. The Relationship between Metacognitive Strategies Awareness

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

The College Board Redesigned SAT Grade 12

Rule Learning with Negation: Issues Regarding Effectiveness

Word Segmentation of Off-line Handwritten Documents

Formulaic Language and Fluency: ESL Teaching Applications

Lecture 1: Machine Learning Basics

INPE São José dos Campos

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Evidence for Reliability, Validity and Learning Effectiveness

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

STA 225: Introductory Statistics (CT)

Australian Journal of Basic and Applied Sciences

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

The Effect of Written Corrective Feedback on the Accuracy of English Article Usage in L2 Writing

Software Maintenance

MYCIN. The MYCIN Task

On-the-Fly Customization of Automated Essay Scoring

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Loughton School s curriculum evening. 28 th February 2017

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Seminar - Organic Computing

Knowledge-Based - Systems

Test Effort Estimation Using Neural Network

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

5 th Grade Language Arts Curriculum Map

Developing Grammar in Context

Computerized Adaptive Psychological Testing A Personalisation Perspective

BUSINESS INTELLIGENCE FROM WEB USAGE MINING

Understanding Language

Data Fusion Through Statistical Matching

success. It will place emphasis on:

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Evidence-Centered Design: The TOEIC Speaking and Writing Tests

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Axiom 2013 Team Description Paper

Rule Chaining in Fuzzy Expert Systems

Unraveling symbolic number processing and the implications for its association with mathematics. Delphine Sasanguie

Grade 6: Correlated to AGS Basic Math Skills

Proof Theory for Syntacticians

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

The Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University

Human Emotion Recognition From Speech

Welcome to ACT Brain Boot Camp

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

How to Judge the Quality of an Objective Classroom Test

Why Did My Detector Do That?!

ROSETTA STONE PRODUCT OVERVIEW

Speech Recognition at ICSI: Broadcast News and beyond

A study of speaker adaptation for DNN-based speech synthesis

Language Acquisition Chart

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Statewide Framework Document for:

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Artificial Neural Networks written examination

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

A Stochastic Model for the Vocabulary Explosion

MODELING ITEM RESPONSE DATA FOR COGNITIVE DIAGNOSIS

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Participate in expanded conversations and respond appropriately to a variety of conversational prompts

Multi-Lingual Text Leveling

Evolution of Symbolisation in Chimpanzees and Neural Nets

EGRHS Course Fair. Science & Math AP & IB Courses

Evaluation of Teach For America:

University of Groningen. Systemen, planning, netwerken Bosman, Aart

ELS LanguagE CEntrES CurriCuLum OvErviEw & PEDagOgiCaL PhiLOSOPhy

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

The Relationship Between Poverty and Achievement in Maine Public Schools and a Path Forward

A Neural Network GUI Tested on Text-To-Phoneme Mapping

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Time series prediction

Transcription:

Determining Factors Influencing Listening Test Item Difficulty and Predicting Reading Proficiency Vahid ARYADOUST Centre for English Language Communication, National University of Singapore 1

Outline Two studies 1. Exploring oral text related variables influencing item difficulty ANFIS 2. Classifying readers based on lexicogrammatical knowledge 2

Study 1: Predicting Item Difficulty How do you develop an in class reading test? How do you make your tests easy or difficult for students? How do you design an easy/difficult reading or listening test item? Some items are easier and some are more difficult Study 1: predicting listening test item difficulty. 3

Predicting Item Difficulty Item difficulty is affected by various factors. To design items, those factors must be considered. Need for theories to predict item difficulty. Developing a predictive theory of item difficulty Listening test development has been thin on substantive theory (Stenner, Stone, & Burdick, 2011, p. 3) 4

Predicting Listening Test Item Difficulty 1 TOEFL listening minitalks: negation referential words rhetorical organization fronted structures lexical overlap vocabulary length, and abstractness of sentences. (Freedle &Kostin, 1990, 1993, 1999) 5

Predicting Listening Test Item Difficulty 2 word level variables, e.g., vocabulary sentence level variables, e.g., dependent clauses discourse level variables, e.g., questions or statements; and task processing variables, e.g., inference making. 41% of variance in item difficulty (Kostin,2004) 6

Predicting Listening Test Item Difficulty 3 Rule space methodology: Buck and Tatsuoka (1998) extracted 15 attributes and 14 interactions (in 5 categories Task identification attributes: the ability to identify the task by determining what type of information to search for in order to complete it; Context attributes, such as the density of information in the text; Information location attributes, such as the ability to use previous items to help locate information; Information processing attributes, such as the ability to process very fast text automatically; and Response construction attributes, such as the ability to construct a response quickly and efficiently. 7

Predicting Listening Test Task Difficulty 4 Surveys of teachers and students: Listener related, speaker related, and material and medium related factors (Boyle, 1984). Goh (1999): text related (e.g., speech rate), listener related (e.g., prior knowledge), speaker related (e.g., accent), and environmental variables (e.g., physical condition). 8

Statistical Tools to Predict Item Difficulty T tests? limited Regression models (Freedle & Kostin, 1990, 1993, 1999; Ginther, 2000) Rule space methodology (Buck & Tatsuoka, 1998) The fusion model (Aryadoust, 2011; Y. W., Lee & Sawaki, 2009; Sawaki, Kim, & Gentile, 2009). Artificial neural networks (Perkins, Gupta, & Tammana, 1995) 9

Problem pertaining to the Nature of these Data Analysis Tools 1 Relying on prescriptive (deterministic) models such as linear regression Linearity violated: likely leading to disqualifying the theory informed hypotheses The validity of the studies in which multiple regression is used to predict item difficulty is not high It is hypothesized that using variables in combination and introducing forms of nonlinearity might improve the validity of item difficulty studies (Perkins et al., 1995, p. 35). (Graph from: http://amosdevelopment.com/video/mixture/regression/mixtureregression.html) 10

Problem pertaining to the Nature of these Data Analysis Tools 2 Multidimensional item response theory (IRT) models the fusion and rule space models, Limitation: they require a large sample size, a requirement which is not always fulfilled (Aryadoust, 2011). 11

Current Study A Class of Neuro fuzzy models Adaptive Neuro Fuzzy Inference Systems (ANFIS) 12

Neuro fuzzy Synergy: Artificial Neural Networks (ANN) and Fuzzy Set Theory Landín, Rowe, and York (2009, p. 325): Rule sets generated by neuro fuzzy logic are completely in agreement with the findings based on statistical analysis and advantageously generate understandable and reusable knowledge, and conclude that Neurofuzzy logic is easy and rapid to apply and outcomes [yield] knowledge not revealed via statistical analysis. 13

A Fuzzy Inference System: Fuzzification 14

Types of Neuro Fuzzy Models Mamdani (e.g., Mamdani & Assilian, 1975) Takagi Sugeno Kang rule based hybrid systems. This study uses a variant of the latter: Adaptive Neuro Fuzzy Inference System (ANFIS). ANFIS optimizes parameter estimation and output prediction and reduces estimation time. Approximately 80% to 90% of the data: training Rest: testing 15

Method Participants: 209 Iranian, Chinese, and Malay, Between 16 and 45 years old Materials: the Official IELTS Practice Materials (University of Cambridge Local Examinations Syndicate, 2007). 40 test items 4 sections 16

Data Analysis Rasch item difficulty (WINSTEPS); Item coding(expert judgments) ANFIS (37/3) (the MATLAB, Version 7.11 (Mathworks, Inc., 2011); Seven ANFIS models generated, Incorporating between 1 and 7 hypothesized explanatory variables. Models comprising between 2 and 6 variables, numerous sub models consisting of all possible variable combinations were tested. 17

Item coding Word count Prepositional phrases Modal verbs Propositional density of oral texts Propositional density of test items Item format Information type they were normalized before the modeling to take values between zero and one 18

Goodness of fit Indices Coefficient of efficiency (E f ): ranges between negative infinity and one Squared correlation coefficient (r 2 ): ranges between zero and one, with values near one indicating good fit. Root mean squared error (RMSE): Lower RMSE values indicate smaller error terms. Mean Absolute Error (MAE): Lower MEA values indicate smaller error terms. Data Analysis 19

Training Data 20

Testing Data 21

Discussion and Conclusion 1 ANFIS found that all variables exerted an effect on item difficulty. MCQs were easier than open ended items. MCQs appear to tax test takers cognitive processes less than constructed response items, Propositional density of both spoken texts and test items affected item difficulty: agrees with Buck and Tatsuoka (1998). Propositional density of items affected item difficulty a likely source of constructirrelevant variance (CIV) 22

Discussion and Conclusion 2 Information type and use of modal verbs were also predictors of item difficulty (Buck & Tatsuoka, 1998). Word count affected item difficulty Freedle and Kostin (1993, 1999), 23

Future applications of Neuro fuzzy Models (NFMs) ANFIS model: more intuitive and theoryinformed Hold great promise for behavioral prediction in language assessment. Computerized adaptive language testing (CALT) Coh Metrix: useful technique to explore oral/reading texts 24

Study 2: Reading Comprehension Reading: Textbase Reading: Grammar + Vocabulary + Strategies Lexico grammatical knowledge predict reading Vocabulary knowledge + Grammar knowledge Reading Can the relationship be (mathematically) modelled?

Data Mining: Predictive Modelling Data mining: discovery of knowledge in data Captures meaningful patterns (i.e., information) in data to constitute understandable structures Some predictive modelling techniques: 1) Linear Regression 2) Classification And Regression Tree (CART) 3) Artificial Neural Networks Represent three schools of thought Conjecturing predicting relationships and results

Artificial Neural Networks (ANNs) Nonlinear mathematical models inspired by human brain (Haykin, 1998). Consist of interconnected units or neurons Capable of pattern recognition, prediction, classification, and learning. Learn by example The network finds out how to solve the problem by itself. (source: http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html)

Advantages of ANN No assumption on the relationships between dependent and independent variables Normality, linearity, & homoskedasticity

Technical Information of the ANN One Hidden Layer Six Inputs Six Units Mathematical Activation Functions: Hyperbolic & Tangent Error Function: Sum of Squares

32

Epilogue A science is exact only in so far as it employs mathematics. (Immanuel Kant) Math: contribution to our understanding of the world Physics, biology, chemistry, social sciences, and applied linguistics

Summary Assumptions of Linear Regression are restrictive. ID variables might be highly correlated in LT data. CART and ANNs offer promise for re /examining some of the data driven LT theories. Due to their flexibility, ANNs can predict DVs with lower degrees of error of measurement.

Predicting Listening Test Item Difficulty 5 Information density (Dunkel, Henning, & Chaudron, 1993); Buck and Tatsuoka (1998): dividing the number of content words by the number of words surrounding the information necessary to answer the test item (found influential). Rupp, Garcia, and Jamieson (2001, p. 211): the type/token ratio (i.e., a ratio of function or grammatical units to lexical units), sentence length, word count, and item text interactions (found influential but not robust) 35

Applying the fuzzified input to the rules antecedents µ 1 (X 1 = A [low] ) = 0.40; and µ 2 (X 1 = B [high] ) = 0.80 (Lotfi Zadeh, 1965) Rule 1 : IF X 1 = A [low], THEN Y = 2. Rule 2 : IF X 1 = B [high], THEN Y = 5. (Y is the output and could take any value depending on the range of the data to be predicted). 36

Fuzzy Inference System: Two Inputs Inputs: X 1 = 14 and X 2 = 18. /MF: low &High The joint functions of the two inputs might be rewritten as: Rule 1 : IF X 1 = A [low] AND X 2 = A [low], THEN Y 1 = 2 (1) Rule 2 : IF X 1 = A [low], AND X 2 = B [high], THEN Y 2 = 3(2) Rule 3 : IF X 1 = B [high], AND X 2 = A [low], THEN Y 3 = 3 (3) Rule 4 : IF X 1 = B [high], AND X 2 = B [high], THEN Y 4 = 5(4) 37

Example of Two Inputs Low and high values of X 1 = 0.8 and 0.4. Low and high values of X 2 = 0.3 and 0.6. Rule 1 : µ 1 = 0.80 0.30 = 0.024, THEN Y 1 = 2; and so forth. The rules are then defuzzified and the value of Y is estimated as follows: Y = (µ 1 Y 1 ) + (µ 2 Y 2 ) + (µ 3 Y 3 ) + (µ 4 Y 4 ) (see Vilém, 1989, 2005). 38

5 MF model http://www.atp.ruhr uni bochum.de/dynlab/dynlabmodules/examples/fuzzysystems/fuzzification.html 39

Discussion and Conclusion 4 Path modeling: enabled the consideration of test section as moderating. Test section: representing a change in the purpose of the oral text (e.g., academic or nonacademic input) & the speed of delivery Its effect supports the validity argument: Flowerdew (1994): texts with a high speed of delivery more difficult. path and ANFIS models converge to some degree, and also differ in some respects. ANFIS model: more intuitive and theory informed 40