Determining Factors Influencing Listening Test Item Difficulty and Predicting Reading Proficiency Vahid ARYADOUST Centre for English Language Communication, National University of Singapore 1
Outline Two studies 1. Exploring oral text related variables influencing item difficulty ANFIS 2. Classifying readers based on lexicogrammatical knowledge 2
Study 1: Predicting Item Difficulty How do you develop an in class reading test? How do you make your tests easy or difficult for students? How do you design an easy/difficult reading or listening test item? Some items are easier and some are more difficult Study 1: predicting listening test item difficulty. 3
Predicting Item Difficulty Item difficulty is affected by various factors. To design items, those factors must be considered. Need for theories to predict item difficulty. Developing a predictive theory of item difficulty Listening test development has been thin on substantive theory (Stenner, Stone, & Burdick, 2011, p. 3) 4
Predicting Listening Test Item Difficulty 1 TOEFL listening minitalks: negation referential words rhetorical organization fronted structures lexical overlap vocabulary length, and abstractness of sentences. (Freedle &Kostin, 1990, 1993, 1999) 5
Predicting Listening Test Item Difficulty 2 word level variables, e.g., vocabulary sentence level variables, e.g., dependent clauses discourse level variables, e.g., questions or statements; and task processing variables, e.g., inference making. 41% of variance in item difficulty (Kostin,2004) 6
Predicting Listening Test Item Difficulty 3 Rule space methodology: Buck and Tatsuoka (1998) extracted 15 attributes and 14 interactions (in 5 categories Task identification attributes: the ability to identify the task by determining what type of information to search for in order to complete it; Context attributes, such as the density of information in the text; Information location attributes, such as the ability to use previous items to help locate information; Information processing attributes, such as the ability to process very fast text automatically; and Response construction attributes, such as the ability to construct a response quickly and efficiently. 7
Predicting Listening Test Task Difficulty 4 Surveys of teachers and students: Listener related, speaker related, and material and medium related factors (Boyle, 1984). Goh (1999): text related (e.g., speech rate), listener related (e.g., prior knowledge), speaker related (e.g., accent), and environmental variables (e.g., physical condition). 8
Statistical Tools to Predict Item Difficulty T tests? limited Regression models (Freedle & Kostin, 1990, 1993, 1999; Ginther, 2000) Rule space methodology (Buck & Tatsuoka, 1998) The fusion model (Aryadoust, 2011; Y. W., Lee & Sawaki, 2009; Sawaki, Kim, & Gentile, 2009). Artificial neural networks (Perkins, Gupta, & Tammana, 1995) 9
Problem pertaining to the Nature of these Data Analysis Tools 1 Relying on prescriptive (deterministic) models such as linear regression Linearity violated: likely leading to disqualifying the theory informed hypotheses The validity of the studies in which multiple regression is used to predict item difficulty is not high It is hypothesized that using variables in combination and introducing forms of nonlinearity might improve the validity of item difficulty studies (Perkins et al., 1995, p. 35). (Graph from: http://amosdevelopment.com/video/mixture/regression/mixtureregression.html) 10
Problem pertaining to the Nature of these Data Analysis Tools 2 Multidimensional item response theory (IRT) models the fusion and rule space models, Limitation: they require a large sample size, a requirement which is not always fulfilled (Aryadoust, 2011). 11
Current Study A Class of Neuro fuzzy models Adaptive Neuro Fuzzy Inference Systems (ANFIS) 12
Neuro fuzzy Synergy: Artificial Neural Networks (ANN) and Fuzzy Set Theory Landín, Rowe, and York (2009, p. 325): Rule sets generated by neuro fuzzy logic are completely in agreement with the findings based on statistical analysis and advantageously generate understandable and reusable knowledge, and conclude that Neurofuzzy logic is easy and rapid to apply and outcomes [yield] knowledge not revealed via statistical analysis. 13
A Fuzzy Inference System: Fuzzification 14
Types of Neuro Fuzzy Models Mamdani (e.g., Mamdani & Assilian, 1975) Takagi Sugeno Kang rule based hybrid systems. This study uses a variant of the latter: Adaptive Neuro Fuzzy Inference System (ANFIS). ANFIS optimizes parameter estimation and output prediction and reduces estimation time. Approximately 80% to 90% of the data: training Rest: testing 15
Method Participants: 209 Iranian, Chinese, and Malay, Between 16 and 45 years old Materials: the Official IELTS Practice Materials (University of Cambridge Local Examinations Syndicate, 2007). 40 test items 4 sections 16
Data Analysis Rasch item difficulty (WINSTEPS); Item coding(expert judgments) ANFIS (37/3) (the MATLAB, Version 7.11 (Mathworks, Inc., 2011); Seven ANFIS models generated, Incorporating between 1 and 7 hypothesized explanatory variables. Models comprising between 2 and 6 variables, numerous sub models consisting of all possible variable combinations were tested. 17
Item coding Word count Prepositional phrases Modal verbs Propositional density of oral texts Propositional density of test items Item format Information type they were normalized before the modeling to take values between zero and one 18
Goodness of fit Indices Coefficient of efficiency (E f ): ranges between negative infinity and one Squared correlation coefficient (r 2 ): ranges between zero and one, with values near one indicating good fit. Root mean squared error (RMSE): Lower RMSE values indicate smaller error terms. Mean Absolute Error (MAE): Lower MEA values indicate smaller error terms. Data Analysis 19
Training Data 20
Testing Data 21
Discussion and Conclusion 1 ANFIS found that all variables exerted an effect on item difficulty. MCQs were easier than open ended items. MCQs appear to tax test takers cognitive processes less than constructed response items, Propositional density of both spoken texts and test items affected item difficulty: agrees with Buck and Tatsuoka (1998). Propositional density of items affected item difficulty a likely source of constructirrelevant variance (CIV) 22
Discussion and Conclusion 2 Information type and use of modal verbs were also predictors of item difficulty (Buck & Tatsuoka, 1998). Word count affected item difficulty Freedle and Kostin (1993, 1999), 23
Future applications of Neuro fuzzy Models (NFMs) ANFIS model: more intuitive and theoryinformed Hold great promise for behavioral prediction in language assessment. Computerized adaptive language testing (CALT) Coh Metrix: useful technique to explore oral/reading texts 24
Study 2: Reading Comprehension Reading: Textbase Reading: Grammar + Vocabulary + Strategies Lexico grammatical knowledge predict reading Vocabulary knowledge + Grammar knowledge Reading Can the relationship be (mathematically) modelled?
Data Mining: Predictive Modelling Data mining: discovery of knowledge in data Captures meaningful patterns (i.e., information) in data to constitute understandable structures Some predictive modelling techniques: 1) Linear Regression 2) Classification And Regression Tree (CART) 3) Artificial Neural Networks Represent three schools of thought Conjecturing predicting relationships and results
Artificial Neural Networks (ANNs) Nonlinear mathematical models inspired by human brain (Haykin, 1998). Consist of interconnected units or neurons Capable of pattern recognition, prediction, classification, and learning. Learn by example The network finds out how to solve the problem by itself. (source: http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html)
Advantages of ANN No assumption on the relationships between dependent and independent variables Normality, linearity, & homoskedasticity
Technical Information of the ANN One Hidden Layer Six Inputs Six Units Mathematical Activation Functions: Hyperbolic & Tangent Error Function: Sum of Squares
32
Epilogue A science is exact only in so far as it employs mathematics. (Immanuel Kant) Math: contribution to our understanding of the world Physics, biology, chemistry, social sciences, and applied linguistics
Summary Assumptions of Linear Regression are restrictive. ID variables might be highly correlated in LT data. CART and ANNs offer promise for re /examining some of the data driven LT theories. Due to their flexibility, ANNs can predict DVs with lower degrees of error of measurement.
Predicting Listening Test Item Difficulty 5 Information density (Dunkel, Henning, & Chaudron, 1993); Buck and Tatsuoka (1998): dividing the number of content words by the number of words surrounding the information necessary to answer the test item (found influential). Rupp, Garcia, and Jamieson (2001, p. 211): the type/token ratio (i.e., a ratio of function or grammatical units to lexical units), sentence length, word count, and item text interactions (found influential but not robust) 35
Applying the fuzzified input to the rules antecedents µ 1 (X 1 = A [low] ) = 0.40; and µ 2 (X 1 = B [high] ) = 0.80 (Lotfi Zadeh, 1965) Rule 1 : IF X 1 = A [low], THEN Y = 2. Rule 2 : IF X 1 = B [high], THEN Y = 5. (Y is the output and could take any value depending on the range of the data to be predicted). 36
Fuzzy Inference System: Two Inputs Inputs: X 1 = 14 and X 2 = 18. /MF: low &High The joint functions of the two inputs might be rewritten as: Rule 1 : IF X 1 = A [low] AND X 2 = A [low], THEN Y 1 = 2 (1) Rule 2 : IF X 1 = A [low], AND X 2 = B [high], THEN Y 2 = 3(2) Rule 3 : IF X 1 = B [high], AND X 2 = A [low], THEN Y 3 = 3 (3) Rule 4 : IF X 1 = B [high], AND X 2 = B [high], THEN Y 4 = 5(4) 37
Example of Two Inputs Low and high values of X 1 = 0.8 and 0.4. Low and high values of X 2 = 0.3 and 0.6. Rule 1 : µ 1 = 0.80 0.30 = 0.024, THEN Y 1 = 2; and so forth. The rules are then defuzzified and the value of Y is estimated as follows: Y = (µ 1 Y 1 ) + (µ 2 Y 2 ) + (µ 3 Y 3 ) + (µ 4 Y 4 ) (see Vilém, 1989, 2005). 38
5 MF model http://www.atp.ruhr uni bochum.de/dynlab/dynlabmodules/examples/fuzzysystems/fuzzification.html 39
Discussion and Conclusion 4 Path modeling: enabled the consideration of test section as moderating. Test section: representing a change in the purpose of the oral text (e.g., academic or nonacademic input) & the speed of delivery Its effect supports the validity argument: Flowerdew (1994): texts with a high speed of delivery more difficult. path and ANFIS models converge to some degree, and also differ in some respects. ANFIS model: more intuitive and theory informed 40