The Effect of Context-Dependent and Context-Independent Test Design on Iranian EFL Learners' Performance on Vocabulary Tests

International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (8): 2129-2136 Science Explorer Publications The Effect of Context-Dependent and Context-Independent Test Design on Iranian EFL Learners' Performance on Vocabulary Tests Esmaeil Bagheridoust 1, & Mona Karagahi* 1 Department of English, South Tehran Branch, Islamic Azad University, Iran Corresponding Author email:m.karagahi@gmail.com ABSTRACT:The purpose of this study was to explore the role of context in vocabulary assessment. The result of the research serves more about the best format for vocabulary assessment. In doing so, the performance of identical items on both the matching test (context independent test) and the C-test (context-dependent test) was compared on English L2 elementary students (n=40). The results showed that all students performed slightly better on the matching tests than the C-tests. Hence, context didn't play a major role in their performance in the C-test. Both teachers and learners may benefit from the findings of this study. Key words: Context-dependent, Context-independent, Vocabulary tests INTRODUCTION The role that knowledge of vocabulary plays in second and foreign language acquisition/learning has long been neglected (Nunan, 1999). In language testing, the prevalent views tend to be influenced by the requirements of proficiency testing. Because vocabulary is not assessed in current proficiency tests, therefore language testers have not paid that much attention to the problem of how to design tests of vocabulary knowledge, let alone the broader concept of lexical ability. Although, the objectives and content of the classroom tests and other measures of learner achievement are different from tests as proficiency test, the assessment of vocabulary is more significant in such tests. As a result the characteristics of the test design should be on the basis of an exact definition of vocabulary as a construct. As one tries to define what particular type of vocabulary is measuring, several important issues arise as what word means in the first place, whether vocabulary includes multi-word lexical items and how words might be influenced by context. The variety of theoretical concepts and frameworks proposed by scholars to account for vocabulary acquisition, knowledge and use in the field of second language vocabulary makes construct definition much more difficult (Bachman & Palmer, 1996). Despite the fact that there are some widely used concepts such as Nation s (1999, Cited in Read, 2000) analysis of what it means to know a word and the theory of vocabulary size, there is no comprehensive, accepted framework for L2 vocabulary work.

Review of the related literature Historical Overview of Vocabulary Testing Understanding the learner s improvements in L2 vocabulary learning had always engrossed teachers in testing vocabulary. The first modern tests were published by Daniel Starch in 1916. It was the time for the gradual establishment of psychometrics. In such tests vocabulary was commonly one of the language elements to be measured, so starch measured vocabulary by having tests-takers match a list of foreign words to their English translations. In 1930s, standardized objective tests became the norm in the United States, simultaneously vocabulary was still one of the necessary components to be included (Nation, 2001). In 1964, such a tendency was further evidence of the prevailing use of Test of English as a Foreign Language (TOEFL) including a separate vocabulary section as other standardized tests of the time. Spolsky (1995, Cited in Schmitt, 2000) believes that language testing has evolved through three main phases: the first phase is pre-scientific through utilizing examinations subjectively marked by a single examiner; a second phase focusing on objectivity and reliability, and a third phase emphasizing on validity along with other aspects of the second phase. In the case of vocabulary testing, unfortunately this last phase is still in its infancy. The evolution seems to have stopped at the second phase and due to the importance of appropriacy of difficulty, reliability and speediness, construct validity has been ignored. The Significance of Vocabulary Testing The most critical issue in second language acquisition is particularly the need for valid and reliable vocabulary tests. These tests can easily solve theoretical and practical problems raised in this field. The percentage of words known by learners at a frequency level, the rate of vocabulary acquisition and the relationship of frequency to other factors contributing to item difficulty; are some possible problems (Meara&Fitzpatrik, 2000). There seems to be similar reasons for testing vocabulary in proficiency tests as those related to testing grammar.such strong argument exists in just this kind of test. Hughes (2003) intends that one suspects that much less time is devoted to the regular, conscious teaching of vocabulary than to the similar teaching of grammar (p.179). Although many might accept what Hughes says, there is still the hope that vocabulary learning is taking place.vocabulary learning is measured through achievement tests in institutional testing. Many people believe in the desirability of vocabulary achievement tests on their back wash effect. Such a belief has evolved concern on usefulness of vocabulary testing following which several researchers further estimating the learners basic knowledge of common meanings of words at 2000, 3000, 5000, 10000 and university word level (Beglar& Hunt, 1999). The Role of Context in Vocabulary Testing The role of context in vocabulary testing is crucial. Words do not have meaning in isolation but in relation to other words. Schmitt (2000) intends that context can radically change the meaning of words, making familiar words opaque and unfamiliar words completely transparent. Also Cameron (2002) supports Schmitt asserting that Any approach to vocabulary testing that fails to appreciate this, is missing out on a fundamental aspect of word meaning; the fact that meaning is not given, but have to be negotiated (p.150). Such a belief would be more apparent in vocabulary testing when the target word is a low-frequency one that learners are not expected to know. Therefore, the ability to infer its meaning based on the contextual clues is quite useful here (Laufer& Goldstein, 2004). Below is an example of how contextual clues are useful in inferring the target word. Mary can be quite gauche; yesterday she blew her nose on the new linen tablecloth (Cited in Read, 2130

2000,p. 163). In contemporary testing, integrative and communicative test formats are receiving great deal of attention. This is quite particular in language proficiency testing, where the emphasis is put upon presenting vocabulary items in context of a sentence or a larger discourse unit for the learners to be able to understand words as they occur in connected written or spoken discourse rather than as independent items. Contextualization of words adds a reading or listening component to vocabulary test which may reduce the number of items included in especially vocabulary size tests. For such purposes as vocabulary size tests, the merits and flaws of using context has to be weighed against the need of covering a large sample of words (Meara& Nation, 2002). Statement of the Problem A few studies have so far attempted to investigate some of the possible differences between context-dependent and context-independent vocabulary test design and the possible impact of context on the learners performance. Matching test as an example of context-independent is mainly recognized. In other words, testees are given a set of options one of which should be matched with the stem. There is no context to be used, everything is there. On the other hand, C-Test as a new version of cloze test is a context-dependent vocabulary test in which the basis is production. The context plays a crucial role in identifying the content words. This type of test is really demanding since the testees have to use several other factors such as comprehension and production in addition to recognition. In the current study the focus has been put on context and whether testees try to use context or not. Research Questions Q: Is there any significant difference between a C-test and a matching test measuring the vocabulary knowledge of Iranian foreign language learners? Research Null Hypotheses H. There is no significant difference between the results of a C-test and a matching test measuring the vocabulary knowledge if Iranian foreign language learners. Methodology Participants A group of 40 students studying in one of the private institutes of Sanandaj participated in this study. The participants included both male and female students between 20 and 29 years of age. A pre-test which had been developed by the researcher based on several books being taught during years such as A Course in English, Developing Reading Skills, and First Certificate in English all in Elementary level was administered in the first place. Instruments Three types of tests were utilized in this study: A standardized teacher-made general proficiency test, a matching test, and a C-test. Procedures The aim of this study, as mentioned before, was to determine if the learnersperformed differently on vocabulary C-tests and matching tests. As mentioned before, the research has been done in an institute. Carrying on a research in a specific place entails some limitations. The kind of test to administer was one of the problematic 2131

issues. Unfortunately, administering a PET or KET test was not allowed there and the institute did not cooperate in the case either. Perforce, a teacher-made test was prepared based on several books being taught for many years (such as a course in English, Developing Reading Skills, and First Certificate in English; all in Elementary level) which went through several steps to be standardized. Therefore, in the first step a pre-test as a general proficiency test, developed by the researcher was administered to 40 students. Since the test was made by the researcher, the reliability and validity of the test were under question. In order to validate the test, item analysis and item discrimination were calculated for each individual item.items with an Item Facility (IF) that fell in the ranges between0.30 and 0.70 were considered acceptable.items with ID indexes above 0.50 were also selected for the test. Overall 29items were discarded from the test; as a result the number of the items was decreased from 90to 61and the proficiency scores of the students were based on the remaining items. The reliability coefficient of the test was 0.84 which indicates that the test wasreliable. Proficiencytest was used to prepare a valid test in constructing the next two tests. In the second step, subjects were asked to participate in the matching tests. This step was taken one week after the proficiency test. To minimize the possible learning effect, especially on vocabulary measures, the C-test was given to the participants after two weeks. The whole administration took about almost one month. RESULTS AND DISCUSSIONS In this study the researcher tried to investigate the effect of context-dependent and context- independent vocabulary test formats on the performance of EFL elementary students studyingenglish in an institute. In other words, she intended to see if learners perform differently on matching test (context-independent) and C-test (context-dependent) in general. The present study attempted to answer the following research question: Q: Is there any significant difference between a C-test and a matching testmeasuring the vocabulary knowledge of Iranian foreign language learners? In order to answer the above question, the present researcher devised a general proficiency test of English to assess the learner s knowledge of English and toprovide a valid and reliable test for the following two tests. Therefore, first a general proficiency test consisting of 90 items were administered to the learners. After standardizing the test the remaining 61 itemsdetermined students scores on the test. The reliability index of the test calculated by KR-21 formulawasabout.84 which means that the teacher-made test was quite reliable. Validating the Proficiency test As mentioned in limitations of the study in chapter 1 and procedure in chapter3, standard tests as PET and KET could not be administered. Therefore, theresearcher as an alternative applied some EFL books, which have been taught for a long time in institutes, as the basis for her research. A proficiency test wasprepared based on the mentioned books consisting of 30 vocabulary items, 30grammar items and finally 30 reading comprehension items; totally 90 items. Since the test was teacher-made, at first it was neither reliable nor valid. In orderto validate the test, item analysis and item discrimination werecalculated for each individual item.items with an Item Facility (IF) that fell in therangesbetween.30 and.70 were considered acceptable.items with ID indexesabove 0.50 were also selected for the test. Overall 29items were discarded fromthe test; as a result the number of the items was decreased from 90to 61and theproficiency scores of the students were based on the remaining items. Thereliability coefficient of the test was 0.84 which indicates that the test wasreliable. TheProficiencytest was used to prepare a valid test in constructing the nexttwo tests. Table 1 shows the descriptive statistics of the proficiency test. As it can be seen the mean scores for the vocabulary, grammar and reading sections were 13, 8.1, and 15.02 respectively. 2132

Table 1.Descriptive Statistics of the Proficiency test N Minimum Maximum Mean Std. Deviation Skewness Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Vocabulary 40 7.00 18.00 13.0000 2.80110 -.287.374 Grammar 40 1.00 15.00 8.1000 3.27226 -.061.374 Reading 40 7.00 23.00 15.0250 4.97423.169.374 Valid N (listwise) 40 The above table represents the test in details, explaining each section of vocabulary, grammar and reading comprehension separately. The following table represents the descriptive statistics of the proficiency test as a whole. Table 2. Descriptive Statistics of proficiency test as a whole N Minimum Maximum Mean Std. Deviation Variance Skewness Statistic Statistic Statistic Statistic Statistic Statistic Statistic Std. Error proficiency 40 20.00 51.00 36.1250 9.41409 88.625.089.374 Valid N (listwise) 40 In order to probe the normality of the data, the ratio of the skewness over their respective standard errors should be within the ranges of +/- 1.96. As displayed in Table 4.3 the ratios of the skewness and kurtosis over their respective standard errors are all within the above mentioned ranges, i.e. the present data enjoy normal distribution on all tests. Table 3. Normality Test N Skewness Kurtosis Statistic Statistic Std. Error Normality Statistic Std. Error Normality MATCHING 40-0.22 0.37-0.58-0.79 0.73-1.07 CTEST 40 0.25 0.37 0.66-0.96 0.73-1.30 PROFICIENCY 40 0.09 0.37 0.24-1.23 0.73-1.68 Inferential Statistics: Paired T-test A paired-samples t-test is run to compare the mean scores of the students on the C-test and the Matching test. As displayed in Table 4.4 the students mean scores on the Matching test and C-test are 19.92 and 11.97 respectively, i.e. the students performed better on the Matching test. Table 4. Descriptive Statistics C-Test and Matching Test 2133

Mean N Std. Deviation Std. Error Mean MATCHING 19.93 40 CTEST 11.98 40 5.02 4.62 0.79 0.73 The t-observed value of 7.93 is higher than the critical value of 2.02 at 39 degrees of freedom. Based on these results it can be concluded that there is a significant difference between the mean scores of the students on the C-test and the Matching test. Thus the null-hypothesis as there is not any significant difference between a C-test and a matching test measuring the vocabulary knowledge of Iranian EFL learners is rejected. The students performed better on the Matching test with a mean score of 19.93.. Table 5. Paired-Samples t-test C-Test and Matching Test Paired Differences t df Sig. (2-tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper MATCHING - CTEST 7.95 6.33 1.00 5.92 9.97 7.93 39.000 Figure 4.1 displays the mean scores of the students on C-test and Matching test. Figure 1. Mean Scores on C-Test and Matching Test Cohen's Effect Size Cohen (1988, p.25) hesitantly defined effect sizes as "small, d =.2," "medium, d =.5," and "large, d =.8", stating that there is a certain risk inherent in offering conventional operational definitions for those terms for use in power analysis in as diverse a field of inquiry as behavioural science. Effect sizes can also be interpreted in terms of the percent of non-overlap of the treated group's scores with those of the untreated group (Cohen 1988). According to table 4.6, the effect size of the present study is 0.63 which is located in the middle part of the table. Therefore, the result of the study is generalizable. The highlighted row represents the effect size of this study. 2134

Table 6. Cohen's Effect size Cohen's Standard Effect Size Percentile Standing Percent of Nonoverlap 2.0 97.7 81.1% 1.9 97.1 79.4% 1.8 96.4 77.4% 1.7 95.5 75.4% 1.6 94.5 73.1% 1.5 93.3 70.7% 1.4 91.9 68.1% 1.3 90 65.3% 1.2 88 62.2% 1.1 86 58.9% 1.0 84 55.4% 0.9 82 51.6% LARGE 0.8 79 47.4% 0.7 76 43.0% 0.6 73 38.2% MEDIUM 0.5 69 33.0% 0.4 66 27.4% 0.3 62 21.3% SMALL 0.2 58 14.7% 0.1 54 7.7% 0.0 50 0% Reliability Indices The K-R21 reliability indices for the Matching test, C-test and Proficiency testare0.76, 0.69 and 0.84 respectively as indicated in table 7. Table 7. K-R21 Reliability Indices N Mean Variance K-R21 MATCHING 30 19.92 25.19 0.76 CTEST 30 11.97 21.35 0.69 PROFICIENCY 62 36.12 88.62 0.84 CONCLUSIONS AND IMPLICATIONS The findings of the study can be briefly summarized as follows: The major findings of this study regarding the research question indicated that the performances of the students on the two tests of vocabulary were significantly different. That is, learners performed differently on different tests of vocabulary. As the results indicate, In general, the students performed better on the matching test with the mean score of 19.92 than the C-test with the mean score of 11.97. 2135

This finding lend itself to the assumption that subjects treated most of the words as individual items even if they were embedded in a context-dependent test. It falls into line with that of Stalnaker and Kura (1995, Cited in Read, 2000) who compared two methods of testing knowledge of German vocabulary. There were two tests, one context-independent (multiple-choice items, in which each target word was presented in isolation) and the other test was context-dependent test (a reading passage was constructed containing 100 of the same target words and the test takers had to supply the English equivalent of each underlined word). Later the two tests were administered to the students studying German at the University of Chicago. Similar results were achieved, meaning that the context didn t play a major role for the Elementary language proficiency students. As discussed above, the research question indicated the significant difference between the students mean scores on the matching test and the C-test. The students performed better on the matching test with a mean score of 19.92. As the current study showed the subjects treated most of the words as standing alone even if they were embedded in a context-dependent test, thus, it is justifiable to say that the short texts in the C-test were not considered by the test takers as a particular context of use. Therefore, an inquiry into what a C-test actually measures still seems promising. However, the advantage of C-test as a selective, controlled test cannot be ignored because the test items simply require the test takers to recall the target words with its appropriate parts of speech. There is plenty of scope, for the development of innovative types of vocabulary assessment that are in line with educational measurement theory which emphasizes on the test purposes, and the factors intervening between purpose and design. However, there are still basic problems in conceptualizing and measuring the two types of vocabulary (receptive and productive). Much of this problem stems from the lack of an adequate conceptual definition of the difference between receptive and productive vocabulary knowledge resulted in the use of tests which have unanticipated outcomes. For example, according to Read and Chapelle (2001), the use of discrete, selective and context-independent tests may in fact have a negative educational impacts if, for instance, language teachers avoid vocabulary assessment because the only measure available seems to be irrelevant to their needs. REFERENCES Bachman LF, Palmer AS.1996.Language testing in practice. Oxford: Oxford University Press. Beglar D, Hunt A.1999. Revising and validating the 2000 word level and university word level vocabulary tests. Language Testing, 16 (2), 131-162. Cameron L.2002. Measuring vocabulary size in English as an additional language. Language Testing Research, 6 (2), 145-173. Cohen J.1988. Statistical power analysis for the behavioral sciences (2nd edition). Hillsdale, NJ: Erlbaum. Hughes A.2003. Testing for language teachers. Cambridge: Cambridge University Press. Laufer B, Goldstein Z.2004. Testing vocabulary knowledge; size, strength, and computer adaptiveness.language Testing, 54 (3), 399-436. Meara P, Fitzpatrick T.2000. Lex 30: An improved method of assessing productive vocabulary in an L2. System, 28 (3), 19-30. Meara P, Nation ISP.2002. Vocabulary. An introduction to applied linguistics. London: Arnold Press. Nation ISP.2001. A study of the most frequent Learning vocabulary in another language. Cambridge: Cambridge University Press. Nunan D.1999. Second language teaching and learning. Boston: Heinle and Heinle Read J, Chapelle CA. 2001. A framework for second language measure for assessing reading performance.language Testing, 21, 28-38. Read J.2000. Assessing vocabulary. Cambridge: Cambridge University Press. Schmitt N. 2000.Vocabulary in language teaching. Cambridge: Cambridge University Press. Schmitt N.1999.The relationship between TOEFL vocabulary items and meaning, association, collocation and word-class knowledge.language Testing, 16 (2), 189-216. 2136