Language Testing. Types of Tests

Language Testing Types of Tests

Why do we want to test? Assess Evaluate Test

Assessment of people processes products environments

Types of tests (test techniques): What comes to mind first? Multiple choice True-false Fill-in-the-blank Short answer Essay questions Dictation Cloze and C-Test Summary

What other types of tests can you think of? Listening comprehension test Composition test Oral interview test Reading comprehension test Translation Guided composition, summary

Tests are one form of assessment, but other forms include: Observations of language performance Portfolio assessment Peer assessment Self assessment

Qualitative Assessment Traditional - Oral interview - Essay/composition - Translation

Portfolio assessment: - Personal context - Learner empowerment - Life-long learning - Individual record of achievement, progress - criticism (creative and repetitive)

Flexible assessor: - self-assessment - peer-assessment - other-assessment

Testing and assessment fundamentally involve: Collecting learner data Analyzing learner data Using the learner results of the data analysis to make interpretations about learners language abilities

Thus, types of tests are really types of learner data

Most common types of learner data Observations of spontaneous language behavior, e.g. in the classroom Elicited experimental (or highly structured and controlled) data Elicited clinical (or unstructured) data Elicited metalingual judgments Self report data, e.g. learner diaries

What s the best kind of test? = What s the best kind of data? All types of data are useful, but none tell the whole story The more types of data you use, the more confident you can be in your interpretations

Test types can also be distinguished according to their purposes Research Diagnosis Placement, including entrance tests Achievement Proficiency Aptitude

Classic goal-based test categories Proficiency test: - relative to a given standard, often a qualification test - frequently very large scale

Classic goal-based test categories The test itself should be tested in terms of reliability: does it achieve consistent results on various occasions validity: does it test the level and abilities which it purports to test objectivity: do different testers come to the same conclusions face validity is desirable (but not necessary)

Achievement test - relative to teaching goals - sometimes very large scale, sometimes very small scale - also tested in terms of reliability and validity

Diagnostic test: - aims to give feedback on performance - based on error detection and error analysis, e.g. grammar vs. vocabulary; or: the verb phrase vs. the noun phrase

Aptitude Tests Aptitude tests are structured, systematic ways of evaluating how people perform on tasks or react to different situations Attempt to predict possible success, e.g. the America SAT (Scholastic Aptitude Test) which is used as an entrance level qualification for college/university studies

Aptitude Tests They are characterized by standardized methods of administration and scoring with the results quantified and compared with how others have done on the same tests

Classic form-based test categories Written: - objective test (results automatically gradable) - multiple choice test - cloze (gap-filling) test - guided test - translation - guided essay

- free test - open answer - essay

Oral: - interview - recording-based - transcription - repetition - response (Q+A, gap-filling,...)

Multiple choice Parameters: questions vs. incomplete statements Parameter values: - a subset of possible parameter values (answers) is provided - one is correct; the others are "distractors

What causes night and day? A. The earth spins on its axis. B. The earth moves around the sun. C. Clouds block out the sun's light. D. The earth moves into and out of the sun's shadow. E. The sun goes around the earth. (Source: P. M. Sadler, "Psychometric Models of Student Conceptions in Science," Journal of Research in Science Teaching (1998. V. 35, N. 3, pp. 265-296).) The correct answer is A ; the other answers are socalled distractors.

If the distractors are (in some sense) equally plausible and randomly ordered, then - the probability of obtaining a correct answer by guessing is 1/n, where n is the number of values provided (i.e. the correct answer plus distractors) - thus: if there are 4 equally plausible answers, the probability of guessing correctly is ¼ = 0.25.

Multiple choice: example Example: The phonemic transcription of sourdough in Southern Educated British English is 1. /so rdu / 2. /sa do / 3. /sa d / 4. /sa rd f/ The probability of guessing the correct transcription of sourdough is 1/4 = 0.25

Multiple choice: pros and cons... Pro: reasonably easy for the teacher to construct easy for the teacher to correct easy to evaluate statistically easy for the student to understand the procedure

...Multiple choice: pros and cons Con: tests written knowledge only (unless acoustic media are used) tests receptive knowledge only tests metalinguistic knowledge rather than performance if the distractors are not carefully selected, the probability of random guessing may be high - tends to test knowledge rather than understanding

Cloze Tests... A gap-completion task Written Many different kinds of gap possible: letters parts of words (morphs) words word sequences

...Cloze Tests Gap selection procedure may be random or systematic - e.g. gaps every 4-10 words - hard words maybe filtered out - or maybe specific word types are filtered manual or computerized

Modified cloze (UBI Entrance Test) Ten years ago representatives from 178 nations met in Rio to plan how to protect the world's resources. Pledges we given t safeguard ecosy, reduce glo -warming ga, and pro humanity thr sustainable devel. Last ye world lea, scientists a activists m again. The agenda: to check whether Rio had changed the world.

What is in a test? Some criteria Technical: - comparison of two performances Personal: - interaction between examiner and examinee Institutional: - qualification: license to participate in certain activities

Psychological: - performance under highly constrained conditions Linguistic: - Language understanding and language production - Metalinguistic activity

Finally, test types can be distinguished by scoring approach Norm-referenced Criterion-referenced

Norm-referenced versus Criterion-referenced Norm-referenced tests are scored according to how well each person does in relation to the mean (or average) score on the test, e.g. national standard reading tests in the U.S. Criterion-referenced tests are scored according to predetermined criteria, so that a person s score is not affected by how well everyone else does on the test.

Norm-referenced versus Criterion-referenced Proficiency tests tend to be normreferenced. Most other types of tests tend to be criterion-referenced.

Testers do not assume that a test score is a person s true score There is always a margin of error in a test, so the true score (the person s true ability) may be somewhat higher or lower than the test score. The amount that the true score may differ from the test score is calculated in relation to the test s overall reliability (standard deviation).

Mastery of Reliability Expert testers and testing companies have mastered the art of test reliability. However, the more important question of test validity remains a challenge for everyone, both novice and expert.

Analogies How would you test whether a person knows the capital cities of all 50 states? How would you test a person s ability to play tennis? How would you test whether a person can tie his/her shoe? How would you test whether a person can build a cabinet?