Language Testing. Types of Tests

Similar documents
Unit 13 Assessment in Language Teaching. Welcome

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

Create Quiz Questions

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Jazz Dance. Module Descriptor.

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Further, Robert W. Lissitz, University of Maryland Huynh Huynh, University of South Carolina ADEQUATE YEARLY PROGRESS

Why OUT-OF-LEVEL Testing? 2017 CTY Johns Hopkins University

How to Judge the Quality of an Objective Classroom Test

Interpretive (seeing) Interpersonal (speaking and short phrases)

SSIS SEL Edition Overview Fall 2017

Technical Skills for Journalism

Grade 11 Language Arts (2 Semester Course) CURRICULUM. Course Description ENGLISH 11 (2 Semester Course) Duration: 2 Semesters Prerequisite: None

Tap vs. Bottled Water

Short vs. Extended Answer Questions in Computer Science Exams

Physics 270: Experimental Physics

10: The use of computers in the assessment of student learning

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Norms How were TerraNova 3 norms derived? Does the norm sample reflect my diverse school population?

Examinee Information. Assessment Information

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

West s Paralegal Today The Legal Team at Work Third Edition

The Oregon Literacy Framework of September 2009 as it Applies to grades K-3

ESL Curriculum and Assessment

Formulaic Language and Fluency: ESL Teaching Applications

Literature and the Language Arts Experiencing Literature

The Effect of Written Corrective Feedback on the Accuracy of English Article Usage in L2 Writing

Interpreting ACER Test Results

ACADEMIC AFFAIRS GUIDELINES

Providing Feedback to Learners. A useful aide memoire for mentors

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Highlighting and Annotation Tips Foundation Lesson

Book Catalogue Hellenic American Union Publications. English Language Teaching

TEKS Comments Louisiana GLE

Wonderworks Tier 2 Resources Third Grade 12/03/13

Guidelines for the Use of the Continuing Education Unit (CEU)

5. UPPER INTERMEDIATE

Information and Instructions

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

Simple Random Sample (SRS) & Voluntary Response Sample: Examples: A Voluntary Response Sample: Examples: Systematic Sample Best Used When

VIEW: An Assessment of Problem Solving Style

Testing Schedule. Explained

Organizing Comprehensive Literacy Assessment: How to Get Started

QUESTIONS ABOUT ACCESSING THE HANDOUTS AND THE POWERPOINT

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

A. True B. False INVENTORY OF PROCESSES IN COLLEGE COMPOSITION

Loughton School s curriculum evening. 28 th February 2017

Recommended Guidelines for the Diagnosis of Children with Learning Disabilities

Teachers Guide Chair Study

Fisk Street Primary School

On-the-Fly Customization of Automated Essay Scoring

Cy-Fair College Teacher Preparation and Certification Program Application Form

Poll. How do you feel when someone says assessment? How do your students feel?

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Field Experience Management 2011 Training Guides

Exhibition Techniques

Summary / Response. Karl Smith, Accelerations Educational Software. Page 1 of 8

Language Center. Course Catalog

WHY GRADUATE SCHOOL? Turning Today s Technical Talent Into Tomorrow s Technology Leaders

Timeline. Recommendations

Evidence for Reliability, Validity and Learning Effectiveness

Requirements for the Degree: Bachelor of Science in Education in Early Childhood Special Education (P-5)

Final Teach For America Interim Certification Program

Probability estimates in a scenario tree

CEFR Overall Illustrative English Proficiency Scales

and secondary sources, attending to such features as the date and origin of the information.

5 Star Writing Persuasive Essay

Myths, Legends, Fairytales and Novels (Writing a Letter)

L1 and L2 acquisition. Holger Diessel

RESPONSE TO LITERATURE

Assessment System for M.S. in Health Professions Education (rev. 4/2011)

Secondary English-Language Arts

Kannapolis City Schools 100 DENVER STREET KANNAPOLIS, NC

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

A Pilot Study on Pearson s Interactive Science 2011 Program

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

The College Board Redesigned SAT Grade 12

Learning Lesson Study Course

Multiple Measures Assessment Project - FAQs

Test How To. Creating a New Test

Psychometric Research Brief Office of Shared Accountability

Using dialogue context to improve parsing performance in dialogue systems

Proof Theory for Syntacticians

Analyzing Linguistically Appropriate IEP Goals in Dual Language Programs

Sample Goals and Benchmarks

Information for Candidates

Midw Forum AMOUNT. award up. MAF Scholarship. Applicants. of the. Applicants. skills. The four page. notified of. award

Lower and Upper Secondary

DIBELS Next BENCHMARK ASSESSMENTS

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

PROGRESS MONITORING FOR STUDENTS WITH DISABILITIES Participant Materials

Student Handbook 2016 University of Health Sciences, Lahore

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Orleans Central Supervisory Union

Transcription:

Language Testing Types of Tests

Why do we want to test? Assess Evaluate Test

Assessment of people processes products environments

Types of tests (test techniques): What comes to mind first? Multiple choice True-false Fill-in-the-blank Short answer Essay questions Dictation Cloze and C-Test Summary

What other types of tests can you think of? Listening comprehension test Composition test Oral interview test Reading comprehension test Translation Guided composition, summary

Tests are one form of assessment, but other forms include: Observations of language performance Portfolio assessment Peer assessment Self assessment

Qualitative Assessment Traditional - Oral interview - Essay/composition - Translation

Portfolio assessment: - Personal context - Learner empowerment - Life-long learning - Individual record of achievement, progress - criticism (creative and repetitive)

Flexible assessor: - self-assessment - peer-assessment - other-assessment

Testing and assessment fundamentally involve: Collecting learner data Analyzing learner data Using the learner results of the data analysis to make interpretations about learners language abilities

Thus, types of tests are really types of learner data

Most common types of learner data Observations of spontaneous language behavior, e.g. in the classroom Elicited experimental (or highly structured and controlled) data Elicited clinical (or unstructured) data Elicited metalingual judgments Self report data, e.g. learner diaries

What s the best kind of test? = What s the best kind of data? All types of data are useful, but none tell the whole story The more types of data you use, the more confident you can be in your interpretations

Test types can also be distinguished according to their purposes Research Diagnosis Placement, including entrance tests Achievement Proficiency Aptitude

Classic goal-based test categories Proficiency test: - relative to a given standard, often a qualification test - frequently very large scale

Classic goal-based test categories The test itself should be tested in terms of reliability: does it achieve consistent results on various occasions validity: does it test the level and abilities which it purports to test objectivity: do different testers come to the same conclusions face validity is desirable (but not necessary)

Achievement test - relative to teaching goals - sometimes very large scale, sometimes very small scale - also tested in terms of reliability and validity

Diagnostic test: - aims to give feedback on performance - based on error detection and error analysis, e.g. grammar vs. vocabulary; or: the verb phrase vs. the noun phrase

Aptitude Tests Aptitude tests are structured, systematic ways of evaluating how people perform on tasks or react to different situations Attempt to predict possible success, e.g. the America SAT (Scholastic Aptitude Test) which is used as an entrance level qualification for college/university studies

Aptitude Tests They are characterized by standardized methods of administration and scoring with the results quantified and compared with how others have done on the same tests

Classic form-based test categories Written: - objective test (results automatically gradable) - multiple choice test - cloze (gap-filling) test - guided test - translation - guided essay

- free test - open answer - essay

Oral: - interview - recording-based - transcription - repetition - response (Q+A, gap-filling,...)

Multiple choice Parameters: questions vs. incomplete statements Parameter values: - a subset of possible parameter values (answers) is provided - one is correct; the others are "distractors

What causes night and day? A. The earth spins on its axis. B. The earth moves around the sun. C. Clouds block out the sun's light. D. The earth moves into and out of the sun's shadow. E. The sun goes around the earth. (Source: P. M. Sadler, "Psychometric Models of Student Conceptions in Science," Journal of Research in Science Teaching (1998. V. 35, N. 3, pp. 265-296).) The correct answer is A ; the other answers are socalled distractors.

If the distractors are (in some sense) equally plausible and randomly ordered, then - the probability of obtaining a correct answer by guessing is 1/n, where n is the number of values provided (i.e. the correct answer plus distractors) - thus: if there are 4 equally plausible answers, the probability of guessing correctly is ¼ = 0.25.

Multiple choice: example Example: The phonemic transcription of sourdough in Southern Educated British English is 1. /so rdu / 2. /sa do / 3. /sa d / 4. /sa rd f/ The probability of guessing the correct transcription of sourdough is 1/4 = 0.25

Multiple choice: pros and cons... Pro: reasonably easy for the teacher to construct easy for the teacher to correct easy to evaluate statistically easy for the student to understand the procedure

...Multiple choice: pros and cons Con: tests written knowledge only (unless acoustic media are used) tests receptive knowledge only tests metalinguistic knowledge rather than performance if the distractors are not carefully selected, the probability of random guessing may be high - tends to test knowledge rather than understanding

Cloze Tests... A gap-completion task Written Many different kinds of gap possible: letters parts of words (morphs) words word sequences

...Cloze Tests Gap selection procedure may be random or systematic - e.g. gaps every 4-10 words - hard words maybe filtered out - or maybe specific word types are filtered manual or computerized

Modified cloze (UBI Entrance Test) Ten years ago representatives from 178 nations met in Rio to plan how to protect the world's resources. Pledges we given t safeguard ecosy, reduce glo -warming ga, and pro humanity thr sustainable devel. Last ye world lea, scientists a activists m again. The agenda: to check whether Rio had changed the world.

What is in a test? Some criteria Technical: - comparison of two performances Personal: - interaction between examiner and examinee Institutional: - qualification: license to participate in certain activities

Psychological: - performance under highly constrained conditions Linguistic: - Language understanding and language production - Metalinguistic activity

Finally, test types can be distinguished by scoring approach Norm-referenced Criterion-referenced

Norm-referenced versus Criterion-referenced Norm-referenced tests are scored according to how well each person does in relation to the mean (or average) score on the test, e.g. national standard reading tests in the U.S. Criterion-referenced tests are scored according to predetermined criteria, so that a person s score is not affected by how well everyone else does on the test.

Norm-referenced versus Criterion-referenced Proficiency tests tend to be normreferenced. Most other types of tests tend to be criterion-referenced.

Testers do not assume that a test score is a person s true score There is always a margin of error in a test, so the true score (the person s true ability) may be somewhat higher or lower than the test score. The amount that the true score may differ from the test score is calculated in relation to the test s overall reliability (standard deviation).

Mastery of Reliability Expert testers and testing companies have mastered the art of test reliability. However, the more important question of test validity remains a challenge for everyone, both novice and expert.

Analogies How would you test whether a person knows the capital cities of all 50 states? How would you test a person s ability to play tennis? How would you test whether a person can tie his/her shoe? How would you test whether a person can build a cabinet?