Multiple Choice Test Item Construction and Item Analysis

Similar documents
How to Judge the Quality of an Objective Classroom Test

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

Taxonomy of the cognitive domain: An example of architectural education program

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Quality teaching and learning in the educational context: Teacher pedagogy to support learners of a modern digital society

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Automating Outcome Based Assessment

Protocol for using the Classroom Walkthrough Observation Instrument

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

The College Board Redesigned SAT Grade 12

ATW 202. Business Research Methods

West s Paralegal Today The Legal Team at Work Third Edition

Sight Word Assessment

Psychometric Research Brief Office of Shared Accountability

Study Group Handbook

ECON 365 fall papers GEOS 330Z fall papers HUMN 300Z fall papers PHIL 370 fall papers

Guidelines for Writing an Internship Report

November 2012 MUET (800)

Short vs. Extended Answer Questions in Computer Science Exams

How long did... Who did... Where was... When did... How did... Which did...

Exemplar Grade 9 Reading Test Questions

Diagnostic Test. Middle School Mathematics

Age Effects on Syntactic Control in. Second Language Learning

VIEW: An Assessment of Problem Solving Style

Unit 3. Design Activity. Overview. Purpose. Profile

Writing for the AP U.S. History Exam

Major Milestones, Team Activities, and Individual Deliverables

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Third Misconceptions Seminar Proceedings (1993)

Developing a Language for Assessing Creativity: a taxonomy to support student learning and assessment

Intermediate Algebra

STA 225: Introductory Statistics (CT)

Study Guide for Right of Way Equipment Operator 1

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

Modified Systematic Approach to Answering Questions J A M I L A H A L S A I D A N, M S C.

Problem-Solving with Toothpicks, Dots, and Coins Agenda (Target duration: 50 min.)

Critical Thinking in the Workplace. for City of Tallahassee Gabrielle K. Gabrielli, Ph.D.

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Chapter 4 - Fractions

Nutrition 10 Contemporary Nutrition WINTER 2016

Intensive English Program Southwest College

The Extend of Adaptation Bloom's Taxonomy of Cognitive Domain In English Questions Included in General Secondary Exams

My first english teacher essay. To teacher first on research andor english, simply order an essay from us..

Houghton Mifflin Online Assessment System Walkthrough Guide

Proposing New CSU Degree Programs Bachelor s and Master s Levels. Offered through Self-Support and State-Support Modes

THINKING SKILLS, STUDENT ENGAGEMENT BRAIN-BASED LEARNING LOOKING THROUGH THE EYES OF THE LEARNER AND SCHEMA ACTIVATOR ENGAGEMENT POINT

Learning Disability Functional Capacity Evaluation. Dear Doctor,

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

A. What is research? B. Types of research

E-3: Check for academic understanding

Extending Learning Across Time & Space: The Power of Generalization

ASSESSMENT GUIDELINES (PRACTICAL /PERFORMANCE WORK) Grade: 85%+ Description: 'Outstanding work in all respects', ' Work of high professional standard'

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

REFERENCE GUIDE AND TEST PRODUCED BY VIDEO COMMUNICATIONS

Association Between Categorical Variables

Cal s Dinner Card Deals

CEFR Overall Illustrative English Proficiency Scales

PHY2048 Syllabus - Physics with Calculus 1 Fall 2014

WELCOME! Of Social Competency. Using Social Thinking and. Social Thinking and. the UCLA PEERS Program 5/1/2017. My Background/ Who Am I?

PREDISPOSING FACTORS TOWARDS EXAMINATION MALPRACTICE AMONG STUDENTS IN LAGOS UNIVERSITIES: IMPLICATIONS FOR COUNSELLING

COURSE SYNOPSIS COURSE OBJECTIVES. UNIVERSITI SAINS MALAYSIA School of Management

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

WHAT ARE VIRTUAL MANIPULATIVES?

Common Core Exemplar for English Language Arts and Social Studies: GRADE 1

Using Blackboard.com Software to Reach Beyond the Classroom: Intermediate

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

BIOH : Principles of Medical Physiology

Math 96: Intermediate Algebra in Context

Introduction to Questionnaire Design

The Impact of Formative Assessment and Remedial Teaching on EFL Learners Listening Comprehension N A H I D Z A R E I N A S TA R A N YA S A M I

Conducting an interview

Improving Conceptual Understanding of Physics with Technology

Frank Phillips College Student Course Evaluation Results. Exemplary Educational Objectives Social & Behavioral Science THECB

The Effect of Syntactic Simplicity and Complexity on the Readability of the Text

Radius STEM Readiness TM

Measures of the Location of the Data

UC Santa Cruz Graduate Research Symposium 2016

Impact of peer interaction on conceptual test performance. Abstract

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Course Content Concepts

English 491: Methods of Teaching English in Secondary School. Identify when this occurs in the program: Senior Year (capstone course), week 11

DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY?

Predatory Reading, & Some Related Hints on Writing. I. Suggestions for Reading

Grade 11 Language Arts (2 Semester Course) CURRICULUM. Course Description ENGLISH 11 (2 Semester Course) Duration: 2 Semesters Prerequisite: None

The lasting impact of the Great Depression

CHEM 591 Seminar in Inorganic Chemistry

RETURNING TEACHER REQUIRED TRAINING MODULE YE TRANSCRIPT

Instructional Supports for Common Core and Beyond: FORMATIVE ASSESMENT

Common Core State Standards

HUMAN DEVELOPMENT OVER THE LIFESPAN Psychology 351 Fall 2013

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Shockwheat. Statistics 1, Activity 1

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

SY 6200 Behavioral Assessment, Analysis, and Intervention Spring 2016, 3 Credits

How to make an A in Physics 101/102. Submitted by students who earned an A in PHYS 101 and PHYS 102.

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Transcription:

Multiple Choice Test Item Construction and Item Analysis College of Pharmacy September 17, 2014

Objectives Apply current research in educational measurement specific to test item construction Identify the basic parts of a test item Distinguish good test items from ones that should be rewritten Apply Bloom s Revised Taxonomy when writing or evaluating a test item for cognitive level Consider difficulty and item discrimination when reviewing a test item s effectiveness Consider strategies for improving test items (and tests) over time

Anatomy of a Multiple-choice Question Patients with congenital adrenal hyperplasia present with excessive circulating levels of STEM a) ACTH b) Aldosterone c) BAM22 d) Cortisol e) CXCR7 Key (correct answer) Distractors OPTIONS

Power Button Press once. Blue light indicates on. Automatically turns off in 5 minutes of non-use. Response Buttons A E Changed your mind? Press a different response button. Good Dog: A Bad Dog: B

From: Kubiszyn, T. & Borich, G. (2000) Educational testing and measurement: Classroom application and practice 6 th edition. Wiley. A B Question 1 U.S. Grant was an a) president b) man c) alcoholic d) general Issues: Grammatical clue (a/an will fix that) Multiple defensible answers

From: Kubiszyn, T. & Borich, G. (2000) Educational testing and measurement: Classroom application and practice 6 th edition. Wiley. A B Question 2 The free floating structures within the cell that synthesizes protein are called. a) chromosomes b) lysosomes c) mitochondria d) free ribosomes Issues: Stem clue

From: Kubiszyn, T. & Borich, G. (2000) Educational testing and measurement: Classroom application and practice 6 th edition. Wiley. Question 3 The square root of 256 is. a) 14 b) 16 c) 4 X 4 d) both a and b e) both b and c f) all of the above A B Issues: all/none of the above should be avoided can likely be figured out even if you can t do the math!

From: Kubiszyn, T. & Borich, G. (2000) Educational testing and measurement: Classroom application and practice 6 th edition. Wiley. Question 4 When 53 Americans were held hostage in Iran, a) the US did nothing to try to free them b) the US declared war on Iran A c) the US first attempted to free them by diplomatic means and later attempted a rescue d) the US expelled all Iranian students Issues: Put US in the stem to shorten the options Test writers tend to make the correct option longer than the distractors B

Items to avoid Type K (complex multiple-choice) Which of the following behaviors suggests that you re losing it? A. You light a match to check a gas leak. B. You pick apart your relationship with your significant other. C. You advise your teenage son to use his own best judgment. D. A and B E. B and C F. All of the above Berk, R. (1996). A consumer s guide to multiple choice item formats that measure complex cognitive outcomes. Pearson Publishing.

Type K and What Research Shows Complex multiple-choice multiple combination choices of answers (1) A only; 2) both A and C; 3) both B and D; 4) A, B and C, 5) All of the Above) fewer can be answered in a given time period may be more dependent on test-taking skills than subject knowledge often have lower item discrimination scores Haladyna, T. M. (1992). The effectiveness of several multiple-choice formats. Applied Measurement in Education, 5, 73-88.

Items to avoid Type K (complex true/false) According to the laws of psychology, which of the following are true (A) and which are false (B)? 1. Never ring a bell when a Pavlov s dog is sitting on your lap 2. Laws of behavior modification only apply to your neighbor s children 3. The right hand does know what the left hand is doing, it just doesn t care. 4. Adults get older faster than children and adults with children age the fastest Berk, R. (1996). A consumer s guide to multiple choice item formats that measure complex cognitive outcomes. Pearson Publishing.

Items to avoid Type K (complex multiple choice) Which of the following are needed to calculate simple interest? I. The amount of money borrowed II. The interest rate III. The length of the borrowing period a) I only b) I and II c) I and III d) I, II, and III

Type X: Research Shows True/False Difficult to write questions that avoid ambiguous statements without making the answer obvious. Writing true or false statements with no exceptions is difficult. Students have 50-50 chance of getting answer right. Students can make educated guesses increasing odds beyond 50-50 without knowing the answer outright.

Rules for MCQ Test Items Each item should focus on a single important concept Each item should assess application of knowledge, not recall of an isolated fact The stem of the item must pose a clear question All incorrect options should be homogenous and plausible Avoid technical flaws

And Remember Test-wiseness It s real! Grammatical cues (e.g., tense/case, singular/plural, nonparallel construction) Logical cues (e.g., some options illogical given the lead-in) Absolute terms (e.g., never, always ) Long correct answer (e.g., the correct option is longer and more specific than the others) Word repeats (e.g., same/similar words in stem and correct option)

Test items from the prof Question 1 A B The pharmacological action of cortisol in the kidney is most similar to that of a) Angiotensin II b) Trimacinolone c) Dexamethasone d) Fludrocortisone e) Betamethasone

Test items from the prof Question 2 A B An increase in the amplitude of cortisol secretion, with no change in the frequency or phase of cortisol secretion, in is thought to result in. a) females, increased anxiety b) females, reduced anxiety c) males, cowardice d) males, reduced anxiety e) males, increased anxiety

Test items from the prof Question 3 A B Long-term therapy with prednisone (oral) in a female asthmatic patient would likely suppress levels of in that patient. I. ACTH II. Cortisone III. Aldosterone a) I only b) III only c) I and II only d) II and III only e) I, II, and III

Analyze and Re-write

On to matching test items with instructional goals

The mid-term, the perfect test question and the tearful prof In assessing Mr. Delgado, which behavior is the most reassuring sign that he has been following his treatment plan for his hypertension and diabetes? A. He has a list of glucose readings for the past 10 days B. He has a list of medications along with newly refilled meds. C. He has kept a nutritional log for a 3-day period D. He can verbalize the side effects of all his medications

The consultation Goal: Learn all the important content Learn how to think critically about the subject Teaching Activities? Lecture - experts conduct hour-long lectures Feedback/Assessment: Mid-term exam Result: Students could not reason through to the right answer Discussion: Should you assess what you haven t taught?

The Cognitive Domain Bloom s Taxonomy Evaluation Synthesis Analysis Application Comprehension Knowledge Creating Evaluating Analyzing Applying Understanding Remembering Bloom, B. S. (1956). Taxonomy of Educational Objectives, Handbook I: The Cognitive Domain. New York: David McKay Co Inc. Anderson, L.W. (Ed.), Krathwohl, D.R. (Ed.), Airasian, P.W., Cruikshank, K.A., Mayer, R.E., Pintrich, P.R., Raths, J., & Wittrock, M.C. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom s Taxonomy of Educational Objectives (Complete edition). New York: Longman.

Before you can understand a concept, you have to remember it apply a concept, you must understand it analyze a concept, you must be able to apply it evaluate its impact, you must have analyzed it create, you must have remembered, understood, applied, analyzed, and evaluated.

Verb use to guide question depth HOTS LOTS Taxonomy Level Creating: can the student create new product or point of view? Evaluating: can the student justify a stand or decision? Analyzing: can the student distinguish between the different parts? Applying: can the student use the information in a new way? Understanding: can the student explain ideas or concepts? Remembering: can the student recall or remember the information? Verbs to trigger thinking at this level assemble, construct, create, design, develop, formulate, write. appraise, argue, defend, judge, select, support, value, evaluate appraise, compare, contrast, criticize, differentiate, discriminate, distinguish, examine, experiment, question, test. choose, demonstrate, dramatize, employ, illustrate, interpret, operate, schedule, sketch, solve, use, write. classify, describe, discuss, explain, identify, locate, recognize, report, select, translate, paraphrase define, duplicate, list, memorize, recall, repeat, reproduce state

What was the learning objective? And what level of the taxonomy was tapped? In assessing Mr. Delgado, which behavior is the most reassuring sign that he has been following his treatment plan for his hypertension and diabetes? A. He has a list of glucose readings for the past 10 days B. He has a list of medications along with newly refilled meds. C. He has kept a nutritional log for a 3-day period D. He can verbalize the side effects of all his medications

Gotta love Iowa State Retrieved from: http://www.celt.iastate.edu/teaching-resources/effective-practice/revised-blooms-taxonomy/

What s the Bloomin Level?

On to Psychometrics

Nine out of Ten Psychometricians Say The best tests: Include questions from across the spectrum of the curriculum being tested Have a mix of item difficulty Do not include difficult items just for the sake of it Are analyzed after administration Use item discrimination to think about an item s effectiveness NOTE: You can t estimate item effectiveness in advance

Two measures of item effectiveness Difficulty and Discrimination Difficulty (p-value) The number of examinees who answer an item correctly Discrimination (id and/or point biserial) A comparison of top scorers with low scorers

Item Difficulty p-value 42 students answered the item 8 got it correct # Who Got the Item Correct # of Students who Answered the Item 8 42.19

Item Difficulty p-value range The higher the value, the easier the item. Above 0.90 -- too easy; review for question s purpose (warm up? fundamental?) Below 0.20 -- too difficult; review for confusing language, remove from subsequent exams, and/or identify as area for re-instruction.

Item Difficulty: Trivia When guessing is taken into account g = guessing/chance # distractors 100 Optimal p-value 1.0 + g 2 True/False 2 items (g=.5) Optimal p =.75 Multi-item MCQ 4 items (g=.25) 5 items (g=.20) Optimal p =.63 Optimal p =.60

Item Discrimination point-biserial correlation Top 27% Bottom 27% (# Upper Group Correct) (# Lower Group Correct) Number of Students in the Upper Group 5-2 6.50 Image Sources: http://www.allarounddrivingschool.com/bigstockphoto_happy_group_of_friends_2134478.jpg http://gosupermarche.com/deardiary/wp-content/uploads/2009/06/sad_group2.jpg

Item Discrimination point biserial range Negative ID 0% - 24% Unacceptable check for item error Usually unacceptable 25% - 39% Good item 40% - 100% Excellent item Adapted from University of Wisconsin Oshkosh: http://www.uwosh.edu/testing/facultyinfo/itemdiscrimone.php

Scantron Analysis

T-values and Statistical Significance The score obtained when you perform a T-Test. Represents the difference between the mean or average scores of two groups while taking into account any variation in scores. The t-value measures the difference in scores between two groups. Is the t-value is big enough for you to say that one group is significantly different from the other? Was the result was something that could have just happened by chance?

A Kinder, Gentler Scantron Report

Reliability Kuder-Richardson Formula 20 (KR-20) The measure obtained by administering the same test twice over a period of time to the same individuals. Scores from time 1 and time 2 are correlated to evaluate the test for stability over time. Acceptable reliability coefficients? 0.60 is an acceptable lower value

From 30,000 Feet

Other Statistical Terms

Finding Good Dogs and Bad Dogs Which items had the best difficulty scores? discrimination scores? Which items were good foundational questions? Comparing difficulty AND discrimination, which items had the best balance of the two? What is your overall take about this exam?

Objectives review Apply current research in educational measurement specific to test item construction Identify the basic parts of a test item Distinguish good test items from ones that should be rewritten Apply Bloom s Revised Taxonomy when writing or evaluating a test item for cognitive level Consider difficulty and item discrimination when reviewing a test item s effectiveness Consider strategies for improving test items (and tests) over time