Standard Setting in a Small Scale OSCE: A Comparison of the Modified Borderline-Group Method and the Borderline Regression Method

Similar documents
RESEARCH ARTICLES Objective Structured Clinical Examinations in Doctor of Pharmacy Programs in the United States

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

The Objective Structured Clinical Examination (OSCE): AMEE Guide No. 81. Part II: Organisation & Administration

Probability and Statistics Curriculum Pacing Guide

4.0 CAPACITY AND UTILIZATION

Lecture 1: Machine Learning Basics

Educational Leadership and Policy Studies Doctoral Programs (Ed.D. and Ph.D.)

School Size and the Quality of Teaching and Learning

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Statewide Framework Document for:

On-the-Fly Customization of Automated Essay Scoring

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Measurement. When Smaller Is Better. Activity:

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

Extending Place Value with Whole Numbers to 1,000,000

Technical Manual Supplement

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

BENCHMARK TREND COMPARISON REPORT:

CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

How to Judge the Quality of an Objective Classroom Test

Use of the Kalamazoo Essential Elements Communication Checklist (Adapted) in an Institutional Interpersonal and Communication Skills Curriculum

The My Class Activities Instrument as Used in Saturday Enrichment Program Evaluation

NCEO Technical Report 27

Multi-Lingual Text Leveling

Evidence for Reliability, Validity and Learning Effectiveness

Multiple regression as a practical tool for teacher preparation program evaluation

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Section 3.4 Assessing barriers and facilitators to knowledge use

Grade 6: Correlated to AGS Basic Math Skills

Analysis of Enzyme Kinetic Data

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

How do we balance statistical evidence with expert judgement when aligning tests to the CEFR?

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

INTERNAL MEDICINE IN-TRAINING EXAMINATION (IM-ITE SM )

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Math 098 Intermediate Algebra Spring 2018

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Tun your everyday simulation activity into research

Evaluation of a College Freshman Diversity Research Program

Learning Disability Functional Capacity Evaluation. Dear Doctor,

MGT/MGP/MGB 261: Investment Analysis

Interdisciplinary Journal of Problem-Based Learning

Teachers development in educational systems

Strategy for teaching communication skills in dentistry

Mathematics process categories

FINAL EXAMINATION OBG4000 AUDIT June 2011 SESSION WRITTEN COMPONENT & LOGBOOK ASSESSMENT

THEORY OF PLANNED BEHAVIOR MODEL IN ELECTRONIC LEARNING: A PILOT STUDY

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Further, Robert W. Lissitz, University of Maryland Huynh Huynh, University of South Carolina ADEQUATE YEARLY PROGRESS

EDUCATIONAL ATTAINMENT

Introduction to Questionnaire Design

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Psychometric Research Brief Office of Shared Accountability

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Instructions and Guidelines for Promotion and Tenure Review of IUB Librarians

Rendezvous with Comet Halley Next Generation of Science Standards

Karla Brooks Baehr, Ed.D. Senior Advisor and Consultant The District Management Council

Assignment 1: Predicting Amazon Review Ratings

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

The patient-centered medical

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Practical Research. Planning and Design. Paul D. Leedy. Jeanne Ellis Ormrod. Upper Saddle River, New Jersey Columbus, Ohio

Applications from foundation doctors to specialty training. Reporting tool user guide. Contents. last updated July 2016

Mathematics subject curriculum

value equivalent 6. Attendance Full-time Part-time Distance learning Mode of attendance 5 days pw n/a n/a

Functional Maths Skills Check E3/L x

OFFICE OF ENROLLMENT MANAGEMENT. Annual Report

Paper 2. Mathematics test. Calculator allowed. First name. Last name. School KEY STAGE TIER

Detailed course syllabus

Julia Smith. Effective Classroom Approaches to.

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

If we want to measure the amount of cereal inside the box, what tool would we use: string, square tiles, or cubes?

MYCIN. The MYCIN Task

Functional Skills Mathematics Level 2 assessment

Process Evaluations for a Multisite Nutrition Education Program

How People Learn Physics

Lecture 2: Quantifiers and Approximation

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

PHD COURSE INTERMEDIATE STATISTICS USING SPSS, 2018

Guidelines for Writing an Internship Report

Improving Conceptual Understanding of Physics with Technology

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

SURVIVING ON MARS WITH GEOGEBRA

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Sector Differences in Student Learning: Differences in Achievement Gains Across School Years and During the Summer

The Effect of Written Corrective Feedback on the Accuracy of English Article Usage in L2 Writing

Reference to Tenure track faculty in this document includes tenured faculty, unless otherwise noted.

May To print or download your own copies of this document visit Name Date Eurovision Numeracy Assignment

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

GCSE English Language 2012 An investigation into the outcomes for candidates in Wales

Executive Guide to Simulation for Health

The Impact of Postgraduate Health Technology Innovation Training: Outcomes of the Stanford Biodesign Fellowship

Kansas Adequate Yearly Progress (AYP) Revised Guidance

Transcription:

Advances in Health Sciences Education (2006) 11:115 122 Ó Springer 2006 DOI 10.1007/s10459-005-7853-1 Standard Setting in a Small Scale OSCE: A Comparison of the Modified Borderline-Group Method and the Borderline Regression Method TIMOTHY J. WOOD 1, *, SUSAN M. HUMPHREY-MURTO 2 and GEOFFREY R. NORMAN 3 1 Medical Council of Canada and Faculty of Medicine, University of Ottawa K1G-3H7 ON, Ottawa, Canada; 2 Faculty of Medicine, University of Ottawa ; 3 Faculty of Medicine McMaster University (*author for correspondence, Phone: (613)521-6012; Fax: (613)521-8059; E-mail: twood@mcc.ca) (Received 14 May 2004; accepted 24 May 2005) Abstract. When setting standards, administrators of small-scale OSCEs often face several challenges, including a lack of resources, a lack of available expertise in statistics, and difficulty in recruiting judges. The Modified Borderline-Group Method is a standard setting procedure that compensates for these challenges by using physician examiners and is easy to use making it a good choice for small scale OSCEs. Unfortunately, the use of this approach may introduce a new challenge. Because a small scale OSCE has a small number of examinees, there may be few examinees in the borderline range, which could introduce an unintentional bias. A standard setting method called The Borderline Regression Method will be described. This standard setting method is similar to the Modified Borderline-Group Method but incorporates a linear regression approach allowing the cut score to be set using the scores from all examinees and not from a subset. The current study uses confidence intervals to analyze the precision of cut scores derived from both approaches when applied to a small scale OSCE. Key words: standard setting, OSCE Although a large number of methods for setting standards on performance examinations exist (Cusimano, 1996), there is no gold standard. Several studies confirm that various methods produce different cut scores and examination administrators must chose a defensible yet feasible method for their examination. Ideally, the method chosen would produce the most accurate result, but in small-scale university-based OSCEs, there are several additional constraints Administrators may only have limited access to experts in psychometrics or statistics or have few resources for data entry and analysis making complex statistical analyses required by some standard

116 TIMOTHY J. WOOD ET AL. setting methods difficult to perform. In addition, finding clinicians who are able to devote time to the extensive standard setting procedures required by some standard setting methods is increasingly difficult. For the last six years, the University of Ottawa Medical School has used the Modified Borderline-Group Method to set the cut score in a 2nd year student OSCE, a clerkship OSCE, and an Internal Medicine Resident OSCE (Humphrey-Murto and MacFadyen, 2002; MacFadyen, 1996). This standard setting method has also been used by the University of Otago, New Zealand (Wilkinson et al., 2001) and by the Medical Council of Canada (MCC) for the MCC Qualifying Examination Part II (Dauphinee et al., 1997; Smee and Blackmore, 2001). With the Modified Borderline-Group Method, a physician examiner evaluates a examinee s performance at a station by completing a stationspecific checklist and then a rating on a global rating scale. The number of points on the scale can vary as long as there is a cohort of examinees labeled as borderline. The MCC and the University of Ottawa use six point scales with adjective descriptors corresponding to inferior, poor, borderline unsatisfactory, borderline satisfactory, good, and excellent. To determine a cut score for a station, the mean checklist score for the cohort of examinees rated as borderline is calculated and then applied to all examinees. By averaging the checklist scores of the Borderline Satisfactory and Borderline Unsatisfactory groups, it is assumed that this corresponds to a examinee exactly at the pass/fail cut point between the two categories. The sum of the station cut scores becomes the cut score for the overall exam. For the small-scale OSCE administrator, there are several benefits to using the Modified Borderline-Group Method as a standard setting method, it does not require any complex statistical procedures and the cut point is easy to calculate. It is also based on actual examinee performance rather than a review of checklist items, so it appears to have higher face validity than other methods. Finally, it is an efficient use of the clinicians time since the evaluation occurs at the time of the exam. Although the Modified Borderline-Group Method meets the needs of small-scale OSCE administrators, there may be some potential problems. For example, because the Modified Borderline-Group Method only uses the checklist scores from those examinees rated as borderline, it does not use all of the data This may not be a problem for a large scale OSCE like the MCCQE Part II, but for a small scale OSCE, there is the risk that a cut score could be based on a relatively small number of examinees, which would result in an increased amount of statistical error associated with the cut score. Another potential problem is related to the calculation of the mean of the checklist scores. For any given station, the cut score will always be toward the extreme of the distribution of checklist scores no

STANDARD SETTING IN A SMALL SCALE OSCE 117 matter how the borderline group is chosen (i.e., as an average over two categories as in the present example or as a single category such as borderline). The reason is that typically there will be more candidates with scores at the high end of the category than at the low end (except of course for statistical fluctuations). Consequently, any average of scores within the group will always result in a computed mean checklist score corresponding to an individual who is higher than the middle of the category. So the mean score computing by averaging over the borderline category will result in a consistent bias upwards introduced into the computed cut point score. Recent research has described a standard setting procedure called a Borderline Regression Method that uses a linear regression approach to set cut scores (Kramer et al., 2003; see also MacFadyen, 1996 and Woehr et al., 1991) This method is very similar to the Modified Borderline-Group Method but rather than selecting out a cohort of borderline examinees and calculating their mean checklist score, this method regresses all of the examinees checklist scores onto their global ratings to produce a linear equation. By inserting the midpoint of the global rating scale corresponding to the borderline group(s) (e.g., 3.5 on the current six-point scale) into the equation, a corresponding predicted checklist score can be determined. This predicted score becomes the cut score for the station. The advantage of the Borderline Regression Method is that it uses all of the examinee data for setting the pass mark and not just the scores from examinees rated as borderline. In addition, this approach will be less susceptible to variation due to unequal weighting of examinees in the borderline groups. The borderline regression approach has also been compared to a standard set using two different Angoff procedures and was found to have less variance associated with it than either of two Angoff procedures (Kramer et al., 2003). Because this regression approach to standard setting is similar to the Modified Borderline-Group Method it also has all of the same advantages including using actual examinee performances, having better face validity, and being an efficient use of the clinicians time. The Borderline Regression Method is a bit more complicated in that an OSCE administrator needs to be able to run a linear regression but this can be done easily in common statistical and spreadsheet programs. The question remains therefore as to whether the use of a regression approach actually leads to a more accurate decision compared to the Modified Borderline-Group Method. The purpose of this study is to compare the accuracy of the two standard setting methods. A 95% confidence interval around the cut scores for each station will be used to determine the accuracy of the pass/fail decision. This study will add to the existing literature because these methods of standard setting have not been compared for accuracy and feasibility with the small scale OSCE administrator in mind.

118 TIMOTHY J. WOOD ET AL. Methods Design A 10 station OSCE using physician examiners and standardized patients was administered to 59 clinical clerks at the University of Ottawa in 1998. Eight stations involved patient encounters and two stations used written questions. Only the patient encounter stations were used for this analysis. Analysis Descriptive statistics including the number of examinees, station cut scores, and pass rates for each standard setting method were calculated. To determine the accuracy of the station cut scores, 95% confidence intervals were calculated. For the Modified Borderline-Group Method, the confidence interval was calculated using formulas from any introductory statistics book (196*standard error). For cut scores derived from the Borderline Regression Method, a linear regression in which the checklist scores were regressed onto the ratings was first conducted for each station. The confidence interval for the resulting regression lines was calculated as follows (Kleinbaum et al., 1988): sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 S ¼ S YjX Y^ XO n þ ðx O þ XÞ 2 ðn 1ÞS 2 X where S is the standard error of the regression line, S YjX is the standard Y^ XO error of the estimate, n is the number of examinees, X O is the cut score, X is the mean of the examinees scores, and S 2 X is the variance associated with the examinees scores. After generating the standard error of the regression line, the confidence intervals are calculated by multiplying the standard error of the regression line by the t value at p = 0.05 and d.f. = n)2. Because the Borderline Regression approach is a relatively novel method of setting a standard, it was decided that some diagnostic tests should be conducted to determine if a linear regression was justified. First, residuals for each station were also analyzed to ensure that there were no outliers. Second, a lack of fit test was conducted for each station to determine if a linear regression was appropriate (Dixon and Massey, 1969) This latter analysis tests whether the means for the groups (as defined by the 6 points on the rating scale) are located in a straight line. For each station, the Sum of Squares(residual) term, available from the regression analysis, is determined. A second Sum of Squares(pure error) term was computed from a one way ANOVA using the values on the rating scale as a grouping factor. This latter term estimates the random error within each rating category. A Sum of Squares(lack of fit) term was then computed by subtracting the Sum of Squares(residual) term from the Sum of

STANDARD SETTING IN A SMALL SCALE OSCE 119 Squares(pure error) term. A Mean Square(lack of fit) term was then created by dividing the Sum of Squares(lack of fit) by k)2 (where k is the number of points on rating scale). The F ratio (F(k)2, n)k)) for the lack of fit test is calculated by dividing the Mean Square(lack of fit) by the Mean Square error term available from the one way ANOVA. Results An analysis of the residuals for each station indicated that scores from three examinees were outliers and therefore these scores were removed from three stations. As shown in Table I, the lack of fit test for each station revealed non-significant results. This indicates that a linear relationship existed between the checklist scores and the global ratings for each station and therefore the linear analyses were justified. Table II displays the number of examinees, mean score, pass rate, and 95% confidence interval for each station as a function of standard setting method. Comparing across standard setting methods, the cut score derived using the Borderline Regression Method was, on average, 0.14 points lower (M = 5.14 vs. M = 5.28, respectively), and was lower than the cut score derived using the Modified Borderline-Group Method on six of the eight stations. The pass rate for the regression approach was, on average, 4% higher (M = 71% vs. M = 67%, Table I. Values used to calculate a lack of fit test for each station Station N SSres SS(err) SS(lof) d.f.(res) d.f.(err) d.f.(lof) MS(lof) MS(err) F Sig. 01 57 54.37 51.66 2.71 55 51 4 0.68 0.99 0.68 0.61 03 58 124.73 117.13 7.60 56 52 4 1.90 2.21 0.86 0.49 04 59 88.25 84.25 4.00 57 53 4 1.00 1.56 0.64 0.64 05 59 31.71 27.89 3.82 57 53 4 0.95 0.52 1.85 0.13 07 59 27.15 24.78 2.37 57 53 4 0.59 0.47 1.27 0.29 08 59 78.74 71.16 7.58 57 53 4 1.89 1.34 1.41 0.24 09 59 100.36 98.83 1.53 57 53 4 0.38 1.83 0.21 0.93 10 58 39.23 38.45 0.78 56 52 4 0.19 0.75 0.26 0.90 n = number of examinees. SSres = SS residual from the regression analysis. SS(err) = SS residual from an ANOVA with scale as grouping factor. SS(lof) = SS(res) ) SS(err). d.f.(res) = n)2. d.f.(error) = n ) k (k = number of items on scale). d.f.(lof) = d.f.(res) ) d.f.(err) = k)2. MS(lof) = SS(lof)/d.f.(lof). MS(err) = error term from an ANOVA with scale as grouping factor. F = MS(lof)/MS(err).

120 TIMOTHY J. WOOD ET AL. Table II. Number of examinees, cut score, pass rate and 95% confidence interval for each standard setting method Station Modified Borderline-group method Regression method N Cut score Pass rate (%) Confidence interval N Cut score Pass rate (%) Confidence interval 1 18 6.00 71 ±0.58 57 6.10 64 ±0.44 3 28 4.55 98 ±0.51 58 4.64 98 ±0.48 4 18 4.54 69 ±0.53 59 4.51 69 ±0.48 5 24 5.21 56 ±0.35 59 5 14 56 ±0.27 7 39 5.98 34 ±0.21 59 5.77 39 ±0.19 8 26 5.35 73 ±0.42 59 5.17 75 ±0.42 9 12 5.49 69 ±0.83 59 4.79 92 ±0.57 10 26 5.14 69 ±0.42 58 5.00 75 ±0.29 overall 5.28 67 ±0.48 5.17 71 ±0.39 Checklist scores range from 0 to 10. The number of examinees for the Modified Borderline-Group Method correspond to those examinees rated as borderline whereas the number of examinees for the Regression Method correspond to all examinees. respectively), and the two approaches differed on five of the eight stations. More importantly, the 95% confidence intervals were smaller for the cut scores derived using the regression approach than the Modified Borderline- Group Method, by an average of 0.09 (M = 0.39 vs. M = 0.48, respectively), t = 2.93, p < 0.05. Interestingly, the differences between the two approaches were quite large for Station 9. Using the Modified Borderline-Group Method, the cut score was 5.49 with 69% of the examinees passing. With the Borderline Regression approach the cut score was 4.79 and 92% of the examinees passed. A subsequent analysis of this station revealed that of the 12 examinees rated as borderline using the Modified Borderline-Group Method, all 12 had been rated as borderline satisfactory and therefore the cut score the station was set relatively high. This station demonstrates one of the potential problems of using the Modified Borderline-Group Method when there are few examinees. Discussion In this study, we describe two standard setting methods appropriate for use with a small scale OSCE, discuss the strengths and potential weakness of both methods and demonstrate that a cut score derived from a Borderline Regression method was more accurate than one derived using the Modified Borderline-Group Method.

STANDARD SETTING IN A SMALL SCALE OSCE 121 There were relatively small differences in the cut scores and pass rates derived from both methods with cut scores determined using a regression approach being slightly lower for most stations. This is anticipated based on the earlier discussion, which suggests that simple averaging of borderline groups will lead to generally biased results. In addition, there was decreased statistical error using the regression estimates. There are other advantages to using a regression approach. First, as demonstrated by Station 9 on the OSCE, a cut score determined using the Modified Borderline-Group Method is more susceptible to variations in the distribution of scores in the borderline groups than is the regression approach. Second, because the borderline group(s) are in the lower tail of the overall distribution, the actual distribution of scores within the groups(s) will usually be skewed to the left so that the average will be biased on the high side. Linear regression uses values across a continuous dimension and therefore avoids this computational bias. The principal disadvantage to using the Borderline Regression approach is related to the statistical complexity. We performed a number of exploratory analyses of the regression approach including a residual analysis to look for outliers, a lack of fit test, and the calculation of the 95% confidence interval associated with the regression line. These analyses were done primarily because this was a new method of standard setting and we wanted to ensure that its use met the assumptions associated with linear regression and to determine the accuracy of the decision compared to the Modified Borderline- Group Method. These analyses are beyond what the typical university OSCE administrator would need to do. Simple linear regression analysis is quite easy to perform and can be conducted in most popular statistics packages available to users. The improved accuracy of a cut score calculated using a regression-based approach, compares favorably to a study reported by Kramer et al. (2003). Using senior post-graduate trainees and general practioners as examinees on an OSCE, they calculated the root mean squared error (RMSE) term derived from a generalizability analysis to determine the amount of error associated with cut scores calculated using a regression approach and two types of Angoff procedures. The RMSE associated with the regression approach was less than the RMSE associated with the Angoff procedures. In addition, the overall pass rates of the examinees were more credible than those associated with the Angoff procedures. In summary, small-scale OSCE administrators have various constraints that dictate which standard setting method is used to determine a cut score. The Modified Borderline-Group Method has many advantages in that it is easy to use, doesn t require a great deal of statistical support, and is an efficient and defensible method of standard setting. Despite these advantages,

122 TIMOTHY J. WOOD ET AL. potential problems can occur when it is applied to a small scale OSCEs. A linear regression approach, similar in nature to the Modified Borderline- Group Method, demonstrated all of the benefits of the latter standard setting method and also proved to have less statistical error associated with it. Before implementing the Borderline Regression Method at the University of Ottawa, future studies will track, across time and examinations, the pass/fail marks derived from both methods. The possibility of extending this approach to a high stakes clinical examination like that used by the Medical Council of Canada will also be investigated. References Cusimano, M. (1996). Standard setting in medical education. Academic Medicine 71: s112 s120. Dauphinee, W.D., Blackmore, D.E., Smee, S.M., Rothman, A.I. & Reznick, R.K. (1997). Using the judgments of physician examiners in setting standards for a national multi-centre high stakes OSCE. Advances in Health Science Education: Theory and Practice 2: 201 211. Dixon, W.F & Massey, F.J. (1969). Introduction to Statistical Analysis. New York: McGraw Hill. Humphrey-Murto, S. & MacFadyen, J.C. (2002). Standard Setting: A comparison of case-author and modified borderline-group methods in a small-scale OSCE. Academic Medicine 77: 134 137. Kleinbaum, D.G., Kupper, L.L. & Muller, K.E. (1988). Applied Regression Analysis and Other Multivariable Methods. Belmont, CA: Duxbury Press. Kramer, A., Muitjens, A., Jansen, K., Dusman, H., Tan, L. & van der Vleuten, C. (2003). Comparison of a rational and an empirical standard setting procedure for an OSCE. Medical Education 37: 132 139. MacFadyen, I.J.C. (1996). A Modified Borderline Groups Method to Establish Case-Based Pass/Fail Decisions for an Undergraduate Objective Structured Clinical Exam: Exploring Issues of Validity. Masters dissertation, University of Illinois at Chicago, Chicago. Smee, S.M. & Blackmore, D.E. (2001). Setting standards for an objective structured clinical examination: the borderline group method gains ground on Angoff. Medical Education 35: 1009 1010. Wilkinson, T.J., Newble, D.I. & Frampton, C.M. (2001). Standard setting in an objective structured clinical examination: use of global ratings of borderline performance to determine the passing score. Medical Education 35: 1043 1049. Woehr, D.J., Arthur, W. & Fehrmann, M.L (1991). An empirical comparison of cutoff score methods for content-related and criterion-related validity settings. Educational and Psychological Measurement 51: 1029 1039.