Our observations of student performance and

Educational Methodologies Designing Evaluation Forms to Facilitate Student Learning Frank W. Licari, D.D.S., M.P.H., M.B.A.; G. William Knight, D.D.S., M.S., M.S.; Pamela J. Guenzel, Ph.D. Abstract: Most dental school instructors struggle to develop course evaluation criteria that can effectively be applied as valid and reliable learning instruments. Vague and unreliable learning assessments often lead to increased dissatisfaction among both faculty and students. Students complain about the lack of faculty calibration, and faculty are often unable to adequately evaluate competence due to the need to provide an overall course grade by the end of the term. By systematically addressing Mackenzie et al. s list of sixteen factors that contribute to faculty disagreements on student evaluation, we developed Criteria for Writing Effective Evaluation Forms as a guide for developing evaluation criteria. By using the guide for developing evaluation forms for student learning, course directors will have the components necessary to ensure validity and reliability of student assessment methodology. By providing students and faculty with clearly defined criteria and the training to apply those criteria, Mackenzie et al. s concerns may be conquered. Dr. Licari is Executive Associate Dean for Academic Affairs, University of Illinois at Chicago College of Dentistry; Dr. Knight is Assistant Dean for Clinical Education, University of Illinois at Chicago College of Dentistry; and Dr. Guenzel is an Evaluation Consultant, University of Detroit Mercy School of Dentistry. Direct correspondence and requests for reprints to Dr. Frank W. Licari, Executive Associate Dean, Office of Academic Affairs, University of Illinois at Chicago, College of Dentistry, 801 South Paulina Street, Room 102A, M/C 621, Chicago, IL 60612; 312-355-3644 phone; 312-413-9050 fax; licari@uic.edu. Key words: criteria, evaluation, instruction, calibration Submitted for publication 6/4/07; accepted 10/10/07 Our observations of student performance and behavior led to our interest in examining the critical features of evaluation forms used in schools of dentistry and to the creation of a set of criteria for developing effective evaluation forms that could contribute to student learning. Over a period of several years we observed that students who required remediation in early operative courses frequently required remediation in subsequent courses. In informal conversations with faculty members, we heard concerns expressed that student problemsolving skills were lacking and that students didn t use technical language well. In similar discussions, when students were asked about their performance on practical examinations, they commented they were hopeful that they had done well on the examination, but were unsure as to how the faculty would grade their work. The students also voiced concern about the lack of uniformity of faculty feedback. Our examination of the learning environment discovered only a few examples of excellent preparations/restorations available to the students and, if present, the models of good quality (for example, prepared teeth) were often three to four times larger than life-sized. Evaluation forms that did exist were generally used for summative grading purposes only. Very rarely were the forms used by either students or faculty in day-to-day activities. When item analyses were conducted on student examination products, it was found that the weakest performance on student projects corresponded with the least clearly described criteria on the evaluation form. In 1982, Dr. Richard Mackenzie et al. observed that obtaining agreement on observations is not a trivial matter in dental education. 1 Their critical analysis of factors that contributed to faculty disagreements resulted in identification of sixteen issues that, in their opinion, needed to be addressed (Figure 1). They further argued that the reliability of any checklist was not sufficient and that validity (correlation of an item with ultimate clinical success) must also be considered. In a 1997 article, Knight described a three-part program to enhance faculty calibration. 2 The process began with the development of valid and reliable criteria for more than sixty commonly performed procedures by students. Once developed, a formal training program for faculty was instituted utilizing morning 48 Journal of Dental Education Volume 72, Number 1

1. Checkpoint Ambiguity 9. Unsystematic Inspection 2. Faulty Memory 10. Discrepancies in Visual Acuity 3. Incomplete Coverage of Dimensions 11. Degrees of Leniency 4. Unspecified Exceptions 12. Inadequacy of Verbal Definitions 5. Untrained Estimation of Size 13. Inadequate Communication with Nonverbal Examples 6. Unstandardized Aids to Judgment 14. Definition Ambiguities 7. Unspecified Methods of Observing 15. Differences in Background 8. Incomplete Operational Definition 16. Differences in Mental Processing Figure 1. Mackenzie et al. s factors related to disagreements continuing education programs for faculty. The final phase was incorporating into the school s promotion and tenure documents a requirement that faculty must demonstrate skill in student evaluation using the evaluation forms. While no formal outcomes measures were reported, Knight stated that by emphasizing the systematic nature of evaluation and the application of the tests, further refinement of the evaluation skill occurs. At least it does for students. Haj-Ali and Feil described an attempt to improve faculty calibration using a three-point rating scale describing an amalgam preparation. 3 Acknowledging Mackenzie et al. s concerns related to reliability, they undertook an extensive training program with selected faculty. Results showed that, with use of a standardized rating form and training program, faculty agreement rose and held for the ten-week study. Overall, there was a 10 percent improvement in agreements, with improvement seen in nine of the thirteen individual criteria. In a study describing a nongraded clinical assessment program, Taleghani et al. attempted to address Mackenzie et al. s factors by requiring faculty to document, in writing, all student clinical performance that fell short of clinical acceptability. 4 Using new forms and faculty training sessions, these authors concluded that verbal interactions between faculty and students and student satisfaction with the nongraded system were both viewed positively. Taleghani et al. acknowledged that faculty calibration is better organized and sequenced, but did not actually document improvement in calibration of either group. Recently, the Commission on Dental Accreditation s standards for American dental schools 5 suggested the clear need for evaluation forms. Standard 2-8 states that the dental school must employ student evaluation methods that measure the defined competencies. Standard 2-23 requires graduates must be competent in the use of critical thinking and problem solving related to the comprehensive care of patient. Finally, Standard 2-25 requires documentation of competency of graduates in fourteen (2-25a-n) specific clinical skill domains. The implications of these standards on dental educators are: 1) for every procedure or process that requires determination of learner competence, evaluation forms and their criteria must be predetermined, standardized, and communicated to faculty and learners in writing; 2) faculty and learners must be calibrated to apply each criterion on each evaluation form correctly and consistently; and 3) learners must be provided the opportunity to demonstrate preclinical and clinical competency evaluated through the use of reliable and valid evaluation forms. It is apparent that there must be corresponding actions taken to respond to these challenges. First, for January 2008 Journal of Dental Education 49

each clinical procedure that requires learner competence, faculty must write or revise evaluation forms (Develop Evaluation Forms). Second, as educators, there must be opportunities for the faculty and learner to discriminate (recognize good and bad examples of a criterion) and apply the established evaluation criteria (Train Criteria). And third, criteria on each evaluation form must be applied to demonstrate the attainment of competency (Use Evaluation Forms) In the education literature, a learning paradigm 6,7 is described and validated stating that the ability to recognize the critical features of the end product is a subskill of a learner s ability to produce that end product. Recognition is the ability to make the necessary discriminations to distinguish good outcomes from poor ones. If a problem/error is never recognized, product improvement only occurs by chance, not by problem solving. The implication for dental educators (and learners) is that recognition the ability to distinguish good outcomes from poor ones must be trained first. Feil and Reed have identified this sequence and described its critical importance in their work on knowledge of results in student motor skill acquisition. 8 In a study examining the relationship between student recognition skills and resulting product performance, a correlation was demonstrated between recognition and production. 9 It was found that over half of the variance in product scores was accounted for by student recognition score (self-evaluation). Only those students who improved their recognition skills had an improvement in their product quality. It was noted that only those students who improved their ability to accurately evaluate their preparations, no matter at what level of evaluation they started, were able to improve the quality of their preparations. If, as this research suggests, improved evaluation skill leads to improved performance, it is imperative to determine the conditions that will best enable students to develop recognition skill. The first condition may be the availability of valid criteria. However, valid criteria alone may not be sufficient. It may be that what is needed are valid criteria within a format (evaluation form) that can facilitate useful feedback and active participation of the learner in the learning environment. The guidelines for writing criteria for evaluation forms presented here address the two domains of validity and reliability. A criterion is considered valid if it is a vital determinant of a successful outcome. In clinical dentistry, success is prevention of disease or restoration of health. A valid criterion, therefore, must measure what a practitioner actually does in patient care. Validity is determined through evidence obtained from laboratory and clinical research. No criterion can be considered for inclusion on an evaluation form used in product analysis without the establishment of its validity. Reliability refers to the correct and consistent application of the valid evaluation criteria. If an evaluation criterion is vague or imprecise or if it is awkwardly formatted, both inter- and intra-rater reliability suffer. The end result then is the worst possible outcome, a confused learner. Mackenzie et al. s list of sixteen factors serves as a firm foundation for issues to address. Criteria for Writing Effective Evaluation Forms These criteria are divided into three categories, one addressing validity and two addressing reliability (Figure 2). Validity: Establish Valid Criteria 1. Criteria are individually valid. (Mackenzie et al. concerns addressed: faulty memory, differences in background, and differences in mental processing.) Each individual criterion in the set used to evaluate a product or process must be valid. It must be grounded in evidence gained through clinical or laboratory research. Each criterion considered for inclusion on an evaluation form must also directly contribute to the success (or failure) of the product or process. In our quest for evidence-based practice, nothing less can be considered. The learner, and the faculty, will attend to items that truly make a difference. 2. Criteria are collectively valid. (Mackenzie et al. concerns addressed: faulty memory, incomplete coverage of dimensions, differences in background, and differences in mental processing.) It is not sufficient to establish validity only for each criterion. Designers of evaluation forms must also ensure that, when taken as a whole, the set of criteria completely cover the essential features of the procedure. That is, if a practitioner performs to a clinically acceptable level on each of the listed criteria, then the product will be clinically acceptable. There can be no other unidentified item (i.e., not listed on the evaluation form) that can be included in the evaluation. By ensuring the completeness of the evaluation form description of the process, the learners know what they need to know. 50 Journal of Dental Education Volume 72, Number 1

Steps Criterion Yes No Problem/Plan Validity: Establish Valid Criteria 1. Criteria are individually valid. 2. Criteria are collectively valid. 3. Criteria are noncompensatory. 4. Criteria are sequenced to reflect the production of the procedure or process. Reliability: Establish Format 5. Criteria descriptions are aligned horizontally. 6. Criteria are consistently numbered. 7. Format facilitates process: procedures, problem solving, and planning for corrective actions (steps, tests, statements, corrective actions). 8. Levels of acceptable and unsatisfactory performance are visually distinguishable. 9. Evaluation form is labeled appropriately (e.g., title, date, task, patient, and student identifier). 10. Format is consistent with evaluation forms for other products and procedures. Reliability: Establish Clarity 11. Number of degrees of excellence promotes high reliability. 12. Degrees of excellence are operationalized (described specifically and/or accompanied by examples). 13. Terminology is consistent. 14. Tests are described specifically. 15. The set of criteria covers an entire range of tasks and clinical conditions. Figure 2. Criteria for writing effective evaluation forms January 2008 Journal of Dental Education 51

3. Criteria are noncompensatory. (Mackenzie et al. concern addressed: degrees of leniency.) If each criterion is valid and if the set of criteria is collectively valid, then it must follow that the criteria are noncompensatory. This means that a practitioner cannot do exceedingly well on one criterion to make up for a substandard performance on another. The necessary outcome of criteria being noncompensatory is that assigning point value to criteria is meaningless. Simple adding or averaging of points will necessarily hide poor performance on a specific criterion and thereby threaten the clinical outcome. It is important to remember that all evaluation forms result in categorical data, which means that arithmetic manipulations are inappropriate. Summative evaluation becomes not an adding of points, but rather a pattern matching exercise. For example, in Figure 3, there are fifteen criteria to address that describe the process. The grading pattern for fourth-year students (Figure 4) requires achievement of 80 percent of the criteria in the Excellent category and none in the Standard Not Met category in order to be assigned an A. It follows, however, that another student who achieves 80 percent of the fifteen criteria in the Excellent category and one Standard Not Met does not pass. Note that the grading scale changes depending on the student year. It is critical that faculty realize that while the grading scale may change, the evaluation call itself must never change. If a performance on a criterion is deemed Standard Not Met, it must be called a Standard Not Met regardless of the student s year. Changing the assessment of any criterion based on year group of the learner (an act of leniency) only confuses the learner because the standard changes. In other words, it is acceptable to change the grading scale but not the evaluation standard. 4. Criteria are sequenced to reflect the production. (Mackenzie et al. concerns addressed: checkpoint ambiguity, faulty memory, and differences in background.) Sequencing patient assessments, treatment plans, and treatments is a routine part of patient care. It is how the professional orders thinking to ensure that nothing is overlooked. Sequencing of steps within patient care procedures has been validated through clinical practice, treatment protocols, and task analyses over the history of the profession. Precisely because sequencing is critical to the provision of patient care, it is, therefore, critical that novices learn the preferred procedural sequence early and well. Purposeful design of evaluation forms to reflect the sequence is an educational opportunity that must not be overlooked. (See Figure 3, left column, Step. ) Sequencing the criteria on evaluation forms provides several key benefits. For the learner, sequenced criteria segment procedures into discrete parts that can be identified, practiced, and related to other components of the skill being learned. An additional benefit for the learner is that each and every time self-evaluation is performed, the sequence of performance is reinforced. For the faculty, too, there is a benefit as they are more likely to remember the sequence and thus recall the relevant criteria. We would suggest that establishing the sequence of valid criteria is best accomplished by having an expert perform the procedure and describe out loud what is being done and why. The expert s dialogue is recorded. Prompting questions from the recorder (such as what do you do first? second? etc. and how do you know when you have completed this step? ) can be invaluable in writing an evaluation form. Reliability: Establish Format 5. Criteria descriptions are aligned horizontally. (Mackenzie et al. concerns addressed: checkpoint ambiguity and unsystematic inspection.) One issue in using evaluation forms is the ease of use for both educator and learner. To give meaningful formative assessment, it is useful for the faculty assessment to be assigned congruently to the learner s assessment. It is simply quicker to do this in a horizontal format than in a vertical one. There is also less chance of error in underlining or checking the correct criterion statement. In some cases, especially for essentially yes or no criteria, there may not be a statement about a specific criterion in each degree of excellence. While there are no examples of this in Figure 3, on an evaluation form used for root planing there is a criterion for calculus removal. The only two assessments are Calculus Removed (Excellent) and Calculus Remaining (Standard Not Met). Through horizontal alignment, less time is spent searching for the criterion in each degree of excellence. Listing the criteria horizontally also leads the evaluator to consider all criteria for a given product/procedure. This is critical in ensuring that the learner receives complete feedback on the entire task. All too often, evaluators will only make a call for the poorest criterion of a given task, which fails to provide information on the rest of the criteria. Further, failure to make all the calls robs the course leader of valuable information that can be obtained when item analyses of performance examinations are 52 Journal of Dental Education Volume 72, Number 1

performed to provide the data to direct and support course improvement. 6. Criteria are consistently numbered. (Mackenzie et al. concerns addressed: checkpoint ambiguity and unsystematic inspection.) Numbering criteria consistently across all the degrees of excellence (Figure 3: Excellent, Clinically Acceptable, and Standard Not Met) provides the learner and the educator with much needed information in formative assessments to detect specific learner problems and to suggest remediation strategies. The learner can, and should, be encouraged to independently chart performance on each criterion over time. In so doing, the learner can identify specific areas for concentrated practice rather than just doing the procedure all over again. The faculty can also use the student-collected data to design specific instructional tasks for the individual learner to address any identified deficiency. Having clear information allows the faculty to design purposely focused remediation programs. Again, the course leader can also use the item analysis data to determine where instructional methods are failing to achieve the desired learner outcomes. Once identified, the course leader can identify whether the criterion was taught, where the criterion was taught, and how it was taught. The leader can then implement databased improvement strategies for the next iteration of the course. 7. Format facilitates process. (Mackenzie et al. concerns addressed: checkpoint ambiguity, faulty memory, incomplete coverage of dimensions, and unsystematic inspection.) Simply listing the criteria is not sufficient for either the learner or the faculty. As Mackenzie et al. point out, nothing should be left to chance: the more information given to the evaluator (whether a learner or a faculty member), the more likely the evaluator will perform in an acceptable fashion. The evaluation form should have columns for the steps, tests, criteria, and problem solving. By listing steps, the evaluator is led systematically through each criterion to ensure the entire process is considered. Having the tests listed also ensures that the evaluator is applying the criteria correctly. Providing a column or designated space for student-generated written statements is especially valuable. Requiring the learner to commit to writing what the observed problem is and to speculate on how to correct it provides the faculty with solid data on what the learner believes and understands. How many times have we identified a problem in a learner s product and have him or her agree with us, when we suspect the learner either didn t recognize the problem, couldn t recognize it, or wouldn t recognize it? By having the learner identify the problem and write the solution, the faculty member can quickly assess whether this is a problem in recognition of a problem, a misunderstanding of information, or perhaps a lack of critical relevant information. With this information in hand, a faculty member can either directly correct the deficiency immediately or design a piece of instruction for the learner that addresses the identified deficiency. 8. Levels of acceptable and unsatisfactory performance are visually distinguishable. (Mackenzie et al. concerns addressed: unstandardized aids to judgment and degrees of leniency.) While perhaps self-evident, this point is important to the learner in developing problem-solving skills. With a clear demarcation, the learner can efficiently identify the severity of clinically significant problems at a glance. Problem-solving skill is enhanced on two fronts. First, the learner can develop corrective strategies that address the specific deficiency. Equally beneficial, the learner can then assess the effect(s) of the corrective action on other criteria before initiating the correction. Learning is enhanced as the learner develops an understanding of the interrelationships between the criteria, i.e., how modification to improve one criterion may lead to a decline in quality in another. For example, in attempting to smooth a wall of a cavity preparation, the outline form extension may be affected. 9. Evaluation form is labeled appropriately. (Mackenzie et al. concerns addressed: checkpoint ambiguity and faulty memory.) The title of the evaluation form quickly alerts the evaluator to the task at hand, thereby eliminating confusion and directing focus. The importance of patient and provider identifiers (preferably chart numbers and identification numbers) is obvious. Date-stamping evaluation forms allows tracking progress on a specific task over time for both the learner and the faculty member. Labeling also provides evidence for breadth of experience, an important consideration in assessing competency. Finally, attending to these features will contribute to data collection for clinical research. It is also important to give instructions for the student and the faculty on how the evaluation form is to be used. Directions for completing the form should be clearly delineated for both the learner and the faculty to facilitate data collection related to common errors and criterion ambiguity. (See Figure 3, Instructions. ) January 2008 Journal of Dental Education 53

Patient Chart Number Student Name Date Instructions: Student completes each check point and obtains faculty initial. For each presentation, the student underlines each criterion immediately after making presentation and makes comments in the space provided below the criteria. Faculty circle the appropriate criteria for the student s performance and make comments as needed (must make comments for SNM). 1. Patient Interview Step Tests Check CC/HPI documented Patient understanding noted What do you have, and where do you have it? What have you noticed, and when did it begin (triggering factor or spontaneous)? What are the quality, frequency, duration, and intensity of the CC? What triggers, increases, and/or decreases the CC? Are there any other symptoms? What has been done by you or others, and what were the outcomes? Health history (med) general health frequency of med/dent visits risk factors present illnesses/hospitalizations/surgeries vital signs sleep patterns Current medications prescription and OTC/herbal/ vitamin/minerals actions/indications recorded reactions/precautions recorded allergies noted Health history (dent) general oral health frequency of dent visits recent treatments risk factors present dentifrice/mouthwash use Social history effects of CC profession/job financial situation marital status/children home location/life alcohol/tobacco/recreational hx caffeine/soft drink Family history sibling hx parental hx Presentation Criteria Excellent Clinically Acceptable Standard Not Met Presentation: 1. Significant findings and implications for tx presented and documented. 2. Demonstrated trust developed between interviewer and patient. 3. Well organized with findings developed. 4. Correct terminology, clear and concise presentation. Comments: 1. Significant findings presented, documented, and emphasized versus insignificant findings. 2. Patient comfortable but not forthcoming, though essential information obtained. 3. Organized, but relationships not fully developed. 4. Some use of lay terms, lack of clarity, or overly lengthy. 1. Lack of documentation, differentiation between signif and insignif findings, or omission of signif findings. 2. Patient ill at ease, and information is compromised. 3. Random presentation; findings not addressed. 4. Use of slang or incorrect terms, unclear, or incomplete presentation. Figure 3. Example of an evaluation form 54 Journal of Dental Education Volume 72, Number 1

2. Physical Assessment and Exam Visual cues overall physical condition symmetry of body/face movement extremities Extra-oral exam color/texture of skin eye/facial/neck movements palpable masses lymphadenopathy (sub-mental, sub-mand, cervical) cranial nn musculature and TM joints Intra-oral exam lips (vermillion and oral surfaces) gingival buccal mucosa salivary flow/consistency tongue (lat border, dorsum, ventral) floor oropharynx musculature dentition Lesion(s) location, size, uni/bilateral color, texture, consistency ulcerated, thickened painful/asymptomatic/numbness adjacent tissue 5. All extraoral findings correct. 6. All intraoral findings correct. 7. Signif findings and implications for tx presented and documented. 8. Well organized, with findings developed. 9. Correct terminology; clear and concise presentation. Comments: 5. Most extraoral findings correct (no effect on dx). 6. Most intraoral findings correct (no effect on dx). 7. Signif findings presented, documented, and emphasized over insignificant. 8. Organized, but relationships not fully developed. 9. Some use of lay terms, lack of clarity, or overly lengthy. 5. Extraoral findings incorrect; dx compromised. 6. Intraoral findings incorrect; dx compromised. 7. Lack of documentation, signif and insignif findings, omissions of findings. 8. Random presentation; findings not addressed. 9. Use of slang or incorrect terms, unclear, or incomplete. Presentation 3. Diagnosis or Differential Diagnosis Synthesize findings Identification of essential elements of history and examination and differentiation from noncritical information Summarize Coherence in the recapitulation of history and examination findings Differential Dx Generate working Dx 10. Evidence of synthesis; analysis supported by findings. 11. Well organized with findings developed. Comments 10. Minor errors in analysis not affecting tx. 11. Organized, but relationships not fully developed. 10. Error in thinking that led to inappropriate tx. 11. Random; findings not addressed. Presentation 4. Treatment Planning, Patient Presentation, and Consent Treatment plan Ideal and appropriate alternative tx plans developed Maintenance plan present and accurate Patient questions answered Patient informed consent signed Or 12. Ideal and appropriate alternative tx plans correct and complete with maintenance plan. 13. Patient s questions answered correctly, and informed consent documented. 12. Ideal and appropriate alternative tx plans correct but incomplete; need amplification. 13. Patient not fully clear about treatment plan; informed consent obtained. 12. Ideal or alternative tx plans inappropriate for dx;maintenance plan inadequate or missing. 13. Patient has many questions unanswered; informed consent missing. Referral plan Referral to whom Referral letter present Faculty interview Patient, student, and faculty signatures obtained 14. Procedures sequenced appropriately to ensure delivery and prognosis. 15. Maintenance plan completed, accurate, and ethical. Comments: 14. Slight variation in sequence not affecting prognosis. 15. Minor variances in timing not affecting prognosis. 14. Variation in sequence compromises tx plan and/or prognosis. 15. Maintenance plan not completed, accurate, and ethical. January 2008 Journal of Dental Education 55

10. Format is consistent with evaluation forms for other products and procedures. (Mackenzie et al. concerns addressed: checkpoint ambiguity, unsystematic inspection, differences in background, and differences in mental processing.) With the emphasis on comprehensive care in contemporary dental education, consistency is a vital consideration to address for both the faculty and the learner. The learner needs to grasp the salient features of each procedure to be mastered. By having a uniform format for all evaluation forms, the learner only needs to learn the format once. The learner then can focus and concentrate on the criteria for each procedure, a much more efficient learning strategy. Similarly, faculty teaching in a multidisciplinary environment do not need to struggle with learning various departmental evaluation form formats, focusing instead on understanding and applying the criteria on the form. D2 Grade Excellent Clinically Acceptable D3 Grade Excellent Clinically Acceptable D4 Standard Not Met A 8-15 XX 0 B 0-7 X 0 C XX XX 1 E XX XX 2 or more Grade Excellent Clinically Acceptable Standard Not Met A 10-15 XX 0 B 7-9 XX 0 C 0-6 XX 0 E XX XX 1 or more Standard Not Met A 12-15 XX 0 B 9-11 XX 0 C 0-8 XX 0 E XX XX 1 or more Figure 4. Grading scales for student performance Reliability: Establish Clarity 11. Number of degrees of excellence promotes high reliability. (Mackenzie et al. concern addressed: degrees of leniency.) This issue may appear to be a difficult one to resolve. At first consideration, one might be convinced that the fewer criterion categories there are, the higher the reliability will be. We would suggest that, for competency-based education, serious consideration should be given to having two categories: Clinically Acceptable and Standard Not Met. One can make a cogent argument for defining the line between clinically acceptable and unacceptable as the critical discrimination to be made in patient care. It can be argued, we believe, that having only two categories is all that is necessary for licensure examination determinations. In academia, many will suggest that because most institutions require discrete grades (as opposed to Pass/Fail systems) we need to separate Excellence from Clinically Acceptable. We have also heard this argument extend to suggesting that, without defining Excellent, students will not be motivated to achieve. A third consideration is that, without an Excellent category, the learner may be denied useful feedback on performance. Each of these considerations has merit and needs to be decided locally. There are those who might also argue that there should be two categories for Standard Not Met: those situations that are Unacceptable But Correctable and Unacceptable and Not Correctable. We would suggest that in a competency-based system this distinction not be made. The rationale is that a correctable error that is not corrected is still an error, threatens the success of the procedure, and is therefore a Standard Not Met. The key to determining the number of degrees of excellence is the ability to clearly define the parameters for each category. If three or four (or more) categories can be defined so well that reliability can be demonstrated, then use them. Reliability of assessment for both the faculty and the student is the desired outcome. However, we urge that writers do not try to force definitions that do not exist. For instance, if three categories can be well defined for most of the criteria, then use three, recognizing that one or two of the criteria within the form might only lend themselves to two categories. In these situations, we highly recommend that meeting the criterion be listed in the Excellent category and leave the Clinically Acceptable category blank (see example in number 5 above). 56 Journal of Dental Education Volume 72, Number 1

12. Degrees of excellence are operationalized. (Mackenzie et al. concerns addressed: faulty memory, untrained estimation of size, incomplete operational definition, inadequacy of verbal definitions, and differences in mental processing.) It has been said that always and never are words we should always remember never to use. The same admonition can be given for slightly, moderately, and severely. In writing criteria, the authors should operationalize each criterion, i.e., provide measurement ranges, positional relationships, texture statements, etc. (Figure 3, criterion 1). It is also most beneficial when training criteria have actual size (or video clip) examples of the target (the ideal), as well as examples of errors for discussion and discrimination. An excellent source of errors is student performance examination products from previous years. Using these past examination products provides value in that student products reveal almost every error in every degree of excellence, and the faculty evaluations are the answer keys. 13. Terminology is consistent. (Mackenzie et al. concerns addressed: faulty memory, inadequacy of verbal definitions, and definition ambiguities.) It is imperative that evaluation forms use consistent terminology. Learners are not only learning psychomotor skills, but they are also learning a profession s language. Using recognized terminology consistently reinforces the learners practice with the new language of the profession, especially if the evaluation process includes dialogue. For example, in describing the preparation of a tooth for restoration in amalgam, we would suggest consistent use of extension and over/underextension rather than extension followed by wide and narrow. Figure 3, criterion 4 addresses this very issue in an evaluation form used in oral medicine. 14. Tests are described specifically. (Mackenzie et al. concerns addressed: untrained estimation of size, unstandardized aids to judgment, unspecified methods of observing, discrepancies in visual acuity, inadequacy of verbal definitions, inadequate communication with nonverbal examples, and differences in background.) A key to reliability in evaluation is to know exactly how to apply each criterion. Specific instructions, included on the evaluation form, need to be given to the evaluator. Issues such as instruments to use, reference point, and/or method of observation need to be addressed. These instructions, useful for the faculty and the learner, can be powerful tools for identifying key learning issues to be addressed. Figure 3, criterion 10 is an example. 15. The set of criteria are broad enough to cover an entire range of tasks and clinical conditions. (Mackenzie et al. concerns addressed: incomplete coverage of dimensions, unspecified exceptions, incomplete operational definition, differences in background, and differences in mental processing.) To facilitate learning, it is helpful for the learner to understand the complete set of criteria for a clinical task. For example, in the concepts of the amalgam preparation, it is most likely easier for the learner to see all of the concepts and critical features on one evaluation form rather than having separate forms for Class I, II, V, etc. Having multiple evaluation forms that are narrowly focused can tend to parse a procedure to such an extent that the learner loses track of the interrelationships among the individual criteria. Having multiple evaluation forms also creates the potential for using an inappropriate form. Having the evaluation form cover the broad range of clinical conditions demands that educators look carefully to ensure that criteria selected are useful in promoting the application of clinical skills rather than producing widgets. For instance, are we interested in ensuring instrument control for effectiveness and safety or are we interested in creating a fourth finger rest? Summary Valid and reliable evaluation forms are an essential element in student learning and faculty calibration. Spending the time to develop evaluation forms will pay dividends in making progress toward achieving both outcomes. The evaluation form presented in this article is clearly one that involves a process. It was chosen to demonstrate the key features of the criteria for effective evaluation forms. To date, we have generated evaluation forms for restorative dentistry, periodontics, endodontics, radiology, dental hygiene, prosthodontics (fixed and removable), and oral surgical procedures as well as process forms like the one presented here. We believe that, once generated, criteria forms should be the basis of curriculum and course design. If evaluation forms are designed as suggested here, they will embody the learning objectives, the sequence of presentation, and the design of learning exercises. Effective evaluation forms will provide feedback to the learner, the supervising faculty, the course designer, and the curriculum manager as well. We would also suggest that design of evaluation January 2008 Journal of Dental Education 57

forms may prompt clinical and basic science research problems to address and solve. REFERENCES 1. Mackenzie RS, Antonson DE, Weldy PL, Welsch BB, Simpson WJ. Analysis of disagreement in the evaluation of clinical products. J Dent Educ 1982;46(5):284 9. 2. Knight GW. Toward faculty calibration. J Dent Educ 1997;61(12):941 6. 3. Haj-Ali R, Feil F. Rater reliability: short- and long-term effects of calibration training. J Dent Educ 2006;70(4): 428 33. 4. Taleghani M, Solomon ES, Wathen WF. Nongraded clinical evaluation of dental students in a competency-based education program. J Dent Educ 2004;68(6):644 55. 5. Commission on Dental Education. Accreditation standards for dental education programs. Chicago: American Dental Association, 2006. 6. Smith JM. A technology of reading and writing. Vol. 4: designing instructional tasks. New York: Academic Press, 1978. 7. Smith DEP. A technology of reading and writing. Vol. 1: learning to read and write a task analysis. New York: Academic Press, 1976. 8. Feil PH, Reed T. the effect of knowledge of the desired outcome on dental motor performance. J Dent Educ 1988; 52(4):198 201. 9. Knight GW, Guenzel PJ, Fitzgerald M. Teaching recognition skills to improve products. J Dent Educ 1990; 54(12):739 42. 58 Journal of Dental Education Volume 72, Number 1