Rubrics & Assessment Data Collection Making Things Good, Better, & Innovative. March 7, 2014 Presented by: Kathleen Voge & Kathi Trawver

Rubrics & Assessment Data Collection Making Things Good, Better, & Innovative March 7, 2014 Presented by: Kathleen Voge & Kathi Trawver

Session Topics Academic Assessment Committee Purpose Developing and Evaluating Rubrics Samples of Rubrics Methods/Trends in Assessment Data Collection Innovative Data Collection & Analysis Discussion and Sharing

AAC Purpose The Academic Assessment Committee (AAC) was created to provide peer leadership, support and review of academic assessment. The AAC serves as a cross-campus forum for the exchange of ideas, information and advice on methods and practices of academic assessment. FSAAC UAA Assessment

A Symbol for Wisdom & Knowledge Dimensions/Criteria Usage Rating Scales Descriptors

Why Use Rubrics? They help us Evaluate student performance Improve communication of expectations Improve transparency of what matters Enhance thinking and learning Increase grading/assessment objectivity

Structure and Components of a Typical Rubric (University of Colorado, 2014). A rubric is a matrix of criteria and their descriptors Generally contain three primary components Dimensions The left side of a rubric matrix lists the elements that make up the full criteria for the scale/performance standards Scale Across the top of the rubric matrix is the rating scale (including qualitative markers and numbers) that provides a set of values for rating the quality of performance for each criterion (levels of performance, milestones) Descriptors Under the rating scale provide examples or concrete indicators for each level of performance

Anatomy of a Rubric Rubric to Assess X Skills Exemplary 4 Satisfactory 3 Scale Developing 2 Unsatisfactory 1 Performance Indicator 1 Performance Indicator 2 Performance Indicator 3 Performance Indicator 4 Descriptor Descriptor Descriptor Descriptor Descriptor Descriptor Descriptor Descriptor Descriptor Descriptor Descriptor Descriptor Descriptor Descriptor Descriptor Descriptor Dimensions/Criteria Descriptors

Rubric Dimensions (Down the Left) Lists the performance indicators that make up the whole SLO (e.g., critical thinking = explanation of issues, evidence, influence of context and assumptions, student s perspective/hypothesis, and conclusions/outcomes) Should be Measurable Include what is most important to you Essential to the learning objective being measured Distinct and not overlapping Specific and clear

Rubric Scale (Across the Top) You want to read the aspiring/exemplar description first so you have the ideal in mind as you are reading the work (left to right) Rating scales should be Clear to everyone Include qualitative and quantitative descriptions Differentiated between scale points Able to be utilized consistently Disagreement number of points, inclusion of a neutral

Rubric Descriptors (The Cells) Rubric descriptions should Descriptors should be positively worded, rather than what isn t there what is present rather than what is absent = strengths vs. deficit based Descriptors should say what students are DOING Use verbs in developmental order (e.g., identify to evaluate) This also helps to determine if your what you consider exemplar is a part of the assignment/measure

Reporting Rubric Data Should measure SLOs consistently across evaluators (reliability) Should measure what it was designed to measure (validity) Treatment of data from rubrics Ordinal level data only Report percentages Use non-parametric tests (e.g., Spearman s Rank Coefficient for correlations) Report median distributions or percentage distributions, rather than reporting means

Evaluating Your Rubric Norming or Calibrating your measure helps you answer how well does the rubric Reflect what is really important to you in terms of student learning and competencies? Address performance levels/scoring? Describe performance levels? Utilize specific language? Provide utility/usefulness in its ratings? Perform in terms of inter-rater reliability

Rubric to Evaluate Your Rubric Example 1 Source: (Teaching, Learning, and Technology Group, 2002)

Criteria 4 Exemplary 3 Good 2 Acceptable 1 Unacceptable Clarity of criteria Each criteria is distinct, clearly delineated and fully appropriate for the assignment(s)/course Criteria being assessed are clear, appropriate and distinct Criteria being assessed can be identified, but are not clearly differentiated or are inappropriate Criteria being assessed are unclear, inappropriate and/or have significant overlap Distinction b/w levels Each level is distinct and progresses in a clear and logical order Distinction between levels is apparent Some distinction between levels is evident, but remain unclear Little/no distinction can be made between levels Reliability of scoring Cross-scoring of assignments using rubric results in consistent agreement between scorers There is general agreement between different scorers when using the rubric Cross-scoring occasionally produces inconsistent results Cross-scoring often results in significant differences Clarity of expectations/ guidance to learners Rubric serves as primary reference point for discussion and guidance for course/assignment(s) and evaluation of assignment(s) Rubric is used to explicitly introduce an assignment and guide learners Rubric is shared and provides some idea of the assignment/ expectations Rubric is not shared with learners Support of metacognition Rubric is regularly referenced and used to help learners identify the skills and knowledge they are developing throughout the course/ assignment(s) Rubric is shared and identified as a tool for helping learners to understand what they are learning through the assignment/ in the course Rubric is shared but no further reference is made to it in the course/ assignment(s) Learners do not see/know of the rubric Engagement of learner in rubric use Faculty and learners are jointly responsible for design of rubrics and learners use them in peer and/or self-evaluation Learners discuss and offer feedback/input into the design of the rubric, and are responsible for use of rubrics in peer and/or selfevaluation Learners offered the rubric and may choose to use it for self assessment Learners are not engaged in either development or use of the rubrics Scoring: 0-10 = needs improvement 11-15 = workable 16-20 = solid/good 21-24 = exemplary (TLT Group, 2002)

Rubric to Evaluate Your Rubric Example 2 Source: (Educational Testing Service, 2006)

Coverage/Organization (A - Covers the Right Content) (Educational Testing Service, 2006) Strong = 5 Medium = 3 Weak = 1 1. The content of the rubric represents the best thinking in the field about what it means to perform well on the skill or product under consideration. 2. The content of the rubric aligns directly with the content standards/ learning targets it is intended to assess. 1. Much of the content represents the best thinking in the field, but there are a few places that are questionable. 2. Some features don t align well with the content standards/learning targets it is intended to assess. 1. You can t tell what learning target(s) the rubric is intended to assess, or you can guess at the learning targets, but they don t seem important, or content is far removed from current best thinking in the field about what it means to perform well on the skill or product under consideration. 2. The rubric doesn t seem to align with the content standards/learning targets it is intended to assess. 3. The content has the ring of truth the content is truly what you look for when you evaluate the quality of performance or product. In fact, the rubric is insightful; it helps you organize your own thinking about what it means to perform well. 3. Much of the content is relevant, but you can easily think of some important things that have been left out or that have been given short shrift, or it contains an irrelevant criterion or descriptor that might lead to an incorrect conclusion about the quality of student performance. 3. You can think of many important dimensions of a quality performance or product that are not in the rubric, or content focuses on irrelevant features. You find yourself asking, Why assess this? or Why should this count? or Why should students have to do it this way?

Coverage/Organization (1B Criteria are Well Organized) Strong = 5 Medium = 3 Weak = 1 1. The rubric is divided into easily understandable criteria. The number of criteria reflects the complexity of the learning target. If a holistic rubric is used, a single criterion adequately describes performance. 2. The details that are used to describe a criterion go together; you can see how they are facets of the same criterion. 1. The number of criteria needs to be adjusted a little: either a single criterion should be made into two criteria, or two criteria should be combined. 2. Some details that are used to describe a criterion are in the wrong criterion, but most are placed correctly. 1. The rubric is holistic when an analytic one is better suited to the intended use or learning targets; or the rubric is an endless list of everything; there is no organization; the rubric looks like a brainstormed list. 2. The rubric seems mixed up descriptors that go together don t seem to be placed together. Things that are different are put together. 3. The emphasis on various features of performance is right things that are more important are stressed more; things that are less important are stressed less. 4. The criteria are independent. Each important feature that contributes to quality work appears in only one place in the rubric. 3. The emphasis on some criteria or descriptors is either too small or too great; others are all right. 4. Although there are instances when the same feature is included in more than one criterion, the criteria structure holds up pretty well. 3. The rubric is out of balance features of more importance are emphasized the same as features of less importance. 4. Descriptors of quality work are represented redundantly in more than one criterion to the extent that the criteria are really not covering different things.

Coverage/Organization (1C Number of Levels Fits Targets and Uses) Strong = 5 Medium = 3 Weak = 1 1. The number of levels of quality used in the rating scale makes sense. There are enough levels to be able to show student progress, but not so many levels that it is impossible to distinguish among them. 1. Teachers might find it useful to create more levels to make finer distinctions in student progress, or to merge levels to suit the rubric s intended use. The number of levels could be adjusted easily. 1. The number of levels is not appropriate for the learning target being assessed or intended use. There are so many levels it is impossible to reliably distinguish between them, or too few to make important distinctions. It would take major work to fix the problem.

Clarity (2A: Levels Defined Well) Strong = 5 Medium = 3 Weak = 1 1. Each score point (level) is defined with indicators and/or descriptors. A plus: There are examples of student work that illustrate each level of each trait. 2. There is enough descriptive detail in the form of concrete indicators, adjectives, and descriptive phrases that allow you to match a student performance to the right score. 3. Two independent users, with training and practice, assign the same rating most of the time. A plus: There is information on rater agreement rates that shows that raters can exactly agree on a score 65% of the time, and within one point 98% of the time. 1. Only the top level is defined. The other levels are not defined. 2. There is some attempt to define terms and include descriptors, but some key ideas are fuzzy in meaning. 3. You have a question whether independent raters, even with practice, could assign the same rating most of the time. 1. No levels are defined; the rubric is little more that a list of categories to rate followed by a rating scale. 2. Wording of the levels, if present, is vague or confusing. You find yourself saying, I m confused, or I don t have any idea what this means. Or, the only way to distinguish levels is with words such as extremely, very, some, little, and none; or completely, substantially, fairly well, little, and not at all. 3. It is unlikely that independent raters could consistently rate work the same, even with practice.

Clarity (2A: Levels Defined Well, Continued) Strong = 5 Medium = 3 Weak = 1 4. If counting the number or frequency of something is included as an indicator, changes in such counts really are indicators of changes in quality. 5. There is enough descriptive detail in the form of concrete indicators, adjectives, and descriptive phrases that allow you to match a student performance to the right score. 4. There is some descriptive detail in the form of words, adjectives, and descriptive phrases, but counting the frequency of something or vague quantitative words are also present. 5. Wording is mostly descriptive of the work, but there are a few instances of evaluative labels. 4. Rating is almost totally based on counting the number or frequency of something, even though quality is more important than quantity. 5. Wording is mostly descriptive of the work, but there are a few instances of evaluative labels.

Clarity (2B: Levels are Parallel) Strong = 5 Medium = 3 Weak = 1 1. The levels of the rubric are parallel in content if an indicator of quality is discussed in one level, it is discussed in all levels. If the levels are not parallel, there is a good explanation why. 1. The levels are mostly parallel in content, but there are some places where there is an indicator at one level that is not present at the other levels. 1. Levels are not parallel in content and there is no explanation of why, or the explanation doesn t make sense.

Rubric to Evaluate the Quality of Your Rubric Example 3 Source: SBE Design Team, 1997

Criteria Clearly Written Acceptable, but needs more clarity if used for high stakes assessment Needs to be Reworked Performance levels addressed Scoring guide is descriptive of each level of performance Scoring guide provides for different performance levels Scoring is open-ended Description of performance levels The descriptions define clear and significant differences between the performance levels Differences between the levels rely on looking for a number of examples or responses There are no specific descriptions of the different performance levels Language specificity The critical attributes between each level of performance are included Subjective words (good, excellent, some) are used to discriminate between levels, but are further defined Vague words are used to discriminate between levels: Some, many, few, good, excellent Usefulness Ratings provide useful instructional information Rating provide instructional information that needs further task analysis Ratings do not provide useful instruction

Methods & Trends in Assessment Data Collection Student self-efficacy surveys Indirect versus direct assessment Activity-based learning Computer-based assessment Online learning As trends in education constantly evolve so will assessment of learning!

Analyzing Collected Data Data can be compared to Previous assessment results Baseline data Existing standards Specific competencies/criterion

Trends in Rubrics & Data Collection Students like rubrics Rubrics promote transparency and consistency Summarized rubric results promote an ongoing dialogue about teaching and learning

Examples / Discussion Handouts Online resources

A Symbol for Wisdom & Knowledge Dimensions/Criteria Usage Rating Scales Descriptors

Online Resources Evaluating Rubrics, <http://business.fullerton.edu/centers/ CollegeAssessmentCenter/RubricDirectory/evaluatingrubrics.pdf>, accessed on February 3, 2014. Creating a Rubric, An Online Tutorial for Faculty, <http://www.ucdenver.edu/faculty_staff/faculty/center-for-facultydevelopment/documents/tutorials/rubrics/index.htm>, accessed on February 3, 2014. Practical Assessment, Research & Evaluation, <http://pareonline.net/getvn.asp?v=7&n=25>, accessed on February 3, 2014. Using Data to Guide Instruction and Improve Student Learning, <http://www.sedl.org/pubs/sedl-letter/v22n02/using-data.html>, accessed on February 3, 2014. VALUE: Value Assessment of Learning in Undergraduate Education, <http://www.aacu.org/value/rubrics/index_p.cfm>, accessed on March 5, 2014.

References Educational Testing Service. (2006). Creating and recognizing quality rubrics. Retrieved from http://www.ecu.edu/cs-educ/opd/upload/rubricforrubrics.pdf Teaching Learning and Technology Group. (2002). A rubric for rubrics: A tool for assessing the quality and use of rubrics in education. Retrieved from http://www.tltgroup.org/resources/rubrics/a_rubric_for_rubrics.htm Teaching, Learning, and Technology Group. (2006). Creating and recognizing quality rubrics. Retrieved from http://www.ecu.edu/cseduc/opd/upload/rubricforrubrics.pdf University of Colorado. (2014). Creating a rubric: An online tutorial for faculty. Retrieved from http://www.ucdenver.edu/faculty_staff/faculty/center-forfacultydevelopment/documents/tutorials/rubrics/1_what_is/index.htm