Investigating Test Bias via the Cleary Rule as Reinterpreted by Lautenschlager and Mendoza

Similar documents
The Impact of Formative Assessment and Remedial Teaching on EFL Learners Listening Comprehension N A H I D Z A R E I N A S TA R A N YA S A M I

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

12- A whirlwind tour of statistics

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Evaluation of Teach For America:

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Creating an Online Test. **This document was revised for the use of Plano ISD teachers and staff.

PowerTeacher Gradebook User Guide PowerSchool Student Information System

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Mathematics Success Level E

Probability and Statistics Curriculum Pacing Guide

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

learning collegiate assessment]

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Introduction to the Practice of Statistics

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

INSTRUCTOR USER MANUAL/HELP SECTION

A. What is research? B. Types of research

Understanding Games for Teaching Reflections on Empirical Approaches in Team Sports Research

Experience College- and Career-Ready Assessment User Guide

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

i>clicker Setup Training Documentation This document explains the process of integrating your i>clicker software with your Moodle course.

School Size and the Quality of Teaching and Learning

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

GDP Falls as MBA Rises?

Gender and socioeconomic differences in science achievement in Australia: From SISS to TIMSS

Multiple regression as a practical tool for teacher preparation program evaluation

Analyzing the Usage of IT in SMEs

Extending Place Value with Whole Numbers to 1,000,000

Are You Ready? Simplify Fractions

STUDENT MOODLE ORIENTATION

Ryerson University Sociology SOC 483: Advanced Research and Statistics

New Features & Functionality in Q Release Version 3.1 January 2016

Radius STEM Readiness TM

Storytelling Made Simple

Getting Started with TI-Nspire High School Science

Longman English Interactive

Excel Intermediate

Sociology 521: Social Statistics and Quantitative Methods I Spring 2013 Mondays 2 5pm Kap 305 Computer Lab. Course Website

American Journal of Business Education October 2009 Volume 2, Number 7

Odyssey Writer Online Writing Tool for Students

Moodle Student User Guide

2 User Guide of Blackboard Mobile Learn for CityU Students (Android) How to download / install Bb Mobile Learn? Downloaded from Google Play Store

AP Statistics Summer Assignment 17-18

Cal s Dinner Card Deals

Grade 6: Correlated to AGS Basic Math Skills

Predicting the Performance and Success of Construction Management Graduate Students using GRE Scores

Discovering Statistics

How to set up gradebook categories in Moodle 2.

INTERMEDIATE ALGEBRA PRODUCT GUIDE

Using SAM Central With iread

An Introductory Blackboard (elearn) Guide For Parents

Analysis of Enzyme Kinetic Data

Psychometric Research Brief Office of Shared Accountability

MyUni - Turnitin Assignments

Changing Majors. You can change or add majors, minors, concentration, or teaching fields from the Student Course Registration (SFAREGS) form.

Houghton Mifflin Online Assessment System Walkthrough Guide

What is related to student retention in STEM for STEM majors? Abstract:

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

LMS - LEARNING MANAGEMENT SYSTEM END USER GUIDE

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

TK20 FOR STUDENT TEACHERS CONTENTS

STUDENT SATISFACTION IN PROFESSIONAL EDUCATION IN GWALIOR

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Math 96: Intermediate Algebra in Context

Lecture 1: Machine Learning Basics

MOODLE 2.0 GLOSSARY TUTORIALS

(Includes a Detailed Analysis of Responses to Overall Satisfaction and Quality of Academic Advising Items) By Steve Chatman

Using Proportions to Solve Percentage Problems I

Discovering Statistics

NCEO Technical Report 27

Practices Worthy of Attention Step Up to High School Chicago Public Schools Chicago, Illinois

The Relationship Between Tuition and Enrollment in WELS Lutheran Elementary Schools. Jason T. Gibson. Thesis

Intermediate Algebra

Research Design & Analysis Made Easy! Brainstorming Worksheet

ROA Technical Report. Jaap Dronkers ROA-TR-2014/1. Research Centre for Education and the Labour Market ROA

Workshop Guide Tutorials and Sample Activities. Dynamic Dataa Software

Effectiveness of McGraw-Hill s Treasures Reading Program in Grades 3 5. October 21, Research Conducted by Empirical Education Inc.

Test How To. Creating a New Test

Do multi-year scholarships increase retention? Results

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

Does the Difficulty of an Interruption Affect our Ability to Resume?

Hierarchical Linear Models I: Introduction ICPSR 2015

Sheila M. Smith is Assistant Professor, Department of Business Information Technology, College of Business, Ball State University, Muncie, Indiana.

DIDACTIC MODEL BRIDGING A CONCEPT WITH PHENOMENA

PSIWORLD Keywords: self-directed learning; personality traits; academic achievement; learning strategies; learning activties.

Preparing for the School Census Autumn 2017 Return preparation guide. English Primary, Nursery and Special Phase Schools Applicable to 7.

Mehran Davaribina Department of English Language, Ardabil Branch, Islamic Azad University, Ardabil, Iran

Mathematics. Mathematics

Relationships Between Motivation And Student Performance In A Technology-Rich Classroom Environment

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

ScienceDirect. Noorminshah A Iahad a *, Marva Mirabolghasemi a, Noorfa Haszlinna Mustaffa a, Muhammad Shafie Abd. Latif a, Yahya Buntat b

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Race, Class, and the Selective College Experience

Transcription:

via the Cleary Rule as Reinterpreted by Lautenschlager and Mendoza Prepared by Greg Hurtz, Ph.D., California State University, Sacramento For, Inc. Feb. 2007 Investigating Test Bias The Cleary Rule stems from an article by Cleary (1968) where she stated that test bias can be evaluated by testing two hypotheses with respect to the linear relation between test scores and a criterion measure: (1) Equality of slopes, and then (2) equality of intercepts (given that slopes are equal). Cleary used the mathematics of the Analysis of Covariance (ANCOVA) to test these hypotheses. ANCOVA tests for group differences on a criterion variable after partialling out a covariate. For the test bias analysis, the criterion variable is a measure of job performance, the covariate is the test score, and the groups (e.g, male vs. female) are represented by a dummy variable. An assumption of ANCOVA is that the slope of the line from regressing the criterion onto the covariate is equivalent across the groups. Stated another way, it is assumed that there is no group-by-covariate interaction, meaning that the covariate is related to the criterion to equivalent degrees across the groups. Thus, by Cleary s example, we should first test whether there are slope differences across groups. If there are slope differences, then we cannot test for intercept differences for the same reasons that ANCOVA holds the homogeneity of regression lines assumption (but see Lautenschlager & Mendoza, 1986 discussion, below); if there are no slope differences, then we proceed to testing for intercept differences by looking for covariate-adjusted group mean differences in ANCOVA. These mean differences are the intercept differences. If there are no such group differences, then the regression of the criterion (i.e., job performance) on the covariate (i.e., test scores) is equivalent across groups, and the conclusion is that there is no bias. Later writings on the Cleary Rule have framed it within the context of moderated multiple regression (MMR) analysis in the Cohen and Cohen (1975) tradition, but this is mathematically identical to the ANCOVA approach because the moderator is a dummy variable. Thus, we test for slope differences with a test-by-group product term, and intercept differences with the group dummy code. Following Cleary s example we would test for the interaction first (later named slope bias ), and then if there is no interaction we would test for intercept differences (later named intercept bias ). In absence of either type of difference, we conclude there is no bias. Lautenschlager and Mendoza s Reframing of Test Bias Analysis Lautenschlager and Mendoza (1986) discuss testing for test bias via regression analysis as either a step up or step down process. They suggest that Cleary, or at least later users of the Cleary method who followed the Cohen and Cohen (1975) tradition for MMR, followed a step up process where in essence intercept differences were tested first, before slope differences. Bias was tested by adding terms to the regression equation and testing for significance, rather than starting with a complete interaction model and dropping terms to test for significance. The description of the Cleary Rule in Nunnally and Bernstein (1994) is evidence of this claim, as it clearly describes a step up approach. Lautenschlager and Mendoza (1986) suggested that a step down procedure should be used that begins with the full interaction model. In concept, this is actually consistent with what Cleary had suggested Page 1 of 9

from the start (i.e, testing slope differences, and then testing intercept differences if there are no slope differences). However, Lautenschlager and Mendoza s approach does not start with an isolated test of slope differences followed by an isolated test of intercept differences; it starts with an omnibus test of whether there are slope and/or intercept differences. Due to the reduction in the sum of squared error for the test, they suggest this test will have more power to detect bias than will separate tests of slope and intercept differences. Thus, there will be less likelihood of making a Type II error when concluding that no bias is present. Carrying out Lautenschlager and Mendoza s Approach in SPSS In order to carry out Lautenschlager and Mendoza s (1986) steps in SPSS, the following steps should be taken which follow their flowchart presented in their article. See also the adapted flowchart at the end of this document, which outlines the analysis and decision steps pictorially. 1. Create a dummy code for the group variable (e.g., gender, ethnicity) 2. Create your product term by multiplying test scores by the dummy code 3. Enter the performance criterion as your dependent variable in the regression analysis 4. Conduct the omnibus test: In Block 1 enter test scores, and in Block 2 enter the dummy code and the product term together. Click the Statistics button, and select the R squared change option to provide the significance test of the change in R 2. a. If the change in R 2 between Block 1 and Block 2 is NOT significant, this suggests no test bias; there are no slope or intercept differences. The analysis stops. b. If the difference in R 2 between Block 2 and Block 1 IS significant, this suggests either slope or intercept bias is present (or perhaps both). If this answer is good enough, the analysis can stop. However, in order to understand the nature of the bias, further exploration is needed into whether slopes, intercepts, or both differ. Page 2 of 9

Model Summary Model 1 2 Change Statistics Adjusted Std. Error of R Square R R Square R Square the Estimate Change F Change df1 df2 Sig. F Change.782 a.612.606 10.925.612 110.290 1 70.000.825 b.681.667 10.041.070 7.430 2 68.001 a. Predictors: (Constant), TestScore Significant; therefore, there is some form of bias. i. Test for slope differences. This is basically Cleary s first step. To run this test, move the group dummy code from Block 2 to Block 1 in your SPSS screens. So, you should now have test scores and the dummy code in Block 1, and the product term alone in Block 2. Rerun the analysis. w the test for the change in R 2 tests for the presence of slope differences. The next step depends on whether or not there are slope differences at this stage. b. Predictors: (Constant), TestScore, Sex, SexByScore 1. If there are NO slope differences (i.e., the change in R 2 is non-significant), test for intercept differences. Drop the product term from the analysis, and move the dummy code to Block 2. With test scores in Block 1 and the dummy code alone in Block 2, the change R 2 will now test for the presence of intercept differences. Page 3 of 9

a. If there ARE intercept differences (i.e., the change in R 2 is significant), evaluate the effect size (i.e., the change in R 2 ) and the practical significance of the score differences in terms of observed score units before crying bias too loudly. b. If there are NO intercept differences (i.e., the change in R 2 is nonsignificant), this contradicts the omnibus test. This is an unlikely event; however, given the suggestion that the omnibus test has more power it probably would not be wise to conclude no bias at this stage. Instead, you probably should conclude that bias of unknown form exists. Analysis of the R 2 values may give some insight into where the bias lies even if the significance test did not pinpoint it. 2. If there ARE slope differences (i.e., the change in R 2 is significant), test for the presence of simultaneously occurring intercept differences. In Block 1, enter test scores and the product term. In Block 2, enter the dummy code. w, the change in R 2 statistic tests for intercept differences in the presence of the slope differences. Page 4 of 9

a. If there ARE intercept differences (i.e., the change in R 2 is significant), then the conclusion is that both slope bias and intercept bias is present. Evaluate the effect size and practical significance of the score differences in terms of observed score units before crying bias too loudly. b. If there are NO intercept differences (i.e., the change in R 2 is nonsignificant), then there is only slope bias. Evaluate the effect size and practical significance of the score differences in terms of observed score units before crying bias too loudly. Subgroup Scatterplots If any differences are found it will probably be useful to view the subgroup scatterplots and regression lines in order to visualize the differences. To get a single scatterplot with both subgroups on it, use the Interactive Graph procedure in SPSS (Graphs menu, Interactive submenu, scatterplot option). Drag and drop the performance criterion measure from the variable list to the vertical axis box. Then drag and drop the test score variable to the horizontal axis box. Finally, drag and drop your group variable (e.g., sex) into the style box. It may ask if you want to convert that variable to a categorical variable; click yes. Next, to add the regression lines click the Fit tab and choose the Regression method from the drop-down menu. Then check the Subgroups box toward the bottom of the screen (and deselect Total ). After you click OK and generate the graph, you may need to double-click it and move the equation fields so they can be viewed more easily. Page 5 of 9

JobPerformance 120 100 JobPerformance = 54.97 + 1.97 * Score R-Square = 0.51 Sex 0 1 Linear Regression 80 60 JobPerformance = 36.64 + 2.82 * Score R-Square = 0.78 40 5 10 15 20 TestScore Relation to Other Tests It should be noted that Lautenschlager and Mendoza s (1986) initial omnibus test is the identical F test as that derived by Chow (1960), commonly referred to as the Chow test. Serlin and Levin (1980) also Page 6 of 9

provide an identical F test for the same purpose; to determine whether slopes, intercepts, or both differ across a set of groups. The omnibus test is therefore well-documented. Lautenschlager and Mendoza (1986) also suggest that follow-up analyses exploring regions of significance may be fruitful in determining whether the regressions differ within the range of test scores where decisions are made. It may be the case that the lines differ at the low end of the continuum, but do not differ appreciably in the upper end where hiring/promotion/etc. decisions are made. If the regions of significance calculation suggests this situation to be true, then the apparent bias may not ever influence any decisions. This could be valuable information in the defense of a test. Serlin and Levin (1980) discuss alternative methods of calculating regions of significance. References Chow, G. C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica, 28, 591 605. Cleary, T. A. (1968). Test bias: Prediction of grades of negro and white students in integrated colleges. Journal of Educational Measurement, 5, 115 124. Cohen, J., & Cohen, P. (1975). Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum. Lautenschlager, G. J., & Mendoza, J. L. (1986). A step-down hierarchical multiple regression analysis for examining hypotheses about test bias in prediction. Applied Psychological Measurement, 10, 133 139. Serlin, R. C., & Levin, J. R. (1980). Identifying regions of significance in aptitude-by-treatment interaction research. American Educational Research Journal, 17, 389 399. Page 7 of 9

Flowchart of Test Bias Analysis Adapted from Lautenschlager and Mendoza (1986) Page 8 of 9

Test the Hypothesis of no Bias Block 1: Scores Block 2: Dummy, Dummy*Scores Is R 2 sig? Is It Slope Bias? Block 1: Scores, Dummy Block 2: Dummy*Scores Bias (but consider whether power was too low to detect it) Is R 2 sig? Is There Also Intercept Bias? Block 1: Scores, Dummy*Scores Block 2: Dummy Is It Intercept Bias? Block 1: Scores Block 2: Dummy Is R 2 sig? There are Slope & Intercept Differences There are Slope Differences Only Is R 2 sig? There are Intercept Differences There Are Slope or Intercept Differences (unlikely, given earlier rejection of omnibus hypothesis of no bias) Call it Bias Are the Differences in Predicted Scores Practically Meaningful? Real Bias Page 9 of 9