Quantitative Research Critique The article by Zane Olina and Howard J. Sullivan, Effects of Classroom Evaluation Strategies on Student Achievement and Attitudes, describes a quasi-experimental study investigating the effect of different evaluation strategies on student performance and attitude. During a 12- lesson instructional program called Learning Explorations, student classes were subject to one of three evaluation strategies: 1) no evaluation; 2) formative, teacher evaluations; or 3) formative, teacher plus formative, self-evaluations. The authors rated the final reports produced by the students and collected survey data to determine if these various assessment strategies resulted in differences in student performance and attitude. Review of the literature The literature review cites four studies dealing with the positive impact of formative evaluation on student performance. Two other studies are cited that show no effect of teacher evaluation on student performance. A list of effective evaluation characteristics are honed from five works and used by the authors in the development of their own teacher-evaluation instruments (Olina and Sullivan, 2002, page 63). Nine studies showing positive results from use of student selfevaluation methods were included in the review. One article comparing teacher evaluation to student self-evaluation and peer-evaluation methods found no performance differences among the three assessment models, but significant differences in motivational levels. (Olina and Sullivan, page 62) For this level of research the literature review was adequate. The authors also referenced previous literature to defend their design of treatment instruments (Olina and Sullivan, page 63), and again in their choice to use the quality of the students final reports as criteria measure (Olina and Sullivan, page 63)
Research hypotheses The environment of the Latvian school system is described to provide a general context for the research. It speaks to need and audience; little formative evaluation is practiced or encouraged in the Latvian schools. The authors provide a broad statement of purpose from which they develop three research questions to investigate. In the introduction to the article they identify the criterion measures that will be used to investigate the first two research questions dealing with student performance, but they fail to mention the post-survey that will be used to collect data on student attitudes. In the Criterion Measures section of the article two additional measures, student attitude surveys and teacher attitude surveys, are identified. The teacher attitude surveys do not seem to apply directly to any of the research statements. Broad Concepts Statement Specific Research Statements Criterion Measures The present study investigated the effects of teacher evaluation and student self-evaluation on student posttest scores, the quality of student research reports, and student attitudes. Does teacher evaluation have a positive effect on student performance? Does the combination of teacher evaluation and student self-evaluation have a different effect on student perfornance than teacher evaluation alone? Scores on post-test Quality of student research reports Scores on post-test Quality of student research reports Does the combination of teacher evaluation and student self-evaluation have a different effect on student attitudes than teacher evaluation alone? Student Attitude Surveys Teacher Attitude Surveys
Participants The description of the sampling method would have been much clearer if the authors had identified that the student classes used in the study were convenience samples intact classes taught by the six teachers involved. That would also have required that the authors identify the limitations of such groups: As it does not represent any group apart for itself, it does not seek to generalize about the wider population; for a convenience sample that is a irrelevance. The researcher, of course, must take pains to report this point that the parameters of generalizability in this type of sample are negligible. (Cohen, 2000, page 103) The authors state that the twelve classes selected in the study were representative of both rural and urban areas and varied socio-economic backgrounds. The classes were from five schools in different regions of Latvia. A significant problem that derives from this is that since the classes are treated as six subject groups with one teacher and two classes being assigned a particular treatment, before the study begins the 186 students are already divided into subject groups that are dissimilar. Since assignment of treatment is per teacher, two classes of rural students may be prescribed self-assessment as a treatment, whereas two classes of urban students may be prescribed no treatment. Differences in post-tests and post-surveys between these groups could stem from the treatments or from their respective urban/rural cultures. Procedure The application of treatments as described by the authors is unintelligible: In order to assign teachers to treatments, the researcher ranked all pairs of classes for each teacher from the highest achieving to the lowest achieving, based on the student 9th Grade Graduation Exam scores in mathematics and the Latvian language. The pairs of classes for each teacher were divided into high-achieving and low-achieving classes using a median split. Teachers with classes from each group were then randomly assigned to one of the three treatments. (Olina and Sullivan, page 66)
It is difficult to determine if the application of the treatments was indeed random. Ranking all pairs of classes for each teacher involves ranking two classes for each teacher. Next, the pairs of classes for each teacher were divided into high-achieving and low-achieving classes using a median split might mean that the students were actually reassigned within the classes but could also mean that two classes were labeled using a median score. Finally teachers with classes from each group were randomly assigned to one of three treatments. The wording here suggests that after dividing a pair of classes on a median split, there were teachers who somehow managed to not have a class from each group. At this point, anyone critiquing this study is forced to make his or her own assumptions and move on. This lack of clarity would certainly make it very difficult for someone to replicate the study. Before the experiment begins another extraneous variable is introduced that makes the three treatment groups unequal. All teachers received the same version of the instructional program. Teachers in the no-evaluation group received no additional instructions for use of the program. Teachers in the remaining two treatments received additional instructions describing the evaluation procedures that they were expected to complete for their evaluation condition. (Olina and Sullivan, page 66) Instructors, and possibly students, who received additional training on evaluation procedures, would have a better understanding of the objectives of the program and this could impact their performance and the results of the study. Instrumentation The four criterion measures used in this experiment are: scores on post-test, quality of student research reports, student attitude surveys, and teacher attitude surveys. The authors describe each of these in detail and refer to alignment with widely excepted standards such as the interrater reliability scores for rating of students projects (Olina and Sullivan, page 65) and
Cronbach s scores for internal reliability of the post-test (Page 65, para 2). That the teacher attitude survey does not address any of the specific research questions posed is the only significant weakness in this section of the survey. Results Tables showing statistical results for mean project report and posttest scores by treatments group are provided. Mean ratings for eight distinct statements on the student attitude survey are also provided and listed by treatment group. A narrative of responses from the teacher attitude surveys are provided although these responses don t address any of the specific research questions posed. Also, classroom observations are provided. Classroom observations were not put forth as a criteria measurements and the fact that visits took place brings up questions of how, if at all, they may have impacted the progress and results of the experiment. Discussion and Conclusions The discussion and conclusions provide recommendations for classroom application and future studies which are derived from the results. However, the limitations and lack of external validity posed by the choice of the convenience sampling and the confusing procedures listed for treatment application mean that the study may not be generalized or easily reproduced. Some limitations of the study, such as the lack of a self-assessment-only treatment, and the unfamiliarity of the teaching and assessment methods to the Latvian teachers are pointed out. However, the disparate instruction given to the treatment groups is not pointed out as a confounding variable. Overall, the study suffers from a lack of consistency among the convenience samples and a lack of control of independent variables.
References Cohen, Lousie; Lawrence, Manion, and Morrison, Keith. (2000) Research Methods in Education (5 th Ed). Routledge Farmer (London, England) Olina, Zane; Sullivan, Howard J. (2002) Effects of Classroom Evaluation Strategies on Student Achievement and Attitudes. Educational Technology Research and Development 50, no.3 pages 61-75 Perry, Lenora; Crocker, Robert. (2006). Module 2: Introduction to Quantitative Research. Education 6100: Research Designs and Methods in Education. Memorial University of Newfoundland. Retrieved Feb 14, 2007 from course lecture notes Perry, Lenora; Crocker, Robert. (2006). Module 3: Experimental and Quasi-experimental Research. Education 6100: Research Designs and Methods in Education. Memorial University of Newfoundland. Retrieved Feb 14, 2007 from course lecture notes