THE IMPACT OF DIFFERENT APPROACHES TO SCHOOL SELF EVALUATION UPON STUDENT ACHIEVEMENT: A GROUP RANDOMIZATION STUDY ABSTRACT

THE IMPACT OF DIFFERENT APPROACHES TO SCHOOL SELF EVALUATION UPON STUDENT ACHIEVEMENT: A GROUP RANDOMIZATION STUDY ABSTRACT Background Improving the quality of education is currently a central concern of educational policy in many countries. Several countries created or are working on legislation and monitoring in the field of School Self Evaluation (SSE), which stresses the schools own responsibility for quality (Hofman, Hofman & Gray, 2010). Barber (1996) argues that the essence of a successful organization is the search for improvement and that effective self-evaluation is the key to it. However, a critical review and synthesis of existing research on SSE reveals that quantitative studies should be conducted in order to examine the relationship between SSE and school effectiveness as it is measured through the performance of students (Kyriakides & Campbell, 2004). Such studies might identify the extent of contribution to school improvement that is made by SSE and may also identify which of the main approaches to establishing SSE mechanisms is more effective. The first approach to SSE is related to the assumption that the involvement of school stakeholders in defining the criteria of SSE may eventually encourage their active participation in using SSE for improvement purposes (Macbeath, 1999). Teacher participation in school-level decision making has been advanced for a wide variety of reasons (Smylie, Lazarus & Brownlee-Conyers, 1996). Most often, participation is thought to enhance communication among teachers and administrators and improve the quality of educational decision making. Participation has been promoted on the basis of ethical arguments for "professionalizing" teaching and "democratizing" school workplaces (Murphy & Beck, 1995). However, Mijs, Houtveen, Wubells and Creemers (2005) conducted a metaanalysis of successful programs for school improvement and provided support for using a systematic approach to change directed at internal conditions with respect to teaching and learning and to support at the school level aiming to improve the quality of teaching and learning. No evidence for the fact that the content of the improvement program has to be developed by the school itself was generated. The second approach is concerned with the establishment of a climate in the school that supports change. This approach is not only based on findings of school improvement projects but also on the view of schools as mini political systems with diverse constituencies. The terminology micropolitics of education emerged in clearly articulated form in the research literature within the past 30 years (Hoyle & Skrla, 1999). Micropolitics recognize divergence of interests, multiple sources of power, and the potential for conflict (Ball, 1987). Such a lens allows for the possibility that coalitions and conflict may occur both across and within organizations such as schools (Firestone & Fisler, 2002). Therefore, the development and the adoption of a SSE system is not just a technocratic affair, though it is determined by political influences (Stronge & Tucker, 2000). When introducing SSE the various constituencies groups within education which have the ability to influence educational arrangements may try to promote their own interests in order to increase their professional power. Therefore, the success of SSE may be determined by the ways used to design and introduce the use of SSE for improvement purposes. This is because beyond, and of course before, setting the layout of a SSE mechanism, it is important to estimate, recognize and take into account those factors that might influence the constituencies proposals, with regard to the framework (content) of the SSE system that lead certain constituencies teams to vigorous resistance against the proposed SSE mechanism. Thus, this approach to SSE is based on the assumption that the concerns of the various stakeholders about SSE should be faced and 1

reduced before encouraging them to establish their own SSE mechanisms and design relevant improvement strategies and actions (Kyriakides & Demetriou, 2007). The third approach is based on the assumption that the knowledge-base of Educational Effectiveness Research (EER) should be taken into account in developing SSE mechanisms (Teddlie & Reynolds, 2000; Teddlie & Stringfield, 2007). A major element of this approach is the emphasis on the evidence stemming from theory and research. Thus, the value of a theory-driven approach is stressed. The need to collect multiple data about student achievement, and the classroom and school processes is emphasized by making use of a theoretical framework based on the main findings of EER. For the purposes of this study, the dynamic model of educational effectiveness (Creemers & Kyriakides, 2008) is used as a framework for establishing SSE mechanisms since it was developed in order to establish stronger links between EER and improvement of practice. Moreover, a series of studies provided support to the validity of the model (e.g., Creemers & Kyriakides, 2010; Kyriakides & Creemers, 2008, 2009). A distinctive feature of the dynamic model is that it does not only refer to factors that are important for explaining variation in educational effectiveness but it also attempts to explain why these factors are important by integrating different theoretical orientations to effectiveness. In this way, teachers and other school stakeholders involved in improvement efforts may become aware of both the empirical support for the factors involved in their project and the way these factors operate within a conceptual framework. Through this approach, school stakeholders are also offered the opportunity to use in a flexible way this knowledge-base, adapt it to their specific needs, and develop their own strategies for school improvement (Heck & Moriyama, in press). Purpose An experimental study concerned with the use of the three main approaches of establishing SSE mechanisms has been conducted. The main aim of this study was to identify the effect of each of these approaches of establishing SSE mechanisms on student achievement in mathematics. Intervention During the school year 2007-2008, a sample of 60 primary schools of Cyprus was selected. The school sample was randomly split into four groups. Different types of support were provided to the first three groups of schools to establish SSE mechanisms for improvement purposes whereas no SSE mechanism was established in the schools of the fourth group. In regard to the different treatments offered to the other three groups of schools, each of them was in line with one of the main approaches to SSE mentioned above. In each school of the first group, group interviews with each group of school stakeholders (i.e., teachers, parents and students) were initially conducted. A questionnaire on the appropriateness of different criteria of SSE emerged from each interview and was administered to the relevant population of each group of school stakeholders. Analysis of data helped us to identify criteria of SSE which were considered important by all groups of stakeholders. By conducting a group interview with representatives of teachers, parents, and students, each school managed to develop its own SSE mechanisms. Based on the results of SSE, stakeholders of each school designed their own strategies and action plans for school improvement purposes. The second group of schools followed the same process to design SSE mechanisms as the first one but before introducing this approach the stakeholders of these schools were encouraged to express their concerns about SSE and exchange them with each other. For this reason, group interviews were conducted and support was offered to stakeholders in order to face and reduce their concerns about SSE. At the next stage, support was provided to the 2

schools of the second group to establish their own SSE mechanisms by generating their own criteria of SSE and draw data in relation to these criteria in order to identify priorities for improvement and design their strategies and action plans. Finally, the third group was asked to establish SSE mechanisms which are in line with the knowledge-base of EER as this is reflected in the dynamic model. Beyond presenting the dynamic model and its assumptions to the various stakeholders of these schools, the instruments used to test the validity of the model were administered and data about the improvement priorities of each of these schools were collected. The results of this investigation were presented to the school stakeholders and they were encouraged to design school improvement initiatives in such a way that one of the first three priorities of their schools could be addressed. Guidelines on actions and strategies that could be considered in designing their improvement strategies were also offered to the school stakeholders. Finally, all three groups of schools were asked to develop mechanisms in order to monitor the implementation of their school improvement plans and the research team was available to provide support, acting as the critical friend of the school (MacBeath, 1999). Data collection and analysis Multilevel modeling techniques were used in order to measure the impact of these three approaches to achievement in mathematics of grade 4 and grade 5 students of each school. It was also examined whether variation in student achievement can be explained by explanatory variables that are associated with the effective implementation of a school-based reform. Our interest to take into account the impact of these variables arises from the fact that evaluation studies of reform efforts reveal that irrespective of the nature/content of the reform that is implemented in schools there is variation in the ability of teachers and schools to implement the reform (Worthen et al., 1997). Thus, in our effort to compare the impact of the three approaches to student achievement, we also collected data on school factors which may have an effect on the attempt of schools to use SSE for improvement purposes. The dependent and the explanatory variables of this study are presented below. Dependent Variable (Student achievement in mathematics): Curriculum-based written tests in Mathematics were administered to all grade 4 and grade 5 students of our school sample (n=4212) at the beginning and at the end of school year 2007-2008. The written tests were subject to control for reliability and validity. Test equating was done by using IRT modeling. The method of equating follows the same procedure as that used in PISA studies (OECD, 2002). However, in PISA equating is horizontal whereas in this study the equating was vertical (see Demetriou, 2009). Estimation was made by the Extended Logistic Model of Rasch (Andrich, 1988). Since each scale had satisfactory psychometric properties, two scores for each student at the beginning and at the end of the intervention were generated, by calculating the relevant Rasch person estimate. Student Background Factors: Information was collected on two student background factors: sex, and socio-economic status (SES). Five SES variables were available: father s and mother s education level, the social status of father s job, the social status of mother s job and the economical situation of the family. Relevant information for each child was taken from the school records. School climate: A questionnaire measuring teachers perceptions towards the climate of their school was administered to all teachers of the school sample (n=1316) and a high response rate (76.9%) was obtained. A first-order Confirmatory Factor Analysis model provided support to the construct validity of the questionnaire. Although the scaled chisquare for the two factor structure (X 2 =108.1, d.f.=53, p<.001) as expected was statistically significant, the values of RMSEA (0.031) and CFI (0.969) met the criteria for acceptable level of fit. Thus, a decision was made to consider the two-factor structure as reasonable and 3

to estimate the factor scores concerned with the extent to which (i) the school gives emphasis to achievement and (ii) a climate of openness and trust can be observed in the school. One way analysis of variance revealed that the variation in teachers responses to each factor score was substantially greater between schools rather than within schools (p<.001). Thus, the aggregated factor scores at the level of school were treated as explanatory variables. The priority area for which school improvement efforts took place: The content of the priority area for which school improvement efforts took place in each school was classified into two groups by investigating whether each priority area was in line with the factors included in the dynamic model. Schools of the first and second experimental group were not aware of the dynamic model but some of the schools of these two groups identified priority areas which were in line with the factors included in the dynamic model. By investigating the effect of this dummy variable upon student achievement, one can identify whether the impact of the treatments offered to the first two groups of schools depends on the extent to which their chosen priority area was in line with the dynamic model. Implementation effort: Since one of the main threats to the internal validity of experimental studies has to do with the extent to which all the groups put the same amount of effort in achieving the schooling outcomes (i.e., this threat is known in the literature as the threat of experimental mortality ), different sources of data were used to find out the extent to which each participating school had put effort to implement its improvement strategies. Specifically, we conducted content analysis of the reflective diaries that school stakeholders kept in order to identify the extent to which each school put efforts to use SSE for improvement purposes. Moreover, the constant comparative method was used to analyze data emerged from interviews with head teachers, school coordinators, and teachers. These interviews were concerned with the experiences and views of school stakeholders about the implementation of the intervention that took place in their schools. The analysis of the qualitative data from each source of data helped us generate a scale measuring the extent to which schools put efforts to implement their improvement strategies and action plans (see Demetriou, 2009). Specifically, the Extended Logistic model of Rasch was used in order to identify the extent to which measures emerged by each source of data could be reducible to a common unidimensional scale. The Rasch model does not test only the unidimensionality of the scale but also is able to find out whether the items of each source of data can be ordered according to the degree of their difficulty and at the same time the schools can be ordered according to their performance in the construct under investigation. The model was initially applied on the whole sample of schools and all 21 measures concerned with their involvement in the intervention using the computer program Quest (Adams & Khoo, 1996) but two items did not fit the model. The results of the various approaches used to test the fitting of Rasch model to our data revealed that there was a good fit to the model when schools performance in the other 19 tasks were analyzed. Moreover, the indices of schools and of items separation were found to be higher than 0.85 indicating that the separability of the scale is satisfactory (Bond & Fox, 2001). Thus, the Rasch person estimates were used to estimate the effort that each school put in implementing the intervention. Results Table 1 illustrates the results of the multilevel analysis conducted in order to measure the impact of each of the three approaches of SSE on student achievement. In model 1 the context variables at each level were added to the empty model. This model reveals that the effects of all contextual factors (i.e., SES, prior knowledge, sex) were statistically significant. Prior knowledge was the only contextual variable which had a significant effect on student achievement when aggregated at the classroom or the school level. In model 2 the school explanatory variables which are concerned with the school climate were added to model 1. 4

Only the factor which refers to the extent to which pressure for success was put to teachers and students was found to be associated with student achievement. In model 3 the impact of the three school improvement approaches was tested by adding to model 2 three dummy variables. By considering the first treatment as a reference group, it was found out that the first and the second group managed to receive similar results and better than the control group (effect size=0.16) 1. However, the third treatment group, concerned with the use of EER for establishing SSE mechanisms, had better results than any other group (effect size = 0.35). At the next step, we attempted to identify any variable that may explain the fact that the three approaches to school improvement had a differential impact upon achievement in mathematics. For this reason, we conducted a multilevel analysis of mathematics achievement of all students but those participating in the control group. The results of this analysis are presented in Table 2. By comparing the figures of this table with those of Table 1, we can observe that the same variables which were found to be associated with achievement of the whole sample are also associated with achievement of students in the three experimental groups. In addition, model 2 reveals that the third approach had a stronger effect than the other two approaches to SSE. Furthermore, model 3 revealed that the effect of the third approach was still stronger than the other two approaches even when other explanatory variables concerned with the implementation of the school improvement approaches were taken into account. Experimental mortality was found to be associated with student achievement. As it was expected, schools which put more effort to use SSE for improvement purposes were more effective. However, by taking into account this variable, the added value of the third approach was not influenced since the effect sizes emerged from models 2 and 3 are very similar (i.e, 0.20 and 0.17 respectively). Finally, it was found that the schools which had the smallest effect on student achievement were those schools of the first and second experimental group which designed an improvement strategy which was not in line with any of the factors of the dynamic model. Conclusions Beyond the fact that all three experimental groups had better results than the control group, implying that SSE can contribute in establishing effective school improvement strategies, the third approach to SSE had the strongest impact. The essential difference of the third approach has to do with the fact that a specific theoretical framework guided the design of the SSE mechanisms. Moreover, the schools of this experimental group were asked to develop their improvement strategies and action plans by taking into account the evidence of EER which show how the functioning of the relevant factors could be improved. These findings seem to reveal that the dynamic model may contribute in the establishment of effective SSE mechanisms since not only the knowledge in the field about what works in education and why was offered to the school stakeholders but also support was provided to schools in order to identify their priorities for improvement and design their strategies and actions to improve relevant school factors and ultimately improve their effectiveness. It also is acknowledged that the effect size of each approach to SSE is relatively small. However, this result is in line with results of evaluation studies measuring the impact of interventions in education which show that during the early phases of interventions their impact on student achievement is relatively small (Konstantopoulos & Hedges, 2008; Slavin, Lake & Groff, 2009). There is a need for longitudinal studies involving both quantitative and qualitative research methods, which could provide answers on questions dealing with the effect of each approach to SSE such as its duration, and the contributory and inhibitory factors to duration. Such studies could also look at both the short and long term effect of each approach to SSE and help take decision on how to support schools establish SSE mechanisms which will have a significant and lasting impact on improving their effectiveness. 5

End Note The fixed effects obtained with multilevel analysis can readily be converted to standardized effects or Cohen s d by dividing them by the standard deviations in the treatment groups. References Adams, R.J., & Khoo, S. (1996). Quest: The interactive test analysis system, Version 2.1. Melbourne: ACER. Andrich, D. (1988). A general form of Rasch s Extended Logistic Model for partial credit scoring. Applied Measurement in Education, 1(4), 363-378. Ball, S.J. (1987). The micro-politics of the school: Towards a theory of school organization. London: Methuen / Routledge & Kegan Paul. Barber, M. (1996). The learning game: Arguments for an Education Revolution. London: Victor Gollanz. Bond, T.G., & Fox, C.M. (2001). Applying the Rasch model: Fundamental Measurement in the Human Sciences. Mahwah, NJ: Lawrence Erlbaum Associates. Creemers, B.P.M., & Kyriakides, L. (2008). The dynamics of educational effectiveness: a contribution to policy, practice and theory in contemporary schools. London and New York: Routledge. Creemers, B.P.M., & Kyriakides, L. (2010). School factors explaining achievement on cognitive and affective outcomes: Establishing a dynamic model of educational effectiveness. Scandinavian Journal of Educational Research, 54(1), 263-294. Demetriou, D. (2009). Using the dynamic model to improve educational practice. Unpublished doctoral dissertation, University of Cyprus. Firestone, W.A., & Fisler, J.L. (2002). Politics, Community, and Leadership in a School University Partnership. Educational Administration Quarterly, 38(4), 449-493. Heck, R.H., & Moriyama, K. (in press). Examining relationships among elementary schools' contexts, leadership, instructional practices, and added-year outcomes: a regression discontinuity approach. School Effectiveness and School Improvement. Hofman, R.H., Hofman, W.H., & Gray, J.M. (2010). Institutional contexts and international performances in schooling: Comparing patterns and trends over time in international surveys. European Journal of Education, 1, 153-173. Hoyle, R.J., & Skrla, L. (1999). The politics of superintendent evaluation. Journal of Personnel Evaluation in Education, 13(4), 405-419. Konstantopoulos, S., & Hedges L.V. (2008). How large an effect can we expect from school reforms? Teachers College Record, 110(8), 1611 38. Kyriakides, L., & Campbell, R.J. (2004). School self-evaluation and school improvement: a critique of values and procedures. Studies in Educational Evaluation, 30(1), 23-36. Kyriakides, L., & Creemers, B.P.M. (2008). Using a multidimensional approach to measure the impact of classroom level factors upon student achievement: a study testing the validity of the dynamic model. School Effectiveness and School Improvement, 19(2), 183-306. 6

Kyriakides, L., & Creemers, B.P.M. (2009). The effects of teacher factors on different outcomes: two studies testing the validity of the dynamic model. Effective Education, 1(1), 61-85. Kyriakides, L., & Demetriou, D. (2007). Introducing a teacher evaluation system based on teacher effectiveness research: An investigation of stakeholders perceptions. Journal of Personnel Evaluation in Education, 19, 43-64. MacBeath, J. (1999). Schools must speak for themselves: The case for school self-evaluation. London: Routledge. Mijs, D., Houtveen, T., Wubells, T., & Creemers, B.P.M. (January, 2005). Is there empirical evidence for School Improvement. Paper presented at the ICSEI 2005 Conference, Barcelona, Spain. Murphy, J., & Beck, L. (1995). School-based management as school reform: Taking stock. Thousand Oaks, CA: Corwin. OECD (2002). PISA 2000 Technical Report. Paris: OECD. Slavin, R.E., Lake, C., & Groff, C. (2009). Effective Programs in Middle and High School Mathematics: A Best-Evidence Synthesis. Review of Educational Research, 79(2), 839-911. Smylie, M. A., Lazarus, V., & Brownlee-Conyers, J. (1996). Instructional Outcomes of School-Based Participative Decision Making. Educational Evaluation and Policy Analysis, 18(3), 181-198. Stronge, J. H., & Tucker, P.D. (2000). The politics of teacher evaluation: A case study of new system design and implementation. Journal of Personnel Evaluation in Education, 13(4), 339-359. Teddlie, C., & Reynolds, D. (2000). The International Handbook of School Effectiveness Research. London: Falmer Press. Teddlie, C., & Stringfield, S. (2007). A quarter-century of U.S. research on school effectiveness and school improvement. In T. Townsend (ed.), International Handbook of Research on School Effectiveness and Improvement (pp131-166). Dordrecht, NL: Springer. Worthen, B.R., Sanders, J.R., & Fitzpatrick, J.L. (1997). Program Evaluation: Alternative Approaches and Practical Guidelines (2nd Ed.). USA: Longman Publishers. 7

Table 1: Parameter Estimates and (Standard Errors) for the analysis of student achievement in mathematics (Students within classes, within schools) Factors Model 0 Model 1 Model 2 Model 3 Fixed part (Intercept) 0.90 (.13) 0.72 (.12) 0.64 (.13) 0.41 (.13) Student Level Prior achievement 0.31 (.08) 0.30 (.07) 0.31 (.07) Sex (0=Girls, 1=Boys) 0.06 (.02) 0.06 (.02) 0.06 (.02) SES 0.15 (.06) 0.15 (.05) 0.14 (.05) Classroom Level Average prior achievement 0.12 (.05) 0.11 (.04) 0.12 (.04) Average SES N.S.S. N.S.S. N.S.S. Percentage of boys N.S.S. N.S.S. N.S.S. School Level Average prior achievement 0.09 (.04) 0.09 (.03) 0.09 (.03) Average SES N.S.S. N.S.S. N.S.S. Percentage of boys N.S.S. N.S.S. N.S.S. School climate A) Openness and trust N.S.S. N.S.S. B) Achievement press 0.08 (.03) 0.08 (.03) SSE Approach Control group -.12 (.03) Dealing with the concerns of N.S.S. stakeholders (2 nd approach) Using the dynamic model (3 rd.14 (.04) approach) Variance components School 14.2% 11.0% 9.5% 6.2% Class 18.5% 15.2% 14.8% 13.0% Student 67.3% 51.1% 51.0% 49.4% Explained 22.7% 24.7% 31.4% Significance test Χ 2 1113.4 797.3 762.1 702.1 Reduction 316.1 35.2 60.0 Degrees of freedom 5 1 2 p-value.001.001.001 Note: The models presented in this table were estimated without the variables that did not have a statistically significant effect at level.05. N.S.S. = No statistically significant effect at level.05. 8

Table 2: Parameter Estimates and (Standard Errors) for the analysis of mathematics achievement of students in the three experimental groups only Factors Model 0 Model 1 Model 2 Model 3 Fixed part (Intercept) 0.87 (.13) 0.72 (.12) 0.64 (.13) 0.35 (.13) Student Level Prior achievement in maths 0.30 (.08) 0.31 (.07) 0.31 (.07) Sex (0=Girls, 1=Boys) 0.06 (.03) 0.06 (.02) 0.06 (.02) SES 0.16 (.06) 0.15 (.04) 0.15 (.04) Classroom Level Average prior achievement 0.11 (.05) 0.12 (.04) 0.12 (.04) Average SES N.S.S. N.S.S. N.S.S. Percentage of boys N.S.S. N.S.S. N.S.S. School Level Average prior achievement 0.09 (.04) 0.10 (.03) 0.10 (.03) Average SES N.S.S. N.S.S. N.S.S. Percentage of boys N.S.S. N.S.S. N.S.S. School climate A) Openness and trust N.S.S. N.S.S. B) Achievement press 0.09 (.04) 0.09 (.03) SSE Approach Dealing with the concerns of stakeholders (2 nd approach) Using the dynamic model (3 rd approach) Priority Area Nothing to do with the factors included in the dynamic model N.S.S. N.S.S. 0.15 (.05) 0.13 (.04) -0.08 (.03) Amount of effort put to SSE 0.12 (.03) Variance components School 14.0% 11.1% 9.5% 5.5% Class 17.5% 16.0% 14.5% 13.3% Student 68.5% 51.1% 50.1% 49.0% Explained 21.8% 25.9% 32.2% Significance test Χ 2 1023.4 797.3 768.1 704.8 Reduction 226.1 29.2 63.3 Degrees of freedom 5 2 2 p-value.001.001.001 Note: Abbreviations as in table 1 9