PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by:[university of Warwick] On: 1 April 2008 Access Details: [subscription number 773572776] Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK British Educational Research Journal Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713406264 Where are we at? An empirical study of levels and methods of evaluating continuing professional development Daniel Muijs a ; Geoff Lindsay b a University of Manchester, UK b Centre for Educational Development, Appraisal and Research, University of Warwick, UK First Published on: 25 October 2007 To cite this Article: Muijs, Daniel and Lindsay, Geoff (2007) 'Where are we at? An empirical study of levels and methods of evaluating continuing professional development', British Educational Research Journal, 34:2, 195-211 To link to this article: DOI: 10.1080/01411920701532194 URL: http://dx.doi.org/10.1080/01411920701532194 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

British Educational Research Journal Vol. 34, No. 2, April 2008, pp. 195 211 Where are we at? An empirical study of levels and methods of evaluating continuing professional development Daniel Muijs a and Geoff Lindsay *b a University of Manchester, UK; b Centre for Educational Development, Appraisal and Research, University of Warwick, UK (Submitted 29 July 2005; resubmitted 13 February 2006; accepted 27 February 2006) Continuing professional development (CPD) is increasingly becoming recognised as important for all professionals in order to maintain and develop their competence. Many professions, especially in the health field, require evidence of CPD in order for professionals to be granted continuing registration as practitioners. Given its accreditation as well as developmental uses, it is important that CPD is evaluated. The present study examines the usefulness of a hierarchical model for the evaluation of CPD for teachers. The data were derived from a sample of 223 CPD coordinators and 416 teachers from a randomly selected sample of 1000 schools in England. Questionnaire data were analysed using Rasch modelling. The results suggest a reasonable fit with the model, with participant satisfaction being the most commonly evaluated outcome while participants use of new skills and student outcomes were the least likely to be evaluated, together with value for money according to teachers only. The implications for teachers CPD are discussed. Introduction The international research literature has consistently shown that professional development is an essential component of successful school development and of teacher growth, well-being and success (Hargreaves, 1994; Day, 1999). It has confirmed that where teachers are able to reflect, access new ideas, experiment and share experiences within school cultures and where leaders encourage appropriate levels of challenge and support, there is greater potential for school and classroom improvement (Muijs & Reynolds, 2000). Improving schools invest in the development of their staff and create opportunities for teachers to collaborate and to share best practice (McLaughlin & Talbert, 2001). Evidence also suggests that *Corresponding author. CEDAR, University of Warwick, Coventry, CV4 7AL, UK. Email: geoff.lindsay@warwick.ac.uk ISSN 0141-1926 (print)/issn 1469-3518 (online)/08/020195-17 # 2008 British Educational Research Association DOI: 10.1080/01411920701532194

196 D. Muijs and G. Lindsay attention to teacher learning can impact directly and indirectly upon improvements in student attitudes to learning, teaching processes and achievement. Where teachers have clear professional identities, and have intrinsic as well as extrinsic rewards for their work, they are more satisfied and expand and develop their own teaching repertoires. In relation to their moral and instrumental purposes, it is more likely that they will provide sustained commitment and an increased range of learning opportunities for students (Joyce et al., 1998). In short, the research literature demonstrates that continuing professional development can have a positive impact on curriculum and pedagogy, as well as teachers sense of efficacy and their relationships with students (Talbert & McLaughlin 1994). Continuing professional development (CPD) is increasingly seen, then, as a key part of the career development of all professionals, which is a shared responsibility with their employers because it serves the interests of both (Madden & Mitchell, 1993). Evidence of CPD is often a requirement for continuing recognition as fit to practise and inclusion on the relevant professional register (British Psychological Society, 2004; Health Professions Council, 2005). The concept is often left ill defined, however, being in many cases conflated with the related concepts of in-service training and on-the-job learning. Both are more limited than CPD, as CPD can encompass a wide variety of approaches and teaching and learning styles in a variety of settings (inside or outside of the workplace). It is distinguishable from the broader concept of lifelong learning, which can include all kinds of learning. It is seen primarily as being related to practitioners professional identities and roles and the goals of the organisation for which they are working (Galloway, 2000). However, while the importance of CPD is widely acknowledged by the professions, evaluation of the impact of CPD is rarely undertaken in a systematic and focused manner. The research evidence about evaluation practices in relation to CPD shows that current practice in many cases seems to be limited in a number of ways. Guskey (2000, pp. 8 10) suggests that these limitations can be summarised as follows: 1. Most evaluation consists merely of summarising the activities undertaken as part of the professional development program: what courses were attended, how many credits accrued etc. This clearly gives no indication of the effectiveness of the activities undertaken, making this form of data-collection inadequate as a means of examining the effects of CPD. 2. Where some evaluation does exist, this usually takes the form of participant satisfaction questionnaires. Obviously, these allow one to gauge whether participants consider the event to have been enjoyable and successful, but this method does not engage with issues such as gains in knowledge, or changes in practice expected from professional development, and certainly does not evaluate whether there have been associated changes in student outcomes. 3. Evaluations are also typically brief, one-off events, often undertaken post hoc. As most meaningful change will tend to be long-term, and many professional development activities will take place over a longer period of time, evaluation efforts need to reflect

Evaluating continuing professional development 197 this and likewise take place over time. Evaluation will also need to be built in to run alongside professional development activities. While Guskey s (2000) study relies largely on American research, the situation in the UK seems similar. In a recent study of CPD activity in England, Edmonds and Lee (2002) found that in most cases evaluation took the form of a feedback form that was completed by teachers, including questions on delivery, content, and whether they felt the course had met its objectives. Only rarely were questions posed as to whether it was cost-effective and was likely to impact on teaching and learning. Follow-up was unusual, with actual effects on teaching and learning hardly ever being studied, and long-term monitoring of impact usually not present. Teachers reported that they thought CPD improved teaching and learning, but were unable to provide hard evidence of impact. Another recent study examined the impact of leadership development programmes. A survey of local education authorities found that barely half claimed to study the impact of CPD (Bennett & Smith, 2000). Qualitative follow-up using in-depth interviews found that managers reported a variety of impacts, but that there was substantial disagreement between different members of the same organisation on the impact of any one CPD programme. The qualitative section of the study also showed that only one school had made any explicit assessment of the impact of professional development on the school, and this was done using a fairly unsophisticated self-assessment method. Where evaluation does occur this is no guarantee of quality. In one study of evaluations of after-school programmes in the USA, the authors concluded that most suffered from severe reliability and validity problems. Implementation was only fully studied in a minority of studies, and where this occurred, only a very small number of evaluations used direct observation methods (Scott-Little et al., 2002). Frameworks for evaluating CPD A number of useful conceptual frameworks have been developed for the evaluation of professional development. Two well-known general evaluation frameworks are Stake s countenance model, and Stufflebeam s CIPP model. Stake s (1967) model distinguishes two main countenances of evaluation description and judgement both of which are used to organise the evaluation and analyse the data. Within both headings, evaluators need to distinguish three elements, namely, antecedents, the situation before and at the start of the programme, transactions, what occurs during the programme, and outcomes. Data collected at each level are analysed by looking for congruence (are objectives achieved?) and contingence (what is the cause and effect relationship, i.e. was what happened caused by a factor within the CPD?) A not dissimilar model is presented by Stufflebeam (1983), who focuses on four main elements: context, input, processes and product (the CIPP model). In this model context refers to the identification of problems, needs and opportunities that can guide programme planning. Input evaluation refers to allocation of resources to the evaluated programme, and also allows for the evaluation of alternative strategies to achieve programme goals. Process evaluation focuses on implementation, while

198 D. Muijs and G. Lindsay product evaluation focuses on outcomes. This model has been found to be useful for evaluating complex systems (Clough & Lindsay, 1991). Building on these evaluation models to develop one specifically for the evaluation of CPD in schools, Guskey (2000) distinguishes a hierarchy of levels of impact: Level 1: participants reactions. Currently this is the most common and easily collectable form of evaluative evidence. However, in many ways it is also the least informative as participants reactions to CPD tend to be impressionistic and highly subjective. Questions addressed at level 1 will include whether the participants enjoyed the event and thought it was useful, and whether it addressed their needs, was well presented and well organised. Three main types of questions can be answered using this approach: content questions (e.g. were the issues addressed relevant, was the material pitched at an appropriate level?), process questions (e.g. was the session leader well prepared, were the materials suitable?) and context questions (e.g. was the room the right size or temperature?) (Guskey, 2000). As can be seen from these questions, while they address possible prerequisites of professional development that can facilitate CPD leading to change, they do not themselves measure this. Level 2: participants learning from CPD. Level 2 in Guskey s framework addresses participants learning from CPD. There are several types of learning that can result from CPD: cognitive, affective or behavioural. Harland and Kinder (1997) provide a more detailed description of outcomes at this level, distinguishing informational outcomes, new awareness and knowledge and skills as cognitive outcomes and value congruence outcomes, affective outcomes, motivational and attitudinal outcomes as affective outcomes. Knight (2002) distinguishes two main types of knowledge: procedural/practical knowledge (lower-level skills) and declarative/propositional knowledge (higher order knowledge, including knowledge of facts, abstract knowledge of principles, and abstract knowledge of ideas). These various types of knowledge are acquired and modified in different ways, thus probably requiring different methods of evaluation. As well as specific knowledge and skills and affective outcomes, CPD may result in renewed commitment of teachers as change agents, and in renewed or extended moral purpose. These outcomes are crucial to teacher effectiveness, and need to be taken into account at this level of evaluation. Level 3: organisational support and change. Guskey s third level of evaluation concerns organisational support and change. It is clear from the research on school improvement, and the growing body of literature on change, that CPD programmes are unlikely to have a lasting effect without organisational support (Muijs et al., 2004). A supportive school ethos and an expectation that all teachers engage in CPD have been found to be important factors in securing change as a result of CPD (Edmonds & Lee, 2002). CPD activities have been found to transfer more easily into changed behaviours and teaching practices if there is good fit with individuals professional and personal values and with existing professional development approaches in the organisation (Knight, 2002). As well as being important in influencing the success of CPD programmes, organisational change can often be a prime goal of CPD programmes. Therefore, organisational level outcomes and

Evaluating continuing professional development 199 support are important parts of CPD evaluation since they would have an impact upon motivation on the one hand and sustainability of change on the other (Guskey, 2000). Level 4: participants use of new knowledge and skill. When a CPD programme is directly intended to change practice, it is essential to evaluate whether participants are actually using the new knowledge and skills acquired. Evaluation of this level will have to take place after a reasonable time (which will depend on the complexity of the knowledge or skills to be acquired and the amount of time participants have had to develop and practise these skills), allowing the participants to practise and assimilate the new method or skill (Guskey, 2000; Grace, 2001). It is also important to take into account the fact that most learners go through different phases of implementation, described by Hall and Hord (1987) as non-use, orientation (information seeking), preparation, mechanical use of the skill (day-to-day), routine use of the skill (establishes pattern of use), refinement (varies use depending on context), integration (coordinates use with colleagues to gain greater impact), and renewal (re-evaluates quality of use and modifies to increase impact). Not all learners will reach all levels of use. Level of use must be defined differently depending on goals, and can range from simple use or non-use distinctions, to more subtle gradations of use (e.g. non-use, novice use, expert use). Level 5: student outcomes. The fifth level identified by Guskey (2000) is the impact of CPD on student learning. This can be defined and measured in a number of ways, one distinction being that between cognitive outcomes, such as mathematical attainment, and non-cognitive outcomes such as attitudes to school. Both require different methods to determine programme effects (Guskey, 2000). Finding a direct impact of CPD on student outcomes will depend on the goals of the activity, who has been involved, as well as the effectiveness of CPD at the four previous levels. The many different initiatives in schools also make it difficult to disentangle the impact of any CPD programme from other factors and programmes in the school. However, the centrality of student outcomes to the educational endeavour makes at least some consideration of this factor imperative in any evaluation of CPD in schools. Guskey (2002) suggests that when designing CPD evaluations one works backwards, starting with level 5, both in planning the CPD activity and the evaluation thereof. This ensures that the final goal of improving student outcomes is central to the process. While Guskey suggests five levels of evaluation, we would add a further level, focusing on the issue of cost-effectiveness of CPD. As Belfield et al. (2001) rightly point out in the context of medical practice, CPD should not be undertaken if the costs to the system outweigh the benefits. Also, if other ways of raising the performance of teachers and students are more cost-effective, doubts would have to be raised over the validity of conducting CPD. It would also be useful to know the cost-effectiveness of different modes of CPD, on which we currently possess little information. This model is therefore predicated on the view that the goal of education and schools is the cognitive, social and emotional development of students, and that

200 D. Muijs and G. Lindsay therefore professional development ongoing in schools should ultimately result in some benefits to them if it is worth pursuing. As a result, while important in itself, participant satisfaction is rated at the lowest level while student outcomes are rated higher. This is, of course, a strong and contestable value judgement, and it is clear that this model is not compatible with forms of CPD that have resulted from different value positions. It should be noted that this approach does not address specific content of CPD or the technical quality of the evaluation procedures. These are important considerations but may apply to all levels. Methodology While some of the studies mentioned above suggest that evaluation is limited to the lower hierarchies of this model, no systematic studies exist in the UK that have attempted to study the extent of CPD evaluation at all levels of this hierarchy. This study is aimed at starting to plug this gap and to examine the benefits of a hierarchical model, derived from that proposed by Guskey, in English schools. Survey methodology was used to attempt to gather the views of CPD coordinators and teachers in English schools on the extent to which CPD activities in their schools were evaluated and how this was done. A survey was developed that aimed to collect respondents views of both the uses and usefulness of CPD evaluation, and the extent to which CPD activities were evaluated at different levels. Respondents were asked to think about evaluation in their school in general, and not any particular evaluation. Some additional data on the different types of CPD employed and biographical data on respondents were also collected. Sample Separate questionnaires for CPD coordinators and teachers were sent to 1000 randomly selected schools in England in the autumn term of 2003: 223 CPD coordinator and 416 teacher questionnaires were returned. As we had relatively (though not untypically) high levels of non-response, we conducted a number of analyses designed to compare actual respondents to the random sample drawn. These included Key Stage 1, Key Stage 2 and General Certificate of Secondary Education (GCSE) results, percentage of students eligible for free school meals and on the special educational needs register, and the ethnicity profile of the school. No statistically significant differences were found for any of the studied variables although there was a slight over-representation of London schools, and a slight under-representation of schools from the South-East region in the final sample. No statistically significant differences between respondent and non-respondent schools (analysed using cross-tabulation tables and chi square tests) were found with regard to gender, urban/rural, specialist school status, Education Action Zone (EAZ) or Excellence in Cities (EIC) status, training school status, Investor in People, Leading Edge or Leadership Incentive Grant status.

Evaluating continuing professional development 201 In order to both look more closely at the applicability of a hierarchical model and the measurement instrument thereof that we have developed, and to study the prevalence of different types of evaluation in English schools, Rasch modelling was used to analyse the survey data. Rasch modelling has been developed to aid the construction of more valid measurement instruments in the social sciences that, rather than using ordinal scales, construct invariant, continuous scales that approximate the measures used in the natural sciences (Bond, 2003). The usefulness of the model lies in the precision it allows in determining how much more or less prevalent types of evaluation are, and in the fact that if the model fits we can develop linear measurement instruments, which have more desirable statistical properties and allow for the use of more powerful statistical methods when analysing data such as relationships. The Rasch model is in essence a one-parameter logistic model within item response theory in which a person s level on a latent trait and the level of various items on the same latent trait can be estimated independently yet still compared explicitly to one another. In other words, person ability and item level are measured independently and iteratively (Hambleton et al., 2002). When using ordinal variables rather than dichotomous items as in the present analyses, a variant of the Rasch model adapted to take this into account is needed. We have used the Partial Credit Model. This can be used to estimate models for questionnaire data using ordered polytomies in which the response structure is modelled to be unique to each item (Wright, 1999). Data will be presented for both the CPD coordinator and teacher surveys. The Winsteps software programme was used to analyse the data. Results 1. CPD coordinators The percentages of CPD coordinator respondents engaged in evaluation at each level of the hierarchical model are given in Table 1. From this table it is clear that participant satisfaction is by far the most frequently evaluated element, with over 75% of coordinators claiming it is evaluated usually or always. None of the other categories reaches 50% usually or always. Participant learning is also relatively Table 1. CPD coordinator use of evaluation: how often have the following elements of CPD been evaluated? (percentage valid responses to each category) Never Rarely Sometimes Usually Always Participant satisfaction 0.0 5.1 18.9 40.8 35.2 Participant learning 7.9 17.3 30.9 26.7 17.3 Organisational change 8.3 18.2 37.5 29.7 6.3 Participants use of new 6.7 15.4 40.0 31.8 6.2 knowledge and skills Student learning outcomes 11.4 14.5 34.7 33.7 5.7 Value for money 13.4 15.5 19.6 33.5 18.0

202 D. Muijs and G. Lindsay frequently evaluated, with over 40% of respondents making this claim, while organisational change is least likely to be evaluated usually or always. When using Rasch modelling, it is first necessary to test the fit of the model to the data, to ascertain whether the assumption of a simple linear hierarchy holds. To do this, measures of model infit and outfit can be studied. Infit MNSQ is a mean square fit statistic indicating the fit of the model to the data. In other words, this indicates whether the items fit the Partial Credit Rasch model, and therefore form a measurement scale. The expected value is 1. Values substantially below 1 indicate dependency in the data; values substantively above 1 indicate noise. As a rule of thumb, values between 0.7 and 1.3 have been suggested as acceptable for this sample size, while other authors have suggested values between 0.6 and 1.4 (Green & Frantom, 2002). Outfit MNSQ is a similar measure that is more sensitive to outliers in the data. Another fit measure is the Root Mean Square Error of Approximation (RMSE). Winsteps provides two RMSE statistics: Model RMSE is computed based on the hypothesis that misfit in the data is essentially random, and can be interpreted as the upper boundary of model reliability. Real RMSE is computed based on the hypothesis that misfit in the data results from model misspecification, and can be interpreted as the lower boundary of reliability. These measures can be seen as equivalent to traditional measures of reliability such as Cronbach s Alpha or KR20. Table 2 shows that the model fits well for CPD coordinators, and has appropriate levels of reliability (Real RMSE reliability 0.77, Model RMSE reliability 0.82). Mean score and standard deviation (SD) values suggest that on average respondents score well above the midpoint, which suggests that they employ evaluation up to a reasonably high level of the hierarchy, but that there is substantial variation between respondents. As well as model fit, it is important to calculate item fit, the extent to which the individual items fit the Rasch model. This measure allows examination of the extent to which individual items fit a continuous measurement scale. As with model fit as a whole, Infit and Outfit MNSQ are used to examine model fit of the items, and interpreted in a similar way. In general, the statistics indicate that the items fit the Table 2. CPD coordinator and teacher levels of evaluation: model fit statistics Statistic CPD coordinators Teachers Infit MNSQ.98.97 Outfit MNSQ.98 1.01 Real RMSE reliability.77.80 Model RMSE reliability.82.85 Mean score 22.5 18.4 SD 5.1 4.8 Minimum score 7 9 Maximum score 34 34

Evaluating continuing professional development 203 Table 3. CPD coordinator and teacher levels of evaluation: item fit statistics CPD coordinators Teachers Infit MNSQ Outfit MNSQ Infit MNSQ Outfit MNSQ Participant satisfaction 1.03 0.99 1.88 2.01 Participant learning 1.08 1.04 0.84 0.80 Organisational change 0.62 0.66 0.70 0.70 Participants use of new 0.83 0.84 0.65 0.65 knowledge/skills Student learning outcomes 0.85 0.84 0.92 0.89 Value for money 1.48 1.42 1.24 1.24 Rasch model with both Infit MNSQ and Outfit MNSQ at 0.98, very close to the 1.0 expected. While these statistics suggest that a hierarchical framework fits the data in this sample, they also indicate at what levels evaluation is most likely to occur or, in other words, whether coordinators are more likely to evaluate participant satisfaction than participant learning, for example. One major advantage of the Rasch model is that it allows in the present case measurement of the extent to which particular evaluation methods are used by respondents with different levels of usage of evaluation overall. Hence, an accurate assessment can be made where each type of evaluation falls on a scale of evaluation prevalence or difficulty. Item-level estimates are given in Table 3. Estimates are standardised around 0 (mean). Negative numbers indicate items at a lower level in the scale (i.e. they are completed by respondents with low use of evaluation). Positive numbers indicate items at higher levels (i.e. completed by respondents with high use of evaluation of CPD). Table 3 reveals that two items appear somewhat borderline, however. Value for money appears to suffer from noise in the data; in other words, there appears to be some randomness in the extent to which coordinators express use of this item compared to their overall evaluation use. The variable organisational change may suffer from a slight dependency on other items. Table 4. CPD coordinator and teacher levels of evaluation: item level Item CPD coordinators Teachers Estimate Error Estimate Error Participant satisfaction 21.40 0.10 22.17 0.09 Participant learning 20.04 0.09 20.69 0.08 Organisational change 0.15 0.09 0.68 0.07 Participants use of new knowledge/skills 0.33 0.09 0.70 0.07 Student outcomes 0.25 0.09 0.39 0.07 Value for money 20.01 0.09 0.80 0.07

204 D. Muijs and G. Lindsay Table 5. Teacher use of evaluation: how often have the following elements of CPD been evaluated? (percentage valid responses to each category) Never Rarely Sometimes Usually Always Participant satisfaction 1.5 2.6 11.1 48.2 36.6 Participant learning 4.5 8.4 28.1 44.6 14.4 Organisational change 10.2 22.8 39.5 21.5 5.9 Participants use of new 10.5 22.2 39.9 22.4 5.0 knowledge and skills Student learning outcomes 11.3 18.0 34.3 28.7 7.8 Value for money 16.4 20.4 33.4 22.1 7.6 Table 4 confirms that participant satisfaction is by far the lowest level item, or, in other words, coordinators are most likely to evaluate CPD at this level. Other items cluster around the mean, with participants use of new knowledge and skills being most positively removed from the mean. Therefore, it can be said that only those coordinators who operate at the highest level of evaluation evaluate use of new knowledge and skills. Conversely, this is therefore the type of evaluation least likely to be encountered in schools. 2. Teachers The percentage of teacher respondents engaged in evaluation at each level of the hierarchical model is given in Table 5. As for the coordinators, participant satisfaction is by far the most frequently evaluated element, with even more teachers (over 85%) claiming this usually or always happens in their school. Participant learning is also evaluated usually or frequently according to teachers, again a higher proportion than among coordinators. All other categories are far less frequently evaluated, with organisational change, use of new knowledge and skills and value for money all below 30% usually or always, lower percentages than found in the responses of the CPD coordinators. The same analyses were conducted for the teacher survey as for the coordinators survey. Model fit statistics are given in Table 2. Findings are similar to those of the coordinators, and indicate that the model fits well. Mean score and SD are both lower than those of CPD coordinators, suggesting both that teachers consider that evaluation occurs less often than coordinators, and that there is somewhat less variance in their views. Item fit statistics are given in Table 3. As with the coordinators, the statistics indicate that in general the items fit the Rasch model, but one item appears to be problematic in the teacher survey, namely, participant satisfaction. This item shows substantial noise, suggesting unpredictability of responses. As will be shown, this is largely due to the overly low-level nature of this item compared to the others. Participant satisfaction is by far the lowest level item or, in other words, the item that teachers report as being most often evaluated. It is more of an outlier than in the coordinators survey, being far lower level than the other items (Table 4). Also, while

Evaluating continuing professional development 205 participants use of new knowledge/skills was the highest level item for coordinators, this was not the case for teachers, where value for money was just highest. In other words, teachers feel that value for money is the element least likely to be evaluated. However, participants use of new knowledge and skills was the second highest level item. Hence, according to the teachers, value for money and participants use of new knowledge and skills are the least likely to be evaluated. Again, substantial variance in prevalence of evaluation in schools is evident from teachers reports although this differs from the views of coordinators. According to teachers, in the average school only participant satisfaction is likely to be evaluated. 3. Relations with opinions on evaluation To gain further understanding of the factors that differentiate evaluation practices between schools, the relationship was examined between the extent of evaluation perceived by coordinators to be happening in their schools and the perceived usefulness of evaluating CPD. Coordinators were asked to indicate their levels of agreement with a number of statements about evaluating CPD. The findings on model fit suggest that it is valid to construct a level of evaluation scale based on the items discussed previously. This scale was correlated with the statements on CPD evaluation (Table 6). Statistically significant moderate correlations were found with coordinators views on CPD evaluation, suggesting that higher levels of use of evaluation levels are associated with more positive views on evaluating CPD. As well as being carried out at different levels, evaluation can be undertaken using different methods. For example, many evaluations may be carried out using questionnaires but, in some cases, student achievement data may be collected, or interviews may be undertaken with participants in CPD. An analysis was carried out to examine whether there was a relationship between the sophistication of evaluating CPD in terms of the different levels and the type of methods used. Therefore, both teachers and coordinators were asked what evaluation methods were used, and these were correlated with the level of evaluation scale. Table 6. Correlations between the levels of evaluation scale and CPD coordinators views on evaluating CPD Correlation coefficient Evaluating CPD is a waste of my time -.25** Evaluating CPD is only useful if feedback is given to the provider -.13 Evaluating CPD is necessary to see whether it is having a positive effect on.30*** students Evaluating CPD is necessary to see whether it is having a positive effect on.29*** teachers Evaluating CPD is necessary to see whether it is having a positive effect on the.28** school **p,.01, ***p,.001.

206 D. Muijs and G. Lindsay Among both coordinators and teachers, statistically significant correlations were found between evaluation level and use of all methods except for questionnaires (Table 7). This implies that respondents with higher levels of evaluation usage are more likely to use all methods except for questionnaires. Correlations were stronger and of moderate level among CPD coordinators; use of documentary evidence and interviews were particularly strongly correlated with evaluation level. Among teachers, collecting documentary evidence was also most strongly correlated with evaluation level. Other correlations were modest, possibly due to the lower levels of involvement with CPD evaluation among teachers. These results suggest that engaging in higher level evaluation is associated with using a diversity of evaluation methods other than just questionnaires, pointing to an overall higher level of sophistication in evaluation where more levels are studied. Discussion Continuing professional development is one of the main elements in ensuring maintenance and further development of quality provision in any profession. While in education, in contrast to many other professions such as medicine and dentistry, no statutory CPD requirement exists as yet, there has nevertheless been an intensification of emphasis on this aspect over recent years. Large investments in CPD have been made at the national, local and individual school level, and the expectation that CPD is a part of school plans and policies is enshrined in both government strategies and Office for Standards in Education inspection frameworks. In view of this investment, a question has to be asked as to the effectiveness of CPD activities. It would be surprising if all activities a school undertook were equally effective, and certainly research undertaken in a variety of contexts suggests that this is not the case (e.g. Muijs et al., 2005). Therefore, it is of importance that schools evaluate the effectiveness of CPD they have undertaken, in order to inform future policies and activities in this regard and that use is made of the most appropriate measure for the purpose (Scannell, 1996). The extent to which this actually occurs, however, was not previously clear. In this study we have attempted to shed some light on this factor using a hierarchical model that built on Guskey s (2000) work, in which he Table 7. Correlation between the levels of evaluation scale and evaluation methods Coordinators Teachers Questionnaires.05.05 Interviews.44***.12* Learning logs and journals.37***.11* Classroom observation.23**.24** Documentary evidence.44***.30*** Student interviews.34**.18** Student outcome measures.17*.16** *p,.05, **p,.01, ***p,.001.

Evaluating continuing professional development 207 distinguishes five levels of CPD evaluation, with participant satisfaction as the simplest level, through participant learning, organisational support, participant behaviour changes and student learning outcomes, to which we have added value for money. Overall, our findings suggest that this model can be usefully employed to characterise levels of CPD evaluation in English schools. We were able to construct fitting Rasch models for both teachers and CPD coordinators, suggesting that a linear scale ranking levels of evaluation by difficulty could be constructed for both groups. These scales largely followed the hypothesised hierarchical ordering. Among both teachers and CPD coordinators participant satisfaction was, as expected, the most common form of evaluation. These findings therefore confirm previous research suggesting that evaluation is most likely to occur at the participant satisfaction level. For each group evaluation of participant learning from the CPD was the next most common. However, evaluation of student outcomes was more common than evaluation of participants behaviour change in the form of their use of new knowledge and skills, in contrast to what would be predicted from the model. This probably results from the strong emphasis on assessment of students through examination in the English education system, which makes this factor both easily accessible in terms of the existence of published test results, and highly salient in the minds of teachers. The ordering of the other factors was similar to that proposed by the model, although some differences emerged between teachers and coordinators. Participants use of new knowledge/skills was least frequently evaluated according to CPD coordinators but second least frequent according to teachers, for whom value for money was the least frequent. The differences between coordinators and teachers may reflect different experience of the evaluation process, whereby value for money is an element that would be studied at a management level, often without involving classroom teachers, while impact on outcomes may be evaluated directly by teachers observing what is going on in their classrooms in terms of student learning. Learning may not, however, be systematically evaluated to the same extent, which may explain the views of coordinators regarding this level of evaluation. That teachers generally seem to see fewer areas being typically evaluated than CPD coordinators may also reflect their lesser involvement in the evaluation process. However, some caution is obviously necessary when interpreting results from self-report data such as surveys, which may suffer from bias (such as a tendency to overstate actual occurrence of evaluation) and a lack of opportunity to discuss participants understanding of different levels and methods of evaluation. These findings also show that levels of evaluation are by no means equally spaced in terms of use. Participant satisfaction measures are clearly distinct from other levels of evaluation among coordinators and, even more strongly, teachers, almost forming an outlier item because of this. Among teachers, participant learning is also distinct and appears to be evaluated in the average evaluation school, while the other levels appear grouped and evaluated only in more highly evaluated schools. Among coordinators, participant learning and value for money were seen as evaluated in average evaluation schools.

208 D. Muijs and G. Lindsay This article suggests that this hierarchical framework may be a useful way of examining evaluation of CPD in English schools, Rasch modelling suggesting that it is possible to construct reliable scales measuring levels of evaluation along these lines, taking into account the fact that levels are not equidistant with regard to commonality of usage. Substantively, the findings support earlier studies suggesting that evaluation of CPD is most likely to take the form of measuring participant satisfaction, found to be the level most likely to be evaluated according to both teachers and coordinators. However, in some ways, these findings present a less gloomy picture of CPD evaluation than that reported in some of the earlier studies (e.g. Harland & Kinder, 1997; Guskey, 2002) If these (self-report) data can be trusted, there is evidence of multilevel evaluation happening in many schools, and even in the average school in this sample evaluation occurs at more than just the participant satisfaction level, encompassing participant learning (according to teachers) and participant learning and value for money (according to CPD coordinators). On the other hand, evaluation of such key outcomes as changes in participants actual behaviour (use of new knowledge and skills) and impacts on students is more limited. Furthermore, it has to be recognised that the relatively low response rate may mean that this sample is not entirely representative of the population. Self-selection may mean that our respondents are more likely to be at the higher end of the evaluation continuum than the population as a whole. Also, definitions employed may mean that, while present, not all levels are evaluated at a high level of sophistication. In particular, it is unlikely that where respondents claim to be evaluating value for money, they are using full cost effectiveness methodology (Levin & McEwan, 2001), and likewise it is unlikely that direct measures of participant learning are used in most schools. The correlational analyses do suggest, however, that there is a relationship between levels of evaluation employed and methods used to evaluate the impact of CPD. Schools that appear to evaluate CPD at more levels also appear more sophisticated in the use of multiple research methods to study this. It is also important here to point out that evaluating CPD at different levels will not in itself lead to useful or effective evaluation, if the methodologies used to evaluate these are not valid, or if there are no effective feedback mechanisms built in to the evaluation. Research in school effectiveness and school improvement (e.g. Harris, 2003) suggests that schools that are improving invest in professional development and are able to engage staff in various forms of professional learning. It has been argued that creating a collaborative professional learning environment for teachers is the single most important factor for successful school improvement and the first order of business for those seeking to enhance the effectiveness of teaching and learning (Eastwood & Louis, 1992, p. 215). Consequently, it would seem imperative that schools adopt evaluative approaches to CPD that accurately gauge outcomes at organisational, teacher and student level and so ensure that CPD activities meet school and teacher needs and contribute to enhancing students development (Joyce & Showers, 1995; Hopkins & Harris, 2001; Smith, 2002). Our study suggests that at present the evaluation of CPD is too focused on participant satisfaction.

Evaluating continuing professional development 209 Consequently, we argue that such evaluation mechanisms do not appear to be in place with respect to the key intended outcomes of most CPD: changes in what teachers actually do, resulting, it is hoped, in improved student outcomes and information on value for money. The content and focus of each CPD initiative must be for each school to decide, guided by its own priorities, which will in turn be influenced to greater or lesser extent by government imperatives. The model examined in the present study is generalisable across such initiatives. For example, any CPD which addresses teaching and learning will benefit from evaluation of teacher and pupil behaviours, not just teacher satisfaction with the CPD initiative. Evaluation of teachers own learning would provide the opportunity for evidence of its impact on cognitive, affective and/or actual or intended behavioural changes. This process also allows reflection which may itself enhance impact on the teacher. Evaluation of impact on pupils is also necessary, however, to examine whether changes in teacher behaviour actually impact positively on pupils. Again, changes may be in different domains. To achieve pupil-level impact, teachers behaviours are likely to need to change and there will often be a requirement for different organisational support: both these factors would also be evaluated. Finally, management of CPD will benefit from evaluation of value for money to facilitate comparisons of different means to achieving the same end, for example, the use of external consultants, attendance at external courses or within-school initiatives. It is accepted that some CPD will be aimed primarily at teachers, where the effects on pupils may be less easy to identify, or require longer term follow up. Nevertheless, this model still provides a useful framework for evaluation of the CPD. It is also recognised that the success of this proposed framework is ultimately dependent upon the relevance and quality of the evaluations undertaken. Furthermore, it is accepted that the higher levels may be more difficult and more costly to evaluate. These limitations notwithstanding, we would still argue that a broadening of the range of evaluative methods as well as investigation of the different levels proposed, including value for money, will have the potential to give meaningful formative and summative feedback to schools and teachers. These need to be adapted to the aims and goals of CPD and integrated into ongoing programme development rather than as standalone processes (Torres & Preskill, 2002). Without these evaluative approaches, gauging the relative effectiveness of different forms of CPD will remain elusive and, by implication, investing in forms of CPD that have little or no impact on the teacher and learner will remain a real possibility. As a contribution to this endeavour, the research study from which this article derives has developed a route map to support schools in evaluating their CPD (Goodall et al., 2005). Acknowledgement We are grateful to the Department for Education and Skills (DfES) in England for allowing us to draw upon some of the work from a recent DfES-funded project, Evaluating the Impact of CPD.

210 D. Muijs and G. Lindsay References Belfield, C. R., Morris, Z. S., Bullock, A. D. & Frame, J. W. (2001) The benefits and costs of continuing professional development (CPD) for general dental practice: a discussion, European Journal of Dental Education, 5(2), 47 52. Bennett, N. & Smith, B. (2000) Assessing the impact of professional development in educational leadership and management: the IMPPEL project, Management in Education, 14(2), 25 27. Bond, T. G. (2003) Validity and assessment: a Rasch measurement perspective, Metodologia de las Ciencias del Comportamiento, 5(2), 179 194. British Psychological Society (BPS) (2004) Continuing professional development (Leicester, BPS). Clough, P. & Lindsay, G. (1991) Integration and the support services: changing roles in special education (Windsor, NFER-Nelson). Day, C. (1999) Developing teachers: the challenges of lifelong learning (London, Falmer Press). Eastwood, K. & Louis, K. (1992) Restructuring that lasts: managing the performance dip, Journal of School Leadership, 2(2), 213 224. Edmonds, S. & Lee, B. (2002) Teachers feelings about continuing professional development, Education Journal, 61, 28 29. Galloway, S. (2000) Issues and challenges in continuing professional development, In: S. Galloway (Ed.) Continuous professional development: looking ahead. Proceedings of a symposium by the Centre on Skills, Knowledge and Organisational Performance, Oxford, May 2000. Goodall, J., Day, C., Harris, A., Lindsay, G. & Muijs, D. (2005) Evaluating the impact of continuing professional development. Research report RR659 (Nottingham, Department for Education and Skills). Grace, M. (2001) Evaluating the training, British Dental Journal, 191(5), 229 250. Green, K. E. & Frantom, C. G. (2002) Item grouping effects on invariance of attitude items, Journal of Applied Measurement, 3, 38 49. Guskey, T. R. (2000) Evaluating professional development (Thousand Oaks, CA, Corwin Press). Guskey, T. R. (2002) Does it make a difference? Evaluating professional development, Educational Leadership, 59(6), 45 51. Hall, G. & Hord, S. (1987) Change in schools: facilitating the process (Albany, NY, SUNY Press). Hambleton, R., Swaminathan, H. & Jane Rogers, H. (2002) Fundamentals of item response theory (Newbury Park, CA, Sage). Hargreaves, A. (1994) Changing teachers, changing times (London, Cassell). Harland, J. & Kinder, K. (1997) Teacher continuing professional development: framing a model of outcomes, British Journal of In-Service Education, 23(1), 71 84. Harris, A. (2003) Leadership in schools facing challenging circumstances, paper presented at International Congress of School Effectiveness and School Improvement, Copenhagen, 5 January. Health Professions Council (2005) Continuing professional development. Available online at: www.hpc-uk.org/registrants/cpd.htm (accessed 5 April 2005). Hopkins, D. & Harris, A. (2001) Creating the conditions for teaching and learning: handbook of staff development activities (London, David Fulton). Joyce, B. & Showers, B. (1995) Student achievement through staff development (New York, Longman). Joyce, B., Calhoun, E. & Hopkins, D. (1998) Models of teaching: tools for learning (Buckingham, Open University Press). Knight, P. (2002) A systemic approach to professional development: learning as practice, Teaching and Teacher Education, 18(3), 229 241. Levin, H. M. & McEwan, P. J. (2001) Cost effectiveness analysis (2nd edn) (Thousand Oaks, CA, Sage). McLaughlin, M. & Talbert, J. (2001) Professional communities and the work of high school teaching (Chicago, IL, University of Chicago Press). Madden, C. A. & Mitchell, V. A. (1993) Professions, standards and competence: a survey of continuing education for the professions (Bristol, University of Bristol).