Calibrating Standards-based Assessment Tasks for English as a First Foreign Language Standard-setting Procedures in Germany

Similar documents
How do we balance statistical evidence with expert judgement when aligning tests to the CEFR?

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

A Brief Profile of the National Educational Panel Study

Referencing the Danish Qualifications Framework for Lifelong Learning to the European Qualifications Framework

School Inspection in Hesse/Germany

Group of National Experts on Vocational Education and Training

Arts, Literature and Communication (500.A1)

ELP in whole-school use. Case study Norway. Anita Nyberg

PIRLS 2006 ASSESSMENT FRAMEWORK AND SPECIFICATIONS TIMSS & PIRLS. 2nd Edition. Progress in International Reading Literacy Study.

CEF, oral assessment and autonomous learning in daily college practice

Accountability in the Netherlands

The recognition, evaluation and accreditation of European Postgraduate Programmes.

2013/Q&PQ THE SOUTH AFRICAN QUALIFICATIONS AUTHORITY

INNOVATION SCIENCES TU/e OW 2010 DEPARTMENT OF INDUSTRIAL ENGINEERING AND INNOVATION SCIENCES EINDHOVEN UNIVERSITY OF TECHNOLOGY

Global MBA Master of Business Administration (MBA)

Case Study 4 Evaluation, testing and assessment. May I help you, madam? English for office communication in an adult education centre

Pre-vocational Education in Germany and China

HDR Presentation of Thesis Procedures pro-030 Version: 2.01

MODERNISATION OF HIGHER EDUCATION PROGRAMMES IN THE FRAMEWORK OF BOLOGNA: ECTS AND THE TUNING APPROACH

Higher Education Review (Embedded Colleges) of Navitas UK Holdings Ltd. Hertfordshire International College

GERMAN STUDIES (GRMN)

TABLE OF CONTENTS. By-Law 1: The Faculty Council...3

Emma Kushtina ODL organisation system analysis. Szczecin University of Technology

Guidelines for Writing an Internship Report

ACADEMIC AFFAIRS GUIDELINES

ANGLAIS LANGUE SECONDE

HEPCLIL (Higher Education Perspectives on Content and Language Integrated Learning). Vic, 2014.

EQuIP Review Feedback

5. UPPER INTERMEDIATE

Exploring the adaptability of the CEFR in the construction of a writing ability scale for test for English majors

State of play of EQF implementation in Montenegro Zora Bogicevic, Ministry of Education Rajko Kosovic, VET Center

Guidelines for the Use of the Continuing Education Unit (CEU)

Achievement Level Descriptors for American Literature and Composition

Susanne Rieger on her objectives as new President of EASC

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

THESIS GUIDE FORMAL INSTRUCTION GUIDE FOR MASTER S THESIS WRITING SCHOOL OF BUSINESS

International School of Kigali, Rwanda

Proficiency Illusion

Information for Candidates

Delaware Performance Appraisal System Building greater skills and knowledge for educators

ELS LanguagE CEntrES CurriCuLum OvErviEw & PEDagOgiCaL PhiLOSOPhy

Foreign Languages. Foreign Languages, General

2 di 7 29/06/

Presentation Advice for your Professional Review

Number of students enrolled in the program in Fall, 2011: 20. Faculty member completing template: Molly Dugan (Date: 1/26/2012)

Delaware Performance Appraisal System Building greater skills and knowledge for educators

Programme Specification. BSc (Hons) RURAL LAND MANAGEMENT

Master s Programme in European Studies

Note: Principal version Modification Amendment Modification Amendment Modification Complete version from 1 October 2014

The IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs. 20 April 2011

Promotion and Tenure Policy

Indicators Teacher understands the active nature of student learning and attains information about levels of development for groups of students.

Reviewed by Florina Erbeli

Developing an Assessment Plan to Learn About Student Learning

Audit Of Teaching Assignments. An Integrated Analysis of Teacher Educational Background and Courses Taught October 2007

Greetings, Ed Morris Executive Director Division of Adult and Career Education Los Angeles Unified School District

European Higher Education in a Global Setting. A Strategy for the External Dimension of the Bologna Process. 1. Introduction

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

UPPER SECONDARY CURRICULUM OPTIONS AND LABOR MARKET PERFORMANCE: EVIDENCE FROM A GRADUATES SURVEY IN GREECE

Initial English Language Training for Controllers and Pilots. Mr. John Kennedy École Nationale de L Aviation Civile (ENAC) Toulouse, France.

Evidence-Centered Design: The TOEIC Speaking and Writing Tests

DEPARTMENT OF SOCIOLOGY CONTACTS: ADDRESS. Full Professor Saša Boţić, Ph.D. HEAD OF THE DEPARTMENT. Assistant Professor Karin Doolan, Ph.D.

Ten years after the Bologna: Not Bologna has failed, but Berlin and Munich!

Purpose of internal assessment. Guidance and authenticity. Internal assessment. Assessment

Programme Specification. MSc in International Real Estate

An Analysis of the Early Assessment Program (EAP) Assessment for English

Learning Microsoft Office Excel

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Special Edition. Starter Teacher s Pack. Adrian Doff, Sabina Ostrowska & Johanna Stirling With Rachel Thake, Cathy Brabben & Mark Lloyd

Study on the implementation and development of an ECVET system for apprenticeship

The Werner Siemens House. at the University of St.Gallen

Mathematics Program Assessment Plan

BASIC EDUCATION IN GHANA IN THE POST-REFORM PERIOD

Focus on. Learning THE ACCREDITATION MANUAL 2013 WASC EDITION

Quality in University Lifelong Learning (ULLL) and the Bologna process

Bilingual Staffing Guidelines

EDUCATION AND DECENTRALIZATION

CONNECTICUT GUIDELINES FOR EDUCATOR EVALUATION. Connecticut State Department of Education

UNIVERSITY OF THESSALY DEPARTMENT OF EARLY CHILDHOOD EDUCATION POSTGRADUATE STUDIES INFORMATION GUIDE

Abbey Academies Trust. Every Child Matters

Oklahoma State University Policy and Procedures

Evidence for Reliability, Validity and Learning Effectiveness

IB Diploma Subject Selection Brochure

Psychometric Research Brief Office of Shared Accountability

Textbook Evalyation:

ELDER MEDIATION INTERNATIONAL NETWORK

VTCT Level 3 Award in Education and Training

Definitions for KRS to Committee for Mathematics Achievement -- Membership, purposes, organization, staffing, and duties

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

NATIONAL CENTER FOR EDUCATION STATISTICS RESPONSE TO RECOMMENDATIONS OF THE NATIONAL ASSESSMENT GOVERNING BOARD AD HOC COMMITTEE ON.

GUIDE TO EVALUATING DISTANCE EDUCATION AND CORRESPONDENCE EDUCATION

The Oregon Literacy Framework of September 2009 as it Applies to grades K-3

Academic Program Assessment Prior to Implementation (Policy and Procedures)

Guidelines for Project I Delivery and Assessment Department of Industrial and Mechanical Engineering Lebanese American University

ASSESSMENT REPORT FOR GENERAL EDUCATION CATEGORY 1C: WRITING INTENSIVE

South Carolina English Language Arts

College of Engineering and Applied Science Department of Computer Science

Colorado State University Department of Construction Management. Assessment Results and Action Plans

Integrating Common Core Standards and CASAS Content Standards: Improving Instruction and Adult Learner Outcomes

Transcription:

Calibrating Standards-based Assessment Tasks for English as a First Foreign Language Standard-setting Procedures in Germany

Claudia Harsch, Hans Anand Pant, Olaf Köller (Eds.) Calibrating Standards-based Assessment Tasks for English as a First Foreign Language Standard-setting Procedures in Germany Volume 2 Waxmann 2010 Münster / New York / München / Berlin

Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.d-nb.de. ISBN 978-3-8309-2299-5 Waxmann Verlag GmbH, Münster 2010 www.waxmann.com info@waxmann.com Cover design: Christian Averbeck, Münster Layout: Stoddart Satz- und Layoutservice, Münster Print: Hubert & Co., Göttingen Printed on age-resistant paper, acid-free as per ISO 9706 All rights reserved Printed in Germany

Table of Contents Preamble...7 Claudia Harsch and Simon P. Tiffin-Richards Chapter 1: Setting Standards in Line with the Common European Framework of Reference 1.1 Educational Standards and Standards-based Assessment...11 1.1.1 Educational Reform in Germany...11 1.1.2 The IQB...12 1.1.3 Tests to Evaluate the NES for English as the first Foreign Language (EFL)...14 1.1.4 Test Development...16 1.2 Relating IQB Tests to CEF-levels...18 1.2.1 Familiarization...21 1.2.2 Specification...22 1.2.3 Standardization...23 1.2.4 Empirical Validation...25 1.3 Rationale for Selecting the Item Pool...26 1.4 Description of Panelists...28 1.5 Standard-setting Procedure at IQB...30 1.5.1 Possible Avenues...31 1.5.2 Standard-setting Context and Methods Chosen...32 1.6 Research into Validity of the Standard-setting Procedure...34 André A. Rupp and Raphaela Porsch Chapter 2: Standard-setting Item Pool 2.1 Overview of Pilot Study Design...37 2.1.1 Sampling Process for Students...37 2.1.2 Test Design...38 2.1.3 Administration of Test Booklets...40 2.2 Methodology...40 2.3 Analyses for Reading and Listening Comprehension Scales...41 2.3.1 CTT Analyses...41 2.3.2 IRT Analyses...42 2.3.3 Item Fit Analyses...42 2.3.4 Item Bias Analyses...43 2.3.5 Basic Validity Analyses...44 2.3.6 Scale Linking Across Designs...45 2.4 Analyses for Written Expression Scale...45 2.4.1 Rating Criteria...46 2.4.2 Rater Training...46 2.5 Descriptive Summary Statistics...47 2.6 Results from Multi-faceted Rasch Analyses...50 5

Simon P. Tiffin-Richards Chapter 3: The Bookmark Standard-setting Method 3.1 Background...57 3.2 Bookmark Standard-setting Method...58 3.2.1 Design...58 3.2.2 Participants...59 3.2.3 Materials...60 3.2.4 Training...61 3.2.5 Methodological Issues...61 3.2.6 Data Collection...64 3.3 Standard-setting Workshop Results...64 3.3.1 Cut-score Summaries...64 Karen Draney and Cathleen Kennedy Chapter 4: The Standard-setting Criterion Mapping Method 4.1 Background...69 4.2 Preparation...72 4.2.1 Preparation of the Software...72 4.2.2 Preparation of the Participants...76 Karen Draney, Cathleen Kennedy, Steve Moore and Linda Morell Chapter 5: Procedural Standard-setting Issues 5.1 Criterion Mapping...81 5.1.1 Report of Conduct for Criterion Mapping...81 5.1.2 Data Collection...86 5.1.3 Issues, Problems, Future Recommendations...89 5.2 Bookmark...91 5.2.1 Training Session...91 5.2.2 Data Collection...97 5.3 Summary and Conclusions...104 Simon P. Tiffin-Richards and Olaf Köller Chapter 6: Comparison and Synthesis of Multiple Standard-setting Methods and Panels 6.1 Setting Educational Standards...107 6.2 The Standard-setting Study...107 6.2.1 Panel Composition Factor...108 6.2.2 Method Modification Factor...109 6.3 General Conclusions...110 6.4 Synthesizing and Reporting Results...110 References...113 Appendices...119 6

Preamble This document is the second in a multi-part technical report series describing the development, calibration, and validation of standards-based tests for English as a first foreign language at the Institute for Educational Progress (Institut zur Qualitätsentwicklung im Bildungswesen, IQB) in Berlin, Germany. The entire report series can be viewed as the documentary cornerstone to provide evidence for what Bachmann (2005) calls the assessment use argument for these tests, which is comprised of the assessment validity argument and the assessment utilization argument. This second report provides empirical evidence of the calibration study of the test item pool and of the standard-setting procedures. Its aim is to transparently describe the decisions, methods and procedures which led to setting cut-scores and standards in alignment with the National Educational Standards (NES) (i.e., the Länderübergreifende Bildungsstandards) for English as a first foreign language at the upper secondary level for the Hauptschulabschluss (KMK, 2003) and the Mittlerer Bildungsabschluss (KMK, 2004) and in alignment with the Common European Framework of Reference for Languages (CEF) (Council of Europe, 2001). In chapter 1, we outline the aims and charges of the IQB within the context of evaluating the NES in Germany by means of standardized proficiency tests. Since the NES refer to the CEF with regards to the competency model and the proficiency levels they delineate, the process of aligning the proficiency tests to the CEF levels takes center stage in this chapter. We describe the purpose of the tests, the aims of the standard-setting procedures, the rationale of the chosen procedures, as well as the use of the Manual for Linking Language Examinations to the CEF (Council of Europe, 2003a 1 ). Chapter 2 contains relevant empirical information that formed the basis of the item and task selection for the standard-setting process, which is described in the following chapters in more detail. Specifically, this chapter describes the structure of the item pool and the respondent population that were used for the pilot study in terms of their key characteristics. A detailed description of the scaling process for the three domains of reading, listening, and writing is then provided. This description includes a summary of information about model fit and item bias analyses. The following two chapters delineate the two chosen methods, the Bookmark method and the Criterion Mapping method, which was developed by the Berkeley Assessment and Evaluation Research Center (BEAR), University of California, United States. We describe the background, design, participants preparation, and processes. Chapter 5 provides information on conducting the actual standardsetting sessions, including procedures followed, data gathered, and issues and problems that arose, for both the Criterion Mapping and Bookmarking methods. 1 Since the revised version of the Manual was not yet available at the time when the standard-setting project was planned and conducted, we refer to the Preliminary Pilot Version of the Manual (Council of Europe, 2003a). 7

The final chapter provides a synthesis of the standard-setting study s results, thoughts on their interpretation, and implications of how the results are reported and presented to stakeholders and policy makers. The standard-setting workshop took place in July 2008 in Potsdam, Germany and was conducted in collaboration with Karen Draney, Cathleen Kennedy, Steve Moore and Linda Morell (BEAR). It was a joint effort of numerous collaborators who were instrumental in planning, preparing and conducting this study: We would particularly like to thank our colleagues from the BEAR Assessment Center, all participants in the workshop who traveled as far as from Switzerland and Norway, and our expert team consisting of Dr. Neus Figueras, Dr. Rita Green, Dr. Felianka Kaftandjieva, Gabriele Kecker, Prof. Dr. Dr. Rainer Lehmann, Dr. Eli Moe, Prof. Dr. Günter Nold, Prof. Dr. Konrad Schroeder and Prof. Dr. Wolfgang Zydatiss for their advice during regular meetings. Last but not least we would like to thank the colleagues and student workers at the IQB who were dedicated to making this workshop and study possible. 8

Claudia Harsch and Simon P. Tiffin-Richards Chapter 1: Setting Standards in Line with the Common European Framework of Reference

1.1 Educational Standards and Standards-based Assessment 1.1.1 Educational Reform in Germany Plans for educational reforms in Germany were made after international large-scale achievement studies in the 1990s (such as TIMSS and PISA, e.g., Baumert et al., 2001, Prenzel et al., 2004) revealed mediocre results amongst German 9 th graders (for a more detailed discussion, cf. Rupp, Vock, Harsch, & Köller, 2008 in the Technical Report I). The Standing Conference of the Ministers of Education and Cultural Affairs of the Federal States in Germany (KMK) took several steps to improve educational outcomes. The ministers agreed that only a teaching approach promoting the development of competencies that are based on real-life demands in authentic contexts, coupled with the teaching of basic skills, will achieve the longterm goal of improving the situation of education in general (http://www.kmk.org/ bildung-schule/qualitaetssicherung-in-schulen.html). The following essential fields of action were identified by the KMK: increased support for students with learning difficulties, improvement of classroom-based quality development and quality assurance, timely identification of poor readers, revision of the structure of the educational system, specifically with respect to school-leaving certificates and the paths that lead to them, more intensive use of learning periods and of learning opportunities, and development of improved structures for staffing and school organization. The aim of the KMK is to improve the quality of schooling, to make school-leaving certificates comparable across states, and to achieve a general permeability of the educational system within Germany. For this purpose, National Educational Standards (NES) were seen as an important steering instrument (Klieme et al., 2003), because they represent binding conventions for educational output. NES were released in 2003/2004 for the end of primary and lower secondary schooling and they are obligatory for all schools (http://www.kmk.org/schul/ home1.htm). The NES targeting the secondary phase of schooling formulate school-leaving qualifications for the core subjects German, Mathematics, and the first Foreign Language English or French. They are a blend of content and performance standards describing core competencies to be acquired by students at the end of lower secondary school. Two different standard documents were released by the KMK for each core subject, the first one describing standards for the track leading to a basic general education school leaving certificate (Hauptschulabschluss) at the end of grade 9 (herein after referred to as HSA-track standards); the second one defining the core competencies for the track leading to a more extensive basic general education school leaving certificate (Mittlerer Schulabschluss) at the end of grade 10 (herein after referred to as MSA-track standards). Policy-makers have decided that the NES should reflect the typical achievement of 11

the students. Formally, the NES are normative guidelines for monitoring educational systems, serving the purpose to provide information about cross-sectional achievement status as well as longitudinal achievement trends (Schecker & Parchmann, 2007). In order to coordinate and concentrate different means to steer educational improvement, the KMK (2006) agreed on a Comprehensive Strategy for Educational Quality Monitoring. This strategy proclaims a paradigm shift within the educational policy in Germany towards the concepts of output orientation, reporting and system monitoring. In order to monitor the development of quality within the educational system, educational processes and outcomes are to be evaluated by external means. For that purpose, the KMK (2006) proposed four fundamental procedures: international large-scale assessment studies (such as PISA, PIRLS and TIMSS), centralized evaluation of the NES in a nationwide sample comparing the German federal states (Ländervergleich), state-wide testing schemes at the end of grade 3 and grade 8 linked to the NES (Vergleichsarbeiten), centralized coordinated reporting with regards to educational achievement and other aspects of the educational system (Bildungsberichterstattung von Bund und Ländern). It becomes clear that the NES are a pivotal means for system monitoring and accountability purposes. In order to evaluate, implement and further develop the Standards, a new academic institution was established, the Institute for Educational Progress (Institut zur Qualitätsentwicklung im Bildungswesen, IQB). 1.1.2 The IQB The IQB was founded in 2004 by the KMK, institutionally bound to the Humboldt- University at Berlin. The institute is financed by the 16 federal states. The administrative structure of IQB is shown in figure 1.1. 12

Federal states of Germany Standing Conference of the Ministers of Education and Cultural Affairs of the federal states in Germany Humboldt University Berlin General Secretary Official Committee: Quality assurance in schools Faculty of Philosophy IV Academic experts Board IQB Q Institut zur Qualitätsentwicklung im Bildungswesen Institute for Educational Progress Figure 1.1 Administrative structure of the IQB (from Rupp et al., 2008) In brief, the IQB pursues the aim of operationalizing the NES, providing standardsbased proficiency scales for each domain, assessing achievement levels on these proficiency scales, providing suggestions for further refinement of the Standard documents, and supporting their implementation. Of particular importance for the report here is the development of a large item pool of standards-based test items, which are used to assemble standards-based proficiency tests, which, in turn, are employed to establish the national proficiency scales in alignment with the NES. On the one hand, the test items developed by the IQB serve to define national proficiency scales, and will be used in the nationwide studies starting in 2009 (Ländervergleich as described above). On the other hand, they are made available to the federal states for their own independent use in standardized tests for state-wide or multi-state system monitoring. A small portion of the items will be publicly released to illustrate their design, layout, and purpose. 13

1.1.3 Tests to Evaluate the NES for English as the first Foreign Language (EFL) The NES for the first foreign language are based on the Common European Framework of Reference (CEF). The CEF is described as follows by the European Council: Developed through a process of scientific research and wide consultation, this document provides a practical tool for setting clear standards to be attained at successive stages of learning and for evaluating outcomes in an internationally comparable manner. ( ) The CEF provides a basis for the mutual recognition of language qualifications, thus facilitating educational and occupational mobility. It is increasingly used in the reform of national curricula and by international consortia for the comparison of language certificates. On this subject, also consult the following sections: 1. Manual for relating language examinations to the CEF (Pilot project) 2. Illustrations of levels of language proficiency. ( ) The CEF is a document which describes in a comprehensive manner i) the competences necessary for communication, ii) the related knowledge and skills and iii) the situations and domains of communication. The CEF defines levels of attainment in different aspects of its descriptive scheme with illustrative descriptors scale. (Source: www.coe.int/t/dg4/linguistic/cadre_en.asp) The CEF thus provides a competence model which distinguishes relevant categories of communicative competence on six successive levels of proficiency. The levels are A1 (Breakthrough) and A2 (Waystage), characterizing basic users, B1 (Threshold) and B2 (Vantage), characterizing independent users, and C1 (Effective Operational Proficiency) and C2 (Mastery) characterizing proficient users. The NES for the first foreign language target the CEF-level A2 for the HSA-track, and B1/B+ for the MSA-track of the German school system (cf. appendix A and KMK 2003, 2004). The competence model used in the NES is directly derived from the model in the CEF: 14

Functional Communicative Competencies Communicative Skills Reading comprehension Listening and audio-visual comprehension Speaking (a) Participation in conversations (b) Connected speech Writing Mediation Availability of Linguistic Resources Vocabulary Grammar Pronunciation and Intonation Orthography / Spelling Intercultural Competencies Socio-cultural orientation knowledge Sensitivity for cultural diversity Practical skills for intercultural encounters Methodological Competencies Text reception (reading comprehension and listening comprehension) Interaction Text production (speaking and writing) Learning strategies Presentation skills and skills for media usage Learning awareness and organization of learning processes Figure 1.2 Competence model in the NES (from Rupp et al., 2008) The competence areas and individual competencies in the NES are grounded in the CEF as they were selected and adapted from the descriptions therein and rearranged into a competence model that suits the German context for standardsbased language learning and standards-based assessment. (For a detailed exposition on how these competencies are linked to the competencies and levels of the CEF, cf. Rupp et al., 2008, Section II). As a result of the complexity of many of the competence areas, neither the CEF (cf. Weir, 2005) nor the NES can provide for competence scales that could supply precise information for the development of test items. Thus, the challenge for the development of standards-based tests at the IQB was to operationalize the coarse-grained descriptions in the NES and in the CEF into test specifications that could guide trained item writers to produce test items that are (a) construct valid, in the sense that they are consistent with theoretical models of language competence, and (b) standard valid, in the sense that they are consistent with the formulations in the NES and the CEF. Section IV of the Technical Report I describes in depths how these challenges were met. The overall purpose of the IQB project Standard Evaluation of the first foreign language English was to develop national proficiency scales for the competence areas of reading comprehension, listening comprehension and writing, whereby the levels of proficiency should reflect the CEF levels A1 to C1. Due to experiences from previous large-scale language assessments in Germany, especially the Study 15

of German and English Language Proficiency (Deutsch-Englisch Schülerleistungen International, DESI; Beck & Klieme, 2007), an expected level of proficiency for students in the pilot study sample was available. Based on this information, it was decided that tasks for the HSA-track sample should range from CEF proficiency levels A1 to B1, while tasks for the MSA-track sample should range from CEF proficiency levels A2 to C1; no C2 tasks were used in either sample due to strong expected floor effects. Moreover, C2 tasks were excluded to avoid frustration effects in test-takers. In order to develop the proficiency scales, the IQB was charged to first develop criterion-oriented proficiency tests which operationalize the above mentioned three domains as described in the NES as well as in relevant CEF scales targeting the CEF-levels A1 to C1 (see Rupp et al., 2008 for an overview of the relevant CEF scales). These standards-based tests are designed for large-scale assessment studies for the purpose of system monitoring. However, due to a lack of statistical precision given the small number of items per domain, the actual IQB test booklets are not suitable for individual diagnostic testing. 1.1.4 Test Development The test development was a collaborative endeavor achieved between September 2005 and July 2007 by an interdisciplinary team of language assessment specialists, didacticians, psychometricians, and a team of up to 20 teachers representing all federal states and school types of Germany. The process was monitored and consulted by a group of internationally renowned experts in the field of language assessment. As the test development in its different stages is described in Rupp et al. (2008, sections IV to VI), we will focus here on those features necessary to understand and judge the process of standard-setting and aligning the IQB tests to the CEF. The test development started with training the teachers in the field of item development by an internationally renowned expert. The teachers were familiarized with the CEF competence model and proficiency levels, and trained in writing test specifications as well as in writing test items. They received nine one-week workshops, during which they developed domain- and level-specific test specifications. In between the workshops, the teachers worked in regional groups, receiving feedback and guidance on a regular basis. The item developers characterized and specified each test item with regards to the targeted domain and level, using a catalogue of characteristics based on the Dutch Grid and the ALTE grids (see appendix B for the respective lists of item characteristics employed by IQB). Each test item within the domain of the receptive skills targets one specific CEF-level only; therefore, the items are called level-specific items. Nevertheless, one task (i. e., a testlet consisting of standardized instruction, reading or listening stimulus and test items) may entail items targeting two adjacent levels. The items testing receptive skills focus on one reading or listening behavior within one task and there is only one format per task. The receptive item pool includes a variety of 16