Designing Listening Tests: A Practical Approach
Rita Green has spent many years at the coalface of language test development and training in a variety of international contexts; this book is the sum of this experience. This book is a fantastic resource for anyone looking to develop listening tests: a highly practical, theoretically-grounded guide for teachers and practitioners everywhere. Green covers a range of important principles and approaches; one highlight is the introduction to the textmapping approach to working with sound files. This book is highly recommended for anyone involved in the development of listening tests. Luke Harding, Senior Lecturer, Lancaster University, UK
Designing Listening Tests: A Practical Approach Rita Green
Rita Green UK ISBN 978-1-137-45715-8 DOI 10.1057/978-1-349-68771-8 ISBN 978-1-349-68771-8 (ebook) Library of Congress Control Number: 2016950461 The Editor(s) (if applicable) and The Author(s) 2017 The author(s) has/have asserted their right(s) to be identified as the author(s) of this work in accordance with the Copyright, Designs and Patents Act 1988. This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Palgrave Macmillan imprint is published by Springer Nature The registered company is Macmillan Publishers Ltd. The registered company address is: The Campus, 4 Crinan Street, London, N1 9XW, United Kingdom
Preface Who is this book for? This book is primarily aimed at teachers who would like to develop listening tests for their students but who have little, if any, training in the field of assessment. It is also designed for test developers who have some experience of assessing the listening ability of test takers, but who would like a practical reference book to remind them of the procedures they should follow, and of the many do s and dont s that litter the field of task development. Those who are engaged in MA studies, or other types of research, should also find Developing Listening Tests (DLT) of interest as the book raises many issues which would benefit from further investigation. DLT offers a systematic approach to the development of listening tasks, starting with a discussion of what listening involves, and the importance of drawing up test specifications. It also explores how to exploit sound files and investigates a range of issues related to task development. The book concludes with a look at the benefits of trialling and data analysis, and how to report test scores and set pass marks. Not everyone reading this book will be able to carry out all of these recommended stages. In many cases, even where test developers would like to do this, the demands and limitations of their assessment contexts make some stages very difficult to achieve. What is of importance is to attempt to do as many as possible. v
vi Preface The organisation of this book Each chapter focuses on one major aspect of the task development cycle. Chapter 1 starts with an overview of the issues which a test developer needs to consider when developing a listening test. These include looking at the processes which are involved in real-life listening, how the spoken and written forms of the language differ and what makes listening difficult. The chapter ends with a discussion on why listening is important and introduces the reader to the task development cycle. Chapter 2 discusses the role that test specifications play in assisting the test developer to define the construct underlying the test, and to describe the conditions under which the test taker s performance will be measured. Chapter 3 introduces the reader to a procedure called textmapping which helps test developers to determine the appropriateness of the sound files they would like to use in their task development work and explores how those sound files can be exploited. Chapter 4 focuses on task development, investigates many of the decisions that need to be made at this stage, and provides a set of item writing guidelines to help in this process. The chapter also discusses the role of peer review in task development and provides an outline of how this feedback could work. Chapter 5 consists of a range of sample listening tasks taken from a number of different testing projects. Each task is discussed in turn providing insights into the listening behaviour, the sound file and the task. Links to the sound files are also provided. Chapter 6 focuses on the benefits to be gained from trialling the listening tasks and carrying out data analysis. Chapter 7 explores the different ways test scores can be reported and how pass marks (or cut scores) can be calculated. Readers are provided with insights into how a standard setting session can be run and the importance of producing a post-test report is discussed. Good luck with the book and the task development process! Rita Green UK
Acknowledgements I would like to start by thanking my colleagues and friends for their feedback on previous versions of these chapters. A special mention goes to Karmen Pižorn, Irene Thelen-Schaefer, Caroline Shackleton, David Gardner, Heidi Ford-Schmidt and Astrid Dansoko. I would also like to express my thanks to the following people and organisations who have provided me with copyright permission to include the tasks and/or sound files used in this book: Graham Hyatt, Länderverbundprojekt VerA6, Germany; Julia Grossmann & Linnet Souchon; Walter Indra; The BundesInstitut, Zentrum für Innovation und Quälitatsentwicklung (Bifie), Austria; The Institut zur Qualitätsentwicklung im Bildungswesen (IQB), Humboldt-Universität zu Berlin, Germany; Devawongse Varopakarn Institute of Foreign Affairs (DVIFA), Ministry of Foreign Affairs, Thailand; Paul Vogel; The Department of Foreign Affairs and Trade (DFAT), Australian Government; ipod traveller: www.ipodtraveller.net; Star Radio, Cambridge, UK; Nathan Turner, Centro de Lenguas Modernas, Granada University, Spain; and Luke Harding, Lancaster University, UK. Reprint of SPSS screen images courtesy of International Business Machines Corporation, SPSS, Inc., an IBM Company. vii
Contents 1 What is involved in assessing listening? 1 1.1 What the listening process involves 2 1.2 How listening differs between contexts and listeners 5 1.3 How listening input varies 7 1.4 How the spoken and written forms of the language differ 8 1.5 What makes listening difficult? 11 1.5.1 Nature of listening 11 1.5.1.1 No permanent record 11 1.5.1.2 Lack of real gaps 12 1.5.1.3 Lack of redundancy 12 1.5.2 Complexity of processing 13 1.5.2.1 Multi-tasking 13 1.5.2.2 Controlled versus automatic processing 14 1.5.3 Input 14 1.5.3.1 Content 14 1.5.3.2 Topic 14 1.5.3.3 Sound quality 15 1.5.3.4 Mode of delivery 16 1.5.4 Task 16 1.5.5 Listening environment 17 ix
x Contents 1.5.6 Speaker characteristics 17 1.5.6.1 Speed of delivery 17 1.5.6.2 Number and type of voices 18 1.5.7 Listeners characteristics 18 1.6 Why is assessing listening important? 19 1.7 Summary 20 1.7.1 Task development cycle 21 2 How can test specifications help? 27 2.1 What are test specifications? 27 2.2 Purpose of the test 28 2.3 Target test population 28 2.4 The construct 29 2.5 Performance conditions 34 2.5.1 Input 35 2.5.1.1 Source 35 2.5.1.2 Authenticity 37 2.5.1.3 Quality 38 2.5.1.4 Level of difficulty 39 2.5.1.5 Topics 40 2.5.1.6 Discourse type 40 2.5.1.7 Nature of content 41 2.5.1.8 Number of sound files needed 41 2.5.1.9 Length of sound files 42 2.5.1.10 Mode of delivery 43 2.5.1.11 Number of times heard 43 2.5.1.12 Speaker characteristics 45 2.5.2 Task 46 2.5.2.1 Instructions and the example 46 2.5.2.2 Test method 46 2.5.2.3 Number of items 47 2.5.2.4 Number of tasks 48 2.5.3 Criteria of assessment 48 2.6 Why do we need test specifications? 49 2.7 Summary 51
Contents xi 3 How do we exploit sound files? 55 3.1 Identifying the potential use of a sound file 55 3.2 A procedure for exploiting sound files: Textmapping 57 3.3 Textmapping for gist 59 3.3.1 Defining the listening behaviour 59 3.3.2 Checking for consensus 61 3.3.3 The Gist textmap table 64 3.3.4 Summary of the gist textmapping procedure 66 3.3.5 Textmapping multiple gist files 67 3.4 Textmapping for specific information and important details (SIID) 68 3.4.1 Defining the listening behaviour 68 3.4.2 Checking for consensus 70 3.4.3 The SIID textmap table 72 3.4.4 Summary of the SIID textmapping procedure 74 3.4.5 Textmapping longer SIID sound files 75 3.4.6 Textmapping multiple SIID sound files 76 3.5 Textmapping for main ideas and supporting details (MISD) 76 3.5.1 Defining the listening behaviour 76 3.5.2 Checking for consensus 77 3.5.3 The MISD textmap table 78 3.5.4 Summary of the MISD textmapping procedure 80 3.6 Re-textmapping 82 3.7 Useful by-products 82 3.8 Summary 83 4 How do we develop a listening task? 85 4.1 Task identifier (TI) 85 4.2 Task instructions 88 4.3 Task issues 90 4.3.1 Test method 90 4.3.1.1 Multiple matching (MM) 91 4.3.1.2 Short answer questions (SAQ) 92 4.3.1.3 Multiple choice questions (MCQ) 94
xii Contents 4.3.1.4 Other test methods 95 4.3.2 Number of times heard 96 4.3.3 Number of items needed 96 4.3.4 Task layout 96 4.3.5 Mode of delivery 97 4.3.6 Integrated listening tasks 97 4.3.7 Grading issues 97 4.4 Guidelines for developing listening items 98 4.4.1 Sound file 99 4.4.2 Task instructions 100 4.4.3 Item/task development 101 4.4.3.1 General issues 101 4.4.3.2 Test method 103 4.4.3.2.1 General issues 103 4.4.3.2.2 Short answer questions (SAQ) 104 4.4.3.2.3 Multiple matching (MM) 105 4.4.3.2.4 Multiple choice questions (MCQ) 106 4.4.4 Layout issues 107 4.5 Peer review and revision 107 4.5.1 Peer review 108 4.5.2 Revision 112 4.6 Summary 112 5 What makes a good listening task? 115 Introduction 115 Part 1: Multiple matching tasks 116 5.1 Task 1: Reading habits (MM) 116 5.1.1 Sound file 117 5.1.2 Task 117 5.1.2.1 Listening behaviour 117 5.1.2.2 Suitability of test method 118 5.1.2.3 Layout 118 5.2 Task 2: School class (MM) 118
Contents xiii 5.2.1 Sound file 120 5.2.2 Task 120 5.2.2.1 Listening behaviour 120 5.2.2.2 Suitability of test method 121 5.2.2.3 Layout 122 5.3 Task 3: A diplomat speaks (MM) 122 5.3.1 Sound file 124 5.3.2 Task 124 5.3.2.1 Listening behaviour 124 5.3.2.2 Suitability of test method 126 5.3.2.3 Layout 126 Part 2: Short answer tasks 127 5.4 Task 4: Winter holidays (SAQ) 127 5.4.1 Sound file 127 5.4.2 Task 128 5.4.2.1 Listening behaviour 128 5.4.2.2 Suitability of test method 128 5.4.2.3 Layout 129 5.5 Task 5: Message (SAQ) 129 5.5.1 Sound file 129 5.5.2 Task 130 5.5.2.1 Listening behaviour 130 5.5.2.2 Suitability of test method 130 5.5.2.3 Layout 130 5.6 Task 6: Oxfam Walk (SAQ) 131 5.6.1 Sound file 132 5.6.2 Task 132 5.6.2.1 Listening behaviour 132 5.6.2.2 Suitability of test method 132 5.6.2.3 Layout 133 Part 3: Multiple choice tasks 133 5.7 Task 7: Hospital (MCQ) 133 5.7.1 Sound file 133 5.7.2 Task 134
xiv Contents 5.7.2.1 Listening behaviour 134 5.7.2.2 Suitability of test method 135 5.7.2.3 Layout 135 5.8 Task 8: Tourism in Paris 135 5.8.1 Sound file 138 5.8.2 Task 138 5.8.2.1 Listening behaviour 138 5.8.2.2 Suitability of test method 139 5.8.2.3 Layout 139 5.9 Summary 139 5.10 Keys to the sample tasks 141 6 How do we know if the listening task works? 145 Introduction 145 6.1 Why do we trial? 146 6.1.1 Task instructions 146 6.1.2 Amount of time allocated 147 6.1.3 Different test methods 147 6.1.4 Task key 148 6.1.5 Task bias 148 6.1.6 Sample tasks/benchmark performances 149 6.1.7 Tasks for standard setting 149 6.1.8 Test administration guidelines 150 6.1.9 Feedback questionnaires 151 6.1.10 Feedback to stakeholders 153 6.1.11 Test specifications 153 6.1.12 Summary 153 6.2 How do we trial? 154 6.2.1 The test population 154 6.2.2 Trial dates 154 6.2.3 Size of the trial population 155 6.2.4 Test booklet preparation 155 6.2.5 Administration and security issues 157 6.2.6 Marking 158
Contents xv 6.3 Trial results 160 6.3.1 Why carry out a data analysis? 160 6.3.2 How do we carry out a data analysis? 161 6.3.2.1 Stage 1: Frequencies 162 Summary 166 6.3.2.2 Stage 2: Discrimination 166 Summary 168 6.3.2.3 Stage 3: Internal consistency (reliability) 168 Summary 171 6.3.2.4 Overall task difficulty 171 6.3.3 Drop, revise or bank? 172 6.4 Conclusions 172 7 How do we report scores and set pass marks? 175 7.1 Reporting test scores 175 7.1.1 Separate skills or all skills? 175 7.1.2 Weighting of different skills 177 7.1.3 Method of reporting used 178 7.1.4 Norm-referenced approach 179 7.1.5 Criterion-referenced approach 180 7.1.6 Pass marks 181 7.2 Standard setting 182 7.2.1 What is standard setting? 182 7.2.2 Why do we standard set? 183 7.2.3 Who is involved in standard setting? 185 7.2.3.1 Before standard setting 185 7.2.3.2 During standard setting 186 7.2.4 Importance of judge selection 187 7.2.5 Training of judges 188 7.2.6 Selecting a standard setting method 190 7.2.7 Role of statistics in standard setting 191 7.2.8 Standard setting procedure 192 7.2.9 Confirming item and task difficulty levels 194
xvi Contents 7.3 Stakeholder meetings 195 7.4 Sample tasks and test website 195 7.5 Post-test reports 197 7.5.1 Post-test item analysis 197 7.5.2 Recommendations 199 Final thoughts 199 DLT Bibliography 203 Index 205
Acronyms CAID CEFR CITC EFL ICAO IELTS MCQ MISD MM SAQ SEM SEQ SIID SHAPE SLP STANAG TI Cronbach s Alpha if Item Deleted Common European Framework of References Corrected Item Total Correlation English as a Foreign Language International Civil Aviation Organization International English Language Testing System Multiple choice questions Main ideas and supporting details Multiple matching Short answer questions Standard error of measurement Sequencing Specific information and important details Supreme Headquarters Allied Powers Europe Standardized Language Profile Standardised Agreement Task identifier xvii
List of figures Fig. 1.1 Extract from lecture 12 Fig. 1.2 Task development cycle 21 Fig. 2.1 CEFR B2 descriptors 31 Fig. 2.2 STANAG Level 1 descriptors 32 Fig. 2.3 ICAO Level 3 descriptors 32 Fig. 2.4 General listening focus 33 Fig. 2.5 Talking points 36 Fig. 2.6 Test specifications template 51 Fig. 3.1 Instructions for gist textmapping 61 Fig. 3.2 Gist textmapping results 62 Fig. 3.3 Highlighted communalities (gist) 63 Fig. 3.4 Communalities (gist) 63 Fig. 3.5 Gist textmap table 65 Fig. 3.6 Gist textmapping procedure 66 Fig. 3.7 Different types of SIID 69 Fig. 3.8 SIID textmapping results 71 Fig. 3.9 SIID: Textmap Table 1 72 Fig. 3.10 SIID: Textmap Table 2 73 Fig. 3.11 SIID textmapping procedure 74 Fig. 3.12 Main ideas, supporting details and SIID 77 Fig. 3.13 MISD Textmap Table 79 Fig. 3.14 MISD textmapping procedure 80 Fig. 4.1 Task identifier 86 xix
xx List of figures Fig. 5.1 Jane s reading habits (MM) 117 Fig. 5.2 School class (MM) 119 Fig. 5.3 A diplomat speaks (MM) 123 Fig. 5.4 Winter holidays (SAQ) 127 Fig. 5.5 Message (SAQ) 129 Fig. 5.6 Oxfam Walk (SAQ) 131 Fig. 5.7 Hospital (MCQ) 134 Fig. 5.8 Tourism in Paris (MCQ) 136 Fig. 6.1 Feedback questionnaire: Example 1 152 Fig. 6.2 Feedback questionnaire: Example 2 152 Fig. 6.3 Frequencies on Q1 162 Fig. 6.4 Frequencies on Q2-Q4 164 Fig. 6.5 Frequencies on Q5-Q8 165 Fig. 6.6 Popham (2000) Discrimination levels 167 Fig. 6.7 Discrimination indices 167 Fig. 6.8 Reliability statistics 170 Fig. 6.9 Overall task difficulty 171 Fig. 7.1 Extract from CEFR familiarisation exercise (listening) 189 Fig. 7.2 Website materials 196