A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise Maria Cutumisu, Kristen P. Blair, Daniel L. Schwartz, Doris B. Chin Stanford Graduate School of Education Please address all correspondence to: Maria Cutumisu Stanford Graduate School of Education 485 Lasuen Mall, Stanford, CA 94305 Tel: (650) 666 9021 cutumisu@stanford.edu Paper to be presented at the Annual Meeting of the American Educational Research Association (AERA) Chicago, April 16-20, 2015 1

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise Maria Cutumisu, Kristen P. Blair, Daniel L. Schwartz, Doris B. Chin Stanford Graduate School of Education 485 Lasuen Mall, Stanford, CA 94305 Abstract We introduce a game-based assessment approach to measure students learning choices. We describe our overarching assessment principles and we present Posterlet, a game in which students create posters and learn graphical design principles. We designed Posterlet to assess children s choices to seek constructive positive or negative feedback and to revise. Several hundred middle-school students played Posterlet. Results show that seeking negative feedback correlates with in-game learning and standardized measures of school achievement. Our research presents a first-of-kind examination of feedback choices, showing that the willingness to seek negative feedback is a wise strategy for learning and it can be measured. We can now develop and evaluate models of instruction that help students choose feedback effectively. OBJECTIVE A major goal of formal and informal education is to prepare students to be independent learners who can make choices about what and how to learn (Schwartz & Arena, 2013). An impediment to achieving this goal is knowing whether we are succeeding. We need new assessment tools that go beyond measuring students knowledge at the end of instruction: we need to measure students abilities to make good learning choices. We are developing game-based assessments to detect specific choices students make while learning, indicative of whether educational experiences foster independent learners. First, we describe our three assessment commitments. Second, we introduce Posterlet, a game designed to collect children s feedback and revision choices. Third, we present empirical evidence that Posterlet measures an important set of choices for learning. Three Assessment Principles BACKGROUND 1) Typical Performance. Assessments should measure typical behaviors, not test behaviors, so we can investigate how students will perform in non-didactic contexts. Therefore, we create short and engaging games, such as Posterlet where children design posters. 2) Preparation for Future Learning. Assessments should include opportunities to learn, so we can measure whether educational experiences have prepared students to make choices relevant to learning (Bransford & Schwartz, 1999). Posterlet includes 21 graphical design principles that students can learn. 3) Choice. Choices about learning should be free, not right or wrong. In Posterlet, students can design posters and complete the game regardless of their learning choices. 2

Posterlet Design Figure 1 shows Posterlet s flow. Students pick a funfair booth and design a poster using graphical design tools. Upon poster completion, students pick three characters from a focus group to provide feedback. From each character, students choose a box for negative ( I don t like ) or positive feedback ( I like ), as shown in Figure 2. Next, they choose whether to revise their poster before submitting. Finally, they see the ticket sales for their booth. Students complete this poster design cycle for two more booths. Figure 1. The Posterlet game flow: students create three posters. Figure 2. Students may choose positive ( I like ) or negative ( I don t like ) feedback from each of the three characters they selected from the focus group. 3

An intelligent feedback system evaluates each poster against 21 graphical design principles (Figure 3) and uses a prioritization scheme to select which to emphasize in the feedback. Both positive and negative feedback are equally informative. For instance, if children choose negative feedback and it is appropriate, they might receive: People need to be able to read it. Some of your words are too small. If children choose positive feedback and it is appropriate, they might receive: Your poster has big letters. Really easy to read. Students can seek feedback on initial posters but not on revisions. Figure 3. The three categories of the 21 graphical design principles used by the system to generate feedback: information, readability, and space use. Data Sources We keep complete log files, but here we focus on three types of data. How many times did children choose negative feedback out of a maximum of 9 (3 feedback x 3 posters)? How many times did children revise out of a maximum of 3 (one opportunity per poster)? What was the quality of the students posters? We also included a posttest to independently measure how many of the design principles students learned. Additionally, we had access to the children s standardized achievement scores. Finally, we sampled children from different schools. The Measurement Construct The feedback literature yields mixed results regarding the impact of negative feedback on learning (Kluger & DeNisi, 1998). Moreover, negative feedback signals a need to change and learn, but it runs the risk of ego threat that leads people to shut down rather than revise (Hattie & Timperley, 2007). This suggests that students attitudes towards seeking feedback could have large implications for learning. In feedback research, students rarely have independent control over their feedback; the feedback arrives without choice (but see Roll, Aleven, McLaren, & Koedinger, 2011). So, while there is a reason to believe that attitudes towards feedback influence learning, there is no evidence whether independent learning choices about seeking feedback are important. Hence, we designed Posterlet to address this question. 4

LOGIC OF THE INVESTIGATION We hypothesize that seeking negative feedback and revising yields better learning. If so, choices about feedback are worthwhile to assess. This is especially true if the goal is to foster independent learners who will likely have to make choices about seeking feedback beyond school. Unlike assessments of knowledge and skills, choice faces a special challenge. For 2 + 2, we know 4 is right and 5 is wrong. The best learning choice is not so clear cut. Ideally, we could rely on an empirical literature, but for choosing negative feedback and revision there is none. We have the burden of showing that some choices are better for learning. This is the major goal of the current research. It is important to highlight that we do not analyze choice as a source of motivation for learning (Iyengar & Lepper, 1999). Instead, we are asking if we can measure the value of specific choices for learning. We examine two classes of evidence that the choice to seek negative feedback and revise correlate with better learning: 1) Internal: Do choices to seek negative feedback or revise correlate with learning graphical design principles in the game? 2) External: Do children who seek negative feedback or revise exhibit better learning outside the game? We also address whether these choices can be influenced by experience (i.e., they are within the reach of education). Therefore, we consider a third class of evidence. 3) Experiential: Do choices to seek negative feedback or revise reflect differences in prior experiences? Participants and Procedures METHODS Participants were students from two public middle schools in New York City and Chicago, as shown in Table 1. They played Posterlet (~15 minutes) followed by an online posttest (~4 minutes), taken individually in a classroom setting, as one of several assessments administered by external school evaluators. Not all students completed the posttest, and we did not receive achievement records for all students. We also removed students who were not within ±2SD from the mean game duration to account for nonadherence to proctors rules. Thus, sample sizes vary across analyses depending on the available data. Table 1. Poster and Posttest Participant Information. 5

Dependent Measures Choices. Negative Feedback measures the number of I don t like choices made by the student (0-9). Revision measures the number of posters a student chose to revise (0-3). In-Game Learning. To gauge performance improvement, Posterlet generated a cumulative Poster Quality score based on 21 design principles. The quality of each poster is the sum of the scores for each of the 21 features: 1 if a feature is always used correctly on a poster, 0 if a feature is not included on the poster, and -1 if a feature is used incorrectly on a poster. Poster Quality measures the sum of each of the three posters quality. A separate Posttest evaluated student learning. Students had to describe common mistakes, provide written feedback regarding a given poster, and note what was good and bad about the same poster using a checklist, as shown in Figure 4. Open responses were coded by two evaluators with a reliability r>.8. The first two questions were scored by assigning one point for each of the 21 graphical design principles mentioned in the answer. The last two questions were scored by assigning one point for each correct answer that was not contradicted on the answers of the other question. The Posttest score is the sum of the normalized scores for each question. Out-of-Game Learning. We received standardized reading and mathematics achievement scores based on the respective state tests. Figure 4. Posttest questions: checked items are the correct answers for Question 3 and 4, respectively. 6

RESULTS Do choices to seek negative feedback or revise correlate with in-game learning? Table 2 shows zero-order correlations among the choice and learning outcome variables across schools. Negative Feedback and Revision choices correlate with both measures of internal learning (Poster Quality and Posttest). Poster Quality can be taken as a measure of learning as students improve over the levels of the game, F(2, 471) =14.8, p<.001. Table 2. Correlations between negative feedback, revision, and in-game learning outcomes. Negative Feedback and Revision were also highly correlated. To determine if Negative Feedback and Revision are independent predictors of learning outcomes, we entered both in regressions. For Poster Quality, Negative Feedback and Revision were significant predictors; t(470)=3.2, p=.002 and t(470)=5.4, p<.001, respectively. For Posttest, Negative Feedback: t(411)=2.9, p=.004 and Revision: t(411)=3.0, p<.003 were also significant predictors. The same pattern of results occurs when analyzing the data per school. Thus, the relation between these choices and learning appears to be stable and the choice to seek negative feedback seems beneficial for learning, even though both positive and negative feedback were informative. Do choices to seek negative feedback or revise correlate with academic achievement? Are we only measuring behaviors that are useful in the context of the game? To find out, we correlated game choices with achievement scores. Table 3 shows correlations by school. The association of Negative Feedback with achievement outcomes is significant across the board, whereas Revision shows more modest and variable correlations. The fact that choices to seek negative feedback exhibit similar correlations across different state tests and demographics indicates the stability of the measure. It also indicates that in-game choice assessments can predict out-of-game achievement. Table 3. Correlations between negative feedback, revision, and outside assessments. 7

Do choices to seek negative feedback and revise reflect differences in out-of-game experiences? Can the choice to seek negative feedback be influenced by experience? NYC students chose Negative Feedback significantly more often that Chicago students; M NYC =3.96 (SD=2.48) and M Chicago =3.24 (SD=2.14), t(402)=-3.2, p=.001. There were no appreciable differences in rates of revision; M NYC =1.1 (SD=1.14), M Chicago =1.2 (SD=1.14), p=.5. In terms of overall posttest performance, the schools were not different overall, p<.25. However, for the two checklist measures of the posttest (questions 3 and 4), NYC performed significantly better, F(1,390)=5.9, p=.015. Thus, students prior experiences are mediators of the choice to seek negative feedback. DISCUSSION Is a choice-based assessment feasible and potentially useful? We tested the general proposition by collecting choices about negative versus positive feedback, and revising versus not. We found that the degree of seeking negative over positive feedback correlated with children s learning within the assessment, their performance on standardized achievement measures, and their school experiences. The results also persisted across the ages included in this sample. To our knowledge, this is a first demonstration that choosing negative feedback yields better learning. We found that the choice to revise also correlated with learning outcomes and seeking negative feedback. However, the correlations of revision with achievement and school differences were less consistent, so we will focus on them less in this discussion. Our research was designed to evaluate choice as an assessment construct, not to determine causes. The differences in seeking negative feedback could be a function of students different cities, parental income, school curricula, teachers, climate, and other possibilities (Aikens & Barbarin, 2008). Now that we have demonstrated a way to measure choices and that they are important, it will be possible for researchers to determine why some students seek negative feedback. Additionally, educators can evaluate if a curriculum prepares students to make such independent learning choices. In this work, we sought convergent validity by showing that negative feedback choices correlated with several outcomes. An important next step will involve collecting divergent validity. For instance, it might be useful to know whether seeking negative feedback shows a different pattern of correlation with learning outcomes than other relevant predictors (e.g., self-efficacy, fixed mindset). In the meantime, the effect of seeking negative feedback for learning raises interesting psychological questions. We are currently investigating the effect of choosing vs. receiving negative feedback on learning. For example, letting patients choose their level of pain medication led to lower doses than when doses were prescribed by the medical staff (Haydon et al., 2011). Similarly, choosing negative feedback may diffuse ego threat. Further, if the students are assigned negative feedback, would that lead to less learning than if they choose it? The question has relevance to many instructional technologies. 8

CONCLUSION We developed a choice-based assessment game, Posterlet, to track behaviors that we hypothesized are important for learning. The data provide a first-of-kind demonstration that choosing negative feedback predicts better learning in-game and in-school. We are working to demonstrate that it predicts out-of-school learning as well, although the game itself may be considered out-of-school. We have gathered an initial warrant that we can measure independent learning by looking at children s behavior in a fun game where there is something to learn and where children can choose whether to do so. ACKNOWLEDGEMENTS Funding from the Gordon and Betty Moore foundation and the NSF (Grant # 1228831) supported this work. The results and interpretations do not represent those of the granting agencies. We thank Richard Arum for including our assessment within his larger project, Neil Levine for his artwork, Jacob Haigh for programming Posterlet, Howard Palmer for his work on Choicelets, and all the students and teachers for participating in our study. REFERENCES Aikens, N. L., & Barbarin, O. (2008). Socioeconomic differences in reading trajectories: The contribution of family, neighborhood, and school contexts. Journal of Educational Psychology, 100(2), 235 251. Bransford, J. D., & Schwartz, D. L. (1999). Rethinking transfer: A simple proposal with multiple implications. Review of research in education, 61-100. Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81 112. Haydon, M. L., Larson, D., Reed, E., Shrivastava, V. K., Preslicka, C. W., & Nageotte, M. P. (2011). Obstetric outcomes and maternal satisfaction in nulliparous women using patient-controlled epidural analgesia. American Journal of Obstetrics and Gynecology, 205(3), 271-e1. Iyengar, S. S., & Lepper, M. R. (1999). Rethinking the value of choice: a cultural perspective on intrinsic motivation. Journal of Personality and Social Psychology, 76(3), 349. Kluger, A. N., & DeNisi, A. (1998). Feedback interventions: toward the understanding of a double-edged sword. Current Directions in Psychological Science, 7(3), 67 72. Roll, I., Aleven, V., McLaren, B. M., & Koedinger, K. R. (2011). Improving students help-seeking skills using metacognitive feedback in an intelligent tutoring system. Learning and Instruction, 21(2), 267-280. Schwartz, D. L., & Arena, D. (2013). Measuring what matters most: Choice-based assessments for the digital age. MIT Press. 9