Game-based formative assessment: Newton s Playground. Valerie Shute, Matthew Ventura, & Yoon Jeon Kim (Florida State University), NCME, April 30, 2013

Similar documents
Instructional Approach(s): The teacher should introduce the essential question and the standard that aligns to the essential question

Catchy Title for Machine

Detecting Student Emotions in Computer-Enabled Classrooms

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Final Teach For America Interim Certification Program

Guru: A Computer Tutor that Models Expert Human Tutors

Innovative Teaching in Science, Technology, Engineering, and Math

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

Results In. Planning Questions. Tony Frontier Five Levers to Improve Learning 1

Sugar And Salt Solutions Phet Simulation Packet

Third Misconceptions Seminar Proceedings (1993)

Greek Teachers Attitudes toward the Inclusion of Students with Special Educational Needs

Teaching a Laboratory Section

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

ScienceDirect. Noorminshah A Iahad a *, Marva Mirabolghasemi a, Noorfa Haszlinna Mustaffa a, Muhammad Shafie Abd. Latif a, Yahya Buntat b

Individual Differences & Item Effects: How to test them, & how to test them well

2.B.4 Balancing Crane. The Engineering Design Process in the classroom. Summary

What effect does science club have on pupil attitudes, engagement and attainment? Dr S.J. Nolan, The Perse School, June 2014

Comparing Teachers Adaptations of an Inquiry-Oriented Curriculum Unit with Student Learning. Jay Fogleman and Katherine L. McNeill

Interdisciplinary Journal of Problem-Based Learning

Science Fair Rules and Requirements

CS Machine Learning

Axiom 2013 Team Description Paper

Cooper Upper Elementary School

What s the Weather Like? The Effect of Team Learning Climate, Empowerment Climate, and Gender on Individuals Technology Exploration and Use

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Finding a Classroom Volunteer

STUDENT SATISFACTION IN PROFESSIONAL EDUCATION IN GWALIOR

PSIWORLD Keywords: self-directed learning; personality traits; academic achievement; learning strategies; learning activties.

MIDDLE AND HIGH SCHOOL MATHEMATICS TEACHER DIFFERENCES IN MATHEMATICS ALTERNATIVE CERTIFICATION

The Impact of Formative Assessment and Remedial Teaching on EFL Learners Listening Comprehension N A H I D Z A R E I N A S TA R A N YA S A M I

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

The My Class Activities Instrument as Used in Saturday Enrichment Program Evaluation

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

Robot manipulations and development of spatial imagery

CHAPTER III RESEARCH METHOD

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

Running head: METACOGNITIVE STRATEGIES FOR ACADEMIC LISTENING 1. The Relationship between Metacognitive Strategies Awareness

Word Segmentation of Off-line Handwritten Documents

SESSION 2: HELPING HAND

Process Evaluations for a Multisite Nutrition Education Program

Lecture 1: Machine Learning Basics

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

9.85 Cognition in Infancy and Early Childhood. Lecture 7: Number

VIEW: An Assessment of Problem Solving Style

The Approaches to Teaching Inventory: A Preliminary Validation of the Malaysian Translation

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Foster City Elementary School

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

All Systems Go! Using a Systems Approach in Elementary Science

Missouri 4-H University of Missouri 4-H Center for Youth Development

ECON 365 fall papers GEOS 330Z fall papers HUMN 300Z fall papers PHIL 370 fall papers

A BOOK IN A SLIDESHOW. The Dragonfly Effect JENNIFER AAKER & ANDY SMITH

Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to Cognition and Instruction.

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Reasons Influence Students Decisions to Change College Majors

Biological Sciences, BS and BA

The Agile Mindset. Linda Rising.

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Van Andel Education Institute Science Academy Professional Development Allegan June 2015

"Be who you are and say what you feel, because those who mind don't matter and

Effective Pre-school and Primary Education 3-11 Project (EPPE 3-11)

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Building Student Understanding and Interest in Science through Embodied Experiences with LEGO Robotics

Application of Virtual Instruments (VIs) for an enhanced learning environment

Affecting Factors to Improve Adversity Quotient in Children through Game-based Learning

Confirmatory Factor Structure of the Kaufman Assessment Battery for Children Second Edition: Consistency With Cattell-Horn-Carroll Theory

Executive Summary. Hialeah Gardens High School

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

Programme Specification

ACBSP Related Standards: #3 Student and Stakeholder Focus #4 Measurement and Analysis of Student Learning and Performance

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Sheila M. Smith is Assistant Professor, Department of Business Information Technology, College of Business, Ball State University, Muncie, Indiana.

SCIENCE DISCOURSE 1. Peer Discourse and Science Achievement. Richard Therrien. K-12 Science Supervisor. New Haven Public Schools

Empowering Students Learning Achievement Through Project-Based Learning As Perceived By Electrical Instructors And Students

Computer Science and Information Technology 2 rd Assessment Cycle

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Cognitive Self- Regulation

PREDISPOSING FACTORS TOWARDS EXAMINATION MALPRACTICE AMONG STUDENTS IN LAGOS UNIVERSITIES: IMPLICATIONS FOR COUNSELLING

Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators

Genevieve L. Hartman, Ph.D.

Helping Students Get to Where Ideas Can Find Them

White Paper. The Art of Learning

Suggestions for Material Reinforcement

The Efficacy of PCI s Reading Program - Level One: A Report of a Randomized Experiment in Brevard Public Schools and Miami-Dade County Public Schools

Technology and Assessment Study Collaborative

Python Machine Learning

Abc Of Science 8th Grade

EFFECTS OF MATHEMATICS ACCELERATION ON ACHIEVEMENT, PERCEPTION, AND BEHAVIOR IN LOW- PERFORMING SECONDARY STUDENTS

UCEAS: User-centred Evaluations of Adaptive Systems

Measurement & Analysis in the Real World

Introductory Astronomy. Physics 134K. Fall 2016

PHYSICS 40S - COURSE OUTLINE AND REQUIREMENTS Welcome to Physics 40S for !! Mr. Bryan Doiron

Understanding Games for Teaching Reflections on Empirical Approaches in Team Sports Research

Week 01. MS&E 273: Technology Venture Formation

eportfolio Guide Missouri State University

Demographic Survey for Focus and Discussion Groups

Supporting Students Construction of Scientific Explanation through Generic versus Context- Specific Written Scaffolds

Academic Internships: Crafting, Recruiting, Supervising

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Transcription:

Game-based formative assessment: Newton s Playground Valerie Shute, Matthew Ventura, & Yoon Jeon Kim (Florida State University), NCME, April 30, 2013

Fun & Games Assessment Needs

Game-based stealth assessment

Games (fun) Assessment (rigor) Control Interactivity Validity Reliability Feedback Goals/rules Fairness Efficiency

Enjoyment Control & Games 3.9 3.7 3.66 3.5 3.3 3.1 2.9 2.93 2.7 Low High Control Klimmt, C., Hartmann, T., & Frey, A. (2007). Effectance and Control as Determinants of Video Game Enjoyment. CyberPsychology & Behavior, 10(6), 845-848. doi: 10.1089/cpb.2007.9942

Control & Assessment Difficulty Student A Student B Time

Feedback & Assessment QUESTION: When students are given good feedback on their task solutions, does their learning render the assessment less valid, reliable, or efficient? ANSWER: No SEE: Shute, V. J., Hansen, E. G., & Almond, R. G. (2008). You can't fatten a hog by weighing it Or can you? Evaluating an assessment for learning system called ACED. International Journal of Artificial Intelligence and Education, 18(4), 289-316.

Stealth Assessment Features Seamless & Ubiquitous Assessment When the cook tastes the soup, that s formative; when the guests taste the soup, that s summative. Formative & Diagnostic Accurate & Rich Learner Models Invisible assessment, transparent support!

ECD (e.g., Mislevy, Steinberg, & Almond, 2003) Assessment Models & Metrics Monitor & Diagnose Success

Newton s Playground Goal: guide a to a. Everything obeys basic rules of physics (e.g., gravity, Newton's three laws of motion). Player draws physical objects that "come to life" when drawn (e.g., levers, ramps, pendulums) to get ball to balloon. Players can solve problems in many different ways, striving for the awesomest one. Perfect Pendulum

Qualitative Physics (Ploetzner, VanLehn, 1997) Nonverbal understanding of: 1. Newton s three laws of motion 2. Balance 3. Mass 4. Gravity

Agents of Force/Motion Ramp: Used to change the direction of the motion of the ball (or another object). Lever: Rotates around a fixed point usually called a fulcrum or pivot point. Pendulum: Directs an impulse tangent to its direction of motion. Secured at the top by a pin. Springboard: Stores elastic potential energy from falling weight; becomes kinetic as weight is released.

Difficulty Indices Relative location of ball to balloon. If balloon is above ball, forces player to use lever, springboard, or pendulum to solve the problem (0-1). Obstacles. If pathway between ball and balloon is obstructed, player must project ball in specific trajectory (0-2). Distinct agents of force/motion. A problem may require one or more agents to get ball to the balloon (0-1). Novelty. A problem is not like any other problems played so solution is not easily determined from prior experiences (0-2).

Game design choices in NP Control: Freedom to play any problem anytime (set up in playgrounds of increasing difficulty) Interactivity: Create their own responses; multiple valid solutions Feedback: Gold vs. silver trophies. Goals/rules: super clear (get ball to balloon)

Task-level design choices Balance evidence elicitation» All agents used» Playgrounds balanced Focus evidence» Some levels target just 1 agent (e.g., pendulum only) Increase difficulty (Playgrounds 1-7)» Discrimination Don t suck out the fun» Construction of colorful responses» Variation of challenges

Springboard: Difficulty Sunny Day: Easy SB Jurassic Park: Medium SB

Pendulum problem Used features of the game task to (subtly) constrain players choice of agent

How did our game-design decisions affect the quality of the assessment, learning, and enjoyment? Games (fun) Assessment (rigor) Control Interactivity Validity Reliability Feedback Goals/rules Fairness Efficiency

Construct Validity: External & In-game Physics (N = 166) External measure of physics knowledge (pretest) correlated with in-game measures of mastery (number gold trophies per agent). Correlations: Pretest Scores and NP Trophies Posttest** 0.60 Ramp-silver 0.09 Lever-silver -0.04 Pendulum-silver -0.02 Springboard-silver 0.15 Ramp-gold** 0.24 Lever-gold** 0.23 Pendulum-gold** 0.34 Springboard-gold** 0.41 N = 166; ** p <.01

Results: Construct Consistency 1. CFA Gold trophies by four agents: X 2 /df < 3, CFI >.95, RMSEA <.05, SRMR <.05 Physics Competency.82.80.80.80 Ramp gold Lever gold Pendulum gold Springboard gold.33.35.37.37 2. Intraclass correlation =.85 (Ramp, Level, Pendulum, Springboard gold trophies) 3. Pairwise correlations: RxL =.67; RxP =.64; RxS=.66; LxP=.64; LxS=.63; PxS=.65

Results: Construct Consistency 1. Intraclass correlation =.82 (Easy, Medium, Hard gold trophies) Easy gold Medium gold Hard gold r =.77 r =.66 r =.53 2. Cronbach s alpha =.87 Data: gold trophy info (NA, 0, 1) Valid Cases: 110 (out of 169) Levels: 29 (out of 74)

Results: Learning & Fun How did the decisions work out? Learning: Significant difference between pretest & posttest scores: F (1, 153) = 4.24; p <.05 simply after 4 hr gameplay. Enjoyment: Kids enjoyed the game (1=dislike; 5=like; M=4, SD = 1). Males & females enjoyed equally (after controlling for pretest).

Next Steps: Formative Assessment Info on competencies used by (a) teachers (to adjust instruction & give good feedback), (b) students (to reflect on how they re doing), and (c) system (to select new gaming experiences), such as: Present problem requiring agents not mastered Provide hints re: agent solutions Give rewards for novel agent use Include formalizations (and values) in simulation (e.g., level editor) Display current estimates of competency levels in NP (progress indicators) so students act to improve them. Develop curriculum to wrap around game lesson plans, activities (e.g., student levels demo ed and discussed in class), etc.

Thank you! Questions? Email: vshute@fsu.edu Website: http://www.myweb.fsu.edu/vshute Download NP: http://www.gameassesslearn.org/newton/

Physics Test

Persistence Test VALIDATION OF THE MEASURE Ventura, M., Shute, V. J., & Zhao, W. (2012). The relationship between video game use and a performance-based measure of persistence. Computers & Education, 60, 52-58.

Test Score Feedback in AfL System 56 55 54 C/I FB Elab FB 53 52 51 50 49 48 Pretest Posttest

Jackknife Variance Estimation (Consistency of assessment) Jackknife resampling: Compared variance of full sample (74 levels) with variance caused by different task formats (i.e., levels) Used gold trophy information (NA, 0, and 1) JK variance (1.1) divided by full sample variance (77.57) = 0.015; reliability =.985!

Convergent Validity: Persistence Persistence r =.28** r =.35** (external measure) Time Unsolved (NP measure) Time Silver (NP measure) Time on Gold Trophies (r=.07)

Convergent Validity: Persistence (just low performers) r =.47** Persistence (external measure) r =.42** Time Unsolved (NP measure) Time on Gold Trophies (r=.004) Time Silver (NP measure)

Can there be validity without reliability? (Moss, 1994) Although the focus here is on reliability (consistency among independent measures intended as interchangeable), it should be clear that reliability is an aspect of construct validity (consonance among multiple lines of evidence supporting the intended interpretation over alternative interpretations). And as assessment becomes less standardized, distinctions between reliability and validity blur.