How to build tutoring systems that are almost as effective as human tutors?

Similar documents
Stephanie Ann Siler. PERSONAL INFORMATION Senior Research Scientist; Department of Psychology, Carnegie Mellon University

Guru: A Computer Tutor that Models Expert Human Tutors

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Discourse Processing for Explanatory Essays in Tutorial Applications

What is PDE? Research Report. Paul Nichols

Teaching a Laboratory Section

WHY SOLVE PROBLEMS? INTERVIEWING COLLEGE FACULTY ABOUT THE LEARNING AND TEACHING OF PROBLEM SOLVING

A Study of the Effectiveness of Using PER-Based Reforms in a Summer Setting

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Evidence for Reliability, Validity and Learning Effectiveness

BSP !!! Trainer s Manual. Sheldon Loman, Ph.D. Portland State University. M. Kathleen Strickland-Cohen, Ph.D. University of Oregon

Effect of Word Complexity on L2 Vocabulary Learning

Group A Lecture 1. Future suite of learning resources. How will these be created?

Physics 270: Experimental Physics

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

On-Line Data Analytics

BEETLE II: a system for tutoring and computational linguistics experimentation

An Introduction to Simio for Beginners

Writing Research Articles

Applications of memory-based natural language processing

Certificate of Higher Education in History. Relevant QAA subject benchmarking group: History

INTERMEDIATE ALGEBRA PRODUCT GUIDE

A politeness effect in learning with web-based intelligent tutors

KLI: Infer KCs from repeated assessment events. Do you know what you know? Ken Koedinger HCI & Psychology CMU Director of LearnLab

BENCHMARK TREND COMPARISON REPORT:

A Study of Metacognitive Awareness of Non-English Majors in L2 Listening

MYCIN. The MYCIN Task

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

An Interactive Intelligent Language Tutor Over The Internet

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

MSc Education and Training for Development

The Impact of Instructor Initiative on Student Learning: A Tutoring Study

Office Hours: Mon & Fri 10:00-12:00. Course Description

Inquiry Learning Methodologies and the Disposition to Energy Systems Problem Solving

Introduction to Simulation

(Sub)Gradient Descent

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Introduction. Chem 110: Chemical Principles 1 Sections 40-52

Just in Time to Flip Your Classroom Nathaniel Lasry, Michael Dugdale & Elizabeth Charles

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

MTH 141 Calculus 1 Syllabus Spring 2017

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

How to learn writing english online free >>>CLICK HERE<<<

Instructional Approach(s): The teacher should introduce the essential question and the standard that aligns to the essential question

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

Lecture 1: Machine Learning Basics

AQUA: An Ontology-Driven Question Answering System

Designing e-learning materials with learning objects

Measurement. Time. Teaching for mastery in primary maths

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

AGENDA. Truths, misconceptions and comparisons. Strategies and sample problems. How The Princeton Review can help

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

How do adults reason about their opponent? Typologies of players in a turn-taking game

Predatory Reading, & Some Related Hints on Writing. I. Suggestions for Reading

ABET Criteria for Accrediting Computer Science Programs

Cal s Dinner Card Deals

Life Imitates Lit: A Road Trip to Cultural Understanding. Dr. Patricia Hamilton, Department of English

LEADERSHIP AND COMMUNICATION SKILLS

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Student Handbook 2016 University of Health Sciences, Lahore

NCSC Alternate Assessments and Instructional Materials Based on Common Core State Standards

A student diagnosing and evaluation system for laboratory-based academic exercises

Control Tutorials for MATLAB and Simulink

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Extending Place Value with Whole Numbers to 1,000,000

Study Group Handbook

NCEO Technical Report 27

Backwards Numbers: A Study of Place Value. Catherine Perez

Measurement. When Smaller Is Better. Activity:

Probabilistic Latent Semantic Analysis

1 Instructional Design Website: Making instruction easy for HCPS Teachers Henrico County, Virginia

Characteristics of Functions

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

National Survey of Student Engagement

MATH 205: Mathematics for K 8 Teachers: Number and Operations Western Kentucky University Spring 2017

Managing Sustainable Operations MGMT 410 Bachelor of Business Administration (Sustainable Business Practices) Business Administration Program

Institutionen för datavetenskap. Hardware test equipment utilization measurement

Computerized Adaptive Psychological Testing A Personalisation Perspective

writing good objectives lesson plans writing plan objective. lesson. writings good. plan plan good lesson writing writing. plan plan objective

Computer Organization I (Tietokoneen toiminta)

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

MADERA SCIENCE FAIR 2013 Grades 4 th 6 th Project due date: Tuesday, April 9, 8:15 am Parent Night: Tuesday, April 16, 6:00 8:00 pm

MGMT 479 (Hybrid) Strategic Management

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Reinforcement Learning by Comparing Immediate Reward

Students Understanding of Graphical Vector Addition in One and Two Dimensions

Stimulating Techniques in Micro Teaching. Puan Ng Swee Teng Ketua Program Kursus Lanjutan U48 Kolej Sains Kesihatan Bersekutu, SAS, Ulu Kinta

Learning Methods for Fuzzy Systems

TU-E2090 Research Assignment in Operations Management and Services

West s Paralegal Today The Legal Team at Work Third Edition

EDIT 576 DL1 (2 credits) Mobile Learning and Applications Fall Semester 2014 August 25 October 12, 2014 Fully Online Course

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Classroom Assessment Techniques (CATs; Angelo & Cross, 1993)

Transcription:

How to build tutoring systems that are almost as effective as human tutors? Kurt VanLehn School of Computing, Informatics and Decision Systems Engineering Arizona State University 1

Outline Next Types of tutoring systems Step-based tutoring human tutoring How to build a step-based tutor Increasing their effectiveness Flame 2

Two major design dimensions Personalization of assignments Non-adaptive Competency gating» using sequestered assessments» one factor per module Adaptive task selection» using embedded assessments» one factor per knowledge component Granularity of feedback, hints & other interaction o Assignment (e.g., conventional homework) Answer (e.g., most regular tutoring systems) Step (e.g., most Intelligent Tutoring Systems) Sub-step (e.g., human tutors & some ITS) 3

Example: Pearson s Mastering Physics Personalization Non-adaptive Ø Competency gating Adaptive task selection Granularity Ø Answer Step Sub-step 4

Example: Andes Physics Tutor Personalization Ø Non-adaptive Competency gating Adaptive task selection Granularity Answer Ø Step Sub-step 5

Example: Cordillera Physics Tutor A step Personalization Ø Non-adaptive Competency gating Adaptive task selection Granularity Answer Step Ø Sub-step 6

Example: Carnegie Learning s Tutors Personalization Non-adaptive Competency gating Ø Adaptive task selection Granularity Answer Ø Step Sub-step 7

Carnegie Learning s skillometer shows knowledge components & current competence Entering a given Identifying units Finding X, any form Writing expression Placing points Changing axis intervals Changing axis bounds 8

Example: Entity-relation Tutor Personalization Ø Non-adaptive Competency gating Adaptive task selection Granularity Answer Ø Step Sub-step 9

Availability Non-adaptive Competency gating Adaptive task selection Answer-based feedback/hints Thousands Hundreds Few Step-based feedback/hints Hundreds (few on market) Tens Few Sub-step based feedback/hints Tens None None 10

Called CAI, CBT, CAL Non-adaptive Competency gating Adaptive task selection Answer-based feedback/hints Thousands Hundreds Few Step-based feedback/hints Hundreds (few on market) Tens Few Sub-step based feedback/hints Tens None None 11

Called Intelligent Tutoring Systems (ITS) Non-adaptive Competency gating Adaptive task selection Answer-based feedback/hints Thousands Hundreds Few Step-based feedback/hints Hundreds (few on market) Tens Few Sub-step based feedback/hints Tens None None 12

Outline Next Types of tutoring systems Step-based tutoring human tutoring How to build a step-based tutor Increasing their effectiveness Flame 13

A widely held belief: Human tutors are much more effective than computer tutors Effect Size 2 1.5 1 0.5 Gain(tutored) Gain(no_tutor) Standard_deviation 0 No tutoring Computer Aided Instruction (CAI) Intelligent tutoring systems (ITS) ITS with natural language dialogue Human tutors 14

A widely held belief: Human tutors are much more effective than computer tutors 2 Effect Size 1.5 1 0.5 Anderson et al. (1995) VanLehn et al. (2005) Bloom (1984) 0 No tutoring Computer Aided Instruction (CAI) Intelligent tutoring systems (ITS) ITS with natural language dialogue Human tutors 15

Common belief: The finer the granularity, the more effective the tutoring 2 Effect Size 1.5 1 0.5 0 CAI is answerbased tutoring Most ITS are step-based tutoring Assignment Answer Step Sub-step Human Interaction granularity 16

17 Granularity of tutoring number of inferences (à ) between interactions Answer-based tutoring (CAI) problem à à à à à à à à à à à à à à à à à à à à à à à à à à à Student enters answer Step-based tutoring (ITS with ordinary GUI) problem Student Student à à à à enters à à à à enters à à à step step Student enters last step Human tutoring problem à Student utters reasoning à à Student utters reasoning à Student utters reasoning à à Student utters answer

18 Granularity of tutorial interaction number of inferences (à ) between interactions Answer-based tutoring (CAI) problem à à à à à à à à à à à à à à à à à à à à à à à à à à à Student enters answer Step-based tutoring (ITS with ordinary GUI) problem Student Student à à à à enters à à à à enters à à à step step Student enters last step Human tutoring problem à Student utters reasoning à à Student utters reasoning à Student utters reasoning à à Student utters answer

19 Granularity of tutorial interaction number of inferences (à ) between interactions Answer-based tutoring (CAI) problem à à à à à à à à à à à à à à à à à à à à à à à à à à à Student enters answer Step-based tutoring (ITS with ordinary GUI) problem Student Student à à à à enters à à à à enters à à à step step Student enters last step Human tutoring problem à Student utters reasoning à à Student utters & enters step à Student utters reasoning à à Student enters last step

Hypothesis: The smaller the grain size of interaction, the more effective the tutoring Because negative feedback is more effective The shorter the chain of inferences, the easier to find the mistake in it Because hinting and prompting are more effective The shorter the chain of inferences, the easier to infer them from a hint or prompt 20

Evidence for an interaction plateau 2 studies from my lab 3 studies from other labs A meta-analysis 21

Dialogue & text have same content Dialogue of Andes-Atlas T: Here are a few things to keep in mind when computing the acceleration vector for a body at rest. Acceleration is change in what over time? S: velocity T: Right. If the velocity is not changing, what is the magnitude of the acceleration? S: zero T: Sounds good.... Text of Andes Here are a few things to keep in mind when calculating acceleration for a body at rest. Acceleration is change in velocity over time. If velocity is not changing, then there is zero acceleration... 22

Results comparing Andes-Atlas to Andes Study 1: Andes-Atlas > Andes but content not controlled properly Study 2 (N=26): Andes-Atlas Andes (p>.10) Study 3 (N=21): Andes-Atlas < Andes (p<.10, d=0.34) Study 4 (N=12): Andes-Atlas Andes (p>.10) Conclusion: Substep tutoring is not more effective than step-based tutoring 23

The WHY2 studies (VanLehn, Graesser et al., 2007, Cognitive Science) 5 conditions Human tutors Substep-based tutoring system» Why2-Atlas» Why2-AutoTutor (Graesser et al.) Step-based tutoring system Text Procedure Pretraining Pre-test Training (~ 4 to 8 hours) Post-test 24

User interface for human tutoring and Why2-Atlas Dialogue history Problem Student s essay Student s turn in the dialogue 25

Why2-AutoTutor user interface Tutor Task Dialogue history Student types response 26

Only difference between tutoring conditions was contents of yellow box Tutor poses a WHY question Student response à analyzed as steps Tutor congratulates Step is incorrect or missing 27

Human tutoring Tutor poses a WHY question Student response à analyzed as steps Dialogue consisting of hints, analogies, reference to dialogue history Tutor congratulates Step is incorrect or missing 28

Why2-Atlas Tutor poses a WHY question Student response à analyzed as steps Knowledge construction dialogue Tutor congratulates Step is incorrect or missing 29

Why2-AutoTutor Tutor poses a WHY question Student response à analyzed as steps Hint, prompt, assert Tutor congratulates Step is incorrect or missing 30

A step-based tutor: A text explanation with same content Tutor poses a WHY question Student response à analyzed as steps Text (the Why2-Atlas dialogue rewritten as a monologue) Tutor congratulates Step is incorrect or missing 31

Experiments 1 & 2 Adjusted post-test scores 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 No significant differences 0 Read textbook: No tutor Step-based tutor AutoTutor: Substepbased Atlas: Substepbased Human tutoring 32

Results from all 7 experiments Human tutoring = Substep-based tutoring systems = Step-based tutoring system Tutors > Textbook (no tutoring) Atlas (symbolic NLP) = AutoTutor (statistical NLP) 33

Evens & Michael (2006) also show human tutoring = sub-step-based tutoring = step-based tutoring 6 No significant differences 5 4 Mean gain 3 2 1 0 Reading a text (1993) Reading a text (1999) Reading a text (2002) Circsim (1999) stepbased tutor Circsim- Tutor (1999) substepbased Circsim- Tutor (2002) substepbased Expert human tutors (1999) Expert human tutors (1993) No tutoring 34

Reif & Scott (1999) also show human tutors = step-based tutoring 100 90 80 70 60 50 40 30 20 10 0 No tutoring No significant differences Step-based tutoring Human tutoring 35

Katz, Connelly & Allbritton (2003) post-practice reflection: human tutoring = step-based tutoring 0.35 No significant differences 0.3 0.25 0.2 0.15 0.1 0.05 0 No tutoring Step-based tutoring Human tutoring 36

Meta-analytic results for all possible pairwise comparisons (VanLehn, 2011) Tutoring type Answer-based vs. other tutoring type Num. of effects Mean effect % reliable 165 0.31 40% Step-based 28 0.76 68% no tutoring Substep-based 26 0.40 54% Human 10 0.79 80% Step-based 2 0.40 50% Substep-based answer-based 6 0.32 33% Human 1-0.04 0% Substep-based 11 0.16 0% step-based Human 10 0.21 30% Human sub-step based 5-0.12 0% 37

Graph of comparisons of 4 tutoring types vs. no tutoring 2 1.5 Effect size 1 0.5 0-0.5 No tutoring Answer- based (CAI) Step- based (ITS) Substep- based (ITS w/ NL) Human tutoring 38

Graphing all 10 comparisons: No tutor < CAI < ITS = ITS w/nl = human 2 Effect size 1.5 1 0.5 vs. No tutoring vs. Answer- based vs. Step- based vs. Substep- based 0-0.5 No tutoring Answer- based (CAI) Step- based (ITS) Substep- based (ITS w/ NL) Human tutoring 39

Graph of comparisons of 4 tutoring types vs. no tutoring Effect size 2 1.5 1 0.5 0 expected observed - 0.5 No tutoring Answer- based (CAI) Step- based (ITS) Substep- based (ITS w/ NL) Human tutoring 40

The interaction plateau hypothesis The smaller the grain size of interaction, the more effective the tutoring Assignments < answers < steps But grain sizes less than steps are no more effective than steps Steps = substeps = human 41

Limitations & caveats Task domain Must allow computer tutoring Only STEM; not language, music, sports... Normal learners Not learning disabled Prerequisite knowledge mastered Human tutors must teach same content as computer tutors Only the type of tutoring (human, ITS, CAI) varies One-on-one tutoring 42

Outline Next Types of tutoring systems Step-based tutoring human tutoring How to build a step-based tutor Increasing their effectiveness Flame 43

Main modules of a non-adaptive step-based tutoring system Student interface Step analyzer Step loop Feedback & hint generator 44

Main modules of an adaptive step-based tutoring system Student interface Step analyzer Task loop Step loop Feedback & hint generator Task loop Task selector Assessor (contains learner model) 45

Main types of step analyzers Three main methods for generating ideal steps Model tracing: One expert system that can solve all problems in all ways Example tracing: For each problem, all acceptable solutions Constraint-based: Example + recognizers of bad steps + recognizers of steps equivalent to example s steps Comparing student and ideal steps Trivial if steps are menu choices, numbers, short texts Harder if steps are math, logic, chemistry, programming Use statistical NLP for essays, long explanations Use probabilistic everything for gestures 46

Outline Next Types of tutoring systems Step-based tutoring human tutoring How to build a step-based tutor Increasing their effectiveness Flame 47

The details can make a huge difference. How can we get them right? Called A/B testing in the game industry During example-based tutoring, when should the tutor tell the student an inference vs. elicit it from the student? Can machine-learned policies improve the tell vs. elicit decision? Min Chi s Ph. D. thesis 48

Tell Elicit S: Definition of Kinetic Energy S: Other answer. Tell Elicit S: ke1=(1/2)*m*v1^2. S: Other answer. 49

5-Stage Procedure Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Study: 64 students using random policy. Calculate Sub-optimal policy. Study: 37 students using Sub-optimal policy Calculate Enhancing & Diminishing policies. Study: 29 students using Enhancing policy vs. 28 students using Diminishing policy Diminishing policy is calculated to decrease learning. Other policies are calculated to increase learning. 50

Calculated policies are composed of many rules, such as: If problem: difficult And last tutor action: tell And student performance: high And duration since last mention of the current principle 50 sec Elicit Machine learner selected features in left side of rule from 50 possible features defined by humans 51

Results (NLG = normalized learning gain) Enhancing Suboptimal Sub-optimal 0.70 0.60 0.50 0.40 0.30 0.20 0.10 p = 0.02 Exploratory Diminishing p = 0.77. p < 0.001 Pretest Postest NLG Enhancing > everything else, which were about the same 52

Conclusions from Min Chi s thesis Details do matter e.g., the Tell vs. Elicit decision Improved policies for Tell vs. Elicit can be induced from modest amounts of data 103 students Induced policies can have a large effect on learning gains (d=0.8). Developers should do many such A/B studies 53

Overall conclusion: We need to use more step-based tutors Non-adaptive Competency gating Adaptive task selection Answer-based feedback/hints Thousands Hundreds Few Step-based feedback/hints Hundreds (few on market) Tens Few Sub-step based feedback/hints Tens None None 54

Outline Next Types of tutoring systems Step-based tutoring human tutoring How to build a step-based tutor Increasing their effectiveness Flame 55

Why are there so few step-based tutoring systems? K-12 curriculum and standardized tests have evolved to favor answer-based tasks K-12 instructors do not view homework as the problem area; it s classroom time that concerns them. Instructors need to share knowledge, policies and authority with a tutoring system 56

Why are competency-gated tutoring systems so rare? Schools are time-gated, not competency-gated Difficulty enforcing deadlines Grading based on time-to-mastery may be pointless and harmful. 57

Recommendation for instructors Use competency-gated tutoring system Flip: Videos/reading at home. Exercises in class. Half group work (paper?) and half individual work (tutor) Noisy study halls instead of lecture halls Deadlines & exams for core. Badges for enrichment. Use a step-based tutoring system Buy one if you can If you build one, use example-tracing first If you will use it repeatedly, plan on A/B testing 58

Recommendations for parents Human tutors step-based tutoring systems If you can do the task, then you can tutor the task Do not lecture/demo! Be step-based. 59

Thank you! 60

Bibliography (all papers available from public.asu.edu/~kvanlehn) The meta-analysis VanLehn, K. (2011). The relative effectiveness of human tutoring, intelligent tutoring systems and other tutoring systems. Educational Psychologist, 46(4), 197-221. Why2 experiments VanLehn, K., Graesser, A. C., Jackson, G. T., Jordan, P., Olney, A., & Rose, C. P. (2007). When are tutorial dialogues more effective than reading? Cognitive Science, 31(1), 3-62. Andes, the physics tutor VanLehn, K., Lynch, C., Schultz, K., Shapiro, J. A., Shelby, R. H., Taylor, L., et al. (2005). The Andes physics tutoring system: Lessons learned. International Journal of Artificial Intelligence and Education, 15(3), 147-204. Andes-Cordillera study in prep 61

Bibliography continued Andes-Atlas studies Siler, S., Rose, C. P., Frost, T., VanLehn, K., & Koehler, P. (2002, June). Evaluating knowledge construction dialogues (KCDs) versus minilesson within Andes2 and alone. Paper presented at the Workshop on dialoguebased tutoring at ITS 2002, Biaritz, France. Machine learning of Cordillera policies Chi, M., VanLehn, K., Litman, D., & Jordan, P. (2011). Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User- Adapted Interaction, 21(1-2), 99-135. Teaching a meta-cognitive strategy (MEA) Chi, M., & VanLehn, K. (2010). Meta-cognitive strategy instruction in intelligent tutoring systems: How, when and why. Journal of Educational Technology and Society, 13(1), 25-39. 62