Modeling Dialogue Building Highly Responsive Conversational Agents

Similar documents
Eliciting Language in the Classroom. Presented by: Dionne Ramey, SBCUSD SLP Amanda Drake, SBCUSD Special Ed. Program Specialist

An Analysis of Gender Differences in Minimal Responses in the conversations in the two TV-series Growing Pains and Boy Meets World

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Client Psychology and Motivation for Personal Trainers

Miscommunication and error handling

Master s Thesis. An Agent-Based Platform for Dialogue Management

A Multimodal System for Real-Time Action Instruction in Motor Skill Learning

How to make successful presentations in English Part 2

Eyebrows in French talk-in-interaction

GROUNDING IN COMMUNICATION

Getting a Sound Bite Across. Heather Long, MD ACMT Annual Scientific Meeting Clearwater, FL March 28, 2015

PREP S SPEAKER LISTENER TECHNIQUE COACHING MANUAL

cmp-lg/ Jan 1998

Teachers: Use this checklist periodically to keep track of the progress indicators that your learners have displayed.

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Functional Mark-up for Behaviour Planning: Theory and Practice

Students will be able to describe how it feels to be part of a group of similar peers.

WELCOME PATIENT CHAMPIONS!

10 Tips For Using Your Ipad as An AAC Device. A practical guide for parents and professionals

Presented by The Solutions Group

Emotional Variation in Speech-Based Natural Language Generation

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

What s in Your Communication Toolbox? COMMUNICATION TOOLBOX. verse clinical scenarios to bolster clinical outcomes: 1

Dialog Act Classification Using N-Gram Algorithms

PART C: ENERGIZERS & TEAM-BUILDING ACTIVITIES TO SUPPORT YOUTH-ADULT PARTNERSHIPS

Sight Word Assessment

Occupational Therapy and Increasing independence

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

5. UPPER INTERMEDIATE

UNDERSTANDING DECISION-MAKING IN RUGBY By. Dave Hadfield Sport Psychologist & Coaching Consultant Wellington and Hurricanes Rugby.

Attention Getting Strategies : If You Can Hear My Voice Clap Once. By: Ann McCormick Boalsburg Elementary Intern Fourth Grade

Facilitating Difficult Dialogues in the Classroom. We find comfort among those who agree with us, growth among those who don t. Frank A.

TAG QUESTIONS" Department of Language and Literature - University of Birmingham

Exemplar Grade 9 Reading Test Questions

THE REFLECTIVE SUPERVISION TOOLKIT

Communication around Interactive Tables

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Part I. Figuring out how English works

Cognitive Thinking Style Sample Report

Using dialogue context to improve parsing performance in dialogue systems

SOFTWARE EVALUATION TOOL

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

E-3: Check for academic understanding

The idea of lingual economy

Executive Guide to Simulation for Health

EFFECTIVE CLASSROOM MANAGEMENT UNDER COMPETENCE BASED EDUCATION SCHEME

teaching issues 4 Fact sheet Generic skills Context The nature of generic skills

THE VIRTUAL WELDING REVOLUTION HAS ARRIVED... AND IT S ON THE MOVE!

Increasing Student Engagement

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

On May 3, 2013 at 9:30 a.m., Miss Dixon and I co-taught a ballet lesson to twenty

Behavior List. Ref. No. Behavior. Grade. Std. Domain/Category. Social/ Emotional will notify the teacher when angry (words, signal)

Red Flags of Conflict

2 months: Social and Emotional Begins to smile at people Can briefly calm self (may bring hands to mouth and suck on hand) Tries to look at parent

CO-ORDINATION OF SPEECH AND GESTURE IN SEQUENCE AND TIME: PHONETIC AND NON-VERBAL DETAIL IN FACE-TO-FACE INTERACTION. Rein Ove Sikveland

CEFR Overall Illustrative English Proficiency Scales

Appendix L: Online Testing Highlights and Script

The Curriculum in Primary Schools

TASK 2: INSTRUCTION COMMENTARY

Software Maintenance

Kindergarten Lessons for Unit 7: On The Move Me on the Map By Joan Sweeney

Behaviors: team learns more about its assigned task and each other; individual roles are not known; guidelines and ground rules are established

Section 1: Basic Principles and Framework of Behaviour

The Common European Framework of Reference for Languages p. 58 to p. 82

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

FORCE : TECHNIQUES DE DESSIN DYNAMIQUE POUR L'ANIMATION FROM PEARSON EDUCATION

P-4: Differentiate your plans to fit your students

Virtually Anywhere Episodes 1 and 2. Teacher s Notes

TRAINEESHIP TOOL MANUAL V2.1 VERSION April 1st 2017 * HOWEST.BE

Chapter 9: Conducting Interviews

LEGO MINDSTORMS Education EV3 Coding Activities

The Evolution of Random Phenomena

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

Lecture 2: Quantifiers and Approximation

Speak with Confidence The Art of Developing Presentations & Impromptu Speaking

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

Increasing the Expressiveness of Virtual Agents Autonomous Generation of Speech and Gesture for Spatial Description Tasks

Multi-modal Sensing and Analysis of Poster Conversations toward Smart Posterboard

Instructional Supports for Common Core and Beyond: FORMATIVE ASSESMENT

Success Factors for Creativity Workshops in RE

Classify: by elimination Road signs

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

THE ALLEGORY OF THE CATS By David J. LeMaster

Genevieve L. Hartman, Ph.D.

THE USE OF ENGLISH MOVIE IN TEACHING AUSTIN S ACT

STRETCHING AND CHALLENGING LEARNERS

COMMUNICATION & NETWORKING. How can I use the phone and to communicate effectively with adults?

Mission Statement Workshop 2010

Integrating Meta-Level and Domain-Level Knowledge for Task-Oriented Dialogue

Final Teach For America Interim Certification Program

A CONVERSATION WITH GERALD HINES

Lecturing Module

Procedural pragmatics and the study of discourse Louis de Saussure

Annotation and Taxonomy of Gestures in Lecture Videos

Ohio s Learning Standards-Clear Learning Targets

PRD Online

Assessing Children s Writing Connect with the Classroom Observation and Assessment

Transcription:

Modeling Dialogue Building Highly Responsive Conversational Agents ESSLLI 2016 David Schlangen, Stefan Kopp with Sören Klett CITEC // Bielefeld University

Who we are Stefan Kopp, Professor for Computer Science, Faculty of Technology, Uni. Bielefeld ( stefan.kopp@uni-bielefeld.de ) Head of research group Social Cognitive Systems at CITEC, U. Bielefeld Research interests: understanding social minds and their interaction adaptive and responsive conversational agents multimodal communication http://scs.techfak.uni-bielefeld.de

Who we are Sören Klett, Ph.D. student at Social Cognitive Systems group at Uni. Bielefeld, (sklett@techfak.uni-bielefeld.de ) research on user-adaptive decision-making in dialogue systems developed and prepared toolkit you will be using in this course, here to provide technical support

Who we are David Schlangen, Professor for Applied Computational Linguistics, Uni Bielefeld. ( david.schlangen@uni-bielefeld.de ) Lead Dialogue Systems Group at Bielefeld / CITEC. Research Interests: understanding understanding highly responsive dialogue systems / incremental processing grounded semantics http://www.dsg-bielefeld.de

Who are you? show of hands: undergrad, master, post-grad, beyond familiarity with dialogue theory? Timo & Arne s class in week 1? Experience with building dialogue systems / conv. agents?

Modeling Dialogue Building Highly Responsive Conversational Agents David Schlangen, Stefan Kopp with Sören Klett CITEC // Bielefeld University

Modeling Dialogue Building Highly Responsive Conversational Agents

Responsive Agents working definition: are responsive to the needs of the dialogue partner(s), at all times minimize time between event and response

Traditional Approach only optimize coherence between event and response event and response are full speech acts.

the status quo: non-incremental processing 750ms silence User: System: 10

A: B: A: B: A: B: A: B:

A: B: A: A: B: t

A: B: t

Responsive Agents working definition: responsive to needs of dialogue partner(s) minimize time between event and response Qs: why? how? what needs? what type of events? which types of responses? who / what creates these events? does an event have to have occurred to respond to it? what are the optimization criteria?

Overview of Course Day 1: Motivation, Phenomena, State of the Art Day 2: Technical Challenges, Approaches Day 3: Introduction to Task & Technical Framework Day 4: Hands-On Exercises Day 5: Reports, Discussion

Modeling Dialogue Building Highly Responsive Conversational Agents Day 1: Motivation, Phenomena, Theoretical Terms David Schlangen, Stefan Kopp with Sören Klett CITEC // Bielefeld University

Overview of Day 1 What does responsiveness mean here? What do people do in dialogues? Dialogue as coordinated, joint action / as process. Grounding, Turn-Taking, etc. State of the art in responsive conversational agents

Example Datum Pentomino/Noise Corpus, 2006; (Fernández & Schlangen 2006; Zarrieß et al. LREC 2016) 3:05 5:02 in 20161123_run1_pento using the wonderful ELAN annotation tool ( https:// tla.mpi.nl/tools/tla-tools/elan/ )

A: B: A: B: In what sense responsive to needs of partner? Orderly sequence of contributions? A: B: A: B:

P so basically okay draw your eye from the bottom of the backwards L? reference in installments E yeah? okay? P go to the left the first square you come to? E P yeah? okay! alright I got it. that's where the bottom of the long twin-tower piece goes. E okay levels of understanding E alright I got it yeah I m putting it in there right now E P it is in there. good acknowledgement of acknowledgement

P E there is the straight line from the top down? yeah P E P fit it all the way to the bottom and it should be: ehm pff oh I have to flip it then interruption, realises own misunderstanding then you must flip it yeah E yeah P right so the angle would be eh pointing I guess to the E okay I got that.. P the open part you got that? now then E P wait i'm sticking it in there right now okay okay

P (and then it + the top of the T) fits (into: + next to) the first piece self correction P where the L is the backwards L E P the top of the T fits next to the first piece? yeah P first piece that you put in was the backwards L? E P all the way on the bottom right? yeah yeah P and then the top of the T fits into lets say the lap of the L E P eh unfortunately not. no? E P <laughter/> no! it will overlap with the first piece. okay.

P (and then it + the top of the T) fits (into: + next to) the first piece P where the L is the backwards L lack of uptake expansion E P the top of the T fits next to the first piece? yeah P first piece that you put in was the backwards L? E P all the way on the bottom right? yeah yeah P and then the top of the T fits into lets say the lap of the L E P eh unfortunately not. no? E P <laughter/> no! it will overlap with the first piece. okay.

P (and then it + the top of the T) fits (into: + next to) the first piece P where the L is the backwards L E P the top of the T fits next to the first piece? yeah P first piece that you put in was the backwards L? E P all the way on the bottom right? yeah yeah P and then the top of the T fits into lets say the lap of the L E P eh unfortunately not. no? laughter events E P <laughter/> no! it will overlap with the first piece. okay.

A second example (Kimbara 2007, U. Chicago) multimodal co-completion

Observations reference in installments signal level of understanding (invited?) interruption; continuation self corrections (= self interruption) expand until successful completion by partner But why do people do that, and why should we model that in practical systems?

Overview of Day 1 What does responsiveness mean here? What do people do in dialogues? Dialogue as coordinated, joint action / as process. Grounding, Turn-Taking, etc. State of the art in responsive conversational agents

Spoken Dialogue Uses evanescent medium. Consists of spontaneously and autonomously produced contributions. Participants want to understand and be understood. Need to coordinate what they are doing.

Herb Clark (Clark, 1996) synthesising much of what was originally researched in the field of conversation analysis (Sacks, Schegloff, Jefferson & others, 1960s ff)

Dialogue as joint process From dialogue as exchange of propositions to dialogue as joint process aimed at creating mutual understanding about joint projects. joint action in dialogue temporal coordination

Dialogue as joint process From dialogue as exchange of propositions to dialogue as joint process aimed at creating mutual understanding about joint projects. joint action in dialogue temporal coordination

https://www.flickr.com/photos/124247024@n07/13903385550 www.flazingo.com coordinating a joint process what needs to be coordinated here? beginning / entry, main part, end / exit

coordinating a process A B

coordinating a process A B

coordinating a process A B

shaking hands 1. extend arms, give hand 2. shake hands 3. retract hands 2.1 grab hand 2.2 and up and down 2.2 and release

coordinating a process A B what needs to be coordinated, and how? beginng / entry: as successor of previous action sequence main part who s doing what? end / exit: when to stop

coordinating a process A B coordination devices: one party leads (e.g., dancing) external beat (e.g., dancing, playing music) convention (e.g., shaking hands) predictability (e.g., language?)

dialogue as a process A B greetings goodbyes stories, arguments, pieces of a larger task.. exchanges, adjacency pairs turns

P so basically okay draw your eye from the bottom of the backwards L? E yeah? okay? P go to the left the first square you come to? E P yeah? okay! alright I got it. that's where the bottom of the long twin-tower piece goes. P (and then it + the top of the T) fits (into: + next to) the first piece P where the L is the backwards L

Dialogue as joint process From dialogue as exchange of propositions to dialogue as joint process aimed at creating mutual understanding about joint projects. joint action in dialogue temporal coordination

H. Clark's Grounding Model propose j project signal p present signal execute behaviour consider proposal recognize p identify signal attend to behaviour (Clark 1996; Clark & Wilkes-Gibbs 1986)

She is getting the elevator to come She is calling the elevator She is activating the up button She is pressing the up button She is pressing her finger against the up button

She is getting the elevator to come She is calling the elevator She is activating the call button She is pressing the call button "Upwards Completion: In a ladder of actions, it is only possible to complete an action from the bottom level up through any level in the ladder." "Downward evidence: In a ladder of actions, evidence that one level is complete is also evidence that all levels below it are complete." She is pressing her finger against the call button

H. Clark's Grounding Model "Upwards Completion: In a ladder of actions, it is only possible to complete an action from the bottom level up through any level in the ladder." "Downward evidence: In a ladder of actions, evidence that one level is complete is also evidence that all levels below it are complete." propose j project signal p execute behaviour consider proposal recognize p "Holistic evidence: Evidence that an agent has succeeded on a present signal identify signal whole action is also evidence that the agent has suceeded on each of its parts." attend to behaviour "Principle of joint closure: The participants in a joint action try to establish the mutual belief that they have succeeded well enough for current purposes."

Grounding Clark's (1996) 4-level model (cf. also (Allwood 1995)) Level Speaker -- Hearer 4 proposal & consideration 3 meaning & understanding 2 presentation & identification 1 execution & attention give evidence for understanding on all levels (with downwards entailment) types of evidence: continued attention, relevant next contribution, acknowledgement, demonstration, display

Conversational tracks Track 2 metacommunicative acts is about Track 1 communicative acts is about "official business" of dialogue

Grounding Track 2 Do you understand this? Track 1 "Who came to the party?" "official business" of dialogue

Grounding Track 2 Do you understand this? --- Yes Track 1 "Who came to the party?" ---- "Peter." "official business" of dialogue

evidence of success A: I saw a tiger. B: Ok [, you saw a tiger.] A: Ok [, you understood that I saw a tiger.] B: Ok [, you understood that I understood that you saw a tiger.] A: Ok [, you understood that I understood that you understood that I saw a tiger.] B: Ok [, you understood that I understood that you understood that I understood that you saw a tiger.] A: Ok [, you understood that I understood that you understood that I understood that you understood oh my god is this every going to stop I am trapped in a recursion someone send help] B: Ok [, lorem ipsum solor sit amet or something like this I m just typing words now] Well enough for current purposes!

Grounding - Clarification Requests... or signal non-understanding, and request repair: Level Speaker -- Hearer 4 proposal & consideration 3 meaning & understanding 2 presentation & identification 1 execution & attention xx Who came to the party? Which party? A B

Grounding - Clarification Requests frequent: around 5% of utterances in taskoriented dialogues (Purver et al. 2001, Rodríguez & Schlangen 2004) multi-dimensional classification in (Schlangen 2004): Level of problem Extent Severity

Clarification Requests Dimension 1: Level of problem Level Speaker -- Hearer 4 proposal & consideration 3 meaning & understanding 2 presentation & identification 1 execution & attention xx Who came to the party? Which party? A B

H. Clark's Grounding Model "Principle of joint closure: The participants in a joint action try to establish the mutual belief that they have succeeded well enough for current purposes." propose j project present signal execute behaviour consider proposal Principle of opportunistic closure: Agents consider an action complete just as signal soon as pthey have recognize evidence sufficient p for current purposes that it is complete. identify signal attend to behaviour Principle of repair: When agents detect a problem serious enough to warrant a repair, they try to initiate and repair the problem at the first opportunity after detecting it.

Principle of repair: When agents detect a problem serious enough to warrant a repair, they try to initiate and repair the problem at the first opportunity after detecting it. P (and then it + the top of the T) fits (into: + next to) the first piece P where the L is the backwards L

Turn-taking how do participants in a dialogue organise distribution of right to speak? Who came to the party? Peter. t A B

Turn-taking Observations to account for: overlaps are fairly rare in dialogue (less than 5%) pauses between turns are very short (around 200ms) --- shorter than motor-planning of new utterance!

Turn-taking Sacks et al. model (1974): At each transition-relevant-point (TRP) of each turn, the following holds: 1. If during this turn the current speaker has selected A as the next speaker, then A must speak next. 2. If the current speaker does not select the next speaker, any other speaker may take the next turn. 3. If no one else takes the next turn, the current speaker may take the next turn.

Turn-taking Selection, how? By asking a question, making a suggestion, etc... --> adjacency pairs A: Who came to the party? B: <silence> A: What's up? Did I say something wrong?

Turn-taking Model is projective, i.e. utterance itself indicates whether TRP is coming up, and whether other speaker is selected, not "signal-reaction" model can explain "significant silence" Although turn-taking works exactly the same way in non-visual modalities (on phone), if visual info is there, then gaze etc. give additional indications.

Turn-taking holds only for "track-1" contributions: backchannels systematically overlap! rules can be broken: competition for getting floor, upgrading, shouting matches...

H. Clark's Grounding Model & turn taking speaker propose j project signal p present signal execute behaviour hearer consider proposal recognize p identify signal attend to behaviour Principle of opportunistic closure: Agents consider an action complete just as soon as they have evidence sufficient for current purposes that it is complete. Principle of repair: When agents detect a problem serious enough to warrant a repair, they try to initiate and repair the problem at the first opportunity after detecting it. * Only one primary presentation at a time * If it s your turn, start ASAP.

Our takeaways Dialogue participants try to reach mutual understanding; need evidence that they have continuously monitor whether they have reached it and, if necessary, repair ASAP; so if you don t react, you risk repair.

Our takeaways Why ASAP? Life s too short! Responsiveness is built into the fabric of dialogue. Reducing it makes dialogue harder. (Cf. eg. (Brannigon et al. 2011)

Responsive Agents working definition: are responsive to the needs of the dialogue partner(s), at all times minimize time between event and response respond to many more types of events than end of turn because they optimize mutual understanding Qs: why? how? what type of events? presentation events understanding events feedback responses repair responses which types of responses? who / what creates these events? does an event have to have occurred to respond to it? what are the optimization criteria?

sociology philosophy CL / AI anthrop. psychology linguistics speech eng. H. Sacks, E. Schegloff, G. Jefferson 1960ff. Conversation Analysis gestures, cultural (in)variants J. Searle, 1969: Speech Acts H. Clark, 1978ff. Joint Action Theory eye tracking, visual world paradigm; mechanistic theories of d. J.L. Austin, 1955: How to do things with words T. Schelling, 1960 The Strategy of Conflict P. Grice, 1957, '69, '75 Logic and Conversation D. Lewis, 1969 Convention... B. Grosz, C. Sidner, J. Allen, et al. Communication & Planning mid '80s: Discourse Structure DRT, RST, SDRT, D-TAG,... mid '90s: Formal Semantics / Pragmx of Dial.: SDRT, KOS,... < 1960 1960s 1970s 1980s 1990s 2000s

Overview of Day 1 What does responsiveness mean here? What do people do in dialogues? Dialogue as coordinated, joint action / as process. Grounding, Turn-Taking, etc. State of the art in responsive conversational agents

The NUMBERS systems fast turn-taking joint work with Gabriel Skantze (Skantze & Schlangen, EACL 2009)

The NUMBERS systems fast turn-taking user dictates a string of digits to system system tries to ground its understanding, as quickly as possible processing based on IU-model: minimal units trigger updates processors implement update functions

the numbers system

The PENTO-10 system fast turn-taking, immediate exec joint work with Okko Buß (Buß et al., SIGdial 2010, semdial 2010, 2011)

Pentomino System U: delete the blue cross S: which piece? U: top right. S: ok? U: right, now take the yellow [one]... S: yes? U:... and turn it... S: yes? U:... to the left S: ok. U: now flip the stairs... S: ok U: horizontally U: that's right U: erm now delete the red [one] S: *wh-* U: bottom right U: correct.

Evaluation Faster task completion compared to nonincremental versions of the systems Higher subjective ratings ( would use again, behaves as expected, natural ) Not higher task success rate (Skantze & Schlangen 2009; Buß et al. 2011)

Embodied Conversational Agents Computer interfaces that hold up their end of conversational, have bodies and know how to use it for conversational behaviors as a function of the demands of dialogue and of emotion, personality, and social convention (Cassell 2000) Required features: Recognize and interpret verbal and nonverbal input behavior Generate verbal and nonverbal output behavior Process multiple functions of conversational behavior Take an active role in dialogue (mixed-initiative)

Embodied Conversational Agents V SGT Rappor Tactical Questi Blackwell Steve 1998 1999 MRE C3IT SASO-ST 2000 2001 2002 2003 2004 2005 El SASO-EN 2006 2007 2 1994 1997 1999 2002 2005 2008 2

Virtual Real Estate Agent (Rea) MIT Media Lab (J. Cassell et al.)

Tutoring: Communication training Conversation Coach by MIT (R. Picard et al.)

Information kiosk Ada & Gras @ Boston Science Museum (ICT)

Personal assistant B: Hi Ok Then Until Wednesday Good, At Ramin, R: 4:30 when let s So, what you do go will Wednesday at you through 9 you o clock have be R: there? a R: Hello is at moment? rest I breakfast 13 have Billie, o clock of a this new R: yes week wait I appointment would I have a R: second R: like time. Mhm Okto go to the restaurant Elder Companion Billie (CITEC, U. Bielefeld)

Overview of Day 1 What does responsiveness mean here? What do people do in dialogues? Dialogue as coordinated, joint action / as process. Grounding, Turn-Taking, etc. State of the art in responsive conversational agents

Questions?

End of Day 1 Tomorrow: Technical Challenges, Background