Situierte Generierung

Similar documents
WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

REVIEW OF CONNECTED SPEECH

Eyebrows in French talk-in-interaction

Modeling Dialogue Building Highly Responsive Conversational Agents

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Saliency in Human-Computer Interaction *

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Annotation and Taxonomy of Gestures in Lecture Videos

Jacqueline C. Kowtko, Patti J. Price Speech Research Program, SRI International, Menlo Park, CA 94025

SOFTWARE EVALUATION TOOL

STUDENT MOODLE ORIENTATION

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

5. UPPER INTERMEDIATE

Appendix L: Online Testing Highlights and Script

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Introduction to the Common European Framework (CEF)

Concept Acquisition Without Representation William Dylan Sabo

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Multi-modal Sensing and Analysis of Poster Conversations toward Smart Posterboard

Eye Movements in Speech Technologies: an overview of current research

Houghton Mifflin Online Assessment System Walkthrough Guide

Test Administrator User Guide

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Gestures in Communication through Line Graphs

Longman English Interactive

MULTIMEDIA Motion Graphics for Multimedia

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

Compositional Semantics

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50

Dialog Act Classification Using N-Gram Algorithms

CEFR Overall Illustrative English Proficiency Scales

Constraining X-Bar: Theta Theory

Common Core Exemplar for English Language Arts and Social Studies: GRADE 1

Kindergarten Lessons for Unit 7: On The Move Me on the Map By Joan Sweeney

Star Math Pretest Instructions

RETURNING TEACHER REQUIRED TRAINING MODULE YE TRANSCRIPT

Eliciting Language in the Classroom. Presented by: Dionne Ramey, SBCUSD SLP Amanda Drake, SBCUSD Special Ed. Program Specialist

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

EQuIP Review Feedback

Beginning to Flip/Enhance Your Classroom with Screencasting. Check out screencasting tools from (21 Things project)

Student User s Guide to the Project Integration Management Simulation. Based on the PMBOK Guide - 5 th edition

Five Challenges for the Collaborative Classroom and How to Solve Them

THE ROLE OF TOOL AND TEACHER MEDIATIONS IN THE CONSTRUCTION OF MEANINGS FOR REFLECTION

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Highlighting and Annotation Tips Foundation Lesson

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Schoology Getting Started Guide for Teachers

Major Milestones, Team Activities, and Individual Deliverables

7. Stepping Back. 7.1 Related Work Systems that Generate Folding Nets. The problem of unfolding three-dimensional models is not a new one (c.f.

Creating a Test in Eduphoria! Aware

Pair Programming: When and Why it Works

EVERYTHING DiSC WORKPLACE LEADER S GUIDE

Functional Mark-up for Behaviour Planning: Theory and Practice

Getting the Story Right: Making Computer-Generated Stories More Entertaining

Custom Program Title. Leader s Guide. Understanding Other Styles. Discovering Your DiSC Style. Building More Effective Relationships

Instructional Supports for Common Core and Beyond: FORMATIVE ASSESMENT

Android App Development for Beginners

DegreeWorks Advisor Reference Guide

Challenging Texts: Foundational Skills: Comprehension: Vocabulary: Writing: Disciplinary Literacy:

USER GUIDANCE. (2)Microphone & Headphone (to avoid howling).

Inclusion in Music Education

TotalLMS. Getting Started with SumTotal: Learner Mode

Quick Start Guide 7.0

Collaborative Construction of Multimodal Utterances

Welcome to California Colleges, Platform Exploration (6.1) Goal: Students will familiarize themselves with the CaliforniaColleges.edu platform.

How To Enroll using the Stout Mobile App

Moodle Student User Guide

Using dialogue context to improve parsing performance in dialogue systems

Reflective problem solving skills are essential for learning, but it is not my job to teach them

CODE Multimedia Manual network version

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Language Acquisition Chart

Stimulating Techniques in Micro Teaching. Puan Ng Swee Teng Ketua Program Kursus Lanjutan U48 Kolej Sains Kesihatan Bersekutu, SAS, Ulu Kinta

WHY SOLVE PROBLEMS? INTERVIEWING COLLEGE FACULTY ABOUT THE LEARNING AND TEACHING OF PROBLEM SOLVING

Course Law Enforcement II. Unit I Careers in Law Enforcement

Accelerated Learning Course Outline

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

OFFICE OF COLLEGE AND CAREER READINESS

Accelerated Learning Online. Course Outline

Starting an Interim SBA

Connect Communicate Collaborate. Transform your organisation with Promethean s interactive collaboration solutions

TIPS PORTAL TRAINING DOCUMENTATION

Welcome to the session on ACCUPLACER Policy Development. This session will touch upon common policy decisions an institution may encounter during the

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Science Fair Project Handbook

Does Linguistic Communication Rest on Inference?

Increasing the Expressiveness of Virtual Agents Autonomous Generation of Speech and Gesture for Spatial Description Tasks

Connect Microbiology. Training Guide

Planning a Webcast. Steps You Need to Master When

The Revised Math TEKS (Grades 9-12) with Supporting Documents

M55205-Mastering Microsoft Project 2016

MAKING YOUR OWN ALEXA SKILL SHRIMAI PRABHUMOYE, ALAN W BLACK

cmp-lg/ Jan 1998

Getting Started with MOODLE

The Creation and Significance of Study Resources intheformofvideos

MOODLE 2.0 GLOSSARY TUTORIALS

Transcription:

Situierte Generierung Einführung Konstantina Garoufi 18. Oktober 2011

Non-situated language Context form and content of discourse (purpose of discourse) I m going to a party. Do you want to come? OK.

Non-situated language Use the wheelpuller to remove the flywheel. Context form and content of discourse (purpose of discourse)?

Situated language Appelt (1982)

Situated language Context form and content of discourse purpose of discourse objects of the scene in the visual field spatial configuration gestures, gaze history of interaction task at hand...

Challenges of situated language generation Non-linguistic context, in addition to the linguistic one Interplay between language and action: language can itself bring about changes to the non-linguistic context, e.g. by causing the hearer to perform an action Real-time system performance required

Outline What is situated language generation? Challenges Modeling linguistic and non-linguistic context Understanding the interplay between language and action Performing in a dynamic environment in real time Summary and discussion

What is context? Context is what constrains a problem solving without intervening in it explicitly. Brézillon (1999)

Context modeling for language generation What is the relationship between formalization of context and natural language ideas of context? Which phenomena and inferences observed in natural language are context-independent and which ones always depend on context? How to automatically identify context-provided constraints resulting in conveying additional or different aspects of information?

Linguistic context modeling Dial Your Disc (DYD) system One of the first generation systems with a dedicated context model Generation of spoken monologues about W. A. Mozart s instrumental compositions van Deemter & Odijk (1997)

How DYD works

Context modeling in DYD Find a level of representation that is both rich and explicit enough to allow a system of rules to exploit the information in there for contextually appropriate utterances Set up a data structure and fill it with information Formulate rules that exploit this data structure context model

DYD s context model Knowledge state: Which information has been expressed so far, and when? Topic state: Which topics have already been dealt with, which are still to be considered? Context state: Which objects have been introduced? How and when? Dialogue state: What recordings have been selected so far?

What information does that encompass? both syntactic and semantic some generally required, some system-specific granularity subject to application (here: speech generation prosody important)

Has DYD s context model solved all our problems? Consider the following text: M. Walker will give a presentation later today in the same room as where the opening session was held. He is currently in the coffee room, just around the corner and he might be an interesting person for setting up a project on ubiquitous computing. Is DYD s context model sufficient here?

Multidimensional context modeling Parrot-Talk system Human agents in the physical world are supported by software agents Text is generated for output on a wearable device (parrot) Conference center application: parrots search for information and encounters with other users who share same interests Geldof (1999)

Context dimensions in Parrot-Talk Linguistic: How far ahead in the discourse have objects been mentioned? Extra-linguistic temporal: date, time physical: how close is target user? social implicature: what is target user doing? User profile: interest in which topics and persons?

Multimodal context in GRE Richer notions of multimodal context, with focus on generation of referring expressions (GRE) < C deictic pointing gestures current focus of attention * this black block three-dimensional salience: linguistic, inherent, and focus space salience van der Sluis & Krahmer (2001)

Multimodal context in GRE Richer notions of multimodal context, with focus on generation of referring expressions (GRE) < C deictic pointing gestures current focus of attention three-dimensional salience: linguistic, inherent, and focus space salience van der Sluis & Krahmer (2001) * this black block focus space * the white block

Multimodal context in GRE Richer notions of multimodal context, with focus on generation of referring expressions (GRE) > C deictic pointing gestures current focus of attention three-dimensional salience: linguistic, inherent, and focus space salience * focus space that white block to the left of the black one van der Sluis & Krahmer (2001)

34&5"77 )=1"+>!"#$%&'( )%**&+(,-.'&/012, 34&5"67768"'9"' A few years later virtue of simultaneous recording to the video camera. For :4(1%6;1<"'!"#$%&'( )%**&+(,-.'&/012, Figure 6: Sketch of recording hardware Byron & Fosler- 34&5"77 )=1"+> For each session, the corpus contains two movies, one recording the virtual-world experience of each partner, a separate audio recording Lussier in(2006) WAV format, and orthographic transcriptions of the audio. Figure 6 sketches the hardware used in our recording process. Partners spoke to each other through headset-mounted microphones with enclosed-earcup headphones (Sennheiser HMD280- slight amount of bleed-through of the other speaker s voice into the wrong channel. The video-stream going to the leader s computer monitor was also sent to the video input of the digital video camera to be recorded. 1 Therefore, the audio signal and video experience of the person playing the leader role is aligned by OSU Quake 2004 corpus of two-party situated problemsolving dialogs the person playing the follower s role, the video track of their experience in the QuakeII world was recorded after the session was completed, using the replay capability available in QuakeII, and once again feeding the video stream from the computer monitor to the video camera. The audio track containing both audio streams was added onto the video record of the follower s experience, and manually aligned. In order to confirm that the re-recording of the playback of the follower s experience was accurate, we also replayed the leader s viewpoint and verified that it was identical to that which was captured on the camera. 2.6. Annotation deictic and exophoric (i.e. The dialog recordings have been orthographically transcribed. The transcripts do not show timing information, such as overlapping situational) speech or word alignment reference with the audio file, but plans are in place to complete an alignment using the Anvil toolkit (Kipp, 2004). Transcription practices for non-words and abandoned utterances used the ICSI language calibrated against meeting corpus guidelines (Janin et al., 2003). spatial arrangement of world 3. Sample Data from the corpus Figure 7 shows a portion of the dialog in session 10. The partners are in a room together, and the leader (dialog lines marked L) is describing the task that must be accomplished perceptual limitations to the Follower (marked F). Events external to the dialog are marked with symbols at approximately the point at which they occur. Once the Follower finds the correct trigger

Outline What is situated language generation? Challenges Modeling linguistic and non-linguistic context Understanding the interplay between language and action Performing in a dynamic environment in real time Summary and discussion

A sister corpus: SCARE 15 spontaneous English dialogue sessions DF view of the virtual world, displayed on the Each session records the joint problem-solving of a pair of human partners working through a treasurehunt style task in a 3D virtual world Stoia et al. (2008)

The SCARE corpus instruction giver (IG) guides instruction follower (IF) through completing tasks DF view of the virtual world, displayed on the IF s view of the world, as displayed on IG s monitor Stoia et al. (2008) IG s map of the world

Example interaction http://slate.cse.ohio-state.edu/quake-corpora/scare

Example transliterated walk forward and go through the first door you see [pause] and then go through the next one right in front of it [pause] yeah that one [pause] ok [disfluency - w] and then turn to your right [pause] and then hit the button in the middle [pause]

Example step-by-step walk forward and go through the first door you see and then go through the next one right in front of it }navigation and then turn to your right and then hit the button in the middle } referring expression generation

What is happening here? More than mere referring expression generation! Looks like the IG is manipulating the extralinguistic context of the discourse in a way that allows him to use a linguistic utterance of lower cognitive complexity How can a generation system model that?

Leßmann et al. (2006) Virtual Baufix game

Virtual agent Max Can interpret: natural language instructions gestures 3D object manipulations Can generate: natural language instructions gestures 3D object manipulations prosody speech, eye gaze, emotional facial expressions

Cooperative problem solving with Max 1 User: Let us build a propeller. 2 Max: Okay. 3 Max: First, a bolt needs to be inserted 4 User: Which bolt? 5 Max: in the middle of a bar. 6 Max: Any bolt. 7 User puts the yellow bolt into the first hole of a three-hole bar. 8 Max: No, that was the wrong hole. Max undoes the action. Leßmann et al. (2006)

Cooperative problem solving with Max 9 Max: Again, a bolt needs to be inserted in the middle hole of a bar. 10 User puts the yellow bolt into the middle hole of a three-hole-bar. 11 Max: Alright. Now, insert <pointing gesture> this bolt into another three-hole-bar. 12 User: Do it yourself. 13 Max: Okay, I will do it. 14 Max connects the objects as instructed. 15 Max: And now the bars need to be turned <iconic gesture> crosswise. Leßmann et al. (2006)

Cooperative problem solving with Max A long pause occurs where the user does not react. 16 Max: Well, then I will perform the construction step. 17 The objects are rotated as instructed. 18 Max: Great. We have just completed the assembly of a propeller. Figure 2: Max assists the human partner in building a propeller Leßmann et al. (2006)

Outline What is situated language generation? Challenges Modeling linguistic and non-linguistic context Understanding the interplay between language and action Performing in a dynamic environment in real time Summary and discussion

connected over Internet The GIVE Challenge 2 1 move forward 2 steps! press the blue button! User plays 3D game in virtual world Natural language generation system generates instructions in real time Koller et al. (2010)

Website

Demo http://www.give-challenge.org

Outline What is situated language generation? Challenges Modeling linguistic and non-linguistic context Understanding the interplay between language and action Performing in a dynamic environment in real time Summary and discussion

Summary Situated language generation is a useful task, but comes with many challenges Fundamental questions about the nature of context in situated communication are open, no unified account of the notions of situated context exists The interplay between language and action is not yet fully explored However we ll see over the next weeks that a lot has been achieved - stay tuned!

Course slides and literature http://www.ling.uni-potsdam.de/~garoufi/ page.php?id=generierung

References Appelt (1982). Planning natural-language utterances to satisfy multiple goals. Brezillon (1999). Context in problem solving: a survey. Byron & Fosler-Lussier (2006). The OSU Quake 2004 corpus of two-party situated problem-solving dialogs. Geldof (1999). Parrot-Talk requires multiple context dimensions. Koller et al. (2010). The First Challenge on Generating Instructions in Virtual Environments Leßmann et al. (2006). Situated interaction with a virtual human - perception, action, and cognition. Stoia et al. (2008). SCARE: A Situated Corpus with Annotated Referring Expressions. van Deemter & Odijk (1997). Context modeling and the generation of spoken discourse. van der Sluis & Krahmer (2001). Generating referring expressions in a multimodal context.