Designing Social Behavior for Humanlike Robots Bilge Mutlu Assistant Professor Department of Computer Sciences CS-202 December 7, 2009
Humanlike Robots Embody human physical, cognitive, and social features Varying degrees of humanlikeness Aethon TUG Matsushita Hospi Honda ASIMO ATR Geminoid
Research Space Human interaction with robots Designing social behavior Verbal and vocal behavior Speech, vocal tone, intonation, prosody, etc. Nonverbal behavior Gaze, gestures, proximity, posture, etc. Verbal and vocal: Speech, vocal tone, intonation, prosody Why design social behavior? Observations of a hospital delivery robot Nonverbal: Gaze, gestures, proximity, posture
Robot with No Manners Workers found the robot to lack manners Humanlike features need to design appropriate social behavior [the robot] doesn't have the manners that we teach our children and it takes precedence over people most of the time I sort of find it insulting that I stand out of the way for patients or a gurney or a wheelchair coming through, but [the robot] just barrels right on You need get out of the way [for the robot]. I called them nasty names and told them, Would you shut the hell up? Can t you see I m on the phone? I ll get to you. If you say, [robot] has arrived, one more time, I m about to kick you in your camera.
Research Question How can we design social behaviors for a humanlike robot? Social scientific Computational Human-computer interaction What social behaviors are important or appropriate? How do we represent and implement these behaviors? How do we get social and cognitive benefits?
Research Approach Understand Represent & Implement Evaluate
Research Approach S 1 S 2 S 3 Existing theory on social behavior Computational models of social behavior Theoretically based manipulations New understanding of social behavior Understand Implementation of behavioral models on interactive platform Represent & Implement Evaluating the social outcome of human-computer interaction Evaluate
Designing Social Gaze Behavior Study I Communicating attention (U.S.) Study II Establishing conversational roles (Japan) Study III Communicating intentions (Japan) Understand Represent & Implement Evaluate
Designing Gaze Behavior Communicating attention Humanoids06
Research Question Increased teacher gaze leads to improved learning Otteson and Otteson, 1980; Sherwood, 1987 Can a robot increase attention looking at a student more?
Understanding Human Behavior People shift their gaze at significant points in speech Kendon, 1967 Gaze away Gaze toward Speech The old woman had made some starch. Look away 70% of the time Look at 73% of the time The old woman had made some starch. Theme Rheme
Understanding Human Behavior Human storyteller Storyteller gaze data Addressee 1 Addressee 2 Camera
Understanding Human Behavior Addressee 1 Addressee 2
Understanding Human Behavior Gaze targets
Understanding Human Behavior Environment 4 3 Down Addressee 1 1 2 Addressee 2 Target clusters
Understanding Human Behavior Time spent 5% Frequency 38% Environment 4 3 Down Time spent 30% Frequency 38% Addressee 1 1 2 Addressee 2 Time spent 38% Frequency 13% Time spent 27% Frequency 11% Times spent & frequencies
Understanding Human Behavior Gamma k=2.19, θ=0.49 count Environment Addressee 1 4 3 count length (ms) Down Gamma k=1.38, θ=1.92 length (ms) 2 1 Gamma k=2.72, θ=0.97 count Gamma k=3.32, θ=0.68 count Addressee 2 length (ms) length (ms) Length distributions
Computational Model Down Environment (p = 0.50) (p = 0.50) Look away (p = 0.70) Addressee Addressee 1 2 (p = 0.50) Look at (p = 0.73) (p = 0.50) Down Environment (p = 0.50) (p = 0.50) Look away (p = 1.00) The old woman had made some starch. Theme Rheme
Evaluation ASIMO told a Japanese fairy tale to two participants ASIMO Between-participants design: 20% of the robot s attention 80% of the robot s attention 20 American participants 12 males, 8 females, aged 19-33 Addressee 2 Addressee 1 (p = 0.50) (p = 0.50) Addressee 2 Addressee 1 (p = 0.80) (p = 0.20) 20% of total gaze directed at addressees 80% of total gaze directed at addressees Look at Look at Participant 1 Participant 2
Hypotheses H.1: Participants who are looked at more will perform better than others Fry and Smith, 1975; Otteson and Otteson, 1980; Sherwood, 1987 H.2: Participants who are looked at more will evaluate ASIMO more positively than others Kleck and Nuessle, 1968; Cook and Smith, 1975 Task performance Positive Evaluation 20% attention 80% attention 20% attention 80% attention
Results Objective: information recall p=ns Task performance 20% attention Predicted Measured Measured F Measured M F M 80% attention Task Performance 5 4 3 2 1 20% attention p=0.03* 80% attention Task Performance Condition 20% 80% 5 4 3 2 1 F 20% attention p<0.01* M 80% attention Participant Gender F M F M
Results Objective: positive evaluation p=ns Positive Evaluation 20% attention Predicted Measured Measured F Measured M 80% attention Positive Evaluation 7 6 5 4 3 2 1 20% attention p=ns 80% attention Positive Evaluation Condition 20% 80% 7 6 5 4 3 2 1 20% attention p=0.04* F M F M 80% attention Participant Gender F M
Conclusions More robot gaze leads to improved learning
Designing Social Gaze Behavior Study I Communicating attention (U.S.) Study II Establishing conversational roles (Japan) Study III Communicating intentions (Japan) Understand Represent & Implement Evaluate
Designing Gaze Behavior Regulating conversational roles Speaker Addressee
Designing Gaze Behavior Regulating conversational roles Unacknowledged non-participants Acknowledged non-participants Overhearer or Eavesdropper Bystander Active participants All participants Speaker Addressee Side participant Levels of participation Goffman, 1979; Clark, 1996
Research Question Gaze cues are instrumental in establishing conversational roles Bales et al., 1951; Schegloff, 1968; Sacks et al., 1974; Goodwin, 1981 How can a robot use these signals? What are these cues?
Understanding Human Behavior Conversational mechanisms: Gaze cues that help manage turn-exchanges Gaze cues induced by discourse structure Gaze cues that signal conversational roles Spatial Temporal Turn-exchanges Discourse structure Role-signaling Gaze away Gaze toward A1 Gaze toward A2 Speech S Speech A1 Speech A2
Understanding Human Behavior Speaker Speaker Overhearer (behind the screen) Bystander Two-party Conversational structure Addressee Addressee Two-party-with-bystander conversational structure Speaker Addressee Addressee Three-party Conversational structure
Understanding Human Behavior Interaction space Speaker Addressee Gaze data for all participants Overhearer/Bystander/Addressee
Understanding Human Behavior Conversational mechanisms: Gaze cues that help manage turn-exchanges Gaze cues induced by discourse structure Gaze cues that signal conversational roles Spatial Temporal Turn-exchanges Discourse structure Role-signaling Gaze away Gaze toward A1 Gaze toward A2 Speech S Speech A1 Speech A2
Understanding Human Behavior Environment Environment (Floor-holding gaze signal) Milliseconds 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 Environment Addresee 1 s face Addressee 1 s body Addresee 2 s face Addressee 2 s body Speaker gaze behavior Speaker speech Addressee 1 speech Speech Addressee 2 speech Question Answer Minimal response Casual narrative segment Talk Simultaneous (overlapping) talk Backchannel responses
Understanding Human Behavior Addressee s face Addressee s face (Turn-yielding gaze signal) Milliseconds 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 Environment Addresee 1 s face Addressee 1 s body Addresee 2 s face Addressee 2 s body Speaker gaze behavior Speaker speech Addressee 1 speech Speech Addressee 2 speech Question Answer Minimal response Casual narrative segment Talk Simultaneous (overlapping) talk Backchannel responses
Understanding Human Behavior Addressee s face Addressee s face (Turn-taking gaze signal) Milliseconds 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 Environment Addresee 1 s face Addressee 1 s body Addresee 2 s face Addressee 2 s body Speaker gaze behavior Speaker speech Addressee 1 speech Speech Addressee 2 speech Question Answer Minimal response Casual narrative segment Talk Simultaneous (overlapping) talk Backchannel responses
Understanding Human Behavior Environment Environment (Floor-holding gaze signal) Milliseconds 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 Environment Addresee 1 s face Addressee 1 s body Addresee 2 s face Addressee 2 s body Speaker gaze behavior Speaker speech Addressee 1 speech Speech Addressee 2 speech Question Answer Minimal response Casual narrative segment Talk Simultaneous (overlapping) talk Backchannel responses
Computational Model Environment Environment (Floor-holding gaze signal) Milliseconds 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 Environment Addresee 1 s face Addressee 1 s body Addresee 2 s face Addressee 2 s body Robot gaze behavior Robot speech Addressee 1 speech Speech Addressee 2 speech Question Answer Minimal response Casual narrative segment
Computational Model Addressee s face Addressee s face (Turn-yielding gaze signal) Milliseconds 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 Environment Addresee 1 s face Addressee 1 s body Addresee 2 s face Addressee 2 s body Robot gaze behavior Robot speech Addressee 1 speech Speech Addressee 2 speech Question Answer Minimal response Casual narrative segment
Computational Model Addressee s face Addressee s face (Turn-taking gaze signal) Milliseconds 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 Environment Addresee 1 s face Addressee 1 s body Addresee 2 s face Addressee 2 s body Robot gaze behavior Robot speech Addressee 1 speech Speech Addressee 2 speech Question Answer Minimal response Casual narrative segment
Computational Model Environment Environment (Floor-holding gaze signal) Milliseconds 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 Environment Addresee 1 s face Addressee 1 s body Addresee 2 s face Addressee 2 s body Robot gaze behavior Robot speech Addressee 1 speech Speech Addressee 2 speech Question Answer Minimal response Casual narrative segment
[Addressee s response] Floor-holding signal Turn-yielding signal Turn-taking signal Floor-holding signal Away Toward Robot Addressee Hello. My name is Robovie. What is your name? It is nice to meet you... [Addressee s response] Gaze Speech (Casual) Narrative Question Answer (Casual) Narrative
Evaluation Robovie conversed with two participants as a travel agent Robovie Between-participants design: c.1: Signaled the roles of a twoparty conversation c.2: Signaled the roles of a twoparty-with-bystander conversation c.3: Signaled the roles of a threeparty conversation 72 Japanese participants All male pairs, aged 18-24 Participant roles: c.1: Overhearer c.2: Bystander c.3: Addressee Participant role: c.1,2,3: Addressee (Covariate) Participant 1 Participant 2
Hypotheses H.1: Addressees will take more speaking turns and speak longer than bystanders and overhearers do H.2: Addressees will recall the details of the information presented by the robot better than bystanders and overhearers do Number of Turns Taken Task Performance Overhearer Bystander Addressee Overhearer Bystander Addressee Two-party Two-party w/ bystander Three-party Two-party Two-party w/ bystander Three-party
Hypotheses H.3: Addressees and bystanders will evaluate the robot more positively than overhearers do H.4: Addressees will express stronger feelings of groupness than bystanders and overhearers do Liking Feelings of Groupness Overhearer Bystander Addressee Overhearer Bystander Addressee Two-party Two-party w/ bystander Three-party Two-party Two-party w/ bystander Three-party
Results Behavioral: turn-taking Participants conformed to their roles 97% of the time
Results Behavioral: turn-taking Number of Turns Taken Overhearer Predicted Measured Bystander Addressee Number of Turns 6 5 4 3 2 1 0 Overhearer (Condition 1) p=ns p<0.01* Bystander (Condition 2) p=0.01* Addressee (Condition 3) Time Spent Speaking (seconds) 8 7 6 5 4 3 2 1 0 Overhearer (Condition 1) p=ns p=0.01* Bystander (Condition 2) p=0.03* Addressee (Condition 3) Two-party Two-party w/ bystander Three-party Conversational Role Overhearer Bystander Addressee
Results Objective: task performance/attentiveness Predicted Measured p=ns p=ns p=ns p=ns p<0.01* p=0.01* Task Performance Task Performance 6 5 4 3 2 1 Task Attentiveness 7 6 5 4 3 2 Overhearer Bystander Addressee 0 Overhearer (Condition 1) Bystander (Condition 2) Addressee (Condition 3) 1 Overhearer (Condition 1) Bystander (Condition 2) Addressee (Condition 3) Two-party Two-party w/ bystander Three-party Conversational Role Overhearer Bystander Addressee
Results Subjective: liking/rapport p=0.01* p=0.05* p=ns Predicted Measured 7 6 Role Overhearer Liking Liking 5 4 Bystander Addressee 3 2 Overhearer Bystander Addressee 1 Overhearer (Condition 1) Bystander (Condition 2) Addressee (Condition 3) Two-party Two-party w/ bystander Three-party
Results Subjective: groupness p=ns p<0.01* p=0.03* Feelings of Groupness Predicted Measured Feelings of Groupness 7 6 5 4 3 2 Role Overhearer Bystander Addressee Overhearer Bystander Addressee 1 Overhearer (Condition 1) Bystander (Condition 2) Addressee (Condition 3) Two-party Two-party w/ bystander Three-party
Conclusions Two-party conversation Two-party conversation Three-party conversation Bystander Robots can manage the roles of their conversational partners using gaze cues
Designing Social Gaze Behavior Study I Communicating attention (U.S.) Study II Establishing conversational roles (Japan) Study III Communicating intentions (Japan) Understand Represent & Implement Evaluate
Designing Gaze Behavior Communicating mental states using subtle cues Yes Picker No A game of guessing HRI09 Guesser Is your pick black?
Research Question Subtle gaze cues lead to mental state attribution Emery, 2000 Could robots use subtle cues to communicate mental states?
Understanding Human Behavior Picker Guesser Picker Participants playing the guessing game Guesser
Understanding Human Behavior Normal (µ=527,σ=127) Shapiro-Wilk W Test, W=0.98, p=.70 count length (ms) 100 200 300 400 500 600 700 800 900 1000 Cue length distribution
Computational Model Robovie Geminoid Task space (Stationary state) Leakage cue (Robot s pick) Guesser s face (Eye-contact) Milliseconds 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Guesser s face Robot s pick Task space Robot gaze behavior Robot speech Speech Guesser s speech Guesser s question Gaze travel Leakage cue Gaze travel Robot s answer
Geminoid, without the cue
Geminoid, with the cue
Robovie, without the cue
Robovie, with the cue
Evaluation Guessing game with the robots Robovie Geminoid Robots, picker; participants, guesser Played 8 rounds of game Between-participants: Geminoid vs. Robovie Within-participants: Glance vs. not glance at the pick (4 of 8 rounds each) 26 Japanese participants 17 males, 9 females; aged 18-24 Participant
Hypotheses H.1: Participants will perform better when the robot leaks information H.2: The effect will be stronger with Geminoid than with Robovie Task Performance (time/number of questions) Lower values indicate better performance Task Performance (time/number of questions) Geminoid Robovie No Leakage Cue Leakage Cue No Leakage Cue Leakage Cue
Results Hypothesis I Lower values indicate better performance p=0.04* p=0.02* Task Performance (time/number of questions) Predicted Measured Number of Questions 6 5 4 3 2 1 Time (secs) 60 50 40 30 20 10 No Leakage Cue Leakage Cue 0 No Gaze Cue Gaze Cue 0 No Gaze Cue Gaze Cue Population No Gaze Cue Gaze Cue
Results Hypothesis II Task Performance (time/number of questions) Geminoid Predicted Robovie Predicted Geminoid Measured Robovie Measured Time (secs) 60 50 40 30 20 10 p=0.05* p=ns Reported Gaze Cues 1.00 0.75 0.50 0.25 p<0.01* Yes Yes No No No Leakage Cue Leakage Cue 0 No Gaze Cue Geminoid Gaze Cue No Gaze Cue Robovie Gaze Cue 0.00 Geminoid Robovie Population No Gaze Cue Gaze Cue Population No Yes
Conclusions Verbal and vocal: Speech, vocal tone, intonation, prosody Subtle social cues can communicate intentions Nonverbal: Gaze, gestures, proximity, posture
High-level Conclusions Computational models of human behavior Oratorial gaze Conversational gaze (turn-taking, role-signaling cues, gaze patterns induced by discourse structure, etc.) Subtle leakage cues Social and cognitive impact Better information recall Higher conversational participation Heightened task attentiveness Stronger rapport and feelings of groupness Better task performance led by stronger interpretations of mental states
Current Work Applications Learning and games Diagnosis and therapy Collaborative Interfaces Entertainment
If You Are Interested Take my courses Undergraduate course Introduction to Human-Computer Interaction Graduate course Human-Computer Interaction Join my lab The Human-Computer Interaction Lab More information http://bilgemutlu.com