Learning Interrogation Strategies while Considering Deceptions in Detective Interactive Stories

Similar documents
Neural Network Model of the Backpropagation Algorithm

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

More Accurate Question Answering on Freebase

Fast Multi-task Learning for Query Spelling Correction

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

MyLab & Mastering Business

1 Language universals

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

Reinforcement Learning by Comparing Immediate Reward

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Task Completion Transfer Learning for Reward Inference

Task Completion Transfer Learning for Reward Inference

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Georgetown University at TREC 2017 Dynamic Domain Track

Efficient Use of Space Over Time Deployment of the MoreSpace Tool

Speeding Up Reinforcement Learning with Behavior Transfer

Agent-Based Software Engineering

Transfer Learning Action Models by Measuring the Similarity of Different Domains

A Case-Based Approach To Imitation Learning in Robotic Agents

Guru: A Computer Tutor that Models Expert Human Tutors

Human Factors Computer Based Training in Air Traffic Control

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Faculty Schedule Preference Survey Results

Language Acquisition Chart

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Decision Making Lesson Review

Lecture 10: Reinforcement Learning

Learning Methods in Multilingual Speech Recognition

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Axiom 2013 Team Description Paper

THE VIRTUAL WELDING REVOLUTION HAS ARRIVED... AND IT S ON THE MOVE!

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Norms How were TerraNova 3 norms derived? Does the norm sample reflect my diverse school population?

Applying ADDIE Model for Research and Development: An Analysis Phase of Communicative Language of 9 Grad Students

FCE Speaking Part 4 Discussion teacher s notes

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

TEAM NEWSLETTER. Welton Primar y School SENIOR LEADERSHIP TEAM. School Improvement

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Team Work in International Programs: Why is it so difficult?

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq

AMULTIAGENT system [1] can be defined as a group of

Improving Fairness in Memory Scheduling

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

Dialog Act Classification Using N-Gram Algorithms

The Round Earth Project. Collaborative VR for Elementary School Kids

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

General Microbiology (BIOL ) Course Syllabus

Initial English Language Training for Controllers and Pilots. Mr. John Kennedy École Nationale de L Aviation Civile (ENAC) Toulouse, France.

LITERACY ACROSS THE CURRICULUM POLICY

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

PreReading. Lateral Leadership. provided by MDI Management Development International

Adaptive Generation in Dialogue Systems Using Dynamic User Modeling

The Master Question-Asker

Department of Statistics. STAT399 Statistical Consulting. Semester 2, Unit Outline. Unit Convener: Dr Ayse Bilgin

Lab 1 - The Scientific Method

Lecture 1: Machine Learning Basics

Grade 6: Module 2A: Unit 2: Lesson 8 Mid-Unit 3 Assessment: Analyzing Structure and Theme in Stanza 4 of If

High-level Reinforcement Learning in Strategy Games

End-of-Module Assessment Task K 2

LISTENING STRATEGIES AWARENESS: A DIARY STUDY IN A LISTENING COMPREHENSION CLASSROOM

Active Learning. Yingyu Liang Computer Sciences 760 Fall

International Journal of Innovative Research and Advanced Studies (IJIRAS) Volume 4 Issue 5, May 2017 ISSN:

Course Content Concepts

IBM Software Group. Mastering Requirements Management with Use Cases Module 6: Define the System

CALCULUS I Math mclauh/classes/calculusi/ SYLLABUS Fall, 2003

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

Secondary English-Language Arts

It s not me, it s you : An Analysis of Factors that Influence the Departure of First-Year Students of Color

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Artificial Neural Networks written examination

Generating Test Cases From Use Cases

Motivating developers in OSS projects

Ex-Post Evaluation of Japanese Technical Cooperation Project

Mandarin Lexical Tone Recognition: The Gating Paradigm

SIE: Speech Enabled Interface for E-Learning

On-Line Data Analytics

Abstractions and the Brain

Causal Link Semantics for Narrative Planning Using Numeric Fluents

ACTIVITY: Comparing Combination Locks

Leader as Coach. Preview of the Online Course Igniting the Fire for learning

CULTURE OF SPAIN. Course No.: SP 205 Cultural Introduction to Spain Credits: 3

Data Fusion Models in WSNs: Comparison and Analysis

Improving Action Selection in MDP s via Knowledge Transfer

Knowledge Transfer in Deep Convolutional Neural Nets

Grammar Lesson Plan: Yes/No Questions with No Overt Auxiliary Verbs

Empirical research on implementation of full English teaching mode in the professional courses of the engineering doctoral students

Characterizing Mathematical Digital Literacy: A Preliminary Investigation. Todd Abel Appalachian State University

MERRY CHRISTMAS Level: 5th year of Primary Education Grammar:

Evaluation Report Output 01: Best practices analysis and exhibition

The development and implementation of a coaching model for project-based learning

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

REVIEW OF CONNECTED SPEECH

Matching Similarity for Keyword-Based Clustering

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Transcription:

Proceedings of he Ninh AAAI Conference on Arificial Inelligence and Ineracive Digial Enerainmen Learning Inerrogaion Sraegies while Considering Decepions in Deecive Ineracive Sories Guan-Yi Chen, Edward Chao-Chun Kao, and Von-Wun Soo Insiue of Informaion Sysems and Applicaions, Naional Tsing Hua Universiy guanyinu@gmail.com, edkao@cs.nhu.edu.w, soo@cs.nhu.edu.w Absrac The sraegies for ineracive characers o selec appropriae dialogues remain as an open issue in relaed research areas. In his paper we propose an approach based on reinforcemen learning o learn he sraegy of inerrogaion dialogue from one virual agen oward anoher. The emoion variaion of he suspec agen is modeled wih a hazard funcion, and he deecive agen mus learn is inerrogaion sraegies based on he emoion sae of he suspec agen. The reinforcemen learning reward schemes are evaluaed o choose he proper reward in he dialogue. Our conribuion is wofold. Firsly, we proposed a new framework of reinforcemen learning o model dialogue sraegies. Secondly, background knowledge and emoion saes of agens are brough ino he dialogue sraegies. The resuled dialogue sraegy in our experimen is sensiive in deecing lies from he suspec, and wih i he inerrogaor may receive more correc answer. Inroducion As dialogue sysems gain imporance in he domain of virual agens as ineracive characers, dialogue sraegies are receiving more and more aenion. In ineracive scenarios played by ineracive characers, o know when and how o selec proper dialogues in a specific social conex is no a rivial ask. Lies, hreas, berayals, inerrogaions, and so forh, are he kinds of dialogues will inensify he sories. Here we focus on one ype of essenial dialogues bu wih significan imporance: inerrogaion. While he main purpose of inerrogaion is o he same as he fundamenal ask-inform proocol in agen communicaion, here is always a probabiliy ha he agen being inerrogaed will aemp o deceive he inerrogaor. If he inerrogaor keeps asking quesions ha ouch cerain sensiive words or issues, i migh cause a suspec o become even more deceive because of being frighened. To verify a reply from a suspec, a deecive may ask while knowing he answer. Besides asking while knowing he answer, a deecive can also change he subjecs emporarily and aemp o pacify he suspec. All of hese possible direcions of dialogue compared o ask-inform obviously require more inermediae saes han jus simple knowledge saes. The exra speech acs give flavor o ineracive sories, bu also diverse dialogues and may even be misused if ill-planned. To srike a balance beween dramaic effecs and communicaion effeciveness, he deecive needs o have proper inerrogaion sraegies, based on he emergen informaion during dialogues. Because he dialogue conex is raher complex, i is hard o implemen he proper dialogue knowledge a priori. To allow an agen o learn sraegies from background conexs, we propose a reinforcemen learning scheme o learn inerrogaion sraegies. Our goal of sudy is o learn he proper inerrogaion dialogue sraegies wih background knowledge by using reinforcemen learning. However, o conduc he reinforcemen learning, he reward scheme plays an imporan role. Providing proper rewards (or punishmen) in righ conexs will acually deermine a policy funcion of an inerrogaion dialogue o be properly acquired for a given social conex. For his reason, we also have o simulae varying conexs, especially he menal saes of he suspec, o ensure he reward scheme is properly designed by various experimens in erms of acual performances. The res of his paper is organized as follows. Firsly, we describe relaed work o our learning mehod. Secondly, we show our mehod of applying Q-learning o emoion model in deails. Thirdly, we presen our experimen seings, resuls, and discussions. Finally, we conclude our research. Copyrigh 2013, Associaion for he Advancemen of Arificial Inelligence (www.aaai.org). All righs reserved. 114

Relaed Work Machine Learning in Form-filling Dialogues Many applicaions of reinforcemen learning for finding dialogue sraegies have been done in he framework of slo-filling asks such as negoiaion dialogue (Selfridge and Heeman 2011), resauran recommendaions (Jurčíček, Thomson, and Young 2012) and virual museum guides (Misul e al. 2012). In hese dialogue sysems, he agen asks he user for he value of a slo, and he sae represenaion of slo-filler is sraigh-forward. For each slo in he form, variables are used o rack back he sysem knowledge from he slo. For a given value of he sae variables, i does no maer how he sysem go o ha poin, as he cos o he end of he dialogue only depends on he sysem knowledge of he slos. Wih such a sae represenaion, good policies can be learned. Thus, he variables needed for he Reinforcemen Learning sae are mainly acion-decision variables and few if any bookkeeping variables are needed. Machine Learning in Unsrucured Dialogues Once we move away from form-filling dialogues, characerizing he sae becomes more difficul. Walker used reinforcemen learning o learn he dialogue policy of a sysem ha helps a user o read email (Walker 2000). Reinforcemen learning is used o learn he choices for a dialogue sraegy a an uerance ha is effecive. The choice is hen memorized as par of he reinforcemen learning sae ha furher consrains he sysem s subsequen behavior. A dialogue sraegy can be also acquired using Markov decision processes (MDPs) and he reinforcemen learning algorihms (Biermann and Long 1996; Levin, Pieraccini, and Ecker 1997; Walker, Fromer, and Narayanan 1998; Singh e. al 1999; Liman e. al 2000; Cuayáhuil 2011). In order o reduce he rich search space of reinforcemen learning wih updaing rules, Heeman proposed combing reinforcemen learning and informaion sae o generae complex dialogues beween he sysem and he simulaed user (Heeman 2007). In he fligh informaion dialogue, i needs o quickly ge he informaion of he origin of a fligh, he airline of he fligh, he deparure ime, ec. They do no ake possibiliy of lying ino consideraion. In his ype of dialogue, he less dialogue seps aken i is beer. Hence hey need o consider he cos of dialogue seps. Recenly, here have been sudies on learning negoiaion policies (Heeman 2009; Georgila and Traum 2011a; Georgila and Traum 2011b), in which he sysem and he user need o exchange informaion in order o decide on a compromised good soluion. The informaion hough, unlike in a slo-filling dialogue, is no par of he final soluion. However, he superinenden agen in our deecive sory scenario no only needs o efficienly ge he informaion abou he crime, bu also needs o make sure answer is rue since here is a possibiliy ha he suspec may lie. Mehod Sysem Archiecure The sysem archiecure ha consiss of wo agens: he agen1 represens he superinenden (he deecive), and agen2 he crime he suspec. Ineracions beween he wo agens are shown in Figure 1. The superinenden agen conains wo pars: Cogniion module and Learning module, whereas he suspec agen conains Emoion simulaion insead of Learning module. Boh of heir Cogniion modules consis of an Undersanding module, social conexs and speech acs. Reinforcemen Learning Figure 1: Sysem Archiecure The superinenden uses is Undersanding module o inerpre an answer and emoion which feedback from he suspec. When he superinenden receives an answer, i becomes a known predicae in he social conex knowledge of he superinenden. The superinenden will obain a reward depending on he correcness and relevance of he criminal case in erms of a reward able, or reward funcion. Afer geing reward, he superinenden updae is inerrogaion policy and social conexs accordingly based on he mehod in Reinforcemen Learning. When iniialized, he suspec is given known social conex facs and relaion as prior social conex knowledge and can use is Undersanding module o inerpre he quesion requesed by superinenden. If he quesion menions a sensiive keyword during inerrogaion, i could rigger he fear emoion of he suspec, and hrough Emoion simulaion he suspec decides o ell a lie or he ruh. To simplify he problem wihou losing he generaliy for auomaed simulaion, we assume boh undersanding modules for agens can simply inerpre he predicaes implemened in he social conex regardless heir ruh values are known or unknown. Q-learning The Q-learning algorihm use a Q-funcion o calculae he qualiy score of a sae-acion combinaion: Q:S A R (1) 115

, where S is curren sae, A is nex acion. Before Q-learning, Q-funcion is assigned a very low fixed value iniially. Then wih each sep of acions, superinenden is received a reward, a new score value is calculaed for each combinaion of a curren sae from S, and a nex acion from A. The core of he algorihm is a simple value-updae ieraion. I updaes a new value of Q based on is old value and he new reward informaion afer he sae ransiion of execuing an acion. Q(S, A ) Q(S, A ) + α (S, A ) R + 1 + γ max Q(S + 1, A + 1 ) Q(S, A ) α + 1 (2), where R +1 is he reward observed afer performing A in S, α(s, A) ( 0 < α 1 ) is he learning rae. The discoun facor γ is such ha 0 γ <1. In he ransiional Q-learning, he environmen moves o a new sae S +1 and he reward R +1 associaed wih he ransiion (S, A, S +1 ) is deermined. In reinforcemen learning, agen selecs he acion o opimize Q based on equaion (3). arg max Q*(s,a ) (3) a A, where he Q-funcion specifies he cumulaive rewards for each sae-acion pair. In conras o he previous reinforcemen learning mehods, he learning goal is o make all unknown of he superinenden become known ha is differen from he previous work whose goal sae is in is own sae space. Simulaing he Suspec s Emoional Response The major reason for why suspecs will lie is fear, and fear can be reaed as an emoion arousal concep in general. Here we follow psychological sudies ha he emoion arousal could be represened by a Hazard Disribuion (Avinadav and Raz. 2008) (Verduyn e. al 2009)(Verduyn, Mechelen, and Tuerlinckx, 2011), and he emoion changes of he suspec is modeled wih i in equaion (4): k 1 ( ρ) h() = kρ (4) k 1 + ( ρ) The definiion of hazard funcion h() is he even of he lying even where is he ime o accumulae fear emoion, ρ is failure rae (hazard rae) and k is dimensionless parameer. As ime grows, he larger is ρ he sooner (smaller ) will he suspec lie, while he smaller ρ, he laer will he suspec lie. Using he hazard funcion wih differen parameer values, we could model he personaliy of a suspec in erms of his or her endency o lie in face of fear. We use wo differen values of ρ in hazard funcion Equaion (4) o simulae he differen personaliies of a suspec o lie in response o fear. We use a fixed value k as 2.464 and wo values of ρ as 2.232 and 2.993 o represen wo kinds of personaliies. As in Figure 2, x-axis indicaes fear emoion value while y-axis indicaes he hazard funcion value ha deermines he endency of lying behavior for an ineracive characer. The wo curves in Figure 2 represen wo differen lying personaliy curves in erms of fear emoion accumulaion. To simplify he emoion simulaion, we ake he emoion value of maximum probabiliy o lie as he lying hreshold of fear emoion. When he suspec s fear exceeds he hreshold, she will always lie. Figure 2: The simulaion of wo kinds of personaliies Emoion deecion for he superinenden The superinenden needs o conduc a learning algorihm according o he suspec s emoion variaion and social conex and learn a proper dialogue sraegy. We divide he suspec s emoion value of fear ino five discree and rough levels shown in Table 2 can be sensed by superinenden by observaion. Neverheless, he superinenden is unable o deermine wheher he suspec will lie by observaion. Table 1: Emoion levels for learning Emoion Level Emoion Value level 1: normal 0 E 0.2 level 2: worried 0.2 < E 0.4 level 3: afraid 0.4 < E 0.6 level 4: scared 0.6 < E 0.8 level 5: errified 0.8 < E 1 There are wo reasons for why we divide emoion value ino five levels: o simplify he differences of Q-able wih respec o differen values of emoion, and o limi he observaion abiliy of he superinenden. Each emoion level has is own Q-able, and i has ransiions beween he sae-acion of each Q-able as shown in Figure 3. Figure 3: The relaion beween emoion level and emoion sae ransiion 116

Experimenaion Experimen Seings Background Knowledge for Dialogue Agens We adop a scenario in he deecive novel Cards on he Table wrien by Agaha Chrisie ha was firs published in 1936. The main characers are Jim Wheeler as he superinenden, Miss Meredih as one of suspecs, and Mr. Shaiana as he vicim. In he sory, here is a scene of inerrogaion dialogues beween he superinenden Jim Wheeler and suspec Miss Meredih. In he original scenario, Miss Meredih ell lies because sensing inensified fear during he inerrogaion. The goal of our experimen is o make Wheeler learn an inerrogaion sraegy, which can keep Miss Meredih from lying, wih minimal amoun of seps during inerrogaion. We assume 7 kinds of background social conexs as facs or relaions o be known by a deecive. Person (p) indicaes p is a person. Objec (o) indicaes o is an objec in he crime scene. Locaion (l) indicaes l is locaion around he place where he crime happened. Acion (ac) indicaes ac is an acion relaed o he crime scene. Where-Objec (o, l) indicaes Objec (o) is a Locaion (l). Where-Person (w, p) indicaes Person (p) is a Locaion (l). Behavior (w, ac, l) indicaes Person (p) performs Acion (ac) a Locaion (l). Relaion (w 1, w 2 ) indicaes w 1 and w 2 are friends. In he social conex for he case sudy, we oally have 84 possible saes for predicaes above. The background knowledge for he superinenden can be iniially assigned as conex nodes in eiher known or unknown caegories. The ulimae goal for he superinenden agen is o inerrogae he suspec and ensure all unknown facs and relaions o become known ones wih correc values. Speech Acs for Inerrogaion As shown in able 1, hree kinds of speech acs are defined in inerrogaion sraegies of he superinenden o search for facs. For each unknown predicae sae, he superinenden can use ASK o inerrogae he suspec. The superinenden uses CONFIRM only when he asks for a known predicae sae. When superinenden deecs he suspec s fear emoion, superinenden can use PACIFY o calm down his emoion. Table 2: Three Speech acs of he Superinenden Speech Ac ASK CONFIRM PACIFY Descripion The general form of ask-if and ask-ref. To ask he suspec a quesion. To verify wheher he suspec ells a lie or he ruh. To calm down a suspec s emoion. 5000 dialogue senences wih differen order and lenghs of speech acs are used o learn an inerrogaion sraegy. A every dialogue, we oally have 84 saes and 3 kinds of speech acs of ASK, CONFIRM, and PACIFY o be seleced in he reinforcemen learning. The oal size of wo-dimensional Q-able is (3*84)*(3*84). Sensiive words relaed o he murder case are defined in advance, and he fear emoion of he suspec will be aroused if any of hese keywords in able 3 are menioned during inerrogaion. Table 3: Sensiive keywords in he scenario Sensiive keywords The fireplace The empire chair The able near fireplace The sileo The door The able near door Each inerrogaion dialogue is generaed according he exploraion procedure in Table 4. Table 4: The exploraion procedure of reinforcemen learning 1 CounEmpy(unknown) = N; 2 while (CounEmpy(unknown)!= 0) 3 { 4 Obain answer and emoion from he suspec; 5 Updae unknown able; 6 Se emoion level and updae is own Q-able; 7 Random choose a nex sae and acion; 8 } Reward Scheme A reward scheme of each ype of speech acs in he reinforcemen learning is described in Table 5. The basic reward for boh ASK and CONFIRM is -500 as a communicaion cos. However, he reward for successfully deecing a lie using CONFIRM is 1000, and he superinenden will record he suspec is elling a lie. The PACIFY speech ac is o calm down he fear of he suspec. The sole reward of PACIFY is -250, bu his ac will also decrease he fear emoion of he suspec. If he fear emoion changes, he reward is given as he emoional level change muliplied by 2000. According o he Hazard funcion, he range of emoion change is beween 0.03 and 0.06, so he change in erms of reward is abou 60~120. Table 5: Reward scheme for reinforcemen learning Caegory Descripion Reward Speech ac Social conex Emoion ASK Inerrogae he suspec -500 CONFIRM Deec lying 1000 Does no deec lying -500 PACIFY Calm down he suspec emoion -250 Know Where did suspec go o Behavior Know Wha did vicim do 200 Know Wha did suspec do Relaion Know The relaion beween objec and locaion Fear Up Emoion fear raises up. -2000 * Fear Down Emoion fear goes down. emoion deviaion Resuls and Discussions We compare he rae of correc answer and he cos of dialogue seps based on he performance in all 5 cases as shown in Table 6. 117

Table 6: The comparison of 5 differen cases Emoion Change Reward Scheme Lying hreshold Correc informaion rae Number of dialogue seps Case 1 Wihou deecing emoion 0.52 0.41 61 Case 2 ρ = 2.232 Shown in 0.52 1 103 Table 5 Case 3 ρ = 2.993 Shown in 0.39 0.79 132 Table 5 Case 4 ρ = 2.232 10x emoion 0.52 1 132 change value as ha in case 2 Case 5 ρ = 2.232 The same as Case 2, excep reward of CONFIRM = 200 0.52 0.79 81 Baseline: Case 1 Case 1 sands as a baseline in our experimen, as he superinenden in case 1 learns an inerrogaion sraegy wihou he abiliy o deec fear. The resuled dialogues are shown in secion A, B and C of Table 7. Noe ha he Lie and Truh signals are no direcly sen o he superinenden. In secion A in Table 7, he superinenden (Sup Jim Wheeler) asks wheher he vicim (Mr. Shaiana) wen o he bridge able or no. The suspec (Miss Meredih) replies yes. The nex quesion he superinenden asks wha he vicim did a he bridge able. This ime since he quesion is no relaed o he murder, he fear emoion of he suspec is sable. The suspec replies wih he ruh. In secion B, when quesions conaining sensiive keywords fireplace and empire chair are asked, he suspec s fear emoion raises up. By exhausively asking unknown predicae saes, he suspec replies ha he vicim was a he empire chair. Table 7: The descripion of secion A, B and C Secion A Wheeler: [ASK] if: Where-Person (mr-shaiana, bridge-able) Meredih: yes [True, Emoion: 0] Wheeler: [ASK] ref: Behavior (mr-shaiana,?, bridge-able) Meredih: Behavior (mr-shaiana, see-bridge, bridge-able) [True, Emoion: 0]... Secion B Wheeler: [ASK] if: Where-Person (mr-shaiana, fireplace) Meredih: yes [True, Emoion: 0.09] Wheeler: [ASK] if: Where-Person (mr-shaiana, empire-chair) Meredih: yes [True, Emoion: 0.27] Wheeler: [ASK] if: Where-Person (ms-meridih, empire-chair) Meredih: yes [True, Emoion: 0.36] Wheeler: [ASK] ref: Behavior (ms-meridih,?, empire-chair) Meredih: found-mr-shaiana-dead [True, Emoion: 0.45]... Secion C Wheeler: [ASK] if: Where-Person (ms-meridih, able-near-fireplace) Meredih: no [Lie, Emoion: 0.72] However, he suspec begins o lie in secion C, which is shown in Figure 4, resuling in low overall correc informaion rae. Figure 4: The emoion variaion in case 1 Inerrogae Differen Personaliies: Case 2 and Case 3 In case 2, we se ρ = 2.232 o model a raher bold Miss Meredih. In his case, her fear emoion accumulaed slowly. Before reaching he lying hreshold, her fear emoion is lowered by PACIFY effecively. We can see clearly ha he suspec did no ell a lie from he beginning ill he end, which is shown in Figure 5. Figure 5: The emoion variaion in case 2 The dialogue resul in secion D is shown in Table 8. The general order of speech acs is CONFIRM PACIFY ASK. Since he fear emoion value lays in level 3, every ime when fear emoion value of he suspec reach he lying hreshold he superinenden use CONFIRM o check wheher he suspec lies or no. A level 3, when deecing he fear emoion change, he superinenden uses PACIFY o lower down he fear emoion, and hen uses ASK o coninue on asking quesions. Table 8: The descripion of secion D Secion D Wheeler: [ASK] ref: Behavior (mr-shaiana,?, able-near-fireplace) Meredih: drink [True, Emoion: 0.51] Wheeler: [CONFIRM] if: Behavior (ms-meredih, leave, bridge-sea) Meredih: yes [True, Emoion: 0.51] Wheeler: [PACIFY] (Do no be afraid, ake a deep breah. Say clearly.) Meredih: (Silence) [True, Emoion: 0.45] Wheeler: [ASK] if: Where-Person (ms-meredih, fireplace) Meredih: yes [True, Emoion: 0.48] Wheeler: [PACIFY] (Do no be afraid, ake a deep breah. Say clearly.) Meredih: (Silence) [True, Emoion: 0.42] 118

In case 3, we se ρ is 2.993 o model a cowardly Miss Meredih. In conras o case 2, he fear emoion of he suspec accumulaed relaively faser o reach he hreshold o lie. Similar o case 2, he superinenden learned a policy wih deecion of lying behavior. The resul is shown in Figure 6. I urns ou ha he superinenden deecs he lying behavior in emoion level 3 (emoion value: 0.4 ~ 0.6) and begins o use CONFIRM and PACIFY. Figure 6: The emoion variaion in case 3 Figure 7: The emoion variaion wih enfold reward on emoion changes in case 4 The reason we only change he reward on emoion change is o observe he difference of using PACIFY speech ac. And he resul of his case i shows ha he reward 2000*(emoion difference) is raher proper. In case 5, he imporance of CONFIRM is lowered by seing is reward value o 200. The superinenden learned a differen inerrogaion sraegy compared o ha in case 2. The resuled dialogues of secion E is described in Table 9. Table 9: The descripion of secion E Secion E Wheeler: [ASK] if: Where-Person (ms-meredih, able-near-fireplace) Meredih: no (Lie, Emoion: 0.52) Wheeler: [CONFIRM] if: Behavior (ms-meredih, leave, bridge-sea) Meredih: no (Lie, Emoion: 0.52) Wheeler: [PACIFY] (Do no be afraid, ake a deep breah. Say clearly) Meredih: (Silence) (Emoion: 0.49)... Wheeler: [ASK] ref: Behavior (ms-meredih,?, able-near-fireplace) Meredih: drink (True, Emoion: 0.22) A he end of session E, he fear emoion of he suspec is reduced o a level ha causes he superinenden o believe ha he suspec may ell he ruh, herefore he sraegy of speech ac of he superinenden changes o ASK again. The resuls of case 2 and case 3 show ha he superinenden may comfor he suspec in ime o decrease he fear emoion of he suspec before lying. Even when inerrogaing a cowardly suspec, he superinenden will sill perform PACIFY as soon as afer deecing a lie. Differen Reward Values: Case 4 and Case 5 In case 4, we change he reward on emoion change wih enfold value where he reward value for emoion change is among 600~1200, resuling in a merciful superinenden who uses PACIFY afer asking every imporan quesion. The resul is shown in Fig 7. Figure 8: The emoion variaion of case 5, in conras o case 2 In Figure 8, he performance of deecing lying by using he reward values for emoion change o be he same as 200 is no good shown as he dashed curve in conras o he solid curve of case 2. The superinenden may deec lying a he wrong emoion level and canno calm down he suspec s fear emoion beyond he lying hreshold. Conclusion In ineracive sories, he deecive who is conducing inerrogaion dialogue mus ensure o obain he rue informaion in face of possibiliy of lying under fear by he suspec. Previous work of learning from dialogue aim o use reinforcemen learning o ge answers from he user quickly in erms of cos of dialogue seps. They do no ake lying ino consideraion. We propose a framework of 119

reinforcemen learning by a reward scheme based on speech acs and social conexs o learn he bes sraegy in inerrogaion dialogue, where formal rules could no be given in advance. While boh he suspec and he superinenden are modeled wih agens, he suspec may lie if is fear is arisen during inerrogaion, and is fear emoion is simulaed wih a hazard funcion. The resul of learned inerrogaion sraegies is very sensiive in deecing lies of he suspec, allowing he superinenden o elici more correc answers during inerrogaion wih less dialogue seps. This work can be exended o acquire differen inerrogaion sraegies for a superinenden o deal wih suspecs wih differen personaliies. I can also be augmened in fuure work by versaile ypes of speech acs and reward schemes o invesigae more elaborae dialogue sraegies o be adoped by dialogue agens in an ineracive scenario. References Avinadav T., and Raz, T. 2008. A New Invered U-Shape Hazard Funcion. IEEE Transacions on Reliabiliy 57(1): 32-40. Biermann, A.W., and Long, P. M. 1996. The composiion of messages in speech-graphics ineracive sysems. In Proceedings of he 1996 Inernaional Symposium on Spoken Dialogue, 97-100. Cuayáhuil, H. 2011. Learning Dialogue Agens wih Bayesian Relaional Sae Represenaions. In Proceedings of he IJCAI Workshop on Knowledge and Reasoning in Pracical Dialogue Sysems (IJCAI-KRPDS), 9-15, Barcelona, Spain: Inernaional Join Conferences on Arificial Inelligence, Inc. Georgila, K., and Traum, D. 2011a. Learning Culure-Specific Dialogue Models from Non Culure-Specific Daa. In Proceedings of he 6h Inernaional Conference on Universal Access in Human-Compuer Ineracion: Users Diversiy - Volume Par II (UAHCI'11), Sephanidis, C. (Ed.), Vol. Par II, 440-449. Berlin, Heidelberg: Springer-Verlag. Georgila, K., and Traum, D. 2011b. Reinforcemen Learning of Argumenaion Dialogue Policies in Negoiaion. In Proceedings of INTERSPEECH 2011, 12h Annual Conference of he Inernaional Speech Communicaion Associaion, 2073-2076. Florence, Ialy: ISCA. Heeman, P. A. 2007. Combining Reinforcemen Learning wih Informaion-Sae Updae Rules. In Proceedings of he Norh American Chaper of he Associaion for Compuaional Linguisics Annual Meeing, 268-275. Sroudsburg, PA: Associaion for Compuaional Linguisics. Heeman, P. A. 2009. Represening he Reinforcemen Learning Sae in a Negoiaion Dialogue. In Proceedings of he IEEE Auomaic Speech Recogniion and Undersanding Workshop (ASRU), 450-455. Merano, Ialy: IEEE. Jurčíček, F., Thomson, B., and Young, S. 2012. Reinforcemen Learning for Parameer Esimaion in Saisical Spoken Dialogue Sysems. Compuer Speech and Language 26(3):168-192. Levin, E., Pieraccini, R., and Ecker, W. 2000. A Sochasic Model of Human-Machine Ineracion for Learning Dialog Sraegies. IEEE Transacions on Speech and Audio Processing 8(1):11 23. Liman, D. J., Kearns, M. S., Singh, S. P., and Walker, M. A. 2000. Auomaic Opimizaion of Dialogue Managemen. In Proceedings of he 18h conference on Compuaional linguisics - Volume 1 (COLING '00), Vol. 1, 502-508. Sroudsburg, PA: Associaion for Compuaional Linguisics. Misu1, T., Georgila, K., Leuski, A., and Traum, D. 2012. Reinforcemen Learning of Quesion-Answering Dialogue Policies for Virual Museum Guides. In Proceedings of he 13h Annual Meeing of he Special Ineres Group on Discourse and Dialogue, 84-93, Sroudsburg, PA: Associaion for Compuaional Linguisics. Selfridge, E. O., and Heeman, P. A. 2011. Learning Turn, Aenion, and Uerance Decisions in a Negoiaive Slo-Filling Domain, Technical Repor, CSLU-11-005, Cener for Spoken Language Undersanding, Oregon Healh & Science Universiy, Porland, OR. Singh, S. P., Kearns, M. J., Liman, D. J., and Walker, M. A. 1999. Reinforcemen Learning for Spoken Dialogue Sysems. In Proceedings of NIPS 1999, 956-962. Cambridge, MA: The MIT Press. Verduyn, P., Delvaux, E., Coillie, H. V., Tuerlinckx, F., and Mechelen, I. V. 2009. Predicing he Duraion of Emoional Experience: Two Experience Sampling Sudies. Emoion 9:83-91. Verduyn, P., Mechelen, I. V., and Tuerlinckx, F. 2011. The Relaion beween Even Processing and he Duraion of Emoional Experience. Emoion 11:20-28. Walker, M. A., Fromer, J. C., and Narayanan, S. 1998. Learning Opimal Dialogue Sraegies: A Case Sudy of a Spoken Dialogue Agen for Email. In Proceedings of he 36 h Annual Meeing of he Associaion of Compuaional Linguisics, 1345-1352. Sroudsburg, PA: Associaion for Compuaional Linguisics. Walker, M. A. 2000. An Applicaion of Reinforcemen Learning o Dialog Sraegy Selecion in a Spoken Dialogue Sysem for Email. Journal of Arificial Inelligence Research 12:387-416. 120