A 3D Virtual Environment for Exploratory Learning in Mobile Robot Control

A 3D Virtual Environment for Exploratory Learning in Mobile Robot Control Ayanna M. Howard Human-Automation Systems (HumAnS) Lab School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA, USA Wesley Paul Jet Propulsion Laboratory California Institute of Technology 4800 Oak Grove Drive, Pasadena, CA 91109, USA Abstract This paper discusses a virtual environment that enables human agents to develop the skills necessary to control a mobile robot through the implementation of exploratory learning practices. The interface connects the human user to both a virtual and physical robot resident in the real-world, and allows evaluation of human performance using a framework that analyzes execution parameters during human operation. The execution data is then used to compare the capability of human agents to learn the skill sets necessary to control the robot during a novel task situation. We give an overview of the environment, as well as the experimental results comparing the performance of multiple operators learning to control a virtual robot.. Keywords: exploratory learning, human-robot interaction, virtual environment 1. Introduction The concept of exploratory learning is defined as the process of learning new skills, or competent utilization of new technology through a trail-and-error process of actual use. In real-world situations, it has been shown that human agents prefer to learn by exploration in human-computer interactions [4] and yet there has been minimal research on determining what makes an interface explorable, i.e. what interface features can increase efficiency for the human agent. This is especially true for interfaces that allow human agents to access the functionality of external robotic devices. Research in exploratory learning permeates throughout efforts focused on human-computer interaction (HCI). These efforts range from [4] in which a model is constructed of a human user employing exploratory learning techniques to learn an unfamiliar software application to [1] in which exploratory learning is utilized to learn mathematical functions using a computer. Although research in exploratory learning is present in HCI, there is minimal focus on applying it to human-robot interaction (HRI). HRI has many unique characteristics [3, 6] including the fact that, although humans are interacting with a computer system, there is also a remote device having a physical embodiment that is controlled through the interaction. This difference means that the human operator must not only provide direction to the robotic device through the interaction, but must also be capable of receiving feedback to ensure adequate control is applied. In this work, we attempt to develop a method to evaluate human-robot interfaces for increasing operator efficiency through the utilization of exploratory learning practices. We present a virtual environment that assists a human agent in developing the skills necessary to control a mobile robot, and apply our evaluation methodology to extract learning parameters for assessment of the interface. 2. Exploratory Learning for Human- Robotic Systems In [2], it was suggested that human users prefer to learn novel device usage through exploration in the context of real tasks. For robot applications, allowing the human to learn during real task implementation poses a major challenge. Typically, we desire the expertise of the human user for tasks that are usually unknown, unexpected, or uncertain. For simple or repetitive tasks, we assume that the robot has enough intelligence and capability to perform such tasks without direct human input. We therefore suggest that any type of exploratory learning for robotic devices provides sufficient training to allow a user to become more effective in implementing a new task in a novel situation than without learning. In [5], research has shown that the performance time for a task decreases with practice based on a power law (Figure 1) and that the rate and shape of improvement is fairly common across tasks. Mathematical definitions defining this learning curve are calculated as follows: t = t 0 + B(N+E) -β (1) t = BN -β (2) t = Be -αn (3) where t is the task time, B is the range of learning, N is the trial number, and α,β are learning rate parameters. Although equation (1) provides the most accurate measurement of the learning curve, its parameters are more difficult to compute. As such, equations (2)-(3) can be used to evaluate the effects

of practice in real-world scenarios. Equation (2) is more suited for determining average performance time linked to multiple users, whereas equation (3) is best fit when evaluating the performance time of individual users. It has been shown that the power curve is applicable to assessing improvements in human performance over time, regardless of the task specifics. The differences found in task complexity, novelty, and other considerations are implied by varying the learning rate parameters. 2 1.5 Time (s) 1 0.5 The Learning Curve 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 representation of the world and control of a virtual robot. The control panel allows the human operator to command the robot to move forward, backward, and turn either left or right. The graphical user interface also connects the virtual robot, viewable by the human user, to the real robot for seamless integration with the real world environment. For our application, we utilize the Sony ERS-7 robot for human interaction. Our focus is to learn how well HumAnS-3D enables a user to develop the skills necessary to control a mobile robot through implementation of exploratory learning practices. Our experimental operator test set consists of 6 novice users having no previous experience with either the user interface, or the robots themselves. The users were segmented into two groups of 3 members each: one group that would employ exploratory learning techniques and the other that would not. Each member of the exploratory learning group was allowed 2 minutes to experiment with the interface, which was interfaced directly to the physical robot. A snapshot of the instructions provided to the exploratory learning group is shown in Figure 3. Trial Number Figure 1. Average task time to perform a simple task The power curve shows that performance improves with practice and has been shown to apply to a wide range of human-computer tasks. To re-iterate, practice, in this case, is defined as implementation of a task, or variations of the same task, repeatedly over a given time period. This differs from the concept of exploratory learning in which a user may interact with the system over a given time period, but without a concrete task or goal specified for achievement. In our research, we are interested in determining whether exploratory learning can be a substitute for practice since a primary benefit of human-robot interaction tasks is to perform well in novel situations. For our application, we wish to determine how well a 3D virtual environment can be used to assist human agents in improving task performance when directly controlling a robot operating in the real-world. In effect, we wish to determine the value for the learning range (B) and the learning rate parameter (β) as defined in equation (2). We propose that by developing a methodology to extract these parameters, we can assess the usability of the interface for exploratory learning. The primary goal is to provide a systematic approach that can assist in improving human-robot interaction schemes for future applications. 3. Implementation and Analysis HumAnS-3D (Figure 2) is a 3D virtual test environment that has been developed to allow user access to a virtual Figure 2. HumAnS-3D environment showing virtual robot, which is connected to the ERS-7 robot After learning, each member of the exploratory learning group was provided a single novel task: to move the robot to a target block unit positioned within the environment and to

push it to a designated goal location (Figure 4). For the other test group, each member was provided with instructions and the same novel task requirements, but without the benefit of experimenting with the interface. From this test setup, we extract two parameters task execution time and number of commands issued to the robot. Table I depicts the average values for each group, where Trained is used as the designated label for the exploratory learning group. We also document the task execution time and average number of commands associated with an expert user in order to provide a baseline for comparison (Table II). In this case, an expert user (who was not a member of either test group and had previous experience interfacing with the robot and in task implementation) was so designated after the distribution of their task time reached the 90% confidence interval of 2.5 seconds such that: In order to extract values for the learning range B and the learning rate parameter β, we assume that the Average Task Time associated with the untrained users equates to the apex of the learning curve. Also, we equate the expert s Average Task Time to N=18 based on the estimated performance convergence time. By simultaneously solving the resulting equalities derived from equation (2), we get: 63.00 = B*(1) -β (5) 46.00 = B*(18) -β (6) which gives us the resulting learning curve equation (Figure 6): t = 63N -0.11 (7) 1.96 σ N 2.5 (4) where N is the number of trials, and σ is the standard deviation associated with those trails. We use this calculation in order to determine performance convergence for the task. It was calculated that this point occurred after completing 18 iterative trials of the box-pushing exercise (Figure 5). Figure 3. Snapshot of information provided to human agents Table I: Execution parameters for human control of ERS-7 robot Average Standard Average # Standard Task Time Deviation Commands Deviation Untrained 63.00 4.36 27.33 2.31 Trained 54.67 6.35 28.33 2.31 Table II: Execution parameters for expert HumAnS-3D user Average Task Time Average Number of Commands Expert 46.00 24 Figure 4. Screen snapshots of block pushing task From this equation, we see that the corresponding execution time corresponding to the exploratory learning process occurs at N=4. Thus, the benefit of using the virtual

environment to improve the skill set of a novice user is a reduction in task learning time by 22.2%. This reaffirms our theory that the use of exploratory learning for robotic devices provides enough training to allow a user to become more effective in implementing a new task in a novel situation. Thus, exploratory learning can be a suitable substitute for practice when training humans to interact with robots for task achievement. 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 90% Confidence Interval 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 Figure 5. Normal distribution with 90% confidence interval associated with expert user 80 70 60 50 Time (s) The Learning Curve environment that enables human agents to develop the skills necessary to control a mobile robot, and apply an evaluation methodology to extract learning parameters for the interface. It is shown that the use of exploratory learning techniques for robotic devices provides sufficient training to allow a user to become more effective in implementing a new task in a novel situation. In essence, we have determined that exploratory learning practices, as applied to our 3D virtual test environment is a substitute for practice. The primary goal of this work is to provide a systematic approach that can assist in improving human-robot interaction schemes for future applications [7]. As such, future work will utilize this infrastructure to develop a learning curve for assessment of robot performance as well. This will allow the system to adjust task roles in human-robot scenarios to increase overall system performance, by understanding the learning curve associated with individual agent performance. Future work will also apply this methodology to interaction schemes involving teams of human operators and robot agents. 5. ACKNOWLEDGEMENT This work was performed in part at the Georgia Institute of Technology and at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration and funded through the Draper Laboratory Research and Development Program. References 40 30 20 1 6 11 16 21 26 31 36 41 46 Trial Number [1] A. Bunt, C. Conati, K. Muldner, Scaffolding Self- Explanation to Improve Learning in Exploratory Learning Environments, Proceedings of 7th International Conference on Intelligent Tutoring Systems, Maceio, Brazil, 2004. Figure 6. Derived learning curve for HumAnS-3D test environment Through this analysis process, we have also provided a systematic methodology to evaluate the ability of different design interfaces to increase the efficiency of human agents interacting with robotic devices. By constructing the learning curve for different interfaces, we can directly assess the interface efficiency effect. In essence, the higher the reduction in task learning time through exploratory learning, the more effective is the interface for enabling the human agent to acquire the skills necessary for robot control. 4. Conclusions This paper discusses a method to evaluate the efficiency of human-robot interfaces for use in implementation of exploratory learning practices. We present a virtual [2] A.L. Cox, What People Learn from Exploratory Device Learning, Proceedings of the Fourth International Conference on Cognitive Modeling. Mahwah, NJ, 2001. [3] T. Fong and C. Thorpe, "Vehicle Teleoperation Interfaces," Autonomous Robots, vol. 11, no. 1, pp. 09-18, 2001. [4] J. Rieman, A Field Study of Exploratory Learning Strategies, ACM Transactions on Computer-Human Interaction, 3(3), pp. 189-218, 1996. [5] F.E. Ritter and L.J. Schooler, The learning curve, In W. Kintch, N. Smelser, P. Baltes, (Eds.), International Encyclopedia of the Social and Behavioral Sciences, Oxford, UK: Pergamon, 2001. [6] J. Scholtz, B. Antonishek, and J. Young, "Evaluation of a Human-Robot Interface: Development of a Situational

Awareness Methodology," Proceedings of the 37th Hawaii International Conference on System Sciences, 2004. [7] A.M. Howard, A Methodology to Assess Performance of Human-Robotic Systems in Achievement of Collective Tasks, 2005 IEEE/RSJ Conference on Intelligent Robots and Systems, August 2005.