The Robot Vision Track at ImageCLEF 2010

Similar documents
A Case Study: News Classification Based on Term Frequency

Postprint.

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Feature-oriented vs. Needs-oriented Product Access for Non-Expert Online Shoppers

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Welcome to. ECML/PKDD 2004 Community meeting

Linking Task: Identifying authors and book titles in verbose queries

Word Segmentation of Off-line Handwritten Documents

AQUA: An Ontology-Driven Question Answering System

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

CEF, oral assessment and autonomous learning in daily college practice

Online Marking of Essay-type Assignments

THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY

InTraServ. Dissemination Plan INFORMATION SOCIETY TECHNOLOGIES (IST) PROGRAMME. Intelligent Training Service for Management Training in SMEs

A Case-Based Approach To Imitation Learning in Robotic Agents

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Automating the E-learning Personalization

WELCOME WEBBASED E-LEARNING FOR SME AND CRAFTSMEN OF MODERN EUROPE

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Using dialogue context to improve parsing performance in dialogue systems

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Challenges for Higher Education in Europe: Socio-economic and Political Transformations

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Laboratorio di Intelligenza Artificiale e Robotica

A Grammar for Battle Management Language

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

GALICIAN TEACHERS PERCEPTIONS ON THE USABILITY AND USEFULNESS OF THE ODS PORTAL

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Oakland Unified School District English/ Language Arts Course Syllabus

How to Develop and Evaluate an etourism MOOC: An Experience in Progress

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Rule Learning With Negation: Issues Regarding Effectiveness

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50

Best Practices in Internet Ministry Released November 7, 2008

The recognition, evaluation and accreditation of European Postgraduate Programmes.

CS Machine Learning

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

Introduction Research Teaching Cooperation Faculties. University of Oulu

Coimisiún na Scrúduithe Stáit State Examinations Commission LEAVING CERTIFICATE 2008 MARKING SCHEME GEOGRAPHY HIGHER LEVEL

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Towards a Collaboration Framework for Selection of ICT Tools

PROCEEDINGS OF SPIE. Double degree master program: Optical Design

Operational Knowledge Management: a way to manage competence

Europeana Creative. Bringing Cultural Heritage Institutions and Creative Industries Europeana Day, April 11, 2014 Zagreb

Preprint.

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Language Independent Passage Retrieval for Question Answering

WP 2: Project Quality Assurance. Quality Manual

PUBLIC CASE REPORT Use of the GeoGebra software at upper secondary school

Create Quiz Questions

GREAT Britain: Film Brief

Proceedings of the Federated Conference on Computer Science DOI: /2016F560 and Information Systems pp ACSIS, Vol. 8.

A Graph Based Authorship Identification Approach

Competition in Information Technology: an Informal Learning

Rule-based Expert Systems

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Your School and You. Guide for Administrators

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

A virtual surveying fieldcourse for traversing

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

BUILD-IT: Intuitive plant layout mediated by natural interaction

UCEAS: User-centred Evaluations of Adaptive Systems

Specification of the Verity Learning Companion and Self-Assessment Tool

international PROJECTS MOSCOW

National Longitudinal Study of Adolescent Health. Wave III Education Data

Language-Specific Patterns in Danish and Zapotec Children s Comprehension of Spatial Grams

Course Content Concepts

On-Line Data Analytics

Overall student visa trends June 2017

Intelligent Agents. Chapter 2. Chapter 2 1

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

SOCRATES PROGRAMME GUIDELINES FOR APPLICANTS

Laboratorio di Intelligenza Artificiale e Robotica

Curriculum Vitae Susanne E. Baumgartner

Evolution of Collective Commitment during Teamwork

Advances in Aviation Management Education

ICTCM 28th International Conference on Technology in Collegiate Mathematics

PROJECT PERIODIC REPORT

Pharmaceutical Medicine as a Specialised Discipline of Medicine

What, Why and How? Past, Present and Future! Gudrun Wicander

Innovative elearning Tool for Quality Training Material in VET. Dr. László Komáromi SZÁMALK / Dennis Gabor Univ.

D.10.7 Dissemination Conference - Conference Minutes

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

STAGE-STE PROJECT Presentation of University of Seville (Partner 44)

Lecture 10: Reinforcement Learning

National Pre Analysis Report. Republic of MACEDONIA. Goce Delcev University Stip

The International Coach Federation (ICF) Global Consumer Awareness Study

On the Combined Behavior of Autonomous Resource Management Agents

Developing a concrete-pictorial-abstract model for negative number arithmetic

Agent-Based Software Engineering

Truth Inference in Crowdsourcing: Is the Problem Solved?

Summary BEACON Project IST-FP

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY?

Transcription:

The Robot Vision Track at ImageCLEF 2010 Andrzej Pronobis 1, Marco Fornoni 3, Henrik I. Christensen 2, and Barbara Caputo 3 1 Centre for Autonomous Systems, The Royal Institute of Technology, Stockholm, Sweden {pronobis}@kth.se 2 Georgia Institute of Technology, Atlanta, GA, USA {hic}@cc.gatech.edu 3 Idiap Research Institute, Martigny, Switzerland {mfornoni,bcaputo}@idiap.ch http://www.imageclef.org/2010/robot Abstract. This paper describes the robot vision track that has been proposed to the ImageCLEF 2010 participants. The track addressed the problem of visual place classification, with a special focus on generalization. Participants were asked to classify rooms and areas of an office environment on the basis of image sequences captured by a stereo camera mounted on a mobile robot, under varying illumination conditions. The algorithms proposed by the participants had to answer the question where are you? (I am in the kitchen, in the corridor, etc) when presented with a test sequence, acquired within the same building but at a different floor than the training sequence. The test data contained images of rooms seen during training, or additional rooms that were not imaged in the training sequence. The participants were asked to solve the problem separately for each test image (obligatory task). Additionally, results could also be reported for algorithms exploiting the temporal continuity of the image sequences (optional task). A total of seven groups participated to the challenge, with 42 runs submitted to the obligatory task, and 13 submitted to the optional task. The best result in the obligatory task was obtained by the Computer Vision and Geometry Laboratory, ETHZ, Switzerland, with an overall score of 677. The best result in the optional task was obtained by the Idiap Research Institute, Martigny, Switzerland, with an overall score of 2052. Keywords Place recognition, robot vision, robot localization We would like to thank the CLEF campaign for supporting the ImageCLEF initiative. B. Caputo was supported by the EMMA project, funded by the Hasler foundation. M. Fornoni was supported by the MULTI project, funded by the Swiss National Science Foundation. A. Pronobis was supported by the EU FP7 project ICT-215181-CogX. The support is gratefully acknowledged.

2 A. Pronobis, H. I. Christensen, B. Caputo 1 Introduction ImageCLEF 4 [1 3] started in 2003 as part of the Cross Language Evaluation Forum (CLEF 5, [4]). Its main goal has been to promote research on multi-modal data annotation and information retrieval, in various application fields. As such it has always contained visual, textual and other modalities, mixed tasks and several sub tracks. The robot vision track has been proposed to the ImageCLEF participants for the first time in 2009. The track attracted a considerable attention, with 19 inscribed research groups, 7 groups eventually participating and a total of 27 submitted runs. The track addressed the problem of visual place recognition applied to robot topological localization. The second edition of the challenge was held in conjunction with ICPR 2010 and saw an increase in participation, with 9 participating groups and 34 submitted runs. As for previous events, in 2010 the challenge addressed the problem of visual place classification, this time with a special focus on generalization. Participants were asked to classify rooms and functional areas on the basis of image sequences, captured by a stereo camera mounted on a mobile robot within an office environment. The test sequence was acquired within the same building but at a different floor than the training sequence. It contained rooms of the same categorical type (corridor, office, bathroom), and it also contained room categories not seen in the training sequence (meeting room, library). The system built by participants had to be able to answer the question where are you? when presented with a test sequence imaging a room category seen during training, and it had to be able to answer I do not know this category when presented with a new room category. We received a total of 55 submissions, of which 42 were submitted to the obligatory task and 13 to the optional task. The best result in the obligatory task was obtained by the Computer Vision and Geometry Laboratory, ETHZ, Switzerland. The best result in the optional task was obtained by the Idiap Research Institute, Martigny, Switzerland. This paper provides an overview of the robot vision track and reports on the runs submitted by the participants. First, details concerning the setup of the robot vision track are given in Section 2. Then, Section 3 presents the participants and Section 4 provides the ranking of the obtained results. Conclusions are drawn in Section 5. Additional information about the task and on how to participate in the future robot vision challenges can be found on the ImageCLEF web pages. 2 The RobotVision Track This section describes the details concerning the setup of the robot vision track. Section 2.1 describes the dataset used. Section 2.2 gives details on the tasks 4 http://www.imageclef.org/ 5 http://www.clef-campaign.org/

Overview of the robot vision track 3 proposed to the participants. Finally, section 2.3 describes briefly the algorithm used for obtaining a ground truth and the evaluation procedure. 2.1 Dataset The image sequences used for the contest are taken from the previously unreleased COLD-Stockholm database. The sequences were acquired using the MobileRobots PowerBot robot platform equipped with a stereo camera system consisting of two Prosilica GC1380C cameras. The acquisition was performed on three different floors of an office environment, consisting of 36 areas (usually corresponding to separate rooms) belonging to 12 different semantic and functional categories. The robot was manually driven through the environment while continuously acquiring images at a rate of 5fps. Each data sample was then labeled as belonging to one of the areas according to the position of the robot during acquisition (rather than contents of the images). Three sequences were selected for the contest: a training sequence, a sequence that should be used for validation and a sequence for testing: training sequence: Sequence acquired in 11 areas, on the 6th floor of the office building, during the day, under cloudy weather. The robot was driven through the environment following a similar path as for the test and validation sequences and the environment was observed from many different viewpoints (the robot was positioned at multiple points and performed 360 degree turns). validation sequence: Sequence acquired in 11 areas, on the 5th floor of the office building, during the day, under cloudy weather. Similar path was followed as for the training sequence; however without making the 360 degree turns. testing sequence - Sequence acquired in 14 areas, on the 7th floor of the office building, during the day, under cloudy weather. The robot followed similar path as in case of the validation sequence. 2.2 The Task Participants were given training data consisting of a sequence of stereo images. The training sequence was recorded using a mobile robot that was manually driven through several rooms of a typical indoor office environment. The acquisition was performed under fixed illumination conditions and at a given time. Each image in the training sequence was labeled and assigned to an ID and a semantic category of the area (usually a room) in which it was acquired. The challenge was to build a system able to answer the question where are you? (I m in the kitchen, in the corridor, etc.) when presented with test sequence containing images acquired in a different environment (different floor of the same building) containing areas belonging to the semantic categories observed previously (present in the training sequence) or to new semantic categories (not

4 A. Pronobis, H. I. Christensen, B. Caputo imaged in the training sequence). The test images were acquired under similar illumination settings as the training data, but in a different office environment. The system should assign each test image to one of the semantic categories of the areas that were present in the training sequence or indicate that the image belongs to an unknown semantic category not included during training. Moreover, the system could refrain from making a decision (e.g. in the case of lack of confidence). We considered two separate tasks, task 1 (obligatory) and task 2 (optional). In task 1, the algorithm had to be able to provide information about the location of the robot separately for each test image, without relying on information contained in any other image (e.g. when only some of the images from the test sequences are available or the sequences are scrambled). In task 2, the algorithm was allowed to exploit the continuity of the sequences and rely on the test images acquired before the classified image (images acquired after the classified image could not be used). The same training, validation and testing sequences were used for both tasks. The reported results were compared separately. The competition started with the release of annotated training and validation data. Moreover, the participants were given a tool for evaluating performance of their algorithms. The test image sequences were released later. The test sequences were acquired in a different environment than the training and validation sequences (one more floor of the same building), under similar conditions, and contained additional rooms belonging to semantic categories that were not imaged previously. The algorithms trained on the training sequence will be used to annotate each of the test images. The same tools and procedure as for the validation were used to evaluate and compare the performance of each method during testing. 2.3 Ground Truth and Evaluation The image sequences used in the competition were annotated with ground truth. The annotations of the training and validation sequences were available to the participants, while the ground truth for the test sequence was released after the results were announced. Each image in the sequences was labelled according to the position of the robot during acquisition as belonging to one of the rooms used for training or as an unknown room. The ground truth was then used to calculate a score indicating the performance of an algorithm on the test sequence. The following rules were used when calculating the overall score for the whole test sequence: 1 point was granted for each correctly classified image belonging to one of the known categories. 1 points was subtracted for each misclassified image belonging to one of the known or unknown categories. No points were granted or subtracted if an image was not classified (the algorithm refrained from the decision). 2 points were granted for a correct detection of a sample belonging to an unknown category (true positive).

Overview of the robot vision track 5 # Group Score 1 CVG 677 2 Idiap MULTI 662 3 NUDT 638 4 Centro Gustavo Stefanini 253 5 CAOR 62 6 DYNILSIS -20 7 UAIC2010-77 Table 1. Results obtained by each group in the obligatory task. # Group Score 1 Idiap MULTI 2052 2 CAOR 62 3 DYNILSIS -67 Table 2. Results obtained by each group in the optional task. 2.0 points were subtracted for an incorrect detection of a sample belonging to an unknown category (false positive). A script was available to the participants that automatically calculated the score for a specified test sequence given the classification results produced by an algorithm. 3 Participation In 2010, 7 groups participated to the Robot Vision task, namely: CVG: Computer Vision and Geometry laboratory, ETH Zurich, Switzerland; Idiap-MULTI: The Idiap Research Institute, Martigny, Switzerland; NUDT: Department of Automatic Control, College of Mechatronics and Automation, National University of Defense Technology, Changsha,China. Centro Gustavo Stefanini, La Spezia, Italy. CAOR, France. DYNILSIS: Univ. Sud Toulon Var R229-BP20132-83957 La Garde CEDEX, France. UAIC2010: Facultatea de Informatica, Universitatea Al. I. Cuza, Romania. A total of 55 runs were submitted, with 42 runs submitted to the obligatory task and 13 runs submitted to the optional task. In order to encourage participation, there was no limit to the number of runs that each group could submit. 4 Results This section presents the results of the robot vision track of ImageCLEF 2010. Table 1 shows the results for the obligatory task, while Table 2 shows the result

6 A. Pronobis, H. I. Christensen, B. Caputo for the optional task. Scores are presented for each of the submitted runs that complied with the rules of the contest. We see that the majority of runs were submitted to the obligatory task. A possible explanation is that the optional task requires a higher expertise in robotics that the obligatory task, which therefore represents a very good entry point. The same behavior was noted at the other editions of the robot vision task. These results indicate quite clearly that the capability to recognize visually a place under different viewpoints is still an open challenge for mobile robots. This is a strong motivations towards proposing similar tasks to the community in the future editions of the robot vision task. 5 Conclusions The robot vision task at ImageCLEF@ICPR2010 attracted a considerable attention and proved an interesting complement to the existing tasks. The approach presented by the participating groups were diverse and original, offering a fresh take on the topological localization problem. We plan to continue the task in the next years, proposing new challenges to the participants. In particular, we plan to focus on the problem of place categorization and use objects as an important source of information about the environment. References 1. Clough, P., Müller, H., Deselaers, T., Grubinger, M., Lehmann, T.M., Jensen, J., Hersh, W.: The CLEF 2005 cross language image retrieval track. In: Cross Language Evaluation Forum (CLEF 2005). Springer Lecture Notes in Computer Science (September 2006) 535 557 2. Clough, P., Müller, H., Sanderson, M.: The CLEF cross language image retrieval track (ImageCLEF) 2004. In Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B., eds.: Multilingual Information Access for Text, Speech and Images: Result of the fifth CLEF evaluation campaign. Volume 3491 of Lecture Notes in Computer Science (LNCS)., Bath, UK, Springer (2005) 597 613 3. Müller, H., Deselaers, T., Kim, E., Kalpathy-Cramer, J., Deserno, T.M., Clough, P., Hersh, W.: Overview of the ImageCLEFmed 2007 medical retrieval and annotation tasks. In: CLEF 2007 Proceedings. Volume 5152 of Lecture Notes in Computer Science (LNCS)., Budapest, Hungary, Springer (2008) 473 491 4. Savoy, J.: Report on CLEF 2001 experiments. In: Report on the CLEF Conference 2001 (Cross Language Evaluation Forum), Darmstadt, Germany, Springer LNCS 2406 (2002) 27 43