The Visual Indexing Vocabulary: Developing a Thesaurus for Indexing Images Across. Diverse Domains

Similar documents
Controlled vocabulary

AQUA: An Ontology-Driven Question Answering System

Ontological spine, localization and multilingual access

Use of Online Information Resources for Knowledge Organisation in Library and Information Centres: A Case Study of CUSAT

Ontologies vs. classification systems

The OhioLINK Digital Media Center Application Profile: A New Tool for Ohio Digital Collections

Ohio s New Learning Standards: K-12 World Languages

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Unit 7 Data analysis and design

User education in libraries

Literature and the Language Arts Experiencing Literature

THEORY OF PLANNED BEHAVIOR MODEL IN ELECTRONIC LEARNING: A PILOT STUDY

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Ruggiero, V. R. (2015). The art of thinking: A guide to critical and creative thought (11th ed.). New York, NY: Longman.

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Language Acquisition Chart

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

21st CENTURY SKILLS IN 21-MINUTE LESSONS. Using Technology, Information, and Media

Cambridge NATIONALS. Creative imedia Level 1/2. UNIT R081 - Pre-Production Skills DELIVERY GUIDE

Clumps and collection description in the information environment in the UK with particular reference to Scotland

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

10.2. Behavior models

Grade 4. Common Core Adoption Process. (Unpacked Standards)

MAHATMA GANDHI KASHI VIDYAPITH Deptt. of Library and Information Science B.Lib. I.Sc. Syllabus

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Diploma in Library and Information Science (Part-Time) - SH220

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

The College Board Redesigned SAT Grade 12

Word Segmentation of Off-line Handwritten Documents

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Copyright Corwin 2015

EQuIP Review Feedback

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Grade 7. Prentice Hall. Literature, The Penguin Edition, Grade Oregon English/Language Arts Grade-Level Standards. Grade 7

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

Early Warning System Implementation Guide

Study Abroad Housing and Cultural Intelligence: Does Housing Influence the Gaining of Cultural Intelligence?

Designing e-learning materials with learning objects

Modeling user preferences and norms in context-aware systems

Strategy Study on Primary School English Game Teaching

THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY

The Ontario Curriculum

Language Arts: ( ) Instructional Syllabus. Teachers: T. Beard address

A 3D SIMULATION GAME TO PRESENT CURTAIN WALL SYSTEMS IN ARCHITECTURAL EDUCATION

Automating the E-learning Personalization

Guidelines for Writing an Internship Report

BENCHMARK TREND COMPARISON REPORT:

Learning Disability Functional Capacity Evaluation. Dear Doctor,

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

Systematic reviews in theory and practice for library and information studies

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

The ADDIE Model. Michael Molenda Indiana University DRAFT

Common Core State Standards for English Language Arts

Stimulating Techniques in Micro Teaching. Puan Ng Swee Teng Ketua Program Kursus Lanjutan U48 Kolej Sains Kesihatan Bersekutu, SAS, Ulu Kinta

Approaches to Teaching Second Language Writing Brian PALTRIDGE, The University of Sydney

Rule Learning With Negation: Issues Regarding Effectiveness

Dublin City Schools Broadcast Video I Graded Course of Study GRADES 9-12

Textbook Evalyation:

INFORMATION LITERACY SKILLS CONTINUUM Grades 6-12

Dublin City Schools Mathematics Graded Course of Study GRADE 4

MYP Language A Course Outline Year 3

5. UPPER INTERMEDIATE

CEFR Overall Illustrative English Proficiency Scales

Evidence for Reliability, Validity and Learning Effectiveness

Metadiscourse in Knowledge Building: A question about written or verbal metadiscourse

Practice Examination IREB

Visual CP Representation of Knowledge

HARPER ADAMS UNIVERSITY Programme Specification

A Case Study: News Classification Based on Term Frequency

Hiroyuki Tsunoda Tsurumi University Tsurumi, Tsurumi-ku, Yokohama , Japan

English for Specific Purposes World ISSN Issue 34, Volume 12, 2012 TITLE:

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:

EDUCATING TEACHERS FOR CULTURAL AND LINGUISTIC DIVERSITY: A MODEL FOR ALL TEACHERS

PROGRAMME SPECIFICATION UWE UWE. Taught course. JACS code. Ongoing

Full text of O L O W Science As Inquiry conference. Science as Inquiry

Speech Recognition at ICSI: Broadcast News and beyond

Instrumentation, Control & Automation Staffing. Maintenance Benchmarking Study

What the National Curriculum requires in reading at Y5 and Y6

KENTUCKY FRAMEWORK FOR TEACHING

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

Using Virtual Manipulatives to Support Teaching and Learning Mathematics

PROCEDURES FOR SELECTION OF INSTRUCTIONAL MATERIALS FOR THE SCHOOL DISTRICT OF LODI

Formative Assessment in Mathematics. Part 3: The Learner s Role

Becoming Herodotus. Objectives: Task Description: Background or Instructional Context/Curriculum Connections: Time:

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50

Loughton School s curriculum evening. 28 th February 2017

Florida Reading Endorsement Alignment Matrix Competency 1

Distributed Weather Net: Wireless Sensor Network Supported Inquiry-Based Learning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Specification of the Verity Learning Companion and Self-Assessment Tool

Student Name: OSIS#: DOB: / / School: Grade:

TEACHING Simple Tools Set II

Transcription:

The Visual Indexing Vocabulary: Developing a Thesaurus for Indexing Images Across Corinne Jorgensen School of Information Studies, Florida State University, Tallahassee, FL, Email: cjorgensen@lis.fsu.edu Diverse Domains 32306-21 00. This paper reports progress in developing a thesaurus for indexing images from diverse domains using non-specialist terminology. The foundation for the vocabulary is a series of research inquiries into user description of images in relation to different tasks. It discusses problems with current thesauri for indexing visual materials and the evolution of the vocabulary from a class project to a tool for annotation for the purpose of performing ground-truth judgments in a benchmarking system. Methods for adding terminology and structuring the thesaurus are discussed, as are considerations that relate to term specificity, and the size and structure of the thesaurus. Unique aspects of visual indexing include problems of scale in relation to meaning and the unbounded nature of visual ontology, requiring a flexible approach to developing a visual indexing vocabulary (VIV). Introduction With the development of inexpensive end-user digitization technologies, the number of collections of images stored in digital format in libraries, museums, archives, information centers, research centers, hospitals, educational institutions, newspapers, and in personal archives is growing exponentially. Institutions in government, higher education, and the library and museum community, among others, are all involved in a number of initiatives to provide networked access to these collections. The Institute of Museum and Library Services (www.imls.gov) has funded a number of collaborative digitization projects, as has the National Science Foundation through its digital library initiatives. In spite of this rapid growth in digital collections, there are many issues and unsolved problems associated with these digitization projects. Technical problems often dominate the early stages of implementation. Other issues related to the realms of collection development (for instance, criteria for the selection of materials to digitize), management (personnel and workflow), and end-user access also enter into the process. One of the most difficult, however, is the question of intellectual access, access to the visual and interpretive content of digitized images. Traditionally, intellectual access is accomplished by assigning textual indexing terms from controlled vocabulary and thesauri and employing keyword searching in descriptive fields such as title or caption. Two well-known and widely used systems for image indexing are the Library of Congress Thesaurus of Graphic Materials (LCTGM or TGM, first edition, Parker 1987) and the Art and Architecture Thesaurus (AAT, second edition, Petersen, 1994). Recent research (Jorgensen 1996; J6rgensen 1999a; Jorgensen 2003) demonstrates that these suffer from several problems when used as sources of terminology for indexing nondomain-specific image collections. The TGM contains many terms covering historical concerns such as industrial, political, and social development, and its historical focus results in a lack of much contemporary terminology, as well as overshadowing terminology that would be useful in indexing visual content. It also retains some gender and cultural biases of the Library of Congress Subject Headings, and its syndetic structure is not hlly developed, leaving many relationships incomplete and resulting in missing terms. The high degree of precoordination (e.g., empty market basket ) additionally limits the usehlness of much of the terminology. The AAT is a highly specialized controlled vocabulary for describing and retrieving information on fine art, architecture, decorative art, and material culture and was created through collaboration among scholars in these fields. The AAT is widely used in slide, drawing, and photograph collections. Lying outside the AAT s scope is the broader range of people, events, and activities which are important access points within more generalized picture collections, and research has demonstrated that only about 16% of the terms are deemed suitable in a more generalized image indexing environment (Jorgensen 1999a). Variations in generic and specific terminology also result in fiustrating omissions. While showing good consistency in construction of its hierarchies, this same hierarchical focus scatters terms as well, especially if one is seeking a generalized visual indexing language. Soergal (1995) provides a detailed review of the AAT and suggests a thorough restructuring to create a more usehl polyhierarchical network of concept relationships. Rather than using a specialized classification system, librarians and picture archivists have sometimes tried to apply an existing general classification system, such as the Dewey Decimal System or the Library of Congress Subject Headings, to an image collection. These attempts 2004 Proceedings of the 67th ASIS&TAnnual Meeting, vol. 41 287

have generally not been successful (Enser 1993; Frost 1996) leading to the conclusion that such systems provide a very sparse language for image indexing, particularly in image content description. Other libraries and archives have created their own thesauri, targeted to meet specific needs in indexing their collections. For example, the Australian Picture Thesaurus (http://www.picturethesaurus.gov.au/), contains many kms needed to index Australian visual materials. Inhouse thesauri are being developed to address indexing and searching in large commercial image collections. For instance, Corbis (Bellas 1999) uses a very large controlled vocabulary (some 25,000 terms at that time) to provide access to four different types of information describing images: factual information, depicted information, contextual information, and conceptual information. Common language is preferred, although some specialist language is sometime used. While early queries in the Corbis system were for specific items, with increasing user familiarity there seems to be an increase in abstract queries. Schroeder reports the use of a similar guiding framework with extracted text and free-text terms. However, thesauri and other tools developed within the context of commercial applications are not for public distribution. Recent efforts have focused both on expanding existing vocabularies and creating metadata structures capable of handling a variety of terms necessary for visual indexing. A number of recent metadata efforts have been stimulated by the need to share object and collection information in a networked environment, in particular admuseum image/object information. These efforts hold promise in improving methods for storing image information in general and in increasing interoperability among systems using these methods. Some of the many metadata schemes currently in existence or being proposed are the Categories for the Description of Works of Art (http://www.get@. eddgrilstandardcdw a/); the Core Categories for Visual Resources (VRA Core, Version 3.0), an adaptation of the CDWA (Visual Resources Association 2000); the Dublin Core (Weibel and Koch ZOOO), and MPEG-7, an ISO/IEC (International Standards OrganizatiodInternational Electrotechnical Commission) standard. While these schemes are designed to cany information about a visual item, much of the information is external data such as artist creator, publisher, medium, size, format, and administrative and access rights information. Information about the visual and interpretive content of an item is carried within only one or a few fields, such as subject or description, and this description is often completely unstructured, often leaving those who are searching for image content (objects, people, scenes, activities) with little data to search on. The proliferation of metadata schemes has also necessitated the creation of tools (crosswalks) to map across the various schemes (McRae and White 1998) and to provide measures of interoperability among different metadata systems. Problems associated with crosswalks include the lack of exact one-to-one mappings, resulting in some degree of imprecision and information loss. Other approaches utilize machine intelligence to parse visual syntactic elements such as color, texture, and shape (referred to as content-based retrieval or CBR); these methods currently provide a narrow set of syntactic features that can be searched (see the Hermitage Museum website at http://www.hermitagemuseum.org/). Researchers are now focusing on adding capabilities to process higher-level features using methods of object recognition, and adding intelligence to recognize certain types of visual semantics, such as indoor-outdoor, day-night, or human-non-human. Machine learning is being used to group images that are similar on a range of features that, taken together, belong to a visual semantic template. Other automated approaches involve extracting text associated with images such as captions or processing audio to capture relevant text. A challenge to all of these approaches is the context of searching is a networked and distributed environment such as the World Wide Web, as unmediated searching of these collections by end-users is becoming the norm. New user communities also increase the complexity of the search environment. The type of indexing applied to an image collection is generally designed for specific user communities or specific types of collections and is thus often composed from a specialized vocabulary for that community. However, with networked access to image collections, the user community becomes much more diversified. An artist may seek for scientific images such as optical mineralogy specimens as inspiration for abstract design, while an author or a student wishes to find an illustration communicating a particular feeling to illustrate a story or poem, and the complex nature of human image interpretation may lead to these diverse needs being satisfied by the same image. Given this context, two major problems emerge: the need for description of a wider range of image attributes for each image than those addressed by traditional systems, and a method to increase the precision of the search, as search result sets are often very large. An Image Indexing Thesaurus One of the keys to solving the problems of intellectual access to images in such a networked context could be the creation of a non-domain specific image indexing thesaurus which can permit end-user searching across a variety of domain and institutional contexts. The need for such a vocabulary has emerged from several recent projects. Researchers surveyed UK Art Libraries Association (ARLIS) members on their practices in the description and indexing of images (Graham 1999). This survey revealed that the majority of respondents used inhouse rules and that there was wide variation in the way 2004 Proceedings of the 67th ASIS&TAnnual Meeting, vol. 41 288

that images were described in terms of their content. Over 50% of respondents were not satisfied with the content indexing of their images, and the lack of appropriate indexing tools impacted this. The report concluded that existing tools for the cataloguing and indexing of images are not very satisfactory, and the majority of problems encountered in retrieval had to do with type, level, and depth of indexing. Similar problems were found in the indexing of moving images at the shot-level. Hudon et al. (200 1) report that among eleven organizations with fourteen similar moving image collections there was little overlap among the indexing vocabularies and that locallyestablished practices have little standardization and compatibility. An indexing language which permits indexing of everyday activities, events, objects, and settings could usefully expand the access points provided for the visual content of an image and enhance the consistency of indexing across collections by providing a core vocabulary which could be supplemented based upon the requirements of specific collections by the more specialized tools which currently exist. This paper reports progress in a project currently underway developing and testing a core vocabulary (thesaurus) which addresses a full range of everyday activities, events, objects, locations, and concepts, for indexing still and moving images across multiple, broad domains. This project focuses on providing a data value tool and possibly data structure tools for the as yet unstructured metadata fields that are being used for visual content description. One of the major needs in image indexing has been a solid research foundation upon which to base decisions about the design of image indexing tools and systems. A body of research conducted over the last decade by this author and others provides a foundation or the creation and testing of a generalized image indexing vocabulary (Jorgensen 1995; Jorgensen 1998). The earlier exploratory research investigated attributes typically described by diverse, naive participants in several types of tasks, using pictorial images created by professional illustrators. This research used content analysis to derive a set of approximately forty image attributes associated with image content grouped into ten broader perceptual and interpretive classes, and demonstrated variations in attribute distributions among different types of tasks (describing, sorting, and searching). Continuing research has demonstrated the robustness of the research results concerning the range and types of attributes across a wide variety of pictorial and abstract images, including scientific and informational images such as maps, charts, and graphs (Brunskill and Jorgensen 2002). While perceptual attributes such as color and objects within the image were (as would logically be expected) typically reported, the story perceived within an image emerges as an important framework for relating groups of image attributes (Jorgensen, 1998; O Connor, O Connor, and Abbas 1999). Other research focusing on image professionals searching strategies also demonstrated the need to tell a story with an image (Jorgensen and Jorgensen 2003). Building the Thesaurus A standard method for thesaurus creation is to gather terms used in the domain from a variety of sources, and then to edit and organize these terms into an initial structure. As work continues, experts may be consulted, and term acquisition and editing continue in parallel. The initial work on the visual indexing vocabulary (VIV) took place during two previous projects creating an Image Testbed Prototype (Jorgensen and Srihari 1999b; Jorgensen and Jorgensen 2002). Terms were gathered by students in an indexing and abstracting class using existing thesauri, subject dictionaries, data from the author s research in image description, and real world observation. For instance, one group used the following sources, among others: the Cambridge Encyclopedia of Ornithology, the Dictionary of Greek and Roman Mythology, the Thesaurus of ERIC Descriptors, Duden pictorial dictionaries, the Thesaurus of Graphic Materials I, the Occupational Outlook Handbook, the Dictionary of Gods & Goddesses, Devils & Demons, the National Audubon Society Field Guide to North American Mammals, and http:l/www. careersonline.com.au. Decisions about what terms to include were made using a framework modeled after the major attribute categories and attributes in (Jorgensen, 1998). The vocabulary facets were developed based on the attribute groups developed in the author s previous research, and the facets contained about 3.500 terms. These categories were later modified, based on further research, into a hierarchical structure (Jorgensen et al. 2001) and used to further develop the indexing language to assist in performing image similarity judgments ( ground-truth measures) for the testbed. This phase of vocabulary development was done by a Graduate Assistant, again using a variety of tools but at this stage primarily relying on real-world observation and interaction. The researcher has currently just completed major revisions of these earlier versions, necessary as term uniformity and specificity varied considerably across the facets, and there were considerable organizational problems (not a surprising result, considering that the first several versions were developed by about thirty participants). These revisions have resulted in a vocabulary of about 6,500 terms. Theoretical and Pragmatic Questions At this stage of the research, a number of interesting theoretical and related pragmatic questions have arisen; these questions point to the future direction of vocabulary development and guidelines for using the thesaurus. 2004 Proceedings of the 67th ASIS&T Annual Meeting, vol, 41 289

Thesaurus Structure As noted above, two theoretical frameworks for structuring this vocabulary had been employed, a faceted structure based upon attribute categories (discussed below under Terminol~gy ~) fiom the author s previous research (a deductive approach), and a purely hierarchical structure (a pyramid derived from levels of knowledge necessary to generate features at varying levels of abstraction) based upon research related to visual processing, cognitive categories and class inclusion (an inductive approach) (Jbrgensen et al. 2001). The faceted structure mirrored some recent suggestions for image indexing fiameworks (e.g. Shatford-Layne 1994; Armitage and Enser 1997), while the hierarchical structure was proposed to better meet the requirements of machine processing (for instance, in facilitating the indexing of image regions and sub-regions), yet neither structure provided a particularly satisfactory solution to the practical requirements of visual indexing. Term scattering was a major problem with the hierarchical pyramid, and the structure in some cases resulted in duplication of terms. The faceted structure derived fiom a set of approximately forty image features also suffered fiom term scattering resulting from the fine level of granularity. A thesaurus, by its structure, encapsulates and demonstrates relationships among terms. A hierarchical structure often poses conceptual problems, as when a term has two equally appropriate broader terms (the AAT has adopted a faceted approach to address this problem). A problem for the current project is balancing the faceted approach suggested by the previous research results with a hierarchical approach that will provide enough structure to convey relationships among terms. The thesaurus was reorganized at the highest level based upon a world view organizing the image attributes into larger categories that are more readily familiar, such as natural and manmade objects, living things, activities and events, and abstract concepts such as object properties, time, and themes. The top-level structure is as follows, each with a few examples of second-level headings: I. 11. 111. IV. V. Image Type/Technique (Type, Technique, Media, Genre, Style) Visual Elements (Color, Shape, Texture, Spatial Placement, Perspective, Lighting) Natural Objects, Phenomena, Processes (Natural Materials, Landforms, Water Forms, Natural Phenomena) Living Beings (Human, Imaginary Beings, Animals, Plants) Produced Objects (Arts, Decorations, Electronics, Food, Magic, Personal Items) VI. Manmade Environments (Buildings and Structures, Spaces, Population Centers) VII. Narrative Elements (Activities and Actions, Events, Occupation, Role, EmotionMental State) VIII. Abstract Concepts (Theme, Atmosphere, Genre, Symbolism, Properties) There are forty-eight top-level headings and 782 second-level headings; ninety-percent of the thesaurus goes no deeper than three levels, although there are a few instances where terms go to a sixth and seventh level. This largely shallow structure is by design, and results from the use of non-specialist terminology and the eight major facets. Terminology Indexing the visual content of materials requires a broader range of attributes than those that have been incorporated in existing thesauri. The thesaurus has been built relying on the range of attributes revealed in the author s previous research asking participants to describe images in the context of several types of tasks (Jbrgensen 1995; Jirrgensen 1999b; Jorgensen 2003). The broad categories are PERCEPTUAL ELEMENTS, COLOR, PLACEMENT, OBJECTS, LIVING THINGS, STATES OF BEING, STORY, ABSTRACT CONCEPTS, and ART HISTORICAL ELEMENTS. Objects, Living Things, and Story and story-related attributes such as States of Being and Abstract concepts, as well as Color, were more typically reported. Thus, the terminology has been especially focused on gathering useful terms in these areas. As the AAT already contains many terms associated with the art-historical aspects of visual materials, the terms included for our purposes are related to those aspects are those most basic and familiar to a general audience. The author s research also shows the s~ot~~ of the image as typically described, and therefore useful in a retrieval context, especially when images are being sought to convey a particular affective or emotive quality. Within a story, the event or activity can be thought of as an envelope encapsulating associated attributes such as who, when, what, and where. Previous research with the TGM I (Gordon 1998) demonstrated the utility of organizing thesaurus terms around such a construct; however this research also demonstrated that the vocabulary of the TGM I limited this approach. In contrast to Gordon s research, which was concerned with using an activitybased schema in a software interface for browsing, the concept of activity is used as a key theoretical framework within which to continue to build the indexing vocabulary. For this work, visual thesauri, such as the Duden dictionaries which depict objects and people within common settings, activities, and scenes (such as a visit to the dentist s office) are being used as reference 2004 Proceedings of the 67th ASlS&T Annual Meeting, vol. 41 290

sources, as will other visual dictionaries such as the Ultimate Visual Dictionaty (DK Publishing 1994). The Longman Dictionary of Contemporary English, with its emphasis on commonly used terms, is another useful resource, as are children s books such as the Simon & Schuster Thesaurus for Children (200 1). Another important step is determining the relationship of the vocabulary to other existing resources and providing guidelines for supplementing the vocabulary with more specialized terms in particular areas. In relation to specific types of attributes, the author s research suggests that some of the attribute types call for a different approach in representation than that currently taken by other existing thesauri and indexing tools. For instance, the AAT contains extensive color terms, but research with describing tasks demonstrates that people use only a limited number of terms to describe color (Jorgensen, 1998). This is in accordance with previous cognitive and anthropological research (Berlin and Kay 1969) demonstrating that, across cultures, basic color naming takes place in a very limited way (with less than twenty color terms commonly used). Additionally, content-based processing methods can be used to provide color composition information. Similarly, in order to meet the goal of providing terms to adequately describe the narrative and emotive aspects of images, a good number of terms pertaining to emotions, mental states, and relationships are provided. However, humans can recognize only six basic emotions (anger, fear, happiness, sadness, disgust, and surprise) with some degree of accuracy (Ekman and Friesen 1975; Katsikitis 1997). In the absence of more specific information regarding the emotional content of a scene in an image, the indexer could more comfortably limit the indexing to one of these six. Term Specificity Research into naming of everyday objects has demonstrated that there is a ba~ic-level ~ used to identify items (Rosch et al. 1976). This basic-level is neither the most general nor the most specific (e.g. apple rather than fruit or Red Delicious ), Basic-level concepts are categorized faster, are used almost exclusively in freenaming tasks, are learned sooner than other types of concepts, and are employed similarly across different cultures. The author s research demonstrated that approximately 80% of terms used to describe images were at the basic-level (Jtirgensen 1995). However, what constitutes a basic-level term can vary with a person s level of expertise and with the context of naming (in a group of animals, a bird will be probably be called a bird, but in a group of birds, they may be named individually e.g. robin, blue jay etc.). One question that remains to be answered is how to achieve the most desirable ratio of generic to specific terms in a generalized vocabulary. The current thesaurus relies heavily on the notion of basic level objects, as the overall shallow hierarchical structure would suggest. The researcher is also drawing upon experience developing an end-user product taxonomy for an Internet-based business. Another insight from cognitive categorization that relates to this question is the notion of exemplars, members sharing certain sets of features that make them typical members of a category. The question of interest here is the number of typical and less-typical members that need to be listed in any given category in order for the category to be generally useful. Given that, in many cases, supplementary vocabularies exist, what is the most useful balance between inclusion of a greater number of more specific terms and a paring down to a more commonly used vocabulary? Use of several thesauri for indexing suggests that a generalized indexing vocabulary of a manageable size can be created which can be extended through use of supplementary vocabularies. In contrast to the notion of exemplars is the notion of anomaly. The author s research demonstrated that people will more frequently describe objects that appear atypical or occur in unexpected settings. This is also supported by research in cognitive psychology (Pezdek et al. 1989)). A number of visual representation methods can be used to create these atypical objects, such as distortion, unusual size, color, or position, as well as schema violations, and vocabulary is necessary to represent these variations in objects in an efficient way. Size of the Thesaurus Current thesauri vary in their size, from small (approximately 6300 for the TGM r) to very large (approximately 120,000 for the AT). One current project proposes a 1000 Concept Ontology for supplementing content-based descriptors in visual indexing, modeled after ontology development work using a top-level ontology of this size. A smaller vocabulary is easier to learn and apply but may be lacking in adequate terminology. A larger vocabulary, with its greater number of terms, may enable finer distinctions but also requires far more expertise (and more time and training) in its application. One question to be answered in this research is the ideal size for a thesaurus applicable to describing visual materials from a broad number of domains. Is it possible to construct a thesaurus that contains enough terms to be useful in describing everyday objects, activities, events, and other frequently occurring concepts without becoming overly complex? Based upon work done to date, the target size for the vocabulary is estimated to be around 10,000 terms. A related concern is the extent to which term formation requirements in existing standards for thesaurus construction (National Information Standards Institute 1994) match those terms used most frequently by endusers to describe objects. For instance, in a thesaurus, 2004 Proceedings of the 67th ASIS&T Annual Meeting, vol. 41 29 1

nouns are usually given in plural form, while users generally ask for what they want, e.g., a picture of an apple rather than apples. Additionally, adjectives are not considered as thesaurus terms, while searchers may request a picture of a large apple or describe the desired image as simply happy. These accepted constraints upon vocabulary formation can have an impact on precision in indexing. This thesaurus adopts the single form of the word as the standard entry term, which clarifies the textual description of visual content. The current thesaurus also contains adjectives in certain areas such as themes or emotional state. Unique Aspects of Visual Indexing There are a number of other interesting conceptual problems that arise in creating and structuring vocabulary for visual indexing, and many of the relational constraints that operate in the real world do not always apply. For instance, the question of scale in visual depiction must not only be represented but must be interpreted in the visual and narrative context of the depicted material. In a narrative sense, it becomes less easy to identify objects solely as objects when they can also knction as Lenvironments y or settings for action. Some examples would be an image of an aircraft carrier with much activity on deck or, in a children s book, the old woman who lived in a shoe. The scale of depiction and the function of an object can narrow down retrieval sets that may otherwise be large, if this information is incorporated into the indexing. While the discussion of adding this kind of structure to an indexing record is beyond the scope of the current vocabulary building effort, the vocabulary itself suggests that development of associated structures could facilitate more complete and nuanced visual description. Similarly, as visual materials can depict anything that the human imagination can produce, the world of visual ontology is largely unbounded: a teapot can be happy, pigs can deposit money in a bank, and mountains can float in the sky. Physical laws do not operate and narrative structure can be freely violated. What this means in relation to a typical thesaurus structure is that the universe of related terms becomes open-ended, and hierarchy must largely be used only as a simple organizing principle for grouping terms and not as communicating the essential structure of the visual world. Conclusion As noted above, the thesaurus is still being built. Gaps in the vocabulary are being assessed, terms are being added, and the structure is being refined. The work will also continue by having groups use the thesaurus for indexing various visual materials and utilizing their comments and findings to refine the terminology and structure and assess the ease of use and complimentarity of this vocabulary to other existing visual thesauri in a number of domains. This research hopes to answer the fundamental question as to whether it is possible to create a generalized vocabulary useful for addressing frequently described image attributes, and for addressing indexer needs in vocabulary choice and applicability. REFERENCES Armitage, Linda H., and Peter G. B. Enser (1997). Analysis of user need in image archives. Journal of Information Science 23(4): 287-299. Bellas, Elizabeth. Panel on Subject Access to Visual Images: The Online Horizon from Where We Stand. A paper delivered at the Art Libraries Society of North America, 27th Annual Conference, Vancouver, BC, 1999. Berlin, B., and P. Kay. Basic Color Terms. Berkeley: University of California Press, 1969. Brunskill, Jeff, and Corinne Jbrgensen. Image Attributes: A Study of Scientific Diagrams. Proceedings of the Annual Meeting of the American Society For Information Science, Philadelphia, PA, Nov. 20, 2002, 365-375. DK Ultimate Visual Dictionary. Dorling Kindersley, Inc. 1st American ed. London ; New York: Dorling Kindersley, 1994. Ekman, P., and W. V. Friesen (1975). Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues. Englewood Cliffs, N.J.: Prentice-Hall. Enser, P. G. B. Query analysis in a visual information retrieval context. Journal of Document and Text Management 1, no. 1 (1993): 25-52. Frost, Olivia. The University of Michigan School of Information Art Image Browser: Designing and Testing a Model for Image Retrieval. In Knowledge Organization and Change, ed. Rebecca Green, 5, 182-188. Frankfurt/Main: Indeks Verlag, 1996. Gordon, Andrew S. The design of knowledge-rich browsing interfaces for retrieval in digital libraries. Ph.D., Northwestern University, 1998. Graham, Margaret E. The description and indexing of images: Report of a survey of ARLIS members, 1998199. Newcastle: University of Northumbria, 1999. Hudon, Michele, James Turner, and Yves Devin (2001). Description et Indexation des Collections D images en Mouvement: Rksultats D une Enquete. Documentation et bibliothiques 47, no. 1,5-12. Jbrgensen, Corinne. Image Attributes: An Investigation. Ph.D., Syracuse University, 1995. Jbrgensen, Corinne (1996). The Applicability of Existing Classification Systems to Image Attributes: A Selected Review. In Knowledge Organization and Change, ed. Rebecca Green, 5, 189-197. Frankfurt/Main: Indeks Verlag,. Jtirgensen, Corinne. Image attributes in describing tasks: an investigation. Information Processing & Management 34, no. 2/3 (1998): 161-174. Jbrgensen, Corinne. Image Indexing: An Analysis of Selected Classification Systems in Relation to Image Attributes Named by Naive Users: OCLC Library and Information Science Research Grant Program. Research Report, Annual Review of OCLC Research 1999a. 2004 Proceedings of the 67th ASISdET Annual Meeting, vol. 41 292

Jorgensen, Corinne (1999b). Retrieving the Unretrievable: Art, Aesthetics, and Emotion in Image Retrieval Systems. Human Vision and Electronic Imaging IV, San Jose CA. Jorgensen, Corinne. Image Retrieval: Theory and Research (2003). Lanham MD: Scarecrow Press. Jorgensen, Corinne, and Rohini Srihari. Creating a Web-based image database for benchmarking image retrieval systems: A progress report. A paper delivered at the Human Vision and Electronic Imaging IV, San Jose CA, 1999b. Jorgensen, Corinne and Peter Jtlrgensen (2002). Testing a Vocabulary for Image Indexing and Ground Truthing. Internet Imaging 111, Jan. 22, San Jose, Ca. Proceedings SPIE (International Society for Optical Engineering) v. 4672, Giordano B. Beretta and Raimondo Schettini, eds., 212-215. Jorgensen, Corinne and Peter JRgensen (2003). Inage querying by image professionals. Proceedings of the American Society For Information Science and Technology (ASIST) 2003 Annual Meeting, Long Beach, California, October 21,349-356. Jorgensen, Corinne, Alejandro Jaimes, Ana Benitez, Shih-Fu Chang (2001). A conceptual framework and empirical research for classifying visual descriptors. Journal of the American Society for Information Science and Technology 52(11): 938-947. Katsikitis, Mary. The classification of facial expressions of emotion: A multidimensional-scaling approach. Perception 26, no. 5 (1997): 613-626. McRae, Linda, and Lynda S. White, eds. ArtMRC Sourcebook: Cataloging Art, Architecture, and their Visual Images. Chicago: American Library Association, 1998. National Information Standards Institute. American National Standard Guidelines for the Construction, Format, and Management of Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. (ANSI/NISO 239.19-1993). O Connor, Brian C., Mary K. O Connor, June M. Abbas, User reactions as access mechanism: An exploration based upon captions for images. Journal of the American Society For Information Science 50(8): 681-697. The Oxford-Duden pictorial Hungarian-English dictionary (1994). Hungarian text edited by Laszlo Anyos, with the assistance of the staff members at Akademiai Kiado, Budapest; English text edited by John Pheby et al. Oxford; New York: Oxford University Press. Parker, Elizabeth Betz. LC Thesaurus for Graphic Materials: Topical Terms for Subject Access. Translated by with an introduction by Jackie M. Dooley. Washington, D. C.: Library of Congress, 1987. Petersen, Toni. Art and Architecture Thesaurus. 2nd ed. New York: Oxford University Press, 1994. Pezdek, Kathy, Tony Whetstone, Kirk Reynolds, Nusha Askari, and T. Dougherty (1 989). Memory for real-world scenes: The role of consistency with schema expectation. Journal of Ejcperimental Psychology: Learning, Memory, and Cognition 15(4): 587-595. Rosch, E., C. B. Mervis, W. Gray, D. Johnson, and P. Boyes- Braem. Basic objects in natural categories. Cognitive P~chologv 8 (1976): 382-439. Shatford-Layne, S. S. (1994). Some issues in the indexing of images. Journal of the American Society for Information Science 45(8): 583-588. Simon & Schuster Thesaurus for Children (2001). Latimer, Jonathan P. and Karen S. Nolting, eds. New York: Simon & Schuster, Inc. Soergel, Dagobert. The Art and Architecture Thesaurus (AAT): A critical appraisal. Visual Resources 10 (1995): 369-400. Visual Resources Association Data Standards Committee. VRA Core Categories, Version 3.0. 2000. Accessed 12/12 2000. Available from http://www.gsd.harvard.edu/-staffaw3lvrd vracore3.htm. Weibel, Stuart L., and Traugott Koch. The Dublin Core Metadata Initiative: Mission, Current Activities, and Future Directions. D-Lib Magazine 6( 12) (2000). 2004 Proceedings of the 67th ASIS& T Annual Meeting, vol. 4 I 293