Indexing languages 6.2.2. Controlled vocabulary Overview Anyone who has struggled to find the exact search term to retrieve information about a certain subject can benefit from controlled vocabulary. Controlled vocabulary allows just one term, spelled one way, to represent a given concept. It is an indexing solution to problems stemming from the ambiguity of natural language that tend to result in imprecise and incomplete retrieval. A controlled vocabulary is a set of authorized (standardized) terms. Most controlled vocabularies represent subjects and are listed in subject authority files called thesauri or subject headings lists. An indexer or cataloger chooses controlled-vocabulary terms from a particular authority file and assigns them to a controlled-vocabulary field in the metadata record. A searcher should also consult the authority file to find terms for searching the controlled-vocabulary field. This module explains how a controlled vocabulary works. It describes three kinds of indexing problems, then shows how controlled vocabulary provides solutions. The problems are: 1. Naming single concepts: What is the best term for a given concept? 2. Showing relationships among single concepts: What concept is related to a given concept? 3. Showing relationships among multiple concepts: What if the subject of a document contains two concepts? 1. Naming Single Concepts Problems What is the best term for a given concept? How does one choose among variant word forms for a concept?
INFO 5200 / Controlled vocabulary / p. 2 Solutions The creator of the controlled vocabulary addresses these problems based on understandings of the users and the collection. The "best" term for a concept is the most accurate, common and current word at the time the controlled vocabulary is created. Typical approaches are to: Focus on concrete nouns Include multiword terms Include proper nouns Exclude commercial names Preferred word forms are shown by example: fish, fishing aircraft carrier American IBM Spelling Singular Plural Multiword theater [not theatre] theater [general; the profession] theaters [specific; buildings] performing arts [one term] Some terms are more ambiguous than others and need further clarification. Most of these are homographs: terms that have the same spellings but different meanings. In a controlled vocabulary, these are often distinguished by parenthetical qualifiers: letter (correspondence) vs. letter (alphabet) port (opening) vs. port (wine) Multiword terms, with more than one word representing a single concept, are also called compound terms. In some controlled vocabularies, terms and their parenthetical qualifiers are treated as compound terms: all the words must be kept together in indexing and searching. 2. Showing relationships among single concepts Problems What concept is related to a given concept? How is it related? Suppose you have these terms: motor vehicles, automobiles, cars, sports cars, trucks Clearly some concepts are broader than (encompass) others and some terms are actually synonyms.
INFO 5200 / Controlled vocabulary / p. 3 Solutions Again, the creator of the controlled vocabulary addresses these problems, based on understandings of the users and the collection. Relationships based on word meanings are called semantic relationships. Three kinds of semantic relationships are equivalent, hierarchical, and associative. Each raises its own questions: Equivalent (synonymous or nearly synonymous) Hierarchical (genus-species or broad-narrow) Associative (related but not synonymous or hierarchical) How to show preferred terms? How to show levels of meaning? How to link related terms? The solutions are cross references that show the relationships. For example, in an authority file on transportation, all three of these relationships pertain to the term automobiles: Equivalent Hierarchical Associative USE FOR cars BROADER TERM motor vehicles NARROWER TERM sports cars RELATED TERM trucks Each term in the authority file is listed separately. For each relationship, there must be a pair of cross references, called mandatory reciprocals: USE FOR and USE BROADER TERM and NARROWER TERM RELATED TERM and RELATED TERM Cross references are commonly abbreviated UF, USE, BT, NT, and RT. All terms in the authority file are listed alphabetically. Here is the display for automobiles: automobiles UF cars BT motor vehicles NT sports cars RT trucks cars USE automobiles motor vehicles NT automobiles sports cars BT automobiles trucks RT automobiles
INFO 5200 / Controlled vocabulary / p. 4 This example shows all reciprocals for automobiles. In the equivalent relationship, automobiles is the preferred term (or authorized term, or descriptor) and cars is the lead-in term (or nonpreferred term). The lead-in term is not used to represent or search for a subject: it is the term that people may look for first in the authority file and is included to lead them to the preferred term. This is how you read a thesaurus entry: Given: automobiles UF cars BT motor vehicles NT sports cars RT trucks You can search using the term automobiles and find something. Search using automobiles instead of searching using cars. Also, you can search and find something using the broader term motor vehicles, or by the narrower term sports cars, or by the related term trucks. We know we will find something using automobiles because it is bolded (bolded means there are guaranteed to be records found with this term) We also know we will find something using motor vehicles, or sports cars, or trucks because all three of those terms are bound into either a hierarchical relationship or an associative relationship, and only authorized terms can be so bound. Project Alert! You must show at least one example of each kind of semantic relationship in your sample thesaurus. Do not force a relationship on every term. You must have at least 15 authorized terms in the thesaurus. Note: for the field on which you executed your thesaurus: All authorized terms in the thesaurus must be found in at least one of your Libib records No unauthorized terms should be found in any Libib records All terms in the records must be in the thesaurus as authorized terms The arrangement of a controlled vocabulary using cross references to show relationships is known as its syndetic structure. See the assigned reading "Thesaurus construction and format" (2001) and the thesaurus tutorial module. 3. Showing relationships among multiple concepts Problems What if the subject of a document contains two concepts? What if it contains more than two concepts? This problem is even more complicated when there are not only multiple concepts in one document... Drama in the lives of teachers
INFO 5200 / Controlled vocabulary / p. 5... but also multiple documents with similar multiple concepts! Methods for teaching drama Drama as a teaching method A subject that includes more than one concept is known as a composite subject; it may also be called a complex or compound subject. Solutions Use precoordinate or postcoordinate indexing to link the concepts. These are rather mysterious terms for what are really simple concepts. Precoordinate indexing is combining several terms in some logical order, as in library catalog subject headings. "Pre" means the terms are combined prior to searching, at the time of indexing. Precoordination is the combination of indexing terms at the time of indexing. Combined terms represent composite or complex subjects. Typical combinations are controlled-vocabulary subject headings used in subject cataloging. Searching usually does not require the entry of all terms in the subject heading.
INFO 5200 / Controlled vocabulary / p. 6 Some examples, with alternatives: Drama in the lives of teachers Education--Teachers Education--Teaching--Psychological aspects Methods for teaching drama Education--Drama--Teaching methods Drama--Teaching methods Drama as a teaching method Education--Teaching methods--drama Postcoordinate indexing is combining single terms using boolean operators (AND, OR, NOT). "Post" means the terms are combined after indexing, at the time of searching. Postcoordination is the combination of indexing terms at the time of searching. Terms represent single, simple concepts. Typical combinations are controlled-vocabulary descriptors used in indexing. Searcher uses boolean operators and other techniques to combine terms. Some examples, with alternatives: Drama in the lives of teachers drama AND lives AND teachers (teachers OR teaching) AND psychology teachers AND psychology NOT methods Methods for teaching drama drama AND teaching AND methods drama AND (teaching OR methods) (drama AND education) AND methods Drama as a teaching method drama AND teaching AND methods drama AND (teaching OR methods) As you study the examples above, you may wonder whether the order of the terms matters. In precoordinate indexing like the subject headings shown, the order of terms, or syntax, does matter: this is known as a syntactic relationship. In postcoordinate indexing, like the boolean combinations shown, syntax may or may not matter, depending on the database. For more information, see the module on indexing, searching, and retrieval. In the examples above, you may also notice that none of the alternatives for either precoordinate and postcoordinate indexing fully conveys the meanings of the titles. Unfortunately, some meaning is almost always lost in a representation.
INFO 5200 / Controlled vocabulary / p. 7 Summary Indexing problems stem from the ambiguity of natural language. In controlled vocabulary approaches, most of the burden of solving these problems falls on the indexers who create and use subject authority files. Searchers must also assume some of the burden, however, in knowing how and when to consult subject authority files and how to use boolean operators to search multiple concepts. This module contains many key concepts and terms. It is especially important to distinguish among concepts in these sets of terms: semantic, syndetic, syntactic equivalent, hierarchical, associative You may also want to compare the solutions in this module with those in the module on natural language. Cites & sites Thesaurus construction and format. (2001). In Thesaurus of ERIC Descriptors. (14th ed.). Phoenix, AZ: Oryx Press. [ xxvii-xxxi] All INFO 5200/4200 course materials are copyrighted and may not be copied, revised, or distributed in any form or venue, beyond their use by students for purposes of fulfilling course requirements, without prior permission of the authors or the University of North Texas.