Controlled vocabulary

Similar documents
Ontological spine, localization and multilingual access

Ontologies vs. classification systems

10.2. Behavior models

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Grade 6: Module 1: Unit 2: Lesson 5 Building Vocabulary: Working with Words about the Key Elements of Mythology

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

More ESL Teaching Ideas

Test Blueprint. Grade 3 Reading English Standards of Learning

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Florida Reading Endorsement Alignment Matrix Competency 1

Literature and the Language Arts Experiencing Literature

Achievement Level Descriptors for American Literature and Composition

The College Board Redesigned SAT Grade 12

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Comprehension Recognize plot features of fairy tales, folk tales, fables, and myths.

Compositional Semantics

GENERAL COMPETITION INFORMATION

GENERAL COMPETITION INFORMATION

Modeling full form lexica for Arabic

What the National Curriculum requires in reading at Y5 and Y6

Library services & information retrieval

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

BULATS A2 WORDLIST 2

Data Modeling and Databases II Entity-Relationship (ER) Model. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Teachers Guide Chair Study

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

Copyright 2017 DataWORKS Educational Research. All rights reserved.

What is a Mental Model?

Digital Storytelling:Great Depression

Emmaus Lutheran School English Language Arts Curriculum

Chromatography Syllabus and Course Information 2 Credits Fall 2016

Referencing the Danish Qualifications Framework for Lifelong Learning to the European Qualifications Framework

Academic Integrity RN to BSN Option Student Tutorial

Loughton School s curriculum evening. 28 th February 2017

English Language and Applied Linguistics. Module Descriptions 2017/18

Interior Design 350 History of Interiors + Furniture

Graduate Program in Education

English IV Version: Beta

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Notetaking Directions

Grade 4: Module 2A: Unit 2: Lesson 4 Word Choice: Using Academic Vocabulary to Apply for a Colonial Trade Job

Multilingual access to information using an intermediate language

A. Planning: All field trips being planned must follow the four step planning process. (See attached)

Grade 6: Module 4: Unit 1: Overview

Literacy THE KEYS TO SUCCESS. Tips for Elementary School Parents (grades K-2)

Big Fish. Big Fish The Book. Big Fish. The Shooting Script. The Movie

21st CENTURY SKILLS IN 21-MINUTE LESSONS. Using Technology, Information, and Media

The Role of String Similarity Metrics in Ontology Alignment

Average Loan or Lease Term. Average

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Providing student writers with pre-text feedback

Intermediate Academic Writing

Unit 8 Pronoun References

INFORMATION LITERACY SKILLS CONTINUUM Grades 6-12

Text: envisionmath by Scott Foresman Addison Wesley. Course Description

Signs, Signals, and Codes Merit Badge Workbook

Facing our Fears: Reading and Writing about Characters in Literary Text

PowerTeacher Gradebook User Guide PowerSchool Student Information System

School: Business Course Number: ACCT603 General Accounting and Business Concepts Credit Hours: 3 hours Length of Course: 8 weeks Prerequisite: None

The Language Of ICT: Information And Communication Technology (Intertext) By Tim Shortis

Common Core State Standards for English Language Arts

ARTS ADMINISTRATION CAREER GUIDE. Fine Arts Career UTexas.edu/finearts/careers

PREPARING FOR THE SITE VISIT IN YOUR FUTURE

Grade 7. Prentice Hall. Literature, The Penguin Edition, Grade Oregon English/Language Arts Grade-Level Standards. Grade 7

5 th Grade Language Arts Curriculum Map

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

MYP Language A Course Outline Year 3

Context Free Grammars. Many slides from Michael Collins

Business. Pearson BTEC Level 1 Introductory in. Specification

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

AQUA: An Ontology-Driven Question Answering System

CLASS EXODUS. The alumni giving rate has dropped 50 percent over the last 20 years. How can you rethink your value to graduates?

MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES

Visual CP Representation of Knowledge

Language Arts: ( ) Instructional Syllabus. Teachers: T. Beard address

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

A process by any other name

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

Public Speaking Rubric

HARPER ADAMS UNIVERSITY Programme Specification

Thesis and Dissertation Submission Instructions

Primary English Curriculum Framework

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Grade 6: Module 2A: Unit 2: Lesson 8 Mid-Unit 3 Assessment: Analyzing Structure and Theme in Stanza 4 of If

Course Syllabus p. 1. Introduction to Web Design AVT 217 Spring 2017 TTh 10:30-1:10, 1:30-4:10 Instructor: Shanshan Cui

Cross Language Information Retrieval

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

California Department of Education English Language Development Standards for Grade 8

Intellectual Property

Let's Learn English Lesson Plan

LING 329 : MORPHOLOGY

Probabilistic Latent Semantic Analysis

Grade 6: Module 3A: Unit 2: Lesson 11 Planning for Writing: Introduction and Conclusion of a Literary Analysis Essay

Grade 3: Module 2B: Unit 3: Lesson 10 Reviewing Conventions and Editing Peers Work

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE

Grade 5: Module 3A: Overview

Transcription:

Indexing languages 6.2.2. Controlled vocabulary Overview Anyone who has struggled to find the exact search term to retrieve information about a certain subject can benefit from controlled vocabulary. Controlled vocabulary allows just one term, spelled one way, to represent a given concept. It is an indexing solution to problems stemming from the ambiguity of natural language that tend to result in imprecise and incomplete retrieval. A controlled vocabulary is a set of authorized (standardized) terms. Most controlled vocabularies represent subjects and are listed in subject authority files called thesauri or subject headings lists. An indexer or cataloger chooses controlled-vocabulary terms from a particular authority file and assigns them to a controlled-vocabulary field in the metadata record. A searcher should also consult the authority file to find terms for searching the controlled-vocabulary field. This module explains how a controlled vocabulary works. It describes three kinds of indexing problems, then shows how controlled vocabulary provides solutions. The problems are: 1. Naming single concepts: What is the best term for a given concept? 2. Showing relationships among single concepts: What concept is related to a given concept? 3. Showing relationships among multiple concepts: What if the subject of a document contains two concepts? 1. Naming Single Concepts Problems What is the best term for a given concept? How does one choose among variant word forms for a concept?

INFO 5200 / Controlled vocabulary / p. 2 Solutions The creator of the controlled vocabulary addresses these problems based on understandings of the users and the collection. The "best" term for a concept is the most accurate, common and current word at the time the controlled vocabulary is created. Typical approaches are to: Focus on concrete nouns Include multiword terms Include proper nouns Exclude commercial names Preferred word forms are shown by example: fish, fishing aircraft carrier American IBM Spelling Singular Plural Multiword theater [not theatre] theater [general; the profession] theaters [specific; buildings] performing arts [one term] Some terms are more ambiguous than others and need further clarification. Most of these are homographs: terms that have the same spellings but different meanings. In a controlled vocabulary, these are often distinguished by parenthetical qualifiers: letter (correspondence) vs. letter (alphabet) port (opening) vs. port (wine) Multiword terms, with more than one word representing a single concept, are also called compound terms. In some controlled vocabularies, terms and their parenthetical qualifiers are treated as compound terms: all the words must be kept together in indexing and searching. 2. Showing relationships among single concepts Problems What concept is related to a given concept? How is it related? Suppose you have these terms: motor vehicles, automobiles, cars, sports cars, trucks Clearly some concepts are broader than (encompass) others and some terms are actually synonyms.

INFO 5200 / Controlled vocabulary / p. 3 Solutions Again, the creator of the controlled vocabulary addresses these problems, based on understandings of the users and the collection. Relationships based on word meanings are called semantic relationships. Three kinds of semantic relationships are equivalent, hierarchical, and associative. Each raises its own questions: Equivalent (synonymous or nearly synonymous) Hierarchical (genus-species or broad-narrow) Associative (related but not synonymous or hierarchical) How to show preferred terms? How to show levels of meaning? How to link related terms? The solutions are cross references that show the relationships. For example, in an authority file on transportation, all three of these relationships pertain to the term automobiles: Equivalent Hierarchical Associative USE FOR cars BROADER TERM motor vehicles NARROWER TERM sports cars RELATED TERM trucks Each term in the authority file is listed separately. For each relationship, there must be a pair of cross references, called mandatory reciprocals: USE FOR and USE BROADER TERM and NARROWER TERM RELATED TERM and RELATED TERM Cross references are commonly abbreviated UF, USE, BT, NT, and RT. All terms in the authority file are listed alphabetically. Here is the display for automobiles: automobiles UF cars BT motor vehicles NT sports cars RT trucks cars USE automobiles motor vehicles NT automobiles sports cars BT automobiles trucks RT automobiles

INFO 5200 / Controlled vocabulary / p. 4 This example shows all reciprocals for automobiles. In the equivalent relationship, automobiles is the preferred term (or authorized term, or descriptor) and cars is the lead-in term (or nonpreferred term). The lead-in term is not used to represent or search for a subject: it is the term that people may look for first in the authority file and is included to lead them to the preferred term. This is how you read a thesaurus entry: Given: automobiles UF cars BT motor vehicles NT sports cars RT trucks You can search using the term automobiles and find something. Search using automobiles instead of searching using cars. Also, you can search and find something using the broader term motor vehicles, or by the narrower term sports cars, or by the related term trucks. We know we will find something using automobiles because it is bolded (bolded means there are guaranteed to be records found with this term) We also know we will find something using motor vehicles, or sports cars, or trucks because all three of those terms are bound into either a hierarchical relationship or an associative relationship, and only authorized terms can be so bound. Project Alert! You must show at least one example of each kind of semantic relationship in your sample thesaurus. Do not force a relationship on every term. You must have at least 15 authorized terms in the thesaurus. Note: for the field on which you executed your thesaurus: All authorized terms in the thesaurus must be found in at least one of your Libib records No unauthorized terms should be found in any Libib records All terms in the records must be in the thesaurus as authorized terms The arrangement of a controlled vocabulary using cross references to show relationships is known as its syndetic structure. See the assigned reading "Thesaurus construction and format" (2001) and the thesaurus tutorial module. 3. Showing relationships among multiple concepts Problems What if the subject of a document contains two concepts? What if it contains more than two concepts? This problem is even more complicated when there are not only multiple concepts in one document... Drama in the lives of teachers

INFO 5200 / Controlled vocabulary / p. 5... but also multiple documents with similar multiple concepts! Methods for teaching drama Drama as a teaching method A subject that includes more than one concept is known as a composite subject; it may also be called a complex or compound subject. Solutions Use precoordinate or postcoordinate indexing to link the concepts. These are rather mysterious terms for what are really simple concepts. Precoordinate indexing is combining several terms in some logical order, as in library catalog subject headings. "Pre" means the terms are combined prior to searching, at the time of indexing. Precoordination is the combination of indexing terms at the time of indexing. Combined terms represent composite or complex subjects. Typical combinations are controlled-vocabulary subject headings used in subject cataloging. Searching usually does not require the entry of all terms in the subject heading.

INFO 5200 / Controlled vocabulary / p. 6 Some examples, with alternatives: Drama in the lives of teachers Education--Teachers Education--Teaching--Psychological aspects Methods for teaching drama Education--Drama--Teaching methods Drama--Teaching methods Drama as a teaching method Education--Teaching methods--drama Postcoordinate indexing is combining single terms using boolean operators (AND, OR, NOT). "Post" means the terms are combined after indexing, at the time of searching. Postcoordination is the combination of indexing terms at the time of searching. Terms represent single, simple concepts. Typical combinations are controlled-vocabulary descriptors used in indexing. Searcher uses boolean operators and other techniques to combine terms. Some examples, with alternatives: Drama in the lives of teachers drama AND lives AND teachers (teachers OR teaching) AND psychology teachers AND psychology NOT methods Methods for teaching drama drama AND teaching AND methods drama AND (teaching OR methods) (drama AND education) AND methods Drama as a teaching method drama AND teaching AND methods drama AND (teaching OR methods) As you study the examples above, you may wonder whether the order of the terms matters. In precoordinate indexing like the subject headings shown, the order of terms, or syntax, does matter: this is known as a syntactic relationship. In postcoordinate indexing, like the boolean combinations shown, syntax may or may not matter, depending on the database. For more information, see the module on indexing, searching, and retrieval. In the examples above, you may also notice that none of the alternatives for either precoordinate and postcoordinate indexing fully conveys the meanings of the titles. Unfortunately, some meaning is almost always lost in a representation.

INFO 5200 / Controlled vocabulary / p. 7 Summary Indexing problems stem from the ambiguity of natural language. In controlled vocabulary approaches, most of the burden of solving these problems falls on the indexers who create and use subject authority files. Searchers must also assume some of the burden, however, in knowing how and when to consult subject authority files and how to use boolean operators to search multiple concepts. This module contains many key concepts and terms. It is especially important to distinguish among concepts in these sets of terms: semantic, syndetic, syntactic equivalent, hierarchical, associative You may also want to compare the solutions in this module with those in the module on natural language. Cites & sites Thesaurus construction and format. (2001). In Thesaurus of ERIC Descriptors. (14th ed.). Phoenix, AZ: Oryx Press. [ xxvii-xxxi] All INFO 5200/4200 course materials are copyrighted and may not be copied, revised, or distributed in any form or venue, beyond their use by students for purposes of fulfilling course requirements, without prior permission of the authors or the University of North Texas.