Controlled Vocabulary>

Faceted Classifications and Thesauri

Introduction
Faceted Classifications
Thesauri
Conclusion
Bibliography

Introduction

Throughout history as collections grew, classification systems were developed to organize materials. The Dewey Decimal (DDC) and Library of Congress (LC) Classifications are used to assign class numbers to organize materials by subject matter. Within broad areas and as more specialized reference materials were published, more specific vocabularies were developed to identify content of a piece and to aid retrieval. A controlled vocabulary is a set of specific terms used to index subject matter of documents (of all types, books ,articles, media, T-shirts) and retrieve items on that subject matter from a collection. The controlled vocabulary can be an index such as the New York Times Index, descriptors in ERIC, or Library of congress Subject Headings. This paper will describe the development and use of two controlled vocabularies, faceted classifications and thesauri. This paper will not cover automated procedures for controlled vocabulary development (PRECIS).

Faceted Classifications

There are three general types of classification schemes: enumerative, synthetic and analytico-synthetic. The enumerative scheme is based on the concept of a universe of knowledge which is divided into successively narrower and more specific subjects. Theoretically, all topics are to be represented. LC is an enumerative scheme. A synthetic scheme is one in which new class numbers can be developed for new topics not already listed. DDC, although primarily enumerative, approaches a synthetic scheme with each revision. The Universal Dewey Decimal System is closest to approximating a truly synthetic system (Aitchison, 1982); (Aitchison and Gilchrist, 1987).

The facet classification is an analytico-synthetic scheme. It is analytic because it subdivides broader elements into single concepts that are clearly defined through facet analysis. It is synthetic in that new elements can be developed. The classification was first originated by S.R. Ranganathan in the 1930's with the Colon Classification. Facet classification has been used more in England that the United States. (It is important to note that the process of facet analysis can also be used to construct thesauri.) There is a renewed interest in this system, because some believe that older systems such as DDC and LC do not provide enough detail to accurately describe all subjects in all media, may not meet the needs of the individual or special library, may not provide for enough coordination of terms, may require complex or lengthy notation, and are often difficult to use to locate materials (Vickery, 1960).

Basically, the facet development process begins by defining the subject to be covered by examining existing classifications or thesauri, or titles or objects in the perspective database. The derived topics are broken down into facets each with a distinct label. Items are organized so that they are in homogeneous, mutually exclusive groups that differ from the main group by one characteristic. For example, in chemistry the facets may logically be substance, state, property, reaction, and operation (Aitchison and Gilchrist, 1987); (Vickery, 1960).

Within each facet, subfacets or more specific topics are listed. The breakdown continues into subfacets within subfacets. The items in each subfacet, in general, are ordered from more general to more specific, complex or concrete (Vickery, 1960); (Croghan, 1971). Within each hierarchy each subfacet is indented from the immediately higher facet to clarify the relationships.

Each element in each hierarchy is assigned an alphabetic or alphanumeric notation (a type of call number). Starting at the top the notation for each element builds on the immediately preceding notation. Liu (1990) stresses that the notation should

be hospitable or able to interpolate new terms at all levels of all facets
be expressive or convey the hierarchy
use mnemonics or consistent notation for each topic throughout
be synthetic or able to form combinations with other items
be brief and easy to use.

The order of the groups top to bottom in each hierarchy is the filing order. This order will be reversed to develop the citation order. The citation order will move from the most specific to the most general. For example, if the filing order was: agents, operations, materials, parts, end-products, the citation order would be: end-products, parts, materials, operations and agents (Aitchison and Gilchrist, 1987). This is an oversimplification of the process, but this "chaining" constructs the complex pre-coordinated index terms used for retrieval.

One problem with this system is how patrons will access specific items in the large collection of facets. Permutations of all possible terms in a string would lead to unuseful chains and be costly and wasteful. It is, therefore, customary to develop an alphabetic index to assist users to access topics of interest. These two tools are used together. Another problem is how to identify the level of specificity for each chained term. Greenberg (1995) suggest that individual libraries adopt local policies and procedures or authority-like records on how strings should be constructed. A third problem may be in generating the original hierarchies. A variety of sources from print to electronic to panels of experts are resources that can be tapped for this process.

Thesauri

The second type of controlled vocabulary, the thesaurus, can theoretically also be developed through facet analysis. Lancaster's (1986) first deductive "top down" approach is, in fact, facet analysis. His two criticisms of this approach, the inability to identify all subject matter and the level of specificity of terms, have been addressed above.

In stead, Lancaster recommends a "bottom up" approach. Here one gathers terms from major aspects of a field or subject matter. Terms selected should have literary warrant in that they are used frequently in the literature. He also considers user warrant such that the terms are of interest to the user which he determines through questionnaires or users questions. From these he believes he can identify the scope of the field and the needed level of specificity. His decisions are based on the scope of the subject matter and the intended user.

Lancaster (1985); (1986) uses a series of rules derived from the 1985 UNESCO guidelines to complete the thesaurus. He organizes the many separate terms into categories as in an English thesaurus. He establishes separate genus/species hierarchies within each category. In each hierarchy he selects one frequently used concept as the valid term and identifies the others in the hierarchy as a BT (broader term) or an NT (narrower term) in relation to the selected term. His hierarchies are smaller and not as specific as those in a faceted scheme, for example in a faceted scheme the concepts might additionally be broken down by shape, by material or by application. He then identifies those concepts within each hierarchy that have an associative relationship (RT) with other terms; these are terms in different hierarchies that are related in some way to one another. Synonyms in different hierarchies are said to have an equivalence relationship and are given cross references so that just one synonym is a valid term for the concept. The valid synonym is given a UF (used for) indicator under which are listed the synonyms that are not to be used. Under these other synonyms will be a "use" note with reference to the valid term. For homographs or words spelled the same but with different meanings, each is given a parenthetical qualifier to narrow its meaning or a scope note (SN) with a definition. He continues to add scope notes to other terms to further control the use of terms. Finally, the chosen valid terms and cross references are alphabetized as are the terms within each small hierarchy.

In general this approach seem to be easier to complete than a facet classification, because the thesaurus is approached in a discrete manner almost on a term-by-term basis (like a dictionary).

Two publications are a combination of a facet classification and a thesaurus. The first, The Thesaurofacet, was developed in 1969 by Aitchison in the subject area of electrical engineering, and the second in 1994, the Art and Architecture Thesaurus. In both the facet division depicts the complete hierarchies and the thesaurus is an index that shows the relationships between terms across sections of the hierarchies. The facet scheme can be used to catalog items and develop subject headings and the thesaurus can be used to retrieve items post-coordinately from a database. Within each text the systems would be internally consistent.

Conclusion

Pre-coordinately, a facet classification seems especially suited to small collections of even non-book material. For example, this type of system may be useful to classify items in a vertical file, a photo collection or even an auction house's sales catalog collection (McNulty, 1992). However, it can also be used in large collections in such expansive a subject area as art and architecture (Art and Architecture Thesaurus). In fact, provisions have been made to use the AAT subject headings in a new 654 field in the MARC record (Petersen, 1990).

The process of chaining terms together in faceted classification lends itself to pre-coordination. The use of compound terms and increased specificity of terms can increase the precision of retrieval. Post-coordinately, the thesaurus can be developed to produce better search results. Devices that increase precision are structural relationships (general to specific), specificity of terms, homograph qualifiers, and scope notes. Recall devices are control of word forms, control of synonyms, structural relationships (specific to general), and specific to general entries (Aitchison and Gilchrist, 1987).

Traditionally it was believed that faceted classification could only be used pre-coordinately and thesauri, post-coordinately. However, thesaurus terms can be used as subject headings and facet chains as post-coordinated concepts (Aitchison, 1982). Godert (1991) reports on how one can post-coordinately synthesize terms from a faceted structure to retrieve items. Certainly, the presence of one tool in a subject matter will facilitate and contribute to the development of the other in that area. However, it seems that these two systems can most advantageously be used if facets are used pre-coordinately and thesauri, post-coordinately. Librarians need to publish research on how they have used facet hierarchies and thesauri in their libraries either in cataloging or in retrieval. Creative research derived from research or theory needs to be conducted on the specific conditions where one system performs better than the other.

Bibliography

Aitchison, J. (1982). Indexing Languages, Classification Schemes and Thesauri. In Anthony, L.J. (ed.), Handbook of Special Librarianship and Information Work (5th ed.). London: Aslib. (p.207-261).

Aitchison, Jean and Gilchrist, Alan. (1987). Thesaurus Construction : A Practical Manual. (2nd ed.). London: Aslib.

Croghan, Antony, F.L.A. (1971). A Manual on the Construction of an Indexing Language. London: Coburgh Publications.

Godert, Winfried. F. (1991). Facet Classification in Online Retrieval. International Classification, 18 (2), 98-109.

Greenberg, Jane. (1993). Intellectual Control of Visual Archives: A Comparison Between the Art and Architecture Thesaurus and theLibrary of Congress Thesaurus for Graphic Materials. Cataloging and Classification Quarterly, 16 (1), 85-117.

Lancaster, F.W. (1985). Thesaurus Construction and Use: A Condensed Course. New York: UNESCO.

Lancaster, F.W. (1986). Vocabulary Control for Information Retrieval. (2nd ed.). Arlington, Virginia: Information Resources Press.

Liu, Songqiao. (1990). Online Classification Notation: Proposal for a Flexible Faceted Notation System. International Classification, 17 (1), 14-27.

McNulty, Tom. (1992). A Subject Classification System for Auction House Sales Catalogues Based on Getty Trust's Art and Architecture Thesaurus, Art Documentation, 11 (4), 185-187

Milstead, Jessica L. (1984). Subject Access Systems. New York: Academic Press.

Petersen, Toni. (1990). Developing a New Thesaurus for Art and Architecture. Library Trends, 38 (4), 644-658.

Vickery, B.C. (1960). Faceted Classification: A Guide to the Construction and use of Special Schemes. London: Aslib.

Vickery, B.C. (1966). Faceted Classification Schemes. Rutgers Series on Systems for the Intellectual Organization of Information . v.5 New Brunswick, New Jersey: Rutgers State University.

Weinberg, Bella Hass. (1995). In-depth Book Review. Journal of the American Society for Information Science, 46 (2), 152-160.