ILS 603: Individual Paper

Nancy Vlahakis--October 16, 1995

Discussion of Databases for the Stearns Collection of Musical Instruments: Issues and Options

Management Issues
The Stearns Collection of Musical Instruments is comprised of over 2,400 instruments of varying classifications, cultural origins, sizes, and quality. The majority of the collection is in storage.

The group that is working on digitizing the collection has developed a vision. The primary goal is to make the collection an educational tool. Target audience is undecided but the group is leaning toward grades 6 through 12 (although the ideal expressed is to be able to present the collection on two or three different levels). The collection must therefore be accessible to this age group and presented in a nonscholarly manner. The dream of the group is to make the collection interactive using a multimedia format of images, videos of the instruments being performed, recordings, and text detailing numerous aspects of the instruments.

Presently, the collection has been catalogued and information stored in the relational database program called dBASE. There are several fields currently being used that contain coded information. For instance, two-letter codes for country of origin (practically nonintuitive) and three-digit codes for various description elements. When you pull up a file, unless you have their code sheet in front of you, you have no idea what the information is that is being displayed. Only a certain percentage of the instruments has been catalogued entirely, but they have all been inventoried and entered into the data base, the key field being the accession number. It is currently accessible only at the School of Music. The goals of a DBMS for this group is to make it accessible on the Web, to include multimedia components, and to handle queries of varying complexity.

Database Options
There are basically four database options available to the group: flat-file, relational, object-oriented, and a device-independent structured document approach I will discuss each option briefly with an accompanying example of current tools.

Flat-file data-management systems comprise straight tables with no connection between them. Their searching capabilities within fields are satisfactory in that they will search on strings of text anywhere within the field, which releases the user from having to know the exact order of the entry. Panaroma, Lotus 1-2-3, and Filemaker Pro are popular examples of this simple structure. They are fast performers offering limited types of queries and can only accomodate text.

A relational database management system (RDBMS) "...lets you cross-reference information between two or more files that share a common, or key, field....[and lets] you establish a variety of relationships among sets of data" (MacUser, June 1992). This would satisfy the need for queries of varying complexity.

Object-oriented database management systems (ODBMS) "...are based on manipulating objects, which encapsulate complex data structures and processes...for manipulating that data" (Mullins 1994). This ability to handle complex objects is the primary distinction from RDBMSs. One type of complex object significant to the Stearns group's purposes is multimedia binary large objects or BLOBs, which translates into image and sound files for instance. "Although some relational databases can process these types of objects, it is seldom easy or efficient to do so" (Mullins 1994). Another benefit to ODBMs is that feature standardizations are being developed by the Object Database Management Group started in 1991, and hopes they are adopted as an ANSI/ISO standard (Mullins 1994).

It is difficult to discuss many products, slotting them into the either/or of database construction. Many of the most successful RDMSs have been integrating object-oriented development tools. Borland's dBASE is one example. dBASE has taken on a Windows face and has "a new binary field type [that] lets you embed...objects such as bitmap images and WAV files" (PC Computing, July 1994). Borland's Paradox 4.5 for Windows is a relational database with the benefit of being able to import and export data from other programs like dBASE and delimited ASCII text. ACI US's 4th Dimension is a relational tool that has recently gone cross-platform--Mac/Windows). It also has the capability to generate relationships horizontally as well as vertically. That is to say, it "...lets you create parent/child, or hierarchical, relationships between files and their subfiles" (MacUser, June 1992).

Embark, a fairly new database system, has been acquired by the University's art museum for creating an image database with text. This is of interest to the Stearns group's direction because of the emphasis on images and accessibility. It was chosen because of its ease of incoporating images and its public-access type module in being able to establish some fields as read-only. (Although this feature is available with other software as well, for instance, with Paradox.) Using this system, they will create kiosks throughout the museum with stand-alone stations. A Web interface is being developed for Embark whose system makes it easy to pull out pieces that will go online. Constant staff use made the collection management features attractive as well. Embark is not cross-platformational, but in a homogeneous environment of Macs, such as the museum is, this is of minimal significance. (Interview with Toni Kramer, computer systems analyst, Museum of Art.)

A radical deviation from these choices is to structure the information and various other components of the Collection using Standard Generalized Markup Language (SGML). By way of building rationale for this kind of construct, I borrow the words of Edward A. Fox: "...[modern] technology encourages individuality, integration, interfaces that are universal but personalizable, indexing that is open rather than controlled, and information that is multimedia and organized using hypermedia as well as hierarchical structures." Further, "if this program of developing requirements and architectures is to be successful, it must be embodied through the development of interoperable systems that follow suitable standards..." (Fox, "Images of Digital Libraries," 1994). The key words here are interoperable and standards.

SGML speaks to content and its structure as does a database. Instead of fields or tables, it uses tags to delineate these elements. SGML is object-oriented in the sense that the SGML grammar deals with entities comprising hierarchies within a structure and links to other kinds of data, i.e., multimedia files. "It is ...a standard designed to express the organization of documents and to accommodate even the most complex multimedia materials" (Price-Wilkin, "Using the World-Wide Web to Deliver Complex Electronic Documents," 1994).

The next component in this process is text retrieval from the tagged structures. This can be accomplished using the PAT search engine, which makes it possible to perform complex queries. It then presents the results on the Web by converting the text to HTML on the fly. Common browsers such as Netscape and Mosaic require this conversion because they do not read SGML. Panorama is a browser that can, but is not widely used as yet because of expense. This construct will endure because it is device-independent and has a very rich grammar accomodating many kinds of data. Moreover, you can create a Document-Type Definition (DTD) tailored to your data, much as you would set up fields in more widely-used DBMSs.

There are many needs and choices for a database management system. Characteristics comprise a user friendly interface to accommodate entry by people who may not normally work on databases (i.e., music students); relational capabilities to handle queries of varying complexity; object-oriented structure for the incorporation of image, video, and sound files; and accessibility through some interface on the Web. The solution would be simpler, though choices still numerous, if the database didn't have to go online. One of the most desirable features of Embark according to Kramer at the Art Museum is that you can pull out bits and pieces to go online. So flexibility seems to be important as well.

As explained earlier, the Collection's cataloging information is already in dBASE. The recent upgrade to Windows and cross-platforms, the object-orientation facilitating embedding multimedia files, and its current employment are compelling reasons to stay with that software. On the other hand, SGML is attractive because it offers these desired attributes as well as being browsable on the Web through programs such as PAT (and is becoming moreso as browser support for SGML increases on the Web). Built on international standards and being device-independent make it a strong candidate.

More exploration needs to be done on these two options. How would dBASE fit into the online scheme? Would SGML be acceptable to Collection staff as the entry tool for cataloging? Perhaps using dBASE for data entry, then converting to SGML for the Web needs to be discussed. I propose to create a prototype of the solution for my individual project.