THE INTEGRATED DIGITAL ARCHIVE

OF LOS ANGELES

 

 

Imagine a resource for the study of Los Angeles, available on the World Wide Web, that brought together and provided access to millions of digital images of photographs, manuscripts, records, newspapers, maps, movies, and artifacts from area archives, museums, and libraries. Imagine an interface that would let anyone from an elementary school child to a Phd candidate to a university faculty member search across the collections by keyword, by format, by collection, by time, or by space.

The power of digital technology and the Internet has made such a resource more than just a researcher's fantasy. In 1994, the Information System for Los Angeles project (ISLA) at the University of Southern California (USC) was conceived as a digital research archive that would include a wide range of information in many formats and from many historical periods. ISLA would facilitate research and teaching, but would also be available to the public via the World Wide Web. It would be a system that would allow users to search,and view, materials housed in many different institutions.

ISLA is a university-wide project that drew on the participation of researchers and faculty from many disciplines, including the social sciences, humanities, natural sciences, computer science, and information and library professionals. But ISLA is also meant to extend beyond the university to become a collaboration with other Los Angeles area institutions, organizations, and communities.

USC found that the biggest challenge to putting together a digital archive is the integration of materials from different sources. Not just the practical integration, but also the conceptual integration. By conceptual integration, ISLA means building relationships between the digital objects and creating new and different means of access.

The ISLA project at USC created a method of "conceptual integration" that explores the space and time dimensions of information, in addition to the more familiar subject and format dimensions. The idea was to enable the user to access information by all four of these attributes. All database objects will be linked by spatial coordinates (north, south, east, west), with provisions for loose as well as precise definitions of the coordinates. The ISLA ingest procedures are compatible with and comply with those of Project Alexandria (NASA/NSF/ARPA funded Digital Library), the Federal Geographic Data Commission (FGDC), and the Library of Congress and Text Encoding Initiatives (TEI).

There are two primary components to the ISLA system. The first is a large database built from USC collections and the collections of Los Angeles area archives, museums, historical societies, etc. The second component is a user interface that would be able to weave together all the data formats (textual, quantitative, still photographic, time-based (audio, video, film) and spatial) and allow an open, user-defined search. The prototype interface screen enables the user to perform quite complex searches, but it appears to be too complicated for a school child or inexperienced user.

Intellectual property rights are always an issue in a collaborative project such as ISLA. In this project, the contributing institutions retain all intellectual property rights and make all decisions regarding the licensing of those rights to users. These decisions will be embedded in the cataloging data for each item.

The ISLA concept is a bold and ambitious design, but like many digital library projects, the full implementation of the original blueprint has run into many detours and roadblocks. The release of ISLA has been delayed. The main delay has been with the user interface. In an effort to make the material accumulated to date accessible, the two components have been separated out. Individual collections have been mounted, but searching by temporal or spatial coordinates is not yet available. Integrated Digital Archive of Los Angeles (IDA-LA) is the database of digital material collected so far, without the ISLA interface. IDA-LA is accessible on the Web, by a search interface that allows for searching by keyword, with the ability to limit the search by collection or format.

The collections mounted so far are:

The IDA-Southwestern Native Americans, which contains some 1000 images created or collected by George Wharton James, a prolific writer and booster of Southern California and the Southwest. They date to between 1890 and 1905.

The IDA-Automobile Club of Southern California, which provides access to historic maps and photographs from the Club's corporate archives.

The IDA-Chinese Historical Society of Southern California, which provides images of artifacts from early Chinese settlements in Los Angeles and Santa Barbara.

The IDA-California Historical Society, which offers photographs of the development of California, with emphasis on the Southern California region from, dating from 1860 to 1960.

 

The IDA-LA search interface is not particularly user-friendly. If the user is not sure what he or she is looking for, the user will have difficulty finding relevant materials. Browsing is hard to do and there is no index of available materials. Only the Chinese Historical Society interface has a list of suggested search terms. This list gives the user some idea of what is included in that collection.

The Automobile Club and Chinese Historical Society collections utilize the Dublin Core metadata elements. Some of the collections that were the first to be mounted are being converted to Dublin Core. The decision to use Dublin Core as the base record was not reached without much back and forth between those who wanted to capture as much information as possible and those who wanted to get the resources up and available as quickly as possible. One dean felt that Dublin Core was too complicated, while others wanted as much information as possible to enable future applications.

The ISLA model for creating records was to have the different participating institutions input their own records. In the case of small, non-profit organizations, this would mean utilizing volunteers. These volunteers often do not have any formal cataloging training or experience with such concepts as authority records and controlled vocabularies. USC discovered that, in practice, this method resulted in some unusual records. Though Dublin Core is intended to be simple enough to use so that any individual should be able to assign terms, in the end, USC mapped the organization's information to Dublin Core in the case of the AAA and the Chinese Historical Society.

IDA-LA does not track the keep track of the scanning documentation (resolution, machine, date,etc.) in any of its metadata, or anywhere else for that matter. The Automobile club collection was scanned by USC. The Chinese Historical Society contracted with someone to take digital photographs of their artifacts. The digital photos were then scanned by USC.

The master images are archived and are not accessible by the public. If a user wants a higher quality image, he or she is referred to the institution that houses the collection. The derivative image resolution is dependent on the quality of the master digital image. Some of the contributing organizations did not want high quality images on the Web. The Automobile Club was concerned about the misappropriation of its material. The Chinese Historical Society did not want the viewing of images on the Web to take the place of visiting the museum and viewing the objects in person. In some cases, the description of an artifact mentions some quality or feature that cannot be seen in the image.

The digital objects that are available as part of IDA-LA to date are mostly photographs, with a few historic documents and selected textual materials, including USC theses. Sophisticated means of navigation are not in place. The theses pages are all mounted on one screen, and the user can click on the unreadable small page image to get a full image of that page. The user can then navigate to the previous page or the next page, or can go back to the original screen with all the page images.

One of the partners in IDA-LA is the Huntington Library. The Huntington Library is making available its collection of two 19th century newspapers, the Los Angeles Star and El Clamor Publico. The newspapers present special display problems. USC has scanned the newspapers, but the image of each page is large. If the whole page is displayed on the computer monitor screen, the print is too small to be of any use to the viewer.

USC and IDA-LA are experimenting with Mr. Sid, an image file format that can reduce the size of high resolution images to less than 3% of their original size while maintaining the quality and integrity of the original image. In addition, the original image can provide all the resolutions necesary for a range of purposes and sizes, for example, a 72 dpi web image or a 300 dpi publication quality version. Mr. Sid makes these versions automatically on-the-fly.

The creation and implementation of the ISLA project has required the project team to wrestle with issues that all digital libraries are facing as they try to utilize the full power of digital technology to create vital new information resources. A look at how ISLA has handled these issues provides insight to other digital resource projects.

 

BACK TO THE TOP

 

This paper could not have been written without the time and assistance of Ann Lynch, Head of Project Discovery and Management, Center for Scholarly Technology, USC.

 

For more information on ISLA, go to: http://www.usc.edu/isd/locations/cst/IDA/prospectus.html.

To access IDA-LA, go to: http://library.usc.edu/. On the Homer library catalogue page, click the DIGITAL MEDIA ARCHIVES tab.

 

This paper was written for Visual Materials: Metadata, Standards, and Best Practices for Digital Libraries, IS 208, taught by Howard Besser, UCLA Spring 2000.