"Improvements on California Heritage Collection"
Prof. Howard Besser
IS246 Tu 1-4
May 4, 1999
The California Heritage Collection consists of photographs, pictures and manuscripts from the collection of the Bancroft Library at the University of California, Berkeley. This "finding aid" is an online archive containing well over 28,000 images that illustrate the history and culture of California. More than thirty institutions now have the chance of exposure to the California Heritage Collection since it is part of the Online Archive of California. To fully understand what the purpose of this search engine is one needs to understand what finding aids are. From the California Heritage web-site, under Frequently Asked Questions, they have defined finding aids as follows: "Finding aids are inventories, registers, indexes or guides to collections held by archives and manuscript repositories, libraries, and museums. Finding aids provide detailed descriptions of collections, their intellectual organization and, at varying levels of analysis, of individual items in the collections". This collection is the outcome of the California Heritage Digital Image Access Project which was funded by the National Endowment for the Humanities. While the collection was being digitized its purpose was to demonstrate how USMARC collection-level cataloging records and electronic versions of archival finding aids can be linked in working together in the network environment while providing access to and controlling of digitized images. California Heritage uses Standard Generalized Markup Language (SGML) finding aid technology and Encoded Archival Description (EAD) for its database design which was established at Berkeley.
While there are several exceptional features of this archival collection one should not overlook the possibilities for further improvements. Unfortunately the fact remains that there was a question weather to concentrate more on the detail of the material already digitized or continue with more digitization during the creation of the California Heritage Collection yet the decision was made to digitize more. Based on this decision several thoughts arise pointing out that this issue can be debated from two perspectives. One side may look at it as a positive aspect of the collection since there is more accessibility of information yet that information may or may not be useful due to its content. Mainly concentrating on improvements that can enhance the California Heritage Collection, one may observe that it lacks description which refers to the context of the subject of the photographs, its navigation is a bit awkward that can be simplified and there should not be two similar words that pull up different lists during a search.
In the California Heritage Collection the photographs under Container Listing lack description that needs to refer to the photograph’s context. The process that was used to include the pictures into California Heritage is as follows: first the images were selected from the Manuscript and Pictorial Collections of The Bancroft Library, then captured on 35mm film and scanned to Kodak Photo CD. The 1024 x 1536 grayscale images were pulled from the Photo-CDs for viewing. Finding aids were encoded using commercial SGML authoring tools. A database was used to keep track of the items as they were processed. With this information now it is understandable what time consuming process that must have been. Along with photographs there also were caption fields which marked information like the photographer, the photographer’s number and series. At the same time a word processed introductory information was processed and merged with the listing generated from the project database in creating the finding aid. This word processed information is the part where it actually informs about the context of the photograph. Having this word processed information allowed the creators to complete a full finding aid for every collection and the purpose was to complete the project and when there was enough time left to come back and add more information to that particular collection.
While doing a search for photographs of architecture, both homes and public buildings, a student from IS 246 at Berkeley, Lisa Parks, was disappointed not to find information that actually was related to the buildings themselves. Lisa says that the photographs were "very antiseptic, rarely including inhabitants or neighboring buildings." She further explains that "the breadth of the material represented is limited, but this is not the fault of the library. These photographs may represent biographical or historical perspectives, and the record of their existence may be as important as the details of the photographs themselves." It is obvious that with the lack of this details that need to inform what the photographs are about students from universities may be disappointed such as Lisa, since they want more than just the image. Yet if much younger students from K-12 were to be researching they might not have had this disappointment because their interest in the search would not really be textual it would rather be pictorial.
Another student, JoAnne Allen, also from UCB taking IS 246 was a bit upset not to find more information pertaining to photographs and artifacts. Her reference is to the African American collection and The Native American collection, where she observed that the "captions posted with the pictures were the captions written by the photographers themselves at a particular period." This clearly explains that there was no introductory information merged with the caption fields during the creation of this collection. Although, JoAnne felt that those captions really did "reflect the mentality of the time" however, she did not think that "they provided enough information for the pictures. An individual who knows nothing about social conditions in which the pictures were taken would not come away with any more insight or knowledge" JoAnne explained. Therefore, a suggestion for improvement in this part would be to actually go back to those word processed sections of California Heritage and include a statement of description that refers to the circumstances of the photographs. With this improvement the students from universities will benefit more and not leave the site with frustration and lack of information.
Another concern that comes in mind regarding California Heritage, is its navigation. An online program’s navigation is an extremely significant aspect of that program therefore, its simplicity is very important for users. If young students from K-12 were to be using the California Heritage Collection they would have a hard time with some of its descriptions. For example, if they conducted a search and wanted to display some photographs they would not have any idea that they needed to go under Container Listing to display those photos. In fact as part of the Frequently Asked Questions section one of those questions asks "Why can’t I see the images?" Clearly this is a problem for the users. These students obviously need a simple search engine that will provide them with the information they need without doing research to discover how to use the program in the first place. It is true that California Heritage contains a guide that illustrates how to use the program yet many users will overlook this section. As a first time user even by students from universities still do not want to waist their time figuring out how an online program works; it should really be self explanatory not requiring reading to find out how to use that program. Improvements to be considered for California Heritage are with its user interface design allowing its users to determine their location within the site, its hyperlinks need to be simplified and some graphic changes should be taken into account.
Navigation within the California Heritage site is a little awkward and for continued use one needs to identify where he/she is in the site. For example, when one selects the section, Container Listing, it is only stated on the very top of that section and as one scrolls down, that title moves up with the document and the person has already forgotten where he/she is in the program. Consequently, it would be beneficial to implement maybe another bar on the bottom of the screen where it permanently stays there identifying which part of the program a person is. It should also keep changing its identification simultaneously with the scrolling of a particular document. For instance, when under Container Listing the bar should identify it is under that section yet there are different groups in that section as well therefore, when the document is scrolled down from group 1 to group 2 the bar should be able to identify specifically where in the program the person is. Lisa Parks comments on this issue with the same ideas by saying that the "outline is useful, but little topical information is given. The basic structure is useful to manage the data, but it is not ideally suited for an outside user. The frames option can be helpful, but the transitions between collections and items in a collection are still rough. The organization of the information is suited to information storage, not information retrieval."
Camille LeBlanc, who is also a student from IS 246, is mainly concerned with the use of the hyperlinks in the program. She believes that in some areas of the program they are too over done and underused in others. Her example states that "the table of contents (TOC) page requires that the user hyperlink to several pages each of which has just one or two paragraphs of information. While the amount of information provided for each topic heading seems appropriate, the fact that the user has to link to several pages to read these few paragraphs of information seems unnecessary. Therefore, grouping some of these topics into fewer, total number of pages makes it less cumbersome for the user." Camille is actually implying that the TOC page should be grouped by the following manner: the Descriptive Summary and the Administrated Information should be part of the same group and Abstract, Pictorial Collection Overview and Scope and Content should be part of a second group. With this notion in mind it is understandable why it would be better to have less hyperlinks in some of the areas of the program. Through this change California Heritage will attract more users since one will not have to go through so many links to get the full information when it can be provided to them with this simplicity. Keeping in mind that users want simplicity while using a program Camille has also some thoughts on graphic changes for the program. She explains: "It would seem helpful to have a graphical snapshot of the collection in addition to a conventional table of contents. A graphical representation of the collections could enable visitors to the collection to immediately grasp the size and proportions of the collection’s contents. This could be achieved by rendering a graphical representation that gives some sort of spatial relationship between categories of artifacts within the collection. For instance, one type of artifact might be represented with a larger shape (or icon) if it is the artifact that is largest in number, and a small shape (or icon) that might be used to represent the type of artifact that is smallest in number. There are many creative ways in which the spatial representation can be achieved through the use of shapes or images to represent categories of artifacts. Regardless of what method is selected, it is likely that giving the user a sense of scope will be helpful to their search."
Perhaps during a search one of the most important aspects is to actually be able to pull up the efficient information one is interested in learning about. With California Heritage an improvement can be made in the structure of the database design by implementing some kind of a cross-referencing index where it can associate older terms with terms that are more commonly used today. This index should also include words that have similar meanings such as "buildings" and "architecture". Depending on who the audience is while a search is being conducted this will make a big difference. For example, when a more sophisticated person is conducting a search that person might choose to use the word "architecture" where as a much younger student might use the word "buildings" instead. In this situation the California Heritage Collection should not produce different results for those two searches and if it has this cross-referencing index that problem will be eliminated.
It is understandable that when the project was in its creation a simple database was introduced to keep track of over 25,000 photographs. All the project members needed to have access to this database at all times therefore, it was placed on the network and was created by Microsoft Access. With the use of Microsoft Access the projects was simplified since it kept track of the barcodes, call numbers, and associated metadata, including caption, filming and file information. In the beginning the database structure was kept as simple as possible to efficiently complete the project but now that the project is in use this structure of the database design does require a more close attention. By inserting a cross-referencing index in the structure of the database design of California Heritage, there would be an immense improvement to the system without doubt.
With this new concept Cal Heritage will not be facing any more problems while searches are being conducted or in a sense offend any of its users through its use of derogatory terms. Jeffrey Ow, another graduate student from Berkeley in agreement with this concept states the following: The sheer amount of information that has been transferred into digital format is amazing, yet some of the collections would benefit from the creation of some sort of "content filters" to assist the text searches. The exact texts of the hard-copy finding aids as well as the photograph captions were inputted, yet some sort of "smart text search engine" could be implemented also that would associate older, "archaic" terms with terms more commonly used. For example, searching for historical photographs of the Japanese American experience, the search resulted with different information with the derogatory phrase "Jap" than the phrase "Japanese". Jeffrey further explains that he is not "advocating any censorship of the original documentation, rather suggesting that an intelligent cross-referencing index e created to assist the search." In support of this new conception, Jeffrey also believes that if a search were to be conducted with the right use of the word "Japanese" Cal Heritage should be able to come up with an efficient listing that is the same as the search with the derogatory term "Jap". Through this cross-referencing index some users will not take offense in this matter since this problem will be eliminated. As time moves on social changes occur and within projects, such as Cal Heritage these social changes should not be overlooked in regards to the right kind of terms being used in recent times. For example, if some non-Japanese American was to be conducting a search to find out more about Japanese Americans and if that person conducted a search with "Jap" and got more results than using the right term, in that case this person will obviously be misinformed of the right way to identify that particular ethnic group at that point in time. It will not be fair to give out wrong information to someone who knows very little to nothing about a certain group of people.
The California Heritage Collection gives users the access necessary to discover more about the heritage of California yet through the navigation of the program it was observed that it lacks descriptions regarding photographs, the navigation itself needs simplification and some type of a cross-referencing index was necessary for the program. Under the section, Digitizing the Collection, it is stated that "it was important to the success of the Project that the image database in the prototype systems provide scholars with a body of inherently interesting material, that could serve as a rich and coherent intellectual resource of enduring scholarly interest." Yet, in this section of the program it lacks what these students they are referring to need as far as information is concerned regarding the images. California Heritage does present many great images of what the search may be about but unfortunately it needs to provide more information. Also it was discovered that it can be done through the assignment of more time to the program. What the creators need to do is to go back to the word processed sections of the images that were merged with the captions and simply add more information regarding the pictures for improvement of the system.
The user interface is also in need of redesign since it is not very simple to work with. Many students are disappointed to not know where exactly they stand in the system. In other words position is important while research is being conducted to simplify scrolling up and down all the time to figure out where the user is during a particular search in the program. Another concern was with the hyperlinks of the system. It just does not make any sense to have too many hyperlinks within a simple search of some section in a topic. To enhance this matter the grouping of the program needs to be analyzed and try to keep it as simple as possible. Finally, the use of derogatory terms should be eliminated from the program through the implementation of some kind of a cross-referencing index in the structure of the database. With this new implementation problems of misled information will not be given to the users and efficient searches will be resulted.
Merrille Proffitt, asked a question under Developer reactions to students’ observations section of the IS 246 web-site regarding the enhancement of the program. She asked the following question: "If we had more funding, should we go back and make the descriptions better and more uniform, or should we do more digitizing from the collection?" This is an extremely important question yet the only answer which seems to be the obvious one is that if there was more funding they should definitely try to go back and enhance the program. By making these suggested improvements the users will be more satisfied of dealing with a more uniform program rather than one that consists of more digitized information yet lacks what is more significant. This is completely based on the audience of the program. For instance, if the users are young students from K-12 they would have to disagree with this answer since their research does not require so much information compared to students from higher educational institutions. At this point it would be better to give more information to the users that need to complete lengthy research projects than to provide more images for young students.
California Heritage Collection