Executive  summaryChapter1Chapter2Chapter3
Chapter4Chapter5Chapter6Chapter7Home

APPENDIX 3D—STRUCTURED DATA PREPARATION

American

Procedures: They wrote scripts to standardize the data, then they scanned the data using utilities such as FGREP and EGREP (this process made it easier to handle the data in Perl script). After standardizing all the delimiters, the vi editor was used to correct any errors in the data. This was a very lengthy process, but one that was only done in the first year. Perl Scripts were then written to query the data. At this point there was consultation with the Art History professors in order to divide the collection into groups that were similar to those found in the museums. The creation of HTML pages was done using Perl script: the scripts extracted the image names and descriptions from the data files, then constructed URLs for the actual dynamic Web page.

Columbia

Procedures: Each data set was processed using Perl scripts (they had to be customized due to the irregularity of the data set). The database was populated using a custom-intake program. Custom reports were then written to generate menus and object descriptions. Both the database-populating and querying programs were written in SQL embedded in C.

Cornell

Procedures: Data was imported into and then cleaned up in an Excel spreadsheet. Once cleaned, the data was imported into FileMaker Pro for Web access. Also, during distribution one (D1) the images were imported into Kodak's Shoebox. One of the Cornell team's regrets was that they did not fully exploit the relational features of FileMaker Pro.

Illinois

Procedures: They first analyzed the data for delimiters, character sets, and formatting. Then the data was imported into MS Access, at this point there still needed to be some updates made to fully "clean-up" the data. MS Access reports were created using embedded HTML. Some HTML formatting was added both before and after the reports were processed. The final stage was to upload the data to the MS SQL server. This was a fairly straightforward process that took approximately 2-3 hours. Overall, the Illinois team was very satisfied with the tools they used.

Maryland

Procedures: Programmers wrote code in Visual Basic in order to parse the text to a standard delimited format. The second distribution was easier because they were able to use the same parsing program. The data was then imported into MS Access. It was exported as a flat text file, manipulated only slightly, then uploaded to the Web server. In the spring of 1995, they implemented MiniSQL, which provided search capabilities on the Web page.

Michigan

Procedures: They used Perl script (all written from scratch) throughout the entire process to perform the following tasks: parsing the text, checking for errors, linking the images to their records, sorting the data, and building HTML files. In addition to Perl script, in order to build a searchable database they utilized GET (Generic Exploration Tool, an all-purpose interface for exploring databases and viewing multimedia resources associated with them). Overall they were satisfied with the tools that they used.

Virginia

Procedures: They developed Perl script, which added SGML-type tags to each field in a record. Due to the inconsistencies in the data from the museums, the script had to be modified for each university data set. As was the case with other universities, there was less editing in Year 2. The text was then uploaded to Open Text where they indexed the data based on the MESL data dictionary. Finally they developed Perl scripts to generate Web pages on the fly. They were extremely satisfied with Perl, but would have preferred to use an SQL database application instead of Open Text.

Back to Chapter 3

 


The Cost of Digital Image Distribution:
The Social and Economic Implications of
the Production, Distribution, and Usage of Image Data

By Howard Besser & Robert Yamashita
http://sunsite.berkeley.edu/Imaging/Databases/1998mellon