3DSTRUCTURED DATA PREPARATION
wrote scripts to standardize the data, then they scanned the data using
utilities such as FGREP and EGREP (this process made it easier to handle
the data in Perl script). After standardizing all the delimiters, the
vi editor was used to correct any errors in the data. This was a very
lengthy process, but one that was only done in the first year. Perl
Scripts were then written to query the data. At this point there was
consultation with the Art History professors in order to divide the
collection into groups that were similar to those found in the museums.
The creation of HTML pages was done using Perl script: the scripts extracted
the image names and descriptions from the data files, then constructed
URLs for the actual dynamic Web page.
data set was processed using Perl scripts (they had to be customized
due to the irregularity of the data set). The database was populated
using a custom-intake program. Custom reports were then written to generate
menus and object descriptions. Both the database-populating and querying
programs were written in SQL embedded in C.
was imported into and then cleaned up in an Excel spreadsheet. Once
cleaned, the data was imported into FileMaker Pro for Web access. Also,
during distribution one (D1) the images were imported into Kodak's Shoebox.
One of the Cornell team's regrets was that they did not fully exploit
the relational features of FileMaker Pro.
first analyzed the data for delimiters, character sets, and formatting.
Then the data was imported into MS Access, at this point there still
needed to be some updates made to fully "clean-up" the data. MS Access
reports were created using embedded HTML. Some HTML formatting was added
both before and after the reports were processed. The final stage was
to upload the data to the MS SQL server. This was a fairly straightforward
process that took approximately 2-3 hours. Overall, the Illinois team
was very satisfied with the tools they used.
wrote code in Visual Basic in order to parse the text to a standard
delimited format. The second distribution was easier because they were
able to use the same parsing program. The data was then imported into
MS Access. It was exported as a flat text file, manipulated only slightly,
then uploaded to the Web server. In the spring of 1995, they implemented
MiniSQL, which provided search capabilities on the Web page.
They used Perl script (all written from scratch) throughout the entire
process to perform the following tasks: parsing the text, checking for
errors, linking the images to their records, sorting the data, and building
HTML files. In addition to Perl script, in order to build a searchable
database they utilized GET (Generic Exploration Tool, an all-purpose
interface for exploring databases and viewing multimedia resources associated
with them). Overall they were satisfied with the tools that they used.
developed Perl script, which added SGML-type tags to each field in a
record. Due to the inconsistencies in the data from the museums, the
script had to be modified for each university data set. As was the case
with other universities, there was less editing in Year 2. The text
was then uploaded to Open Text where they indexed the data based on
the MESL data dictionary. Finally they developed Perl scripts to generate
Web pages on the fly. They were extremely satisfied with Perl, but would
have preferred to use an SQL database application instead of Open Text.
Back to Chapter 3
Cost of Digital Image Distribution:
The Social and Economic Implications of
the Production, Distribution, and Usage of Image Data
Howard Besser & Robert Yamashita