Mark Handel
ILS 603: Visual Databases
Issues of Scale and Size in Visual Databases
Introduction Many visual databases are made up of images of actual objects, from objects from an archive, to artworks, to satellite images to medical images. In many instances, it is useful to know how large the object being imaged actually is; to be able to accurately judge the scale of the image. This paper will look at various ways this problem is (and is not handled), along with some of the issues and problems this brings up.
Uses of Scale Data First, of course, is there a need to be able to find out scale information? I think there is. In certain applications, this information is "first-order" data; without it, the image is essentially useless. For instance, in medical imaging, a tumor that is about 100 pixels wide means very different things when a pixel corresponds to .01mm, and when a pixel corresponds to .5mm. Similarly, in satellite images, knowing the scale of the image is critical to being able to use the images for mapping, planning, or intelligence. In other applications, scale information is not so critical, but still important to be able to make full use of the image. In an art setting, the size of a piece of art is important to know. When every work of art seems more or less the same size (whether because it is projected from a slide, or seen on screen), certain aspects of the impact of the work is lost. La Grande Jatte, by Seruat, is not the size of a poster suitable for framing, but actually about 20! feet long. By being able to understand the size of the work, a student is better able to understand why the work was so confrontational: not just is it an image of the lower class's leisure time, but they are presented as heroic, larger than life figures. In a different example, Van Eyck's Arnolfini Marriage seems extremely well detailed; when the size of the work is taken into account (roughly 20cm x 40cm), the work becomes obsessively detailed, and furthers the argument of it being a visual marriage contract.
Definitions and some Issues Here, I use "scale" in the map sense of the word: some way to tell the relationship between an on-screen distance, and the "real-world" distance. Size here is used to talk about the actual size of a digital image (in terms of pixel size), and resolution is primarily used to talk about the aspects of capturing and displaying the images in the digital realm. These definitions are made because the distinctions between these different aspects of image meta-data are occasionally confused. For instance, an image scanned in at 300dpi has no information about the scale -- the 300dpi is only resolution information. Unless the actual object is scanned in at 300dpi, resolution provides information about the surrogated image. Discussion of resolution (and scale) is also complicated by the resolution of the display or of the printer. Most images for on-screen display are designed with 72dpi screens in mind. However, as high-resolution graphic modes become more common, the displayed image! 's apparent resolution increases (as the image itself gets smaller.) For instance, most on-screen images are designed to be seen at about 72dpi. I'm currently working at a computer that has a screen resolution of around 100dpi, which means that many images are shown smaller than originally intended. This creates a "rear-view mirror effect," where images are larger than they appear, and again, scale of the actual object is hard to recover without additional information. One of other problems in scale is that good digitization practice tends to eliminate the basic hints of scale. That a bigger digital image would indicate a bigger actual object is not always something that can be relied on. To get the maximum detail, an object is usually captured at the highest level of magnification that can be obtained. This means that the scale of a given set of digital images is not always consistent, and cannot be compared. For instance, if a organization were to use the PhotoCD technology to digitize their image collection, each image ends up being some multiple of 2x3 in size, but widely different in scale. This particular approach is not bad -- it makes sure that each digital surrogate has the most detail possible for a given technique. However, it does eliminate any "inherent" sense of scale in a digital image, making the user look for other sources of information. In the end, all of the digital images, no matter the size of the actual object, end u! p looking more or less the same size; usually about the size of the user's screen.
Example Web Sites & Programs In looking at three sample web sites, all dealing with art historical information, each ended up handling the issue of scale differently; two by completely ignoring it. At the University of Michigan's MESL implementation, scale is shown textually in the image information. This is a pretty simple and straight forward way of doing it, stating "6'7" high, 4'9" wide" It requires the user to be able to take an image seen on a 14" screen and be able to imagine it a different size. In comparison, two other sites, the Library of Congress and the Vatican Museum completely ignore the issue of scale. The images are presented as-is, without any indication of the their actual size. Of course, in both instances, the images are presented without much meta-information on the whole, so the lack of scale information is not particularly surprising. To t! he Vatican's credit however, many of the images are of frescoes, which usually have additional visual clues, such as lintels and other parts that help the viewer judge scale.

However, there are some programs that do view scale as an important attribute. EmbARK, DCI's image collection management software, has a rather elegant display of the scale of an image. When a user calls up an image, a small drawing is also shown, showing the size of the work of art in relation to a door, a human figure, and an on-screen ruler. Through the comparison, the user is able to quickly judge how big the image is, rather than trying to figure out how big a set of text measurements actually are. EmbARK is primarily designed for the museum market, were issues of conssieurship are very important. To display all works of art at roughly the same size without any indication of their scale would a serious issue in a program designed in part to educate the public about art appreciate and history. EmbARK though is a large and complex program, and one that requires a organizational level commitment to. Other, lower-end image management programs tend to ignore issues of scale.! A program such as Shoebox, written by Kodak to complement their PhotoCD product, does not support storing information about scale in the default database. Although users can define their own fields to store scale information, Shoebox itself cannot use this information to provide feedback in a display.

File Format Support As seen, a few programs support the ability to display scale information, while the majority do not. In the case of web and other on-line access, where scale information is available, it is usually delivered through text, rather than through visual cues. In looking at the various popular file formats for image storage, it quickly becomes apparent that the seeming disregard of scale information is continued. Of the major file formats, none of them support storing scale information, and I could find only one that seemed to support scale information. Looking at the three most common file formats, neither GIF, JPEG/JFIF, nor TIFF have an explicit field or tag to save information about the scale of an image. TIFF seems to come the closest with a set of tags called ResolutionUnit, XResolution and YResolution, but these are to store resolution, not scale. It is possible that the full JPEG standard has a way to store scale information; however, the full JPEG standard is rarely implemented in encoders/decoders (for instance, the lossless JPEG mode). The only format that I could find that definitely contained scale information is a special-purpose format called FITS. FITS (Flexible Image Transport System) is designed to transport astronomical data, and has not only scale information, but also unambiguous location information fields. It is not too surprising that in a specialized field, an image format carries with it the information seen as important by the users.

How to provide this information Given that providing scale information is useful in understanding digital surrogates of visual information, how is this information best displayed to the user, and how should this information be stored? Both of these are complex questions, the former a much more difficult one than the latter. In terms of how to display the scale information, there seems to be two major classes of thought, best typified by UM's MESL project, and by EmbARK. In the MESL site, the scale information is purely textual, listed with other forms of meta-data on the image. This is perhaps the easiest way to display the information, since it's just one more textual field. It also has a precedent in existing cataloging practice; in AACR2, the scale of an object is just listed as "28 cm." with no other details. EmbARK typifies the other approach that I have found, to actually give some sort of visual feedback on the scale of the object, either to a preset standard (like a ruler), or to familiar objects (like a person.) EmbARK uses both approaches, however, this method requires more processing than just printing out a line or two of text.

In terms of storing the information, the information can either be encapsulated in the image data itself (an object-oriented approach) or stored sepately as part of other meta-data stored about the image. In the examples above, both have the MESL site and EmbARK choose to store the scale information separately from the image data. At display time, the information is re-integrated into the display for the end-user. However, there is no reason that the information cannot be stored in the image data itself. All three of the major image formats examined here contain "user-defined" data blocks (either for primarily textual information, or for true user-data) which could be used to store scale information. By defining a specific format for scale information, and then storing this information into the JPEG APP0, the TIFF UserInformation, or the GIF TextInformation fields, the scale information would always travel with the image itself. This has the advantage of being particularly backwards compatible: image viewers that do not support showing the fields will simply ignore them, image viewers that show the field uniterpreted will show the user the scale, while "smart" viewers will be able to use the information to create a display like EmbARK's. The problem with this approach is that it would break up the information on an object: some information would be stored in separate records, while some would travel with the image itself. It would also require the cataloger to enter information into two different places. (Of course, this could be eliminated by designing an image format that carried all of its cataloging information with it; much like the attempt in SGML to embed cataloging data into encoded documents.)

It seems obvious that scale information about images would be an important aspect of the image to store. However, few, if any collections of images attempt to intergrate this information into their displays. Future projects in this area would include programs that are able to quickly modify images to include scale information. Once this is accomplished, it would not be too hard to add other features, such as a "ruler" mode that would allow a user to measure a distance on an image. This would be quite useful in a educational and research setting. As part of my project, I hope to work on some prototypes that attempt to provide some of this functionality.

Bibliography Hamilton, Eric. JPEG File Interchange Format, Version 1.0.2. Milpitas, CA: C-Cube Microsystems, 1 September 1995
found here

Aldus Corporation, Microsoft Coportation. TIFF 5.0 Specification. Redmond, WA: Microsoft Corporation, 8 August 1988
found here

CompuServe Incorporated. Graphics Interchange Format, Version 89a. Columbus, OH: Compuserve Incorporated, 1990

NASA Goddard Space Flight Center. FITS (Flexible Image Transport System) Frequently Asked Questions (FAQ). NASA, 1995