Bibliography of Moving Image Indexing

for Howard Besser's Winter 2000 Digital Collections of Still and Moving Images

Goodrum, Abby and Amanda Spink. Visual Information Seeking: A study of image queries on the World Wide Web, in Proceedings of the 1999 Meeting of the American Society for Information Science. Consists largely of an analysis of over one million Excite searches and a determined subset of about 33,000 image queries. Data on number of queries/search terms per user, search sessions, and terms used in the Excite searches is contrasted with image request data from a non-digital environment. Among other findings, the authors determined that unique terms (one that do not reappear in other image searches) are typical in the image queries they studied.

Patrick, Timothy B., Mary Ellen C. Sievert, and Mihail Popescu. Text Indexing of Images Based on Graphical Image Content, in Proceedings of the 1999 Meeting of the American Society for Information Science. Describes testing process and results for a hybrid system of image indexing, incorporating assigned index terms and retrieval by associated graphical content. The authors discuss in particular detail some of the fators they believe may have influenced their results, and suggest topics for further research.

Goodrum, Abby. Representing moving images: Implications for developers of digital video collections. Proceedings of the 1998 Meeting of the American Society for Information Science. Given the shortcomings of text-based video information retrieval methods, the author examines the congruence of both text- and image-based representations with the documents for which they stand. Moving images, keywords, document titles, keyframes, and salient stills are all evaluated for their effectiveness for at least two kinds of searches for video information. Somewhat technical, but substantiates the idea that while text is not appropriate for sole representation and access, it is still a reasonably effective preliminary access point, especially in combination with image-based search capability.

Turner, James M.1999. A typology for visual collections. Bulletin of the American Society for Information Science 25, no. 6 (August/September), 14-16. Describes some of the factors and considerations involved in creating a descriptive typology for all visual collections. Includes sources and examples of the major facets of the final typology ("The World of Visual Collections"): Personal or institutional entities involved with the collection, Users, Collection activities, Image characteristics, and Responsibility for collections. Does not include visual representation of final typology, but a poster is available from the author.

Turner, James M. 1996. Issues in shot-level indexing of moving images: what constitutes a shot? ASIS SIG VIS Newsletter 1, no. 2 (Spring). Summarizes key issues involved in compiling a shot-level index for a collection. Turner also outlines a shot-level data structuring issue from the National Film Board of Canada that he predicts will be shared by other institutions beginning to automate moving image collections: finding a middle ground between a card file and digital index.

Turner, James M. 1997. Deriving shot-level indexing from audio description texts (Association of Moving Image Archivists, 1997 Annual Conference, Bethesda MD, 1997 11 17-22). Presentation summary briefly describes the practice of audio description and its potential use as a source for shot-level indexing of moving images. Provides analysis of some preliminary work comparing effectiveness of professional indexing vs. indexes derived from audio description, plus preliminary shot-by-shot breakdown of indexed material in one case, and some conclusions for each set of results. Further research is warranted and under way, but the author deems the method promising.

Turner, James M. 1997. Explorations in using audio description as a tool for indexing moving image documents (ASIS 1997, Washington). Largely similar to preceding article by Turner ("Deriving shot-level indexing from audio description texts"), but with additional observations from exploratory work. Presents some more anomalies and considerations (such as cultural bias in descriptions, lack of audio description for considerable portions of a moving image document, etc.), but cautions that results are still largely inconclusive/exploratory and that further investigation is called for.

Turner, James M. 1997. Indexing pictures: some considerations (Annual Meeting Council on Botanical and Horticultural Libraries, Jardin botanique de Montréal, 1997 06 04). Enumerates some concerns/considerations for image indexing, including the validity of text-based indexing, use of controlled vocabularies, subject access issues peculiar to image collections, and information structure considerations. Provides lengthy references and resource lists on most points.

Turner, James M. 1996. Cross-Language Transfer of Indexing Concepts for Storage and Retrieval of Moving Images: Preliminary Results. ASIS 1996 Annual Conference Proceedings. October 19-24, 1996. An extension of previous work involving indexes derived from terms assigned to images and moving pictures by groups of viewers. Turner presents results from a study comparing the terms assigned by native French and English speakers for the same sample images, wherein the results were evaluated for equivalency. The results recorded strongly suggest that multi-lingual approaches to automated indexing may indeed be viable.

Turner, James M. 1996. Storage and retrieval of moving images: a research agenda (ASCRT, Saint Catharines, ON, 1996 05 28). Summary of recent, ongoing, and needed research, specifically in the area of shot-level indexing for moving image collections. Turner reviews his own recent findings and describes upcoming projects in indexing automation, practical applications with multiple searching modes, multilingual implementation of existing or automatically generated indexes, and online classifications for concepts and multilingual access.

O'Connor, Brian C. and Mary Keeney O'Connor. 1999. Categories, Photographs & Predicaments: Exploratory Research on Representing Pictures for Access. Bulletin of the American Society for Information Science 25, no. 6 (August/September). The authors discuss their findings based on users' vastly different assertions about/descriptions of test images, and posit that iconological interpretations may be as key to successful image retrieval as pre-iconographical or iconographical descriptors.

Boreczky, J., A. Girgensohn, G. Golovchinsky, and S. Uchihashi. An Interactive Comic Book Presentation for Exploring Video. To appear in Proceedings of CHI 2000, April 1-6, 2000, The Hague, The Netherlands. (Abstract) (PDF). The authors present examples of a summary format for video. The format is distinctive in that it uses algorithms to not only to isolate keyframes for individual shots, but also eliminates redundant keyframes and visually distinguishes important shots in the final presentation format, which may also include captions and related text information. A comparative evaluation on the effectiveness of several summary versions is presented, as well as analysis of input and output problems influencing results. (See also Uchihashi, et al., "Video Manga: Generating Semantically Meaningful Video Summaries" (1999) for comparative studies and Uchihashi, S., and J. Foote, "Summarizing Video Using a Shot Importance Measure and a Frame-Packing Algorithm" (1999) for technical information)

Smoliar, S.W. ad J. D. Baker. Storytelling, Jamming, and All That Jazz: Knowledge Creation in the World of New Media. In Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences (HICSS-32), R. Sprague, Jr., editor, 1999. (Abstract) (PDF). Draws parallels between controlled musical improvisation and the collaborative creation of knowledge. The authors suggest that the tenets behind successful jazz music-making, culture, and teaching method--technical mastery combined with flexible application--may be strongly and effectively related to knowledge management in the digital environment.

Uchihashi, S., and J. Foote. Summarizing Video Using a Shot Importance Measure and a Frame-Packing Algorithm. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (Phoenix, AZ), vol. 6, pp. 3041-3044, 1999. (Abstract) (PDF). Technical breakdown of algorithms used to create a visually significant summary of keyframes for a video document (see Boreczky, et al., "An Interactive Comic Book Presentation for Exploring Video" (1999) and Uchihashi, et al., "Video Manga: Generating Semantically Meaningful Video Summaries" (1999) for methodology and comparative studies).

Uchihashi, S., Jonathan Foote, Andreas Girgensohn, and John Boreczky. Video Manga: Generating Semantically Meaningful Video Summaries, in Proceedings of ACM MULTIMEDIA '99,1999. Describes the authors's methods of isolating keyframes from a video sequence and assigning value to them prior to incorporation in an interactive summary. Qualitative assessment of general outcomes are provided, as well as discussion on incorporating other elements (related text, etc.) into the video summary. (See also Boreczky, et al., "An Interactive Comic Book Presentation for Exploring Video" (1999) and Uchihashi, S., and J. Foote, "Summarizing Video Using a Shot Importance Measure and a Frame-Packing Algorithm" (1999) for technical information)

Boreczky, J. S. and L. D. Wilcox. A Hidden Markov Model Framework for Video Segmentation Using Audio and Image Features. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (Seattle, WA), vol. 6, 1998, pp. 3741-3744. (Abstract) (PDF). Outlines use of hidden Markov model framework to enhance typical image-based (histogram) comparisons with audio and motion information. Use of the technique produced exceptional successes in identifying camera motion within shots (pans and zooms) and transition types between shots, changes that reduce the accuracy of other video segmentation algorithms.

Lee, A., K. Schlueter, and A. Girgensohn. Sensing Activity in Video Images. In CHI 97 Extended Abstracts, ACM Press, 1997, pp. 319-320. (Abstract) (PDF). Discusses activity sensing tools in use as part of NYNEX's internal video monitoring system ("awareness tool"). Coworkers have access to information on their team members' level of activity, but they also have increased control over their personal privacy. Little to do with indexing, but presents a somewhat novel application of image activity analysis.

Smoliar, S.W., J.D. Baker, T. Nakayama, and L. Wilcox, Multimedia Search: An Authoring Perspective. Image Databases and Multi-Media Search, A. W. M. Smeulders and R. Jain, editors, 1997, pp. 3-10.  Inspired by Louis Hjelmslev's semiological conception of planes of expression and planes of content within images. Expands on Hjelmslev's ideas within the context of multimedia searching, with particular emphasis on the problems inherent in current indexing and searching solutions. Maintains that the existing searching/indexing methods cannot effectively address queries, and that any reasonable queries are themselves difficult to effectively formulate.

Smoliar, S.W. and L.D. Wilcox. Indexing the Content of Multimedia Documents. In Proceedings: VISual'97; Second International Conference on Visual Information Systems (San Diego, CA), 1997, pp. 53-60. (Abstract) (PDF). Consists largely of a review of several multimedia document retrieval systems (including Virage and WebSEEk's image searching approaches and the Musclefish audio retrieval process) and analysis of how their methods may be better applied. The idea of using the interwoven threads of context, form, and content as search criteria (methods that are central to the traditional library retrieval process) is highlighted as an equally serviceable solution to the problem of multimedia indexing and retrieval.

Minneman, S. L. and S. W. Smoliar. Representing the Content of Video: Artifact or Process? Proceedings Knowledge Representation for Interactive Multimedia Systems: Research and Experience (Budapest, Hungary, August 1996), ECAI, pp. 57-65. (Abstract) (PDF). The authors propose that a less structurally-dependent approach to moving image analysis may lead to more effective knowledge representation in multimedia. The impact of adaptation, perception, and narrative structure (or lack of it) on how video content is modeled and represented plays a part in the research agenda they propose. Attention is also drawn to the changing ideas of what kind of specific queries will be made by a more process-oriented searcher.

Smoliar, S. W. and J. D. Baker. Extended Media Research at the FX Palo Alto Laboratory. IEEE Computer Society Multimedia Newsletter, 4, 1 (August 1996), pp. 45-48. (Abstract) (PDF). Outlines research agenda for the Extended Media group at FXPAL. Concerns the group hopes to address or resolve through current and future projects include cultural (not just literal) translation for readable media documents and representations, and the authoring of truly multimedia documents which fully utilize the potential of the web to convey content and conform to the changing nature of readable media.

Besser, Howard and Rosalie Lack. Image and Metadata Distribution at Seven University Campuses: Reports from a Study of the Museum Educational Site Licensing Project, in Proceedings of the Third Annual Conference on Research and Advanced Technology for Digital Libraries, Paris, 1999.

Sable, Carl L. and Vasileios Hatzivassiloglou. Text-Based Approaches for the Categorization of Images, in Proceedings of the Third Annual Conference on Research and Advanced Technology for Digital Libraries, Paris, 1999. (Postscript file). Outlines in detail several variations on an approach to image classification, based on pieces of text associated with the sample images. Classification was narrow--images were assigned "indoor/outdoor" features only, and the authors discuss secondary features having an effect on results--but the accuracy of the results achieved with their text-based method is an improvement on some image-based methods, and even approaches that of humans performing the same task.

Vercoustre, Anne-Marie and François Paradis. Metadata for Photographs: From Digital Library to Multimedia Application, in Proceedings of the Third Annual Conference on Research and Advanced Technology for Digital Libraries, Paris, 1999. Working with a relatively small collection of images related to a specific subject, the authors produced an educational CD-ROM. Their interoperable XML and Dublin Core metadata for the images may be applied, through HTML prescriptions which embed queries to the XML metadata, toward the generation of complex virtual documents. Appendices include full XML metadata descriptions (with English and French descriptive content) for sample images.

Auffret, Gwendal and Bruno Bachimont. Audiovisual Cultural Heritage: From TV and Radio Archiving to Hypermedia Publishing, in Proceedings of the Third Annual Conference on Research and Advanced Technology for Digital Libraries, Paris, 1999. Bachimont and Auffret revisit the concept of the document in the context of audiovisual materials, pointing out the key differences between AV storage units, the AV stream, and the AV document. They also define the essential functions of the digital AV library as follows: that it should allow users to search for, browse, navigate in, annotate, and interpret AV documents in context. The metadata composed by a digital library, in the authors' opinion, recasts the AV digital library in a role more like that of a hypermedia publisher than mere repository. Structures building on existing indexing techniques and incorporating standardized language (XML/SGML/SMIL) are described in prototype form, as are possible areas of future work.

Hunter, Jane and Jan Newmarch. An Indexing, Browsing, and Search Retrieval System for Audiovisual Libraries, in Proceedings of the Third Annual Conference on Research and Advanced Technology for Digital Libraries, Paris, 1999. The authors present an overview of the application developed for the State Library of Queensland's Audiovisual unit, in which Dublin Core metadata elements are incorporated into an RDF schema. The resulting Video Metadata Generator is designed to allow for efficient, easy construction of value-added summaries of video holdings, ones which may easily be output as HTML or in other accessible multimedia formats.

Takasu, Atsuhiro, Takashi Yanase, Teruhito Kanazawa, and Jun Adachi. Music Structure Analysis and Its Application to Theme Phrase Extraction, in Proceedings of the Third Annual Conference on Research and Advanced Technology for Digital Libraries, Paris, 1999. In a manner similar to shot-level indexing of moving image documents, the division of musical pieces into phrases and themes is investigated as one approach to audio information retrieval issues. Although the reported results are encouraging, limitations of the experiment performed are numerous and daunting; for instance, only 100 Japanese pop songs were used as an sample set for analysis; other genres or types of audio material may require completely different methods of analysis. However, the work does suggest and substantiate that enough similarities exist between audio information retrieval and video/other multimedia information retrieval for similar solutions to be applied in several media.