The Cost of Deploying Digital Images by Universities
This chapter presents the costs incurred and time commitment made by the seven university participants in the Museum Educational Site Licensing Project (MESL). It is important to note that MESL was a pilot project so, at times, expediency rather than "best practices" drove choices. For this reason the costs may not be truly representative of an ongoing delivery system. However, the figures reported in this study provide insight into resource allocation; that is, they indicate areas where the most and the least time and effort were spent. In addition, the study revealed that certain start-up costs incurred in the first year are not likely to continue in subsequent years.
After first providing background information (including the range of universities participating and the cost centers examined), this chapter looks at where effort in the MESL project was focused. It examines overall time spent both by university and by cost center, as well as overall costs. 1 Then it elaborates on each individual cost center, discussing the findings, explaining the processes involved, and discussing how efforts changed over time. Finally, the chapter summarizes how the findings are likely to impact future projects. The data used here comes from the MESL University Technical Reports, surveys administered to all participating universities asking them questions about their implementation processes (including decisions made, costs, and hours).
For the most part the Technical Reports did not specifically address infrastructure costs. 2 It was difficult to properly allocate highly shared costs such as networking, labs of workstations with high-resolution monitors, and electronic classrooms. Although we might be able to identify the share of these costs actually coming from the MESL project, this level of use was more a reflection of time period, chance, and the fact that free resources tend to see increased use in any case. The usage numbers would not truly reflect what MESL required, nor would they show what the MESL Project participants would actually be willing to pay for.
Seven universities were chosen to participate in MESL: American University 3 , Columbia University, Cornell University, the University of Illinois, theUniversity of Maryland, the University of Michigan and the University of Virginia. One goal in the selection of MESL participant institutions was to try to represent the variety of educational institutions that might be interested in making images available across a network. Three of the universities were private and four were public. The size of the universities ranged from as small as 10,000 students to as large as 50,000. There was also a wide range of previous experience with digital image projects. Of the seven universities, four had extensive experience with digital projects, and three of those with digital image projects. Columbia, Michigan, Maryland and Virginia had already established extensive electronic classrooms. However, even at these universities, at the end of the MESL Project staff felt that they did not have robust enough computers to fully support users accessing large image files, in particular in the Humanities departments (see Appendix 3AUniversity Profiles for a detailed description of each university).
The primary tasks facing the universities were to process the images and accompanying metadata received from the museums and to build a deployment mechanism. The universities also implemented security mechanisms in order to restrict access to approved users. In addition, they engaged in public relations and training to support and encourage usage of the MESL images. The university portion of the delivery of images involved ten cost centers, listed below:
At the end of the project, each university reported the time taken and the processes chosen to accomplish the tasks associated with each cost center. A variety of factors influenced the decisions made during these processes, such as computer infrastructure, prior experience with digital information, the amount of available resources, and the location of the project management team within the organization (whether it was centered in the library or in the Information Technology department).
In examining the cost centers, we will concentrate on image preparation, structured data preparation, and functionality. These cost centers represent the core costs of creating and deploying the images and their metadata. Additional data on all of the cost centers is available in Appendices 3B-3I.
Overall Time Spent
The "overall time spent" represent the total staffing hours each university spent on the MESL project(Chart 1Overall Time Spent). The number of total hours for each university declined from Year 1 to Year 2 (from an average of 2,540 to an average of 1,795a decline of roughly 30%). The larger Year-1 totals are likely to be an indication of start-up and learning costs of engaging in a new project. Our educated guess is that, given a stable environment with no new activities, Year-3 costs would be slightly lower than Year-2 costs, and that costs in subsequent years would remain fairly stable.
The total hours spent on the MESL project varied from 2,268 at Virginia to 7,199 at Maryland. We expect that an ongoing deployment system that replicated MESL would require approximately 2,500-3,000 hours per year, but in reality we think that many parts of the process would be streamlined and that actual deployment systems would be quite different from MESL. Therefore we think that most universities seeking to engage in a MESL-type project will actually experience much lower levels of time commitment.
Overall Time Spent by Cost Center
The summary of the total hours by cost center for all the universities clearly indicates that there was a large disparity between the functionality cost center and the other cost center versus all the remaining cost centers (see Chart 2Overall Time Spent by Cost Center). Functionality appears to be the largest cost center because it represented the time spent constructing a local application for the delivery of data to users on each campus. This involved not only the design of a delivery system, but also working with application tools such as search engines that act on the data.
An interesting observation is that, overall, the hours worked for database creation (image, structured and unstructured data preparation) decreased from Year 1 to Year 2 by over 35%, probably due to greater familiarity with data handling and the development of automated tools for processing the data. Also of interest is the fact that the only cost center whose hours increased considerably was the security cost center, reflecting the fact that several universities developed vastly more sophisticated security systems in the second year.
The hours spent on the "other" cost center were by far the highest with a total of 12,701 hours. This represented time spent on activities such as meetings and managerial tasks, and hours for personnel who played an advisory role. Unstructured data preparation and log files were the two cost centers with the lowest hours: 581 and 331 respectively. This is related to the fact that only five of the universities made unstructured data available and two of them only did so in Year 2. Likewise, only three universities kept logs and only two of them regularly analyzed those logs.
For most of the universities we compiled salary information for the personnel involved with the project. Using the salary information and hours per task, we calculated the personnel costs for each cost center. Although there are at times large disparities between the universities regarding costs, these figures clearly indicate which cost centers were the largest and which were the smallest. For this reason these numbers can serve as a guideline for resource allocation.
In this section, we first look at the average personnel costs for each cost center for Year 1 and Year 2 (in order to compare start-up and ongoing costs). We then look closely at two of the cost centers to illustrate the range of costs and hours reported by the universities. Each university estimated the personnel costs needed to support continued MESL use; we discuss these numbers in order to provide another measure for ongoing costs.
Table 1 presents the average personnel costs incurred by each university for each of the cost centers for Year 1 and Year 2. 4 Two important observations to be made from these figures is that the total overall costs decreased considerably from Year 1 to Year 2, and that functionality is clearly the most expensive cost center.
It is not surprising that Year 1 costs should be higher than those of Year 2. Year 1 costs represent start-up costs, which are customarily higher than ongoing costs. Year 2 costs are closer to what ongoing costs would be; however, we suggest that if we had the figures for Year 3, they would be slightly lower than those for Year 2, and would be more representative of actual ongoing costs.
The two exceptions to the reduction in costs from Year 1 to Year 2 were security and log files. The reason for higher security costs in Year 2 (see the Security section below for more details) was that after Year 1, several universities realized that IP address checking was not a flexible enough form of network protection, and they enhanced their systems to include log-in and password controls. Log files, unstructured data, and technical development numbers are inflated because not all of the universities reported costs in these three cost centers, so the one or two that had higher costs caused the averages to go up (the averages for these cost centers were based on the number of universities that reported hours not on the total number of universities).
Functionality costs, just like functionality hours, are the highest because this cost center not only required many hours of work time, but also required higher-level personnel.
Using a matrix that presents a range of costs and hours, we will examine more closely two of the cost centers. 5 It is important to look at the range of costs and hours in addition to averages because averages on their own can be misleading. This matrix illustrates that hours and costs are not always directly proportional to each other. That is, since the universities used a variety of people, from senior level programmers to graduate students, to complete the tasks associated with the cost centers, it is possible to have high costs and low hours and vice versa. For example, Table 2 shows that in Year 1, the highest personnel cost, $5,950.00, did not correspond to the highest number of work hours, 166.
Functionality is another cost center that is worth looking at more closely (see Table 3). The wide disparity evident in Year 1 is due to many variables, as explained in the Functionality section later in this chapter. The most important factor was that some universities chose to use pre-existing systems while others chose to design their systems from scratch. Also, the university with the highest personnel costs ($107,520) employed two of their senior-level programmers throughout the entire design process. At most other universities the typical combination was a senior level manager and several graduate students or lower-level programmers.
Another observation regarding functionality is that in Year 2 the lowest cost and hours is actually a little higher than those of Year 1. This can be explained by the fact that two of the universities deployed pre-existing technology in Year 1; however, in Year 2 they discovered that they in fact needed to redesign the system in order to fully meet user needs. This illustrates the fact that since technology is changing so rapidly it is not a given that utilizing a pre-existing system will necessary lead to lower costs. In order to fully meet changing user needs, systems might be periodically enhanced, redesigned, or even scrapped.
In their technical reports, each university was asked to estimate the ongoing personnel costs required to support MESL use. The cost ranged from as low as $12,600/year to as high as $47,500/year, and the time commitment ranged from 10% to 100%. On average the number of personnel was 2-3, each working at 10-20% (or the equivalent of one multi-skilled FTE at 20%-60% time).
An examination of the estimated ongoing cost figures revealed that there were two ways that the universities interpreted ongoing support. One way was user-driven; that is, they concentrated on providing support to faculty and students in their use of the MESL images. The other definition of ongoing was continuous technical development; such as, making changes and upgrades to the functionality of the system itself.
When planning a budget it is important to note that supporting a system is not identical to supporting the users of that system. Ideally ongoing support should represent a combination of these two factors. Training is truly an ongoing cost because it involves consistent support over time. Ongoing technical development, on the other hand, is a periodic and sporadic cost; it involves "tweaking" the system as needed. A final observation is that the higher range of costs was found to be at the universities that were concentrating their efforts on offering ongoing technical development. This is not surprising as the personnel needed to provide technical development are usually highly paid IT employees.
Image preparation refers to the time spent converting the images received from the museums into derivative images for the universities' deployment mechanisms. This cost center, along with structured data preparation and, optionally, unstructured data preparation, constituted the database development costs.
Most universities spent less than 100 hours each year on image preparation (Chart 3-Image Preparation). From Year 1 to Year 2, at all of the universities, with the exception of two, the number of hours worked decreased; at one, the hours remained the same and at the other the hours increased slightly. The range of hours worked in Year 1 was 80 to 166 hours, and the range for Year 2 was as low as 18 and as high as 208 hours.
At all of the universities the images were converted into at least two other sizes: a thumbnail and a medium (or screen-size image), see Appendix 3CImage Preparation for a detailed description of each university's conversion process. Michigan and Columbia made additional image sizes available on their Web sites: four at Michigan and three at Columbia.
The skills needed to perform the batch processing of the images differed only slightly from university to university. Most involved some knowledge of writing scripts to do the batch conversion and of the particular software package used (see Table 4).
Effort Change Over Time
Aside from the exceptions, most universities experienced reduction in hours worked from Year 1 to Year 2. Overall, the most important factor that accounted for this reduction was the reduction of the learning curve. In Year 1, the universities spent time writing batch file programs and deciding on the resolution and size specifications for the images. These types of issues were resolved during that first year, so further hours were not required in Year 2.
Another factor that accounted for the reduction in time was cleaner data. In Year 1, there were extensive problems with image formats, and missing and corrupt images. For example, PhotoCDs caused problems at most universities due to both quality and formatting issues. In Year 2, other file formats replaced that of PhotoCDs. In Year 1, after the universities identified the corrupted files, the museums redistributed them. The museums renamed some of the image files when redistributing them so the universities had to then update their indices with the new names.
Software was also a factor in reducing the hours worked in Year 2. Due to the rapid rate of innovation in the software industry, by Year 2 better image processing software was available. Another important software issue is vendor stability. One university invested a fair amount of time in a vendor who went out of business by Year 2. One suggested solution for future implemenations would be to use technology that conforms to a widely excepted standard rather than one that is vendor specific.
Structured and unstructured data preparation refers to the hours associated with preparing structured and unstructured data to be mounted on the campus network. Structured and unstructured data are essentially the descriptive metadata that is associated with each image. Structured data refers to the standard cataloging information that accompanies an image (e.g., title, author/creator, medium, etc.). Unstructured data is additional information about the image, such as curatorial notes, that does not fall within the formal boundaries of standard image cataloging. The structured data sets were made available at all the universities, but only four universities chose to make unstructured data available.
Most of the universities had a reduction in hours worked processing structured data from Year 1 to Year 2 (Chart 4Structured Data Preparation). The exceptions were Columbia, which had a slight increase, and Maryland where the hours remained the same. 6 The range of hours worked during Year 1 was from 69 to 416. The same wide variance is evident in Year 2, from 30 hours to 353.2 hours.
The process that the universities used to make the structured data available varied little across institutions (see Appendix3DStructured Data and 3EUnstructured Data for more details). Most of the universities wrote scripts or code to parse the text, which was then uploaded to a database that allowed queries. The unstructured data was simpler because the universities did not do any additional processing (in most case, it was simply made available as a text file hyperlinked from the structured data page). Table 5 presents the software that each university used and the skills needed to perform the tasks associated with the structured data preparation cost center.
Effort Change Over Time
For the most part, the reduction in work hours from Year 1 to Year 2 was attributed to cleaner data and to the decreased learning curve, rather than to the software itself or to differences in processing methods. Some of the data problems that all universities experienced in Year 1 were lack of standard delimiters, missing fields, character set problems, and typographical errors in image files names. By Year 2 a standard delimiter was agreed upon, and the data received from the museums was in better shape. In addition, by Year 2 the universities had the code for parsing data written, and they were experienced in working with their chosen software packages.
An important issue for MESL, which needs to be considered for any future distributions of images and metadata, was how to seamlessly integrate data from one repository into a variety of customized deployment systems. During both years of MESL the museums used the MESL data fields inconsistently. At times, the same information appeared in different fields from museum to museum. A significant amount of custom processing was needed to resolve this problem because cleaning it up required not only a syntactic, but also a semantic, understanding of the data.
Functionality refers to the development of an interface and processes for the user to access and manipulate an image and its descriptive metadata. At all of the universities except Maryland, browsing, searching, and displaying results were the extent of the functionality offered. Maryland developed a proprietary system (Borkowski and Hays, 1998) in addition to a Web-based system 7 that afforded them additional instructional tools such as the capability to sort and project slides on left and right screens in an "electronic carousel" format. The remaining six universities eventually chose the World Wide Web as the primary access mechanism for their users.
There was a wide variance in hours worked for this cost center (Chart 5Functionality). For Year 1, the two highest were American at 1,280 and Michigan at 1,792, while the lowest was 42 hours at Columbia. In Year 2, the largest number of hours worked was 920, and the lowest was 0 hours. In Year 2, three of the universities experienced an increase in hours worked.
Table 6 presents a comparison of the functionalities that each university's interface offered. Browsing capability refers to the ability to look through the various museum collections; six of the seven universities offered browsing in some form. The searching capability (fielded search, simple search, and complex Boolean search) varied across the universities. The major distinction between them was whether they offered fielded searches and whether they offered both complex Boolean searches as well as simple searches. 8 Most universities provided a view of a thumbnail image and its metadata with the option of clicking on the image to retrieve a larger size or clicking on a link to retrieve more data. A final feature offered at two universities was the ability to sort the results.
In comparing the seven delivery systems, it is evident that Michigan offered all of the functionalities examined and the greatest variety of image sizes. American and Cornell offered most of the functionalities, but less variety in image sizes.
All of the Web-based delivery systems provided searching via HTML forms that generated CGI (Common Gateway Interface) scripted calls to a back-end database or search engine. Back-end databases and search engines included products such as Filemaker Pro, Microsoft SQL Server, and Glimpse, and locally designed systems such as Full Text Lexicographer.
There were a variety of skills that were utilized in this stage of the process: from general database development to knowledge of various programming languages (C, Perl script, shell scripts, Visual Basic programming, Unix programming, CGI scripting, HTML, and MiniSQL). See Appendix 3FFunctionality for a more detailed description of the processes behind each university's delivery system.
Effort Change Over Time
University implementations of the MESL data varied dramatically. The differences resulted primarily from the fact that institutional situations-e.g. the local information delivery architecture, encoding and searching systems, and staff expertise-had a major influence on the choices that were made at each site. In addition, a few of the project staff at MESL sites had been involved with digital imaging projects and drew on these experiences when making interface design and other related decisions. The degree of institutional support for MESL implementation-manpower, equipment, classroom facilities, and available expertise-constituted another significant variable from one university to another.
For the most part, the MESL project staff at American had little or no prior experience with digital image project, so we can speculate that lack of expertise was the reason for the high number of hours worked in both Year 1 and Year 2. However, due to rapidly changing technology it is not always possible to capitalize on earlier system designs. For example three of the universities initially decided to adopt functionality from existing systems, but two of them discovered that the older systems did not meet their needs and they ended up redesigning them in Year 2. Likewise, two of the universities with prior experience in digital projects choose to design the MESL interface completely from scratch, therefore resulting in a high number of hours worked.
Security costs reflect costs required to control access to the MESL images. At three of the universities the hours worked increased from Year 1 to Year 2 (see Appendix 3GSecurity for more details). In the first year all the universities used access control via IP address checking. In most cases, IP control was a standard feature of the server software that the universities were already using, and for this reason it required minimal additional time or costs to implement. However, by Year 2 several of the universities were not satisfied with the inflexibility of this type of control system (IP addresses are based on workstations and not users, so they can be both overly restrictive and insufficiently secure). In Year 2, these universities experimented with more robust security such as login/password systems.
Effort Change Over Time
Going against the trend, overall hours worked in the security cost center actually increased in Year 2 (over 150 hours for those universities that upgraded their systems). As explained earlier, in the first year all the universities used IP address control, but by Year 2 several of the universities experimented with more robust security systems, which resulted in an increase in hours worked in Year 2.
The last four cost centers (log files, outreach, user training, and technical development) can be grouped under an End User category since they involved tasks performed both toinform end-users about the existence of the MESL images and to support the users. 9 These four cost centers represent ongoing costs. There will always be new tools that should be developed, updates that need to be made in order to keep up with user needs, and users who need to be trained.
Log files were used to track access for security reasons as discussed above, and to track usage of the MESL data. Usage tracking is important because knowing who the users are and how they are using the images helps to ensure that their needs are met. Most server software includes the capability to maintain log files; however, additional time is needed to fully analyze the logs. Of the three universities that reported hours for the log file cost center, two spent under 50 hours total for both years, and the third spendt 120 hours each year. 10
The other three cost centers (user training, technical development, and outreach) represent significant start-up and ongoing costs. User training includes the time spent on activities such as classroom setup and access, and educating instructors and students on usage. At most of the universities time spent was under 100 hours each year.
Technical development encompasses the time spent on providing technical support to the MESL users. Some of the activities included the development of MESL-related tools, designing and developing MESL-inspired course-related Web sites, customizing the MESL data for faculty, and producing templates for use by faculty and teaching assistants. At the five universities that reported technical development time, on average the hours were considerably higher than those for usage training. At four of the universities the hours worked ranged from 200 to 400 hours.
The outreach cost center includes any publicity or educational activities that the university might have conducted in order to encourage the use of MESL images. Activities included publishing newsletters and articles, giving demonstrations, and holding open-house sessions. In one case (Virginia), an electronic mailing list was established. A few of the universities held both on- and off-campus presentations (Maryland and Illinois). The average time was under 125 hours each year.
Effort Change Over Time
Unlike the hours for the other cost centers, the hours worked for these four cost centers on average stayed the same or decreased only slightly from Year 1 to Year 2. This is not surprising since end-user support represents both start-up and ongoing costs. Of the four cost centers, technical development and user training are the two that are not optional. They represent support of users through the development of tools for manipulating data and training to use these tools. On the other hand, each institution will have to decide the level of outreach that they would like to support and to what extent they want to maintain and analyze log files.
Impact on Future Projects
It is important to keep in mind that this study has clear limitations. For example, the figures do not include the costs of a licensing agreement to obtain the images or equipment that will be needed to browse and project the images. However there are certain conclusions that can be drawn from this data that will be helpful for those considering similar projects.
In this final section, we examine how the cost centers will shift depending on the distribution model that is adopted. Next we present three characteristics of an ideal image library. These characteristics serve as criteria by which to evaluate digital image distribution systems. Finally we briefly discuss the issue of the cost effectiveness of increased use of technology in universities.
Even when the distribution model changes, most cost centers will not simply go away; rather, the cost centers will shift to other organizations and some may be reduced or increased. For example, the MESL model was a multi-delivery system in which each university developed its own deployment system. If that model is adopted, the cost centers will remain the same, with each university responsible for creation, deployment, security and end-user support. Functionality, which encompasses design and development of a deployment system, will remain the largest cost. After that, the database creation cost centers (image, structured data and unstructured data preparation) will be the next largest, and then the ongoing support of user training and technical development cost centers will follow.
On the other hand, the Art Museum Directors Association's AMICO project has proposed a central distribution model. In that model, database creation (image, structured and unstructured data preparation cost centers) and the creation of a deployment system (functionality cost center) will clearly reside with the central distribution institution. Also, the creation of a security system will most likely be the responsibility of the central distribution institution. However, the universities will have the responsibility of supplying the names of eligible users or distributing passwords, depending on the security system adopted. Finally the end-user cost centers (log files, user training, technical development and outreach) will likely be shared between the universities and the central distribution institution. For example, training materials will most likely be developed by the central distribution institution, but the actual training and "hand-holding" will be conducted locally. Likewise, the end users will communicate their needs regarding improvements or additional tools to the central distribution institution, which in turn will make the changes and updates.
Whichever distribution model is chosen, it is important that it meet the needs of its users. There are many ways to judge the effectiveness of a system; in this case, we will look at the three criteria of a great image library as defined by Helene Roberts (Roberts, 1985). The first criterion is that the collection should be comprehensive. The collection should not only contain the "great" works of art, but it should have works by lesser-known artists who may have influenced the better-known artists. In addition, a collection should ideally contain popular images in order to place works of art in their contexts. One of the main criticisms from the university faculty regarding the MESL project was the number of images available was insufficient. Clearly some critical mass of images must be reached in order to meet the teaching and research needs of university faculty and students.
The second characteristic of a great image library according to Roberts is that it should allow for integration with the local art library. This also was an issue raised during MESL. At some of the universities, faculty members were able to supplement MESL images with locally digitized images to meet user needs. Clearly in the central distribution model the redundancies of the multi-delivery system are removed, but the ability to integrate the system with local resources becomes problematic.
The final characteristic cited by Roberts is the existence of a robust indexing system that allows for alternative methods of access. As the number of system users expands it will become increasingly important that the system accommodate non-specialists. That is, the system needs to allow for searching and browsing of the images in such a way as to accommodate users who are unfamiliar with art history terminology. For example, at one MESL university a faculty member from the Communications Department was able to integrate the images into her curriculum. Clearly her class accessed and used the images in a different manner than an art history class would have.
In a recent article, Dr. A. W. Bates addresses the economics of increased use of technology on university campuses (Bates, 1997). Bates states that increased use of technology most likely will not reduce the cost of education, however it can improve the cost-effectiveness of education. Clearly providing networked access to images and their descriptive metadata will improve students' access to these works. In addition, providing students with the opportunity to work closely with new technologies will help students to increase their computer literacy. These are factors that cannot be measured quantitatively, but nonetheless are important benefits.
1. In this study we will only look briefly at the personnel costs because they tend to be misleading. There is little salary consistency across universities due to factors common to university experimental projects. Some universities had students perform job functions at vastly reduced salaries from similar positions at other universities; in many cases tasks more appropriate for a different salary level were assigned to a staff member solely because they were a member of a project team (such as librarians performing computer programming). For this reason, instead of personnel costs we will concentrate on the analyzing the data by hours worked on each task. Back to text
2. One question asked was about acquiring additional storage space. Three of the universities reported that they did invest in additional disk space to accommodate the requirements of MESL. The average costs were $3,000 for 8 gigabytes of hard disk space. Back to text
3. For further information about American University's MESL experience (see Albrecht, 1998). Back to text
4. It is important to note that these figures have not incorporated the other category, which included the staff time that was above and beyond the time reported for the cost centers. Since specific questions about the other category were not asked in the technical report, we would merely be speculating as to how exactly that time was spent. For this reason we have not included it here. However it is important to keep in mind that on a MESL-like project there would most certainly be administrative costs such as meeting time, consultant's hours, etc. that should be taken into consideration. Back to text
5.See Appendix 3BRange of Costs and Hours by Cost Center for all the cost centers. Back to text
6. It is important to note that Maryland had the most hours worked both for this cost center and others because it supported two delivery systems (a web-based and a proprietary system). Back to text
7. Examples from Maryland cited herein were gathered from their Web implementation which was never intended as the primary means of access for Maryland users. Consequently they are not indicative of the access that most Maryland users experienced via their campus network system. Back to text
8. Utilizing fielded searches can increase the accuracy of the search results. For example if a search is done for "Monet" in a system that does not offer fielded searches, the search engine will search through all of the indexed descriptive metadata. Due to truncation features the result set can include images that contain the word "monetary" in their metadata. Back to text
10. At this university, for both years they had a person dedicated to regularly analyzing the log files and generating reports. Back to text
Cost of Digital Image Distribution:
Howard Besser & Robert Yamashita