Yale Study of Imaging Costs: Some Early Findings Preservation microfilming is a technically viable and cost-effective source for digital image conversion. This sentence sums up but does not begin to do justice to the rich findings of Yale University Library's Project Open Book. With many facets and several phases stretching over the past four years, Project Open Book has concentrated for the past year, with the help of funding from the National Endowment for the Humanities, on production-level digital image conversion of the printed text and accompanying materials contained in the brittle books previously preserved on microfilm. A key focus of the present phase is a complex cost study of the conversion process. Paul Conway, head of Yale's preservation department and the principal investigator on the project, prepared the following overview of frequently asked questions about the implications of the study for libraries and archives. More information is available on the World Wide Web at URL: http://www.library.yale.edu/pres/presyale.html. Key Findings Q: What are the three most important things you have learned about costs in the past year? A: First, high quality results are obtainable at a reasonable cost per volume. Second, we now have a meaningful method for examining conversion process costs at a level of detail needed to compare findings and, eventually, reduce costs. Third, I am very excited about what we have learned about the role of people in the process, especially learning curves. Q: What is the bottom line for libraries? A: The total per-volume cost of equipment and processing is less important than understanding a model for getting this figure, but I'll tell you about the dollars anyway. Table 1 summarizes the overall costs per volume and per image for the four major components of the technology system and the four major steps of the complex image conversion process. Not included in the $55.03 figure is Yale's administrative and physical overhead. Q: How confident are you of these figures? A: There is at least one assumption underlying each of these numbers and all of them will be described in the report on the project. The equipment figures are based upon Yale's actual costs for the project and are probably high. The process costs are quite solid, both statistically and intuitively. Data Analysis Q: How did you get your process data and then analyze it? A: Project staff recorded the time it took, in minutes, to complete each of ten processing steps for all 2,000 volumes converted in the past year. Corroborative data from daily work logs validated the accuracy of the volume processing data. Beyond these key numbers, staff also collected information on film and book characteristics for each volume - some 25 variables in all. I applied the usual descriptive statistics to this data along with multiple regression analysis to find the most important factors that influence process time and two-step multiple analysis of co-variance to discover the impact of the learning curve on processing time. Q: What were your goals in interpreting this mountain of data? A: As in so many areas of modern life, time equals money. The really interesting issues have to do with why processing times vary and if there is anything we can do about it. On one level, I am interested in identifying the most important characteristics of microfilm, and of the books on the film, that influence processing time and how important these factors are in the overall cost scheme. On another level, I want to be able to separate the characteristics of the input source - microfilm - from the technology and people variables that combine to give us the bottom line. Future Trends Q: Last year's ARL preservation statistics show the continuing commitment of most academic research libraries to reformat deteriorating library materials on preservation film. What does your research tell us about books on film? A: That is a really big question. In essence, the findings on film characteristics, for example reduction ratio, density, clarity, and what I call "technical rigor," have relatively little impact on conversion costs but can make or break digital image quality. The good news in this conclusion is that we can obtain or exceed quality conversion from "poor film" with only a marginal increase in overall conversion costs of "good film." More significantly, the findings suggest that significant investment in improving the quality of new film will probably not pay off in terms of reduced conversion costs. Quality Standards Q: When you speak of "obtaining or exceeding quality conversion," what is your standard for measuring quality? A: We have followed the lead established by Cornell University's pioneering research on conversion from paper. Anne Kenney and her colleagues have developed a simple but sophisticated "Quality Index" for measuring digital resolution quality. Conversion from preservation microfilm produces acceptable to outstanding quality images for printed books without illustrations or graphics that are essential to understanding the text. Half-tones present significant quality challenges when scanned in binary mode from microfilm. Gray-scale technology or special enhancement routines produce better results. Q: If technical film quality itself has relatively little influence on overall conversion costs, what role does the character of the original books on the film play in the cost equation? A: Quite a big one. Book characteristics like tight gutters, yellowed or faded paper and inks, and similar factors associated with deterioration, damage, or heavy use, tend to increase the costs of most of the processing steps. There is very little we can or should do about this fact, however, because our preservation imperative should not control our digital image selection processes. The findings will allow us to predict the incremental increases in cost required to digitize "difficult books" in comparison to "easy books." Cost Projections Q: So far you have suggested that there is very little we can do to contain or reduce conversion costs by changing the nature of books or film. Are there other areas that hold greater promise? A: Most definitely. Technology costs are declining and there is significant "folk knowledge" in the field that helps us predict the rate of decline. Another source of my optimism about costs is the tremendous importance that people have in mastering and then simplifying the process. Table 2 shows just how dramatic is the impact of training and practice on processing costs. This table compares the average processing times (and costs) of a 600-volume sample with the costs of the process for the first and last 50 volumes in the sample. The important thing to know about these findings is that they control, statistically, for all of the film and book characteristics noted in study, as well as varying sizes of volumes converted. What's left is the improvements in staff efficiency, including simplifying the conversion process itself. Q: Setting aside all the numbers and statistics, what is the key message of Project Open Book for library administrators? A: Microfilm is an excellent, but by no means universally appropriate, source for digital conversion. The findings of Project Open Book should be replicated and also placed side-by-side with similar studies of the costs of converting paper to digital images. I anticipate that the findings emerging from a companion research project at Cornell will help us sort out the "film first/scan first" debate. Library/Institutional Responsibility Q: Should libraries assume responsibility for digital image conversion? A: This is an impossible question to answer without knowing more about the level of institutional commitment to maintain access to the digital files for as long as they have use and value for scholarship and learning. I remain quite optimistic that the advantages of digital conversion will help us find a political and economic consensus on our responsibilities for digital preservation. Project Open Book has demonstrated for a single input source, microfilm, that digital conversion may be relatively affordable as a part of a comprehensive preservation and access strategy. Much work needs to be done to verify and extend the findings to other media and other contexts. Table 1 Cost of Microfilm to Digital Conversion Per Book Per Image Equipment --------- Hardware $22.86 $0.105 Software $5.20 $0.024 Integration Support $1.16 $0.006 Optical Media $2.10 $0.010 Subtotal Equipment $31.32 $0.145 Processing Costs ---------------- Inspection $1.36 $0.006 Scanning $9.77 $0.045 Indexing $7.66 $0.035 Acceptance $4.92 $0.023 Subtotal Processing $23.71 $0.109 Total Costs $55.03 $0.254 Table 2 Practice Effect for Digital Image Conversion Processes Sample Mean Least Least Square Mean Square Mean Process Total Sample First 50 Last 50 ------- ------------ ----------- ------------- Inspect 5.3 $1.36 5.2 $1.33 4.6 $ 1.17 Scan 38.1 $9.77 37.8 $9.68 21.1 $ 5.41 Index 29.9 $7.66 32.2 $8.25 16.1 $ 4.13 Accept 19.2 $4.92 18.0 $4.61 15.5 $ 3.97 Total 92.5 $23.71 93.2 $23.88 57.3 $14.68 ------- ARL 182 A Bimonthly Newsletter of Research Library Issues and Actions Association of Research Libraries October 1995