{{ site.title }}

The Librarians’ Code, Orphan Works, and Mass Digitization

Last Updated on April 11, 2012, 4:34 pm ET

In preparation for the Berkeley symposium on orphan works and mass digitization, I thought it might be helpful to sketch some of the ways that the Code of Best Practices in Fair Use for Academic and Research Libraries might assist libraries in devising strategies for addressing these related questions. While neither issue is treated explicitly or comprehensively in the Code, there are several principles that should come in handy.

Principle Three: Digitizing to Preserve At-Risk Items

Principle Three addresses a family of situations that is likely to include many orphan works, and where projects may be done at a scale that could be called “mass digitization.” As we explain in the Code,

Preservation is a core function of academic and research libraries. It involves not only rescuing items from physical decay, but also coping with the rapid pace of change in media formats and reading technologies. Even when libraries retain the originals of preserved items, digital surrogates can spare the original items the wear and tear that access necessarily inflicts.

It is quite likely that when works in library collections qualify for digitization as a preservation strategy, they will do so in large groups. That is, all works fixed in certain formats, or published in certain date ranges, may be known to be susceptible to a common frailty or flaw that makes circulation in digital surrogates a reasonable preservation strategy. Or they may be stored on formats sufficiently outdated or inaccessible that mass migration is a reasonable way of ensuring a reasonable level of access for library patrons. In such circumstances, mass digitization may be a reasonable preservation strategy.

It is also highly likely, given the relative obscurity of rightsholders for older materials (especially archival materials and special collections) that a library could not identify rightsholders for these materials, even with a diligent search. 

The academic and research library community has declared in Principle Three, that “It is fair use to make digital copies of collection items that are likely to deteriorate, or that exist only in difficult-to-access formats, for purposes of preservation, and to make those copies available as surrogates for fragile or otherwise inaccessible materials.” This includes mass digitization. And because the essence of fair use is use without authorization, libraries engaged in fair uses need not worry about the location or identity of rightsholders, thereby obviating the need to fret over a work’s orphan status.

Like every principle in the Code, Principle Three is subject to a series of Limitations and Enhancements. The first Limitation is directly relevant to orphaned materials, as it bars digitization of works where a “fully equivalent digital copy is commercially available at a reasonable cost.” Thus, works whose rightsholders are active in the market for new digital versions will not be susceptible to this principle. Additional Limitations dealing with circulation and off-premises access will help preserve the market for new copies in case a rightsholder should decide to resume exploitation of a preserved work. The requirement of full attribution will also help to facilitate the reunion of older works with long lost authors and publishers. Librarians also suggested, by way of “Enhancements” to the principle, that libraries consider using technological measures to further limit redistribution of digital surrogates, and that they make themselves readily available to putative rightsholders who would like to challenge their use. 

Taken together, Principle Three along with its Limitations and Enhancements describes a policy for mass digitization, including digitizing orphaned works, for purposes of preservation that should be very helpful to academic and research libraries.

Principle Four: Creating Digital Collections of Archival and Special Collections Materials

Principle Four addresses another situation where digitization at the level of entire collections may make sense (though, depending on the size of the collection, such digitization may not always be at a scale that qualifies as “mass”) and the likelihood of absent rightsholders will be quite high. The Code describes the core family of situations as follows:

Many libraries hold special collections and archives of rare or unusual text and nontext materials (published and unpublished) that do not circulate on the same terms as the general collection. The copyright status of materials in these collections is often unclear. Despite the investments that have been made in acquiring and preserving such collections, they frequently are of limited general utility because they typically can be consulted only on-site, and in some cases using only limited analog research aids. The research value of these collections typically resides not only in the individual items they contain (although such items are often unique in themselves), but also in the unique assemblage or aggregation they represent. Special collections can have a shared provenance or be organized around a key topic, era, or theme. Libraries and their patrons would benefit significantly from digitization and off-site availability of these valuable collections. 

Here the core concern is not fragility or obsolescence, per se, but the story is closely related to the preservation rationale in Principle Three in that the defining characteristics of qualifying collections include rarity and inaccessibility. Academic and research librarians expressed a consensus that fostering increased access to carefully curated collections of rare and unique items was a legitimate fair use. The likely prevalence of letters, personal photographs, and other primary materials and ephemera in these collections is specifically invoked as a fact favoring a finding of fair use; such works are likely to be orphans due to uncertain provenance and the like, but more importantly, they were typically created with no intention of market exploitation. Thus the fourth fair use factor, which weighs the effect of a proposed use on a likely or tradtitional market for the used work, should strongly favor libraries. The aggregation of such works into a digitial research corpus also presents a powerful argument for transformativeness, which can be strongly persuasive for courts as they consider the first fair use factor, and colors the rest of the fair use determination. Accordingly, Principle Four declares that “It is fair use to create digital versions of a library’s special collections and archives and to make these versions electronically accessible in appropriate contexts.”

Here, again, the Limitations and Enhancements to the principle provide a helpful roadmap for designing a policy that will carefully balance the interests of the public (especially the scholarly community) with the interests of rightsholders. The first Limitation cautions strongly against applying Principle Four to works that are commercially available, showing deference again to rightsholders who are actively exploiting their copyrights. The remaining limitations protect absent authors (and the subjects of their writings) against invasions of privacy and ensure proper attribution, which, again, can help reunite works with lost rightsholders. The first Enhancement reflects the reasoning just described, providing that a collection made up of likely orphans presents an especially strong case for fair use. Further enhancements suggest the use of technological measures to prevent unreasonable redistribution of digitized works, provision of an easy way for putative rightsholders to make their concerns known, and the utility of additional measures to add value, context, and coherence to collections. 

So, for libraries considering (mass) digitization of archives and special collections, and who may be daunted by the very likely presence of orphaned works in those collections, Principle Four together with its Enhancements and Limitations, shows a way forward.

Principle Seven: Creating Databases to Facilitate Nonconsumptive Uses, Including Search

Finally,Principle Seven may provide the most powerful justification for mass digitization of library collections, as it applies regardless of the nature of the ingested works, and it relies on settled legal principals declaring copying for purposes of the creation of search and data mining tools to be fair use. The Code describes the core covered uses in this way:

In addition to making specific collection items available to users for intensive study, librarians have always played an important role in conducting and supporting scholarship in disciplines which examine trends and changes across broad swaths of information, e.g., information science, linguistics, bibliography, and history of science. Developing indexing systems and finding aids is also a core part of the library mission. Digital technology offers new possibilities where both of these traditional functions are concerned. Libraries can offer scholars digital databases of collection items on which to perform computerized analyses, and they themselves can employ such databases to develop new and powerful reference tools. Because they do not involve ordinary reading or viewing of the processed works, these uses are often referred to as non-consumptive.

It should be obvious that such projects necessarily involve mass digitization, and that the presence of orphan works is likely when digitization is conducted at such a scale.

The fair use pedigree of such non-consumptive uses is very strong. Federal appellate courts in several circuits have found copying for non-consumptive purposes (such as helping Internet users find relevant websites and images, and helping teachers determine whether a student’s paper has plagiarized an earlier document) to be transformative fair uses. The real subjects and outputs of nonconsumptive uses are not the copyright-protected expressions in individual works, but rather the unprotected facts (the frequency with which authors of US fiction named protagonists “Adolf” before and after 1939, the species of mouse favored by cancer researchers between 1980 and 2005) that can be discovered by crawling across a massive corpus. 

Perhaps more indicative of the strength of the fair use case in this context is the fact that Google, Yahoo, and a host of other household name companies have based their billion dollar business models on the belief that fair use covers their massive-scale copying of copyrighted material on the Internet to create databases of this kind without express permission from any rightsholders. Principle Seven reflects the strong consensus in the academic and research library community that libraries, who would create such tools in the context of an explicitly non-profit, public service mission of facilitating research and increasing the general store of knowledge, would have at least as strong a claim to fair use as these private businesses. 

Accordingly, Principle Seven states that, “It is fair use for libraries to develop and to facilitate the development of digital databases of collection items to enable non-consumptive analysis across the collection for both scholarly and reference purposes.” 

Limitations and Enhancements provide essential guidance for designing such a project within the bounds of library consensus. Because the fair use argument relies very strongly on the transformative nature of nonconsumptive use, any consumptive exploitation of digitized works (i.e., uses that involve full text access to ingested works for individual study) will need a separate justification. So, the only Limitation is a strong one: it requires that access to works in a non-consumptive database be limited to what is appropriate to the non-consumptive purpose (e.g., display of “snippets” to verify validity or utility of a search result). Enhancements to the principle are designed to maximize the “value added” in the creation of a database by favoring databases that include additional data added by curators, and further favoring efforts to create collective databases that leverage even further the power of “big data” across multiple collections.

As you can see, while the Code of Best Practices in Fair Use for Academic and Research Libraries may not treat the subjects of mass digitization and orphan works explicitly, or in a single principle, the library community has articulated in the Code a series of Principles that can guide efforts to accommodate both of these phenomena within the bounds of fair use.