{{ site.title }}

Text and Data Mining Exemption to Digital Millennium Copyright Act Would Advance Knowledge of Diverse Works

photo of two women working at a desktop computer
image CC0 by PxHere

This week, the Association of Research Libraries (ARL), as a member of the Library Copyright Alliance, joined the Authors Alliance and the American Association of University Professors in submitting a petition to the US Copyright Office that would allow researchers and the scholarly community to conduct text and data mining (TDM) on literary works and motion pictures.

TDM involves extracting data from original materials, and is used by researchers to generate new knowledge by identifying patterns and drawing comparisons across works. But presently, Section 1201 of the Digital Millennium Copyright Act (DMCA) prevents researchers from circumventing technological protection measures (TPMs) to access copyrighted works, even for noninfringing use. In effect, this prevents scholars from using TDM with “in-copyright motion pictures and literary works distributed electronically.”

Restrictions on TDM are restrictions on knowledge creation. In a letter of support for the TDM exemption to Section 1201, scholarship and copyright experts Rachael Samberg and Timothy Vollmer note “[i]n response to confusion over copyright, website terms of use, and other perceived legal roadblocks, some digital humanities researchers have gravitated to low-friction research questions and texts to avoid making decisions about rights-protected data.” When scholars are limited to conducting analysis on works that are in the public domain, research questions are skewed according to the availability of materials rather than by interest or relevance, and critical questions may go unanswered—or even unasked, as Samberg and Vollmer point out. Critically, because works written before 1925 are in the public domain due to copyright expiration on January 1, 2020, reliance on this material precludes a deeper understanding of works by and about women and people of color, most of which proliferated during the late 20th century, and throughout the 21st century.

Myriad researchers wrote letters in response to a request from the Samuelson Law, Technology & Public Policy Clinic to demonstrate how the scholarly community could benefit from an exemption to the TDM restriction. Examples include the following expansions of research projects:

  • examine “differences between canonical authors and those who have been traditionally marginalized”; examine the “range and diversity of twentieth-century literary culture”;
  • “study trends in contemporary literature across the globe, including works in English, Japanese, Chinese, German, Spanish, Russian, and Portuguese”;
  • “expand this work beyond four novels to a collection of about 25 post-1994 works with autistic characters. This would allow me to look for similarities and differences in the use of sentiment across genres and characters”; and
  • “compare collections of novels published by authors who graduated from different MFA writing programs.”

The letters of support compiled by the Samuelson Clinic are available on the Authors Alliance website.

The petitions were submitted as part of a rulemaking process that occurs every three years, which the Copyright Office uses to assess whether the prohibition on circumventing technological measures in Section 1201 prevents users from making noninfringing use of copyrighted material.

This year’s rulemaking takes place in the context of a larger conversation about the future of the DMCA. The Senate Judiciary Committee held a series of hearings on DMCA in 2020, and will issue a discussion draft on DMCA reform this month. ARL is closely monitoring potential changes to DMCA—check back here for updates.

For more information, please see this Authors Alliance blog post on 1201 rulemaking.

Affiliates