Librarians Create Conditions for Researchers to Tackle Grand Challenges with Data Science

The COVID-19 pandemic, and the global scientific effort to develop treatments and vaccines, is the latest large-scale event to show the power and urgency of collaboration and data-sharing to solve society’s greatest challenges. Research libraries and librarians play a critical role in data management, education, and policy, empowering researchers to use data more effectively.

Many of the biggest challenges researchers are addressing require large data sets, aggregations of data sets and published literature, and machine access to all of that information for artificial intelligence (AI) applications. In order to advance discovery, researchers need access to data and publications, data-science tools and education, and a favorable intellectual-property environment that allows them to use potentially copyrighted material for AI analysis.

The Academic Data Science Alliance (ADSA)—a community of leaders, practitioners, educators, and librarians—came together to expand the cumulative experience of the cross-disciplinary Moore-Sloan Data Science Environments to other institutions. ADSA holds virtual events on scaling data-science capacity. Libraries and librarians are involved in data science as data curators, trainers, tool builders, and more. To meet this moment, ADSA has also amassed COVID-19 data-science resources and is crowdsourcing expansion of those resources.

a woman and a man working with code at a computer — image CC0 by NESA by Makers

Research libraries are also influential in shaping the intellectual-property environment that enables machine access to data. In January 2020, the Library Copyright Alliance (LCA) filed public comments with the US Patent and Trademark Office on “Intellectual Property Protection for Artificial Intelligence Innovation.” The LCA explained how the right of fair use in US copyright law clears the way for much of the data processing—often involving large volumes of copyrighted material—that makes machine learning possible. However, license terms imposed by website operators and database providers could interfere. A statutory solution, similar to what is found in various European Union directives, may be necessary to resolve this problem.

Text and data mining are also critical tools in the digital humanities, and require “legal literacy,” or the knowledge and confidence of finding and using sources for this work. Funded by the US National Endowment for the Humanities, a team of librarians, legal experts, and scholars are building an open educational curriculum called “Building Legal Literacies for Text Data Mining.”

Librarians are creating the conditions for researchers to solve the thorniest problems of the day by obtaining and curating data sets, teaching workshops on and developing data-science tools and methods, advising on terms of service and copyright issues, and advocating for favorable statutory protections for AI research.