On February 19, 2020, the US Office of Science and Technology Policy (OSTP) issued “Request for Information: Public Access to Peer-Reviewed Scholarly Publications, Data and Code Resulting from Federally Funded Research.” The Association of Research Libraries (ARL) welcomes this opportunity to suggest actions US federal agencies can take to expand access to publicly funded research.
Association of Research Libraries Response to Request for Information: Public Access to Peer-Reviewed Scholarly Publications, Data, and Code Resulting from Federally Funded Research
May 6, 2020
Thank you for the opportunity to provide input on the actions that US federal agencies can take to expand access to publicly funded research. I submit the following views on behalf of the Association of Research Libraries (ARL), a nonprofit collective of libraries in 124 leading research institutions in the United States and Canada. As collaborative partners supporting the full life cycle of scientific inquiry and creation, ARL’s mission is to create an equitable, enduring, and barrier-free research information environment to advance research and learning. Our 100 US academic member libraries alone, including many public and land-grant institutions, directly serve 3.5 million students and faculty. Since mid-March 2020, when US universities transitioned to virtual operations to ensure the safety of their communities from the COVID-19 pandemic, libraries have been even more focused on maximizing barrier-free access to digital content to support academic and research continuity.
Many recent articles and editorials, from national newspapers to prestigious scientific journals, have recognized this historic moment as an inflection point for open science practices and efficacy—including the rapid sharing and evaluation of data and research, and an emphasis on machine readability and computability to handle volume and speed. The rise of preprint submissions, widespread data sharing, and accelerated and innovative approaches to peer review are not just emergency responses to COVID-19. These actions represent a harbinger of the global scientific enterprise that citizens will expect in the future.
Research libraries are uniquely responsible for the past, the present, and the future of scholarship. They curate and steward locally produced research assets, provide computational access to digitized materials, partner in the delivery of data science and digital scholarship education, and build strong networks of inter-library collaboration. Development of these future-facing services has been constrained by the percentage of library budgets devoted to scholarly journal subscriptions and annual license fees, and the staff required to negotiate licenses and maintain access restrictions. The social and scientific cost of protecting publisher revenue through embargoes that delay access is too high. It’s time for a new, multi-stakeholder model that includes rapid dissemination and experimentation with faster and more efficient peer review, including post-publication, open peer review, and more.
In this age of innovative digital technologies, ARL libraries work with many partners—teaching and research faculty, administrators, funding agencies, and publishers—to improve the research communications ecosystem. As well as leveraging new open infrastructures, libraries are working to change existing publishing models to improve access to information. Our community is ready to partner on new business models that sustain scholarly communities and promote equitable, open access to scholarship. Libraries are committed to working collectively and collaboratively with scholarly societies and domain communities to develop actionable transition strategies to achieve immediate open access to federally funded research. We want to develop and support solutions that equally serve the interests of large research institutions, smaller institutions, independent scholars, and the public.
ARL is pleased to offer our perspectives on the four topics outlined in the request for information.
1. What current limitations exist to the effective communication of research outputs (publications, data, and code) and how might communications evolve to accelerate public access while advancing the quality of scientific research? What are the barriers to and opportunities for change?
Subscription journals still dominate the marketplace for scholarly research, and consequently, approximately 85% of the world’s scholarly output is still behind a paywall.1 Often bundled in so-called “big deals,” prices have consistently outpaced the rate of inflation and the Consumer Price Index, so that even the most highly resourced university libraries cannot keep up with journal cost increases (including for very high-impact journals) without sacrificing other areas of their collections budgets. Outside well-resourced academic institutions, most people cannot access current scientific literature, including the broad taxpayer base that collectively funded its creation. Researchers publish papers in high-impact journals (many of which are owned or published by a group of three to five commercial entities) to advance their career status, otherwise compete for recognition, and obtain grant funding to advance their research. Authors are often compelled to sign over their copyrights to these journals, which places limitations on how digital copies can be shared or used.
Opportunities for change:
- Science is a process of discovery where the insights of one study reveal and build on the discovery of the next advance. Embargoes on publicly funded research add delays (on top of lengthy review periods) to the widespread distribution of scientific articles and data, slowing down the relay process. Conversely, immediate public access to federally funded research publications and data would expand the opportunity to participate in research not only by individuals but also by machines in mining the information through AI techniques for additional insights and obscure associations to other research.
- Since 2016, and particularly in the first quarter of 2020, there has been a significant growth in preprint services and deposits, and a growing interest in the development of post-publication peer review and other overlay services. US federal agencies could accelerate these innovations by rewarding all research outputs in grant reviews, including preprints.
There are a range of critical reasons to accelerate public access to research data, including (1) reducing redundancy in the system by making data available for reuse; (2) evaluating research outputs for rigor and reproducibility within a discipline, leading to strengthened findings; and (3) expanding the potential of open data to contribute to new nonprofit and commercial innovation. Limitations on data publication and access include:
- Relegation of data to supplemental files in PDF rather than making it available in machine-accessible format
- Nonexistent or inconsistent application of persistent identifiers (PIDs) for data sets
- Variation in the capacity and requirements of data repositories
- Resource-intense curation required to make data reusable and interoperable
- Challenge of moving data from institutionally based computing environments to data repositories
- Inadequate infrastructure for making sensitive data public and lack of common metadata standards for sensitive data
Opportunities for change
In FY 2019, US federal agencies obligated an estimated $101.9 billion for extramural R&D,2 much of which goes to academic research institutions. As researchers face funding and travel restrictions due to COVID-19, data reuse will be more important than ever, and removing embargoes will have a positive effect on research across a constrained system. Similarly, delays in sharing data, code, or publications hinder accountability to the scientific community and reduce opportunities for error correction and replicability. With the amount of academic research that is funded by the federal government, a cross-agency requirement for making research outputs immediately available is also likely to accelerate the cultural adoption of open science practices across the research enterprise.
The relative ease of data reuse is dependent on good documentation, curation, and metadata, including PIDs, and the distributed landscape of digital repositories demands agreed-upon, open standards and protocols to automate workflow and interlink related scholarly works. As complex as the landscape is, it is incumbent on all stakeholders in the research enterprise to reduce the friction where we can. In noting the following opportunities, the Association commits to working within and across our institutions to implement them. ARL recommends that:
- US federal agencies require PIDs for data, people, and organizations
- US agencies provide stable funding for domain data repositories and other key elements of open research infrastructure
With near-term budget shortfalls, collaboration and shared services and infrastructure will be even more important. ARL welcomes the opportunity to continue working with agencies on standards, requirements, and their implementation and workflows.
The key limitation in scientific code associated with research data is proprietary restrictions on sharing and reuse. The Association published the freely available Code of Best Practices in Fair Use for Software Preservation in 2018. ARL recommends that federal agencies require open source software for federally funded research data, when feasible.
2. What more can Federal agencies do to make tax-payer funded research results, including peer-reviewed author manuscripts, data, and code funded by the Federal Government, freely and publicly accessible in a way that minimizes delay, maximizes access, and enhances usability? How can the Federal Government engage with other sectors to achieve these goals?
Data, publications, and code associated with a particular award are typically pieces of a larger and longer-term research agenda. Given the US government’s size and influence, federal requirements for immediate data sharing will go a long way to making that practice normative, so that the scientific community builds data sharing into training, labs, tools, and more. ARL recommends that US federal agencies:
- Reward quality over quantity in reviewing funding proposals, and include all types of research outputs (including data and code), by asking for the “top [number of] research outputs”
- Make competitive funding available for building and sustaining open infrastructure for data sharing
- Offer competitive funding to universities and libraries to strengthen the partnerships between academic institutions and agencies with respect to data curation and long-term data stewardship
Research libraries are interested in redirecting subscription dollars to support a sustainable public access environment by investing in open infrastructure and open content, particularly in partnership with scholarly communities.
3. How would American science leadership and American competitiveness benefit from immediate access to these resources? What are potential challenges and effective approaches for overcoming them? Analyses that weigh the trade-offs of different approaches and models, especially those that provide data, will be particularly helpful.
The government funds the majority of basic science (relative to industry) in the United States, and making research outputs as open as possible, as early as possible, increases the rate of innovation across all sectors of the economy. Open publications, data, and code available for replication are also more trustworthy. There is no more salient example of the benefits of open access than the preprints, rapid evaluation, and data sharing that scientists across the world are participating in right now in order to develop treatments, cures, and vaccines for COVID-19. Unprecedented speed of data sharing during emergencies is needed, but vaccines and pandemic preparation take years of sustained investment, not just emergency action. In fact, when the situation stabilizes for this pandemic, agencies, universities, national labs, and others in the scientific community will find many lessons from this experience for what worked and what was lacking in terms of data-sharing infrastructures, rigor and review, and machine accessibility of both publications and data. In the near term, recent lab closures and interruptions to degree completion among US students pose a threat to the scientific workforce if those students cannot complete their work. Immediate access to federally funded publications and especially data could be leveraged to mitigate that damage now and for the near future as student populations remain socially distanced. In less extraordinary circumstances than these, immediate access expands the potential pool of researchers and data available for training, augmenting the capacity of the US scientific community, together with the private sector, to respond to grand challenges.
There is a growing global consensus around open access for all the reasons enumerated in these comments. One principal challenge is the extent to which the commercialization (and consolidation among a few companies) of scholarly literature has become the source of sustainability for many of our scholarly and professional societies, including for their non-publishing activities. It is time for a new paradigm for scholarly publishing in which the content of scientific outputs is freely and immediately accessible, multiple stakeholders contribute to the sustainability of open infrastructure elements (such as PIDs), and publishers charge for specialized services. The Association is committed to working with the scholarly community to advance this vision. By working together, libraries and societies could articulate their distinct contributions to advancing scholarship, and envision a sustainable way to support the dissemination of scholarship along with the essential, ancillary services of promoting the discipline. The growing enthusiasm for “subscribe to open” and transformative agreements based on article processing charges or “green” deposits are demonstrations of our community’s willingness to experiment and engage.
4. Any additional information that might be considered for Federal policies related to public access to peer-reviewed author manuscripts, data, and code resulting from federally supported research.
There are still misconceptions among scholars about the extent to which (embargo-dependent) subscription revenue is the key to the functioning of the scientific peer-review process. In fact, a range of models co-exist with immediate access, including “gold” open access (with or without a transformative agreement), post-publication peer review, overlay journals, and more. Peer reviewers do not typically receive compensation from their publishers. Like authors, they contribute their time and expertise to advance scholarship and gain recognition as experts in an area. ARL supports open science initiatives that would elevate the role of peer reviewers within the academic reward system.
Thank you for your consideration of these comments.
Mary Lee Kennedy
Association of Research Libraries
- “2020 EBSCO Serials Price Projection Report,” EBSCOpost, October 2, 2019, https://www.ebsco.com/blog/article/2020-ebsco-serials-price-projection-report.
- Daniel Morgan and John F. Sargent Jr., Effects of COVID-19 on the Federal Research and Development Enterprise, Congressional Research Service, April 10, 2020, https://crsreports.congress.gov/product/pdf/R/R46309.
About the Association of Research Libraries
The Association of Research Libraries (ARL) is a nonprofit organization of 124 research libraries in Canada and the US whose mission is to advance research, learning, and scholarly communication. The Association fosters the open exchange of ideas and expertise, promotes equity and diversity, and pursues advocacy and public policy efforts that reflect the values of the library, scholarly, and higher education communities. ARL forges partnerships and catalyzes the collective efforts of research libraries to enable knowledge creation and to achieve enduring and barrier-free access to information. ARL is on the web at ARL.org.