On January 17, 2020, the US Office of Science and Technology Policy (OSTP) issued a “Request for Public Comment on Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research.” The Association of Research Libraries (ARL) welcomes this opportunity to offer recommendations in this area.
Association of Research Libraries Comments on Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting from Federally Funded Research
March 17, 2020
The Association of Research Libraries (ARL) thanks the US Office of Science and Technology Policy (OSTP) for the opportunity to submit comments on desirable characteristics of repositories for managing and sharing data resulting from federally funded research. ARL is a nonprofit membership organization of 124 research libraries in the United States and Canada whose mission is to advance research, learning, and scholarly communication.
Our member libraries, which include academic libraries along with federal and large public libraries, manage data repositories and consult with researchers on deposit into disciplinary and/or agency repositories. Librarians also work with researchers to curate data for deposit. Research data stewardship—including curation, preservation, and development of tools for reuse—involves many different stakeholders, and OSTP’s guidelines to advance our shared understanding of repository characteristics are welcome. ARL recognizes the excellent response of our colleagues in the Confederation of Open Access Repositories (COAR) and SPARC to this request for information.
Just as OSTP recommends a common set of characteristics for data repositories, knowing there will be disciplinary and domain variation, ARL asks that OSTP consider harmonization of federal policies with respect to the definition of research data for sharing, as well as support for the cost of data curation and long-term preservation.
I. Desirable Characteristics for All Data Repositories
ARL supports “Desirable Characteristics for All Data Repositories,” I-A through I-K, with the following additional recommendations and suggestions:
A. Persistent Unique Identifiers
In order to deploy persistent unique identifiers (PUIDs) as a critical piece of infrastructure for provenance and replicability, ARL recommends that repositories:
- Embed digital asset versioning in PUIDs
- Include identifiers for people, organizations, data, and funding
B. Long-term Sustainability
Research libraries seek accountability for both sustainability of the software or repository platform and the long-term sustainability of the individual assets or data sets within the repository. ARL recommends that data repositories:
- Develop long-term plans for funding and sustaining their infrastructures, and for documenting individual assets in accordance with public-data retention policies
In order to convey knowledge of data use terms, and to standardize where possible, ARL recommends that repositories include licensing and reuse terms in any metadata schema, and that OSTP:
- Direct generalist repositories that serve multiple disciplines to general purpose metadata standards, such as the DataCite Metadata Schema
D. Curation & Quality Assurance
Data curation and quality assurance are critical for discoverability, long-term sustainability, and interoperability of assets in data repositories. These activities are also resource intensive. Research libraries expect the following:
- Curation is a partnership among data creators, curators, and repository managers, and that libraries are recognized as a source of broad expertise in this area.
- With targeted federal investment in university capacity, librarians and other experts can work with data creators to improve the quality of data sets before stewardship is transferred to a data repository, especially federal repositories.
- By partnering with national groups like the Data Curation Network that provide expertise not available locally as well as set standards for levels of curation, federal agencies can leverage distributed networks of knowledge.
In order to facilitate the broadest possible access to data, data repositories should:
- Ensure that data repositories are maximally open to machines as well as people, through user-friendly interfaces and open APIs
- Document access restrictions with reference to specific legal guidelines or ethical frameworks
F. Free & Easy Access and Reuse
In order to ensure access and reuse, repositories should:
- Integrate and implement Creative Commons license terms for published data sets, and include clear disclosure of licensing terms in the metadata
In order to enhance discovery for reuse, repositories should:
- Include PUIDs, and machine-readable, standardized licenses, in citation metadata
(Nothing to add.)
In recognition that some repositories exclusively collect data that will be made openly available, we ask OSTP to:
- Clarify that “In cases where the repository is collecting sensitive data, it will provide documentation related to the safeguards in place to protect data from access breaches.”
J. Common Format
Providing access to data in a common format is dependent on the type of data that is provided to the repository. ARL recommends that:
- Transforming content that may be obsolete or content that may not have an open standard be excluded from this requirement
To further ensure clarity on provenance, ARL recommends that repositories:
- Implement versioned, machine-readable provenance tracking
Additional Characteristics Requested for All Repositories
ARL recommends that repositories:
- Clearly indicate to potential data users if a data set is subject to a retraction
M. Open Source Platforms
ARL recommends that repositories:
- Use open source tools and frameworks for repository development whenever possible
- Provide source code for the repository platform in a publicly auditable venue and preferably licensed with an open source license
II. Additional Considerations for Repositories Storing Human Data (Even if De-Identified)
A. Fidelity to Consent
- Ensure that appropriate systems are in place to confirm that data use is consistent with the original permission provided by the participants, even when the data is shared in a repository.
- For data sets with privacy concerns, a full data package will be required for consistency, including a copy of original consent forms, protocols, institutional review board (IRB) requirements, etc.
- Outline what security techniques to look for when evaluating a repository for storing human data.
III. Additional Characteristics for Sharing of Human Subjects’ Data
- Include documentation of the utility of the repository under various international privacy policies.
- Include documentation of the infrastructure in place to support the sharing of human data. Without such information, it is impossible for researchers to assess the appropriateness of the repository for their research.
Thank you for your consideration of these comments.
Mary Lee Kennedy
Association of Research Libraries
About the Association of Research Libraries
The Association of Research Libraries (ARL) is a nonprofit organization of 124 research libraries in Canada and the US whose mission is to advance research, learning, and scholarly communication. The Association fosters the open exchange of ideas and expertise, promotes equity and diversity, and pursues advocacy and public policy efforts that reflect the values of the library, scholarly, and higher education communities. ARL forges partnerships and catalyzes the collective efforts of research libraries to enable knowledge creation and to achieve enduring and barrier-free access to information. ARL is on the web at ARL.org.