In December 2019 the National Science Foundation sponsored an invitational conference on implementing effective data practices, convened by the Association of Research Libraries (ARL), the California Digital Library, the Association of American Universities (AAU), and the Association of Public and Land-grant Universities (APLU). Forty experts representing libraries, research offices, scientific communities, tool builders, and funding agencies spent 1.5 days in a workshop environment designing guidelines for institutions to implement persistent identifiers (PIDs) for data sets and machine-actionable data management plans (DMPs). The program agenda with links to presentations can be found on the ARL website.
The project will result in a set of guidelines for the broad adoption and implementation of these effective practices within research institutions. The conference coordinating committee is meeting in mid-April to draft the first set of guidelines, which will be made available for wide community input over the spring and summer, and finalization in the fall of 2020. The committee’s intent is to contribute the guidelines to the forthcoming AAU-APLU Institutional Guide to Accelerating Public Access to Research Data (fall 2020).
Five key takeaways from this conference include:
1. Center the researcher
With only 40 people in attendance at an event designed for multiple stakeholder groups, active researchers were a small fraction of attendees. However, there was a shared understanding that researchers need to see our recommended data practices as worthwhile to them, and worth their effort. For our project’s guidelines to have legitimacy, researchers need to be at the center of our thinking and of our models, which means addressing scientific merit first and administrative burden a very close second. This means that tools, education, and services need to be built around data management planning in a way that accommodates the scholarly workflow, and not the other way around. Active data management plans, conference participants agreed, would serve as a communication and collaboration vehicle for multiple units across an institution to form a more coherent research support environment.
The project’s implementation guidelines will endeavor to disambiguate researcher needs and contributions to the DMP process from the needs and responsibilities of collaborative research support services.
2. Closer integration of library and scientific communities
Like offices of research and research computing, academic and research libraries serve all disciplines within an institution. Conference participants focused on the need for greater alignment between disciplinary specialists (researchers and domain repository managers) and the library (stewardship) community.
The project’s implementation guidelines will encourage more conversation between repositories in libraries and domain repositories, particularly at the point at which data management plans are finalized, and then when stewardship responsibility transfers from the researcher to the repository and identifiers are assigned in the process of data curation.
3. Sustaining PID infrastructure means sustaining community and organizations
Persistent identifiers for people, organizations, and data (or instruments, code, and more) are essential to interlinking research across disciplines and domains. Some identifiers are domain-specific, while others (such as ORCID, which uniquely identifies people) are universally recognized as valuable across all stakeholder groups.
The project’s implementation guidelines will encourage support and/or advocacy for organizations that sustain identifier registries as essential pieces of scholarly infrastructure, as well as open licensing of metadata that enables interoperability across systems. The guidelines will also encourage institutionally developed or vended research support systems to include fields for PIDs in order to accelerate their adoption.
4. Unbundle the DMP (compliance, business process, scientific merit)
While the value proposition for active or machine-readable DMPs is well demonstrated, there was a sense at the conference that we may be overloading the DMP as currently understood with too many expectations—that it would simultaneously be a tool within the lab, among campus resource units, and with repositories and funding agencies.
Dina Paltoo, a participant and presenter at the meeting, summarized NIH’s draft data management and sharing policy, including the concept of a “just in time” DMP. Others echoed this logic of unbundling award compliance, business and planning processes within the institution, and questions of scientific merit that concern the funding review panel. In fact, some observed, the point at which the award is made to the institution might be when there is the greatest incentive to convene the multiple units affected by the DMP and finalize the internal budget allocation.
The project’s implementation guidelines will support versioned, updatable, living DMPs by encouraging multiple stages of DMP creation, sharing, iterating, and eventually integrating into the grant progress report.
5. PIDs will unlock discovery
Objects of scientific inquiry (like the human body, or natural phenomena) don’t always fit into the categories people have devised to understand them, let alone the systems we’ve established to house separate slices of data about them, explained Maryann Martone, professor emeritus of neuroscience at the University of California San Diego. PIDs are part of the infrastructure necessary to connect metadata across systems and assemble diverse data to answer new questions.
The project’s implementation guidelines will provide tangible examples of data integration across repositories through PIDs.