There is a growing perception that science can progress more quickly, more innovatively, and more rigorously when researchers share data with each other. Policies and supports for data sharing within the STEM (science, technology, engineering, and mathematics) academic community are being put in place by stakeholders such as research funders, publishers, and universities, with overlapping effects. Additionally, many data sharing advocates have embraced the FAIR data principles – holding that data must be findable, accessible, interoperable, and reusable, by both humans and machines – as the standard benchmark for data sharing success. There is also an emerging scholarly literature evaluating the efficacies of some of these policies, although this literature tends to either focus on discrete disciplines or particular journal or funder initiatives.
By contrast, many scientists are not engaging in data sharing and remain skeptical of its relevance to their work. Through a series of studies on scholarly research practices at Ithaka S+R, we have found that scientists in a variety of fields, including chemists, agricultural scientists, and civil and environmental engineers, tend not to make their data widely available. This reticence stands in contrast to the fact that over 40 percent of scientists reported that analyzing pre-existing quantitative data was highly important to their research in the Ithaka S+R US Faculty Survey 2018. Barriers to sharing include the fear of being “scooped,” wariness of data being misused, or the belief that the benefits of sharing data do not outweigh the effort required to format, contextualize, and upload research data in a way that is suitable for reuse. There is growing awareness of these challenges in academic support communities, and much has been written about possible solutions, ranging in scale from domain-specific technical solutions to systemic interventions like facilitating data citation and publication.
As organizations and initiatives designed to promote STEM data sharing multiply – within, across, and outside academic institutions – there is a pressing need to decide strategically on the best ways to move forward. Central to this decision is the issue of scale. Is data sharing best assessed and supported on an international or national scale? By broad academic sector (engineering, biomedical)? By discipline? On a university-by-university basis? Or using another unit of analysis altogether? To the extent that there are existing initiatives on each of these scales, how should they relate to one another? How do we design support for data sharing in order to align as closely as possible with the practices and interests of scholars, in order to maximize buy-in?
In this issue brief, we build on our ongoing research into scholarly practices to propose a new mechanism for conceptualizing and supporting STEM research data sharing. Successful data sharing happens within data communities, formal or informal groups of scholars who share a certain type of data with each other, regardless of disciplinary boundaries. Drawing on Ithaka S+R findings and the scholarly literature, we identify what constitutes a data community and outline its most important features by studying three success stories, investigating the circumstances under which intensive data sharing is already happening. We contend that stakeholders who wish to promote data sharing – librarians, information technologists, scholarly communications professionals, and research funders, to name a few – should work to identify and support emergent data communities. These are groups of scholars for whom a relatively straightforward technological intervention, usually the establishment of a data repository, could kickstart the growth of a more active data sharing culture. We conclude by responding to some potential counterarguments to this call for bottom-up intervention and offering recommendations for ways forward.