Report on integration of data and publications

22 Nov 2011

Technology can reduce the latency to joining data to publications but there is a lack of common best practice conventions for scholarly publications.

From a researcher perspective, the value of data is that of a first class research object which represents the basis of their research. Researchers discover and use data and analyses from others to formulate new and testable hypothesis before extending the evidence base with empirical data. The implications of first class research objects are that they require preservation, recognition, validation, curation and dissemination which then improve their availability, findability, interpretability and re-usability.

Researchers perceive and enforce their creator right over the data, choose when and with whom they share it and wish to maintain this control. This need for control is based on perceived legal barriers and misuse, or absence of a trust network common in other forms of scholarly communication; it may be a mixture of both. Researchers want somewhere safe to put their data while maintaining control in order to avoid legal redress and professional misuse, but expect some central organisational structure to pay for these infrastructures. They recognise that many lack sufficient skills to manage their data appropriately, but, importantly, are enthusiastic to change this situation. Researchers see the benefit in joining publications with data in a more formal and agreed convention, but there must be a recognition and credit mechanism for this. They accept this joining as good professional practice and agree that data supporting traditional publication should be available with the publication. Technology can reduce the latency to joining data to publications but there is a lack of common best practice conventions for scholarly publications. Distilled into statements, our desk research has revealed five abstract researcher requirements for integrating data and publication.

1. Researchers need somewhere to put data and make it safe for reuse
2. Researchers need to control its sharing and access
3. Researchers need the ability to integrate data and publication
4. Researchers need to get credit for data as a first class research object
5. Researchers need someone to pay for the costs of data availability and re-use

This report sets out to identify examples of integration between datasets and publications. Findings from existing studies carried out by PARSE.Insight, RIN, SURF and various recent publications are synthesized and examined in relation to three distinct disciplinary groups in order to identify opportunities in the integration of data. These groups are Researchers, Publishers and Libraries/Data centres. Opportunities identified for each group have been scoped against seven criteria:

1. Availability
2. Findability
3. Interpretability
4. Reusability
5. Citability
6. Curation
7. Preservation

Opportunities to improve the linking of data and publications have been identified for each stakeholder group and mapped against each of the criteria in tables at the end of this summary.

Based on an examination of the available research and literature, incentives and barriers relating to data exchange are identified for each disciplinary group. The content of a draft of this report formed the basis of a workshop in June 2011 with professionals from research libraries. The workshop served to validate this opportunities and issues identified in this report.

