Skip to Main Content

Data Citation

Dataverse - Data Citation

Citations should support the finest-grained description necessary to identify the data. List data sets at the finest level of granularity provided by the repository or institution and  clearly identify the subset of the data that underlies each figure and analysis. This is analogous to using the author, title, publisher, and date of publication to refer to a book but including page numbers in the in-text reference.

Why cite data?

  • If the authors of a scientific publication properly cite the data that underlies it, it is much easier for the reader to locate that data. This in turn makes it easier for the reader to validate and build on the publication’s findings.
  • Data citations ensure that data contributors receive proper credit when their work is reused by other researchers.
  • If a dataset links back to the paper that describes its collection, a reader coming to the dataset direct can use that link to put it in context and understand the methodology used.
  • If a dataset links to other papers that make use of it, these links can be used by the contributors and data publishers to demonstrate the impact of the data. Potential reusers might use these links to discover critiques of the data or to provide inspiration for how to use them.
  • The publishing infrastructure that makes the data citable will also help to ensure they are available for reference and reuse long into the future.
  • There will be less danger of rival researchers ‘stealing’ results from those who publish their data openly, as failure to give due credit would amount to plagiarism and thus be punishable.
  • Services built around data citation will make it easier for researchers to discover relevant datasets.
  • Data citations could be used to measure the impact of both individual datasets and their contributors.
  • Researchers could gain professional recognition and rewards for published data in the same way as for more traditional publications.

From Data Citation and Linking | DCC

Elements of a citation

From Citing Data | ICPSR:

Citing data is straightforward. Each citation must include the basic elements that allow a unique dataset to be identified over time:

  • Author
  • Title
  • Distributor
  • Date
  • Version
  • Persistent identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)

DOI Citation Formatter

The Citation Formatter interface is very simple. Just fill the form including:

  • The DOI of the resource you want to cite
  • The citation style you want to use (you can use the drop down menu or type and auto-complete)
  • The language if applies

Joint Declaration of Data Citation Principles by FORCE11

  • Importance: Data should be considered legitimate, citable products of research
  • Credit and attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data
  • Evidence: whenever and wherever a claim relies upon data, the corresponding data should be cited
  • Unique identification: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community
  • Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials
  • Persistence: Unique identifiers, and metadata describing the data, and its disposition, should persist — even beyond the lifespan of the data they describe
  • Specificity and verifiability: Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying
  • Interoperability: Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities

Example citation based on the principles from Dataverse

Selected example citation guidance from repositories