What is a dataset?

For the DOI registration at TIB, a dataset is a structured, digital collection of data that is produced during research. Part of every dataset is the describing metadata. While the nature of datasets varies across disciplines, researchers within each discipline typically agree on what constitutes a dataset for them. Below you’ll find some examples of different datasets.

Despite the differing nature of datasets, many of the services required by researchers are shared, such as methods of citation, discovery, and long-term preservation.

Publication of Data

To make primary scientific data citeable as publications, several organisational and technical pre-conditions have to be met:

  • Quality control of the primary data set by the author and by the data publishing agency,
  • Quality control of the descriptive metadata set by the author and by the data publishing agency
  • Long-term availability of the published data in online reporsitories
  • Search function for data publications in library catalogues (e.g. GetInfo)
  • Access to the primary data with assignment of a persistent identifier (e.g. DOI)

In this system, a data set would be attributed to its investigators as authors like it would be done for a work in the conventional scientific literature. Thus, scientific primary data should not exclusively be understood as part of a scientific publication, but may have its own identity.

Citation of Data

The size of the data sets used in a scientific publication often prohibits their publication as data tables and, as a result, data used as the basis of a publication are rarely published anymore. The lack of access to scientific data is an obstacle to interdisciplinary and international research.
Persistent identifiers together with their bibliographical information provide the opportunity to find and to cite primary data in scientific publications.
A citation of a data set follows the classical citation rules in scientific literature, e.g. creator(s)( publication year): data set name. publisher. persistent identifier.

As an example the dataset:

Kuhlmann, H et al. (2009): Age models, iron intensity, magnetic susceptibility records and dry bulk density of sediment cores from around the Canary Islands. doi:10.1594/PANGAEA.727522
http://dx.doi.org/10.1594/PANGAEA.727522

is supplement to the following article:

Kuhlmann, Holger; Freudenthal, Tim; Helmke, Peer; Meggers, Helge (2004): Reconstruction of paleoceanography off NW Africa during the last 40,000 years: influence of local and regional factors on sediment accumulation.
Marine Geology, 207(1–4), 209–224,
doi:10.1016/j.margeo.2004.03.017
http://dx.doi.org/10.1016/j.margeo.2004.03.017

Examples of registered datasets and other scientific content

Classical Dataset
Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127–797. Geological Institute, University of Tokyo.
doi:10.1594/PANGAEA.726855
http://dx.doi.org/10.1594/PANGAEA.726855

Earth quake event
Geofon operator (2009): GEOFON event gfz2009kciu (NW Balkan Region) GeoForschungsZentrum Potsdam(GFZ).
doi:10.1594/GFZ.GEOFON.gfz2009kciu
http://dx.doi.org/10.1594/GFZ.GEOFON.gfz2009kciu

Climate Model
Denhard, Michael (2009); dphase_mpeps: MicroPEPS LAF-Ensemble run by DWD for the MAP D-PHASE project.
World Data Center for Climate.
doi:10.1594/WDCC/dphase_mpeps
http://dx.doi.org/10.1594/WDCC/dphase_mpeps

Drilling core
SAFOD (2008): SAFOD Main Hole downhole logging data phase 2 (2005), 3387–3799m. Scientific Drilling Database.
doi:10.1594/GFZ.SDDB.1128
http://dx.doi.org/10.1594/GFZ.SDDB.1128

Map
Kraus, Stefan; del Valle, Rodolfo (2008): Geological map of Potter Peninsula (King George Island, South Shetland Islands, Antarctic Peninsula). Instituto Antártico Chileno, Punta Arenas, Chile & Instituto Antártico Argentino, Buenos Aires, Argentina.
doi:10.1594/PANGAEA.667386
http://doi.pangaea.de/10.1594/PANGAEA.667386

Conference Proceeding
L. Zipp et al (2009) In vitro evaluation of radiochemotherapy using carbon ions in glioblastoma cell lines, PTCOG 48. Meeting of the Particle Therapy Co-Operative Group. Heidelberg, Düsseldorf: German Medical Science GMS Publishing House.
doi:10.3205/09ptcog233
http://dx.doi.org/10.3205/09ptcog233

Medical Case study
Nejat Isik et al (2009) , Chiari Malformation Type I and Surgical Departments of Neurosurgery, Neurology, and Anaesthesiology and Reanimation, SB. Istanbul Göztepe Education and Research Hospital. (2009, Nov 12).
doi:10.1594/EURORAD/CASE.6634
http://dx.doi.org/10.1594/EURORAD/CASE.6634

Expert Opinions
R. Boyle (2008) Are probiotics useful for treating or preventing eczema? Department of Paediatrics, Imperial College, London, United Kingdom.
doi:10.1594/eaacinet2008/EO/9-250808
http://dx.doi.org/10.1594/eaacinet2008/EO/9-250808

Video
B. Kirchhof (2009) Silicone oil bubbles entrapped in the vitreous base during silicone oil removal, Video Journal of Vitreoretinal Surgery.
doi:10.3207/2959859860
http://dx.doi.org/10.3207/2959859860