Subject Guides: Research Data Management: Data Repositories

Data Repositories

Depositing your data in an archive or repository helps ensure its long-term preservation and storage. UNTHSC has the repository DataSafe@UNT Health which is available to host campus data. Some journals provide data storage for small files, while others have pre-determined agreements with subject-specific general science public data repositories. In some instances, researchers can meet data sharing requirements by making their data available to interested parties upon request.

Selecting a Data Repository

Not all repositories are created equal. When selecting a data repository, you will want to think about the trustworthiness of the repository before depositing your data. The NIH has created a list of desirable characteristics for all data repositories, which you can read about more below:

Unique Persistent Identifiers: Assigns datasets a citable, unique persistent identifier, such as a digital object identifier (DOI) or accession number, to support data discovery, reporting, and research assessment. The identifier points to a persistent landing page that remains accessible even if the dataset is de-accessioned or no longer available.
Long-Term Sustainability: Has a plan for long-term management of data, including maintaining integrity, authenticity, and availability of datasets; building on a stable technical infrastructure and funding plans; and having contingency plans to ensure data are available and maintained during and after unforeseen events.
Metadata: Ensures datasets are accompanied by metadata to enable discovery, reuse, and citation of datasets, using schema that are appropriate to, and ideally widely used across, the community(ies) the repository serves. Domain-specific repositories would generally have more detailed metadata than generalist repositories.
Curation and Quality Assurance: Provides, or has a mechanism for others to provide, expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata.
Free and Easy Access: Provides broad, equitable, and maximally open access to datasets and their metadata free of charge in a timely manner after submission, consistent with legal and ethical limits required to maintain privacy and confidentiality, Tribal sovereignty, and protection of other sensitive data.
Broad and Measured Reuse: Makes datasets and their metadata available with broadest possible terms of reuse; and provides the ability to measure attribution, citation, and reuse of data (i.e., through assignment of adequate metadata and unique PIDs).
Clear Use Guidance: Provides accompanying documentation describing terms of dataset access and use (e.g., particular licenses, need for approval by a data use committee).
Security and Integrity: Has documented measures in place to meet generally accepted criteria for preventing unauthorized access to, modification of, or release of data, with levels of security that are appropriate to the sensitivity of data.
Confidentiality: Has documented capabilities for ensuring that administrative, technical, and physical safeguards are employed to comply with applicable confidentiality, risk management, and continuous monitoring requirements for sensitive data.
Common Format: Allows datasets and metadata downloaded, accessed, or exported from the repository to be in widely used, preferably non-proprietary, formats consistent with those used in the community(ies) the repository serves.
Provenance: Has mechanisms in place to record the origin, chain of custody, and any modifications to submitted datasets and metadata.
Retention Policy: Provides documentation on policies for data retention within the repository.

Reminder

It is important to carefully review user agreements and FAQ pages before you deposit your data into any online repository. Lewis Library can help you review data repositories. Contact us or your Library Liaison to schedule an appointment.