Skip to Main Content

Research Data Repositories: Finding and Storing Data

This guide will help researchers find a repository to submit collected data or find raw data to analyze on a topic. Guide 2 of 3-part series.

We have three guides about data: Which one do you need?

Overview

Often, the place to find and store data are the very same. Researchers will place the data they collect into general or disciplinary repositories. While other researchers can search those repositories for data and datasets on their topic. Some repositories are costly while others are considered "open" and offer data freely for anyone to download. 

Data and Statistics Are Not Equivalent 

Although both terms are commonly used synonymously, they are, in fact, very different. Before you start searching for either, think about which one best applies to your needs. 

  • Data: are collected raw numbers or bits of information that have not been analyzed or organized. 
  • Statistics: are the product of collected data after it has been analyzed or organized that will help derive meaning from the data. 

The National Library of Medicine has a great resource full of other data-related definitions. 

What to Consider When Choosing a Data Repository

A data repository is a storage space for researchers to deposit data sets associated with their research. And if you’re an author seeking to comply with a journal or funder data sharing policy, you’ll need to identify a suitable repository for your data.

An open access data repository openly stores data in a way that allows immediate user access to anyone. There are no limitations to the repository access.

When choosing a repository for your data, keep in mind the following:

  1. There are many discipline-specific repositories that may be ideal. Talk to your librarian, journal, funder, or colleagues.
    • It is likely that your funder or journal will have specific guidelines for sharing your data
  2. Ensure the repository issues a persistent identifier (like a DOI) or you can link to your ORCID account
  3. Repository has a preservation plan in perpetuity
  4. Does the repository have a cost to store your data? There may also be a cost to access datasets.
  5. Is the repository certified or indexed?
  6. Is the repository completely open or are there restrictions to access?
  7. Consider FAIR data Principles - Data should be Findable, Accessible, Interoperable, and Re-usable

NIH guidelines for selecting a data repository

3 Ways to use Google to find Data

  1. Google has a Dataset Search! Here is a video tutorial on how to use this search tool.

  2. You can search for specific file types in Google, for example CSV files for datasets. By typing into Google filetype:csv in the search bar you are "telling" Google to only search for things that have that specific file type. For example: (poverty AND ohio) filetype:xls will result in XLS (Excel) files mentioning Poverty in Ohio.

  3. Limit search results by web domain by typing into Google: site:.gov (YOUR TOPIC HERE) . This will limit datasets, files, etc. from specific websites. You could even do .org for professional organizations.

Subjects: Interdisciplinary