Skip to Main Content

Open Science Online Course

Storing data

How should research data be stored?

Research data needs to be stored somewhere during and after the research project. The level of openness for the data may vary anywhere from strictly confidential and closed access to publicly open access. In LUT, it is recommended that metadata is always made openly accessible, even in cases where the stored data cannot be fully opened.

The nature of the data that has been gathered and handled affects the choice of storage service as well as how the data needs to be processed before storage.

Storing your data during the research

Planning the storage of your data carefully and choosing suitable storage solutions can save your resources and prevent the loss or destruction of the data. It is important to ensure responsible data management throughout the life cycle of the data.

During your research, it is important to use infrastructure that meets your requirements regarding factors such as data security, accessibility, scalability, and compliance with legal and ethical guidelines. The usability of the storage system throughout the research process is also something to be considered. For example, it is usually not allowed to store sensitive or confidential data on commercial cloud services (such as OneDrive or Google Drive) without the anonymization or pseudonymization of the data. The agreements made in your project can also limit how and where the data can be stored.

In LUT, datasets can be stored during the research project on LUT Universities' workstations, network share locations or cloud services purchased by the university. You can find more information on data handling principles, levels of data classifications and the use of data storing services in the LUT Universities Data Handling Guide (LUT Universities Intranet, requires login).

Source:

University of Eastern Finland (2024) Responsible data management and sharing. Available at https://blogs.uef.fi/ueflibrary-bors/open-research-data/responsible-data-management-and-sharing/ (Accessed: 25 Sept 2024)

Sharing data and digital preservation

Sharing the data

Data is usually opened and shared through data repositories, for example subject-specific, institutional or general repositories. Not all types of data can be fully opened, so plans for data sharing should be formulated as early as possible in the data management plan. Only the owner of the data can make decisions about the sharing of the data. In order to comply with the FAIR principles, there are several points to consider in data sharing.

1. Data Privacy and Confidentiality and Legal and Ethical Compliance

All confidential data has to be anonymized or pseudonymized beforehand, and the research participants must be informed about the plans for data sharing. Compliance with the GDPR must also be observed.

2. Data Security

It is recommended to use storage solutions that are secured and encrypted. The repository should manage access to the data stored at the repository.

3. Documentation and Metadata

Provide comprehensive metadata with the data or use data dictionaries and/or readme-files to describe the dataset, its variables, and the collection methods you used. Use a storage space that allows proper description of the data.

4. Licencing

Use commonly known licences, e.g. Creative Commons, to manage how the data can be used, shared, and distributed.

5. Data Quality and Integrity

Choose a repository that allows version control in order to validate data accuracy, completeness, and consistency.

6. Accessibility and identifiers

Share data in standard formats (e.g., CSV, JSON) through reliable repositories that offer long-term preservation and persistent identifiers for datasets.

7. Long-Term Preservation

Use repositories that ensure data backup and long-term access.

Data journals

It is also possible to share datasets in journals. Data journals are publications whose primary purpose is to share datasets instead of sharing other results of the research. Publishing in a data journal may be of interest to researchers and data producers for whom data is a primary research output. In some cases, the publication cycle may be quicker than that of traditional journals, and where there is a requirement to deposit data in an approved repository, long-term curation and access to the data is assured. (UEF Data Management Course Material - Data Sharing, CC BY)

LUT recommendations

More data repositories based on e.g. the field of science or subject can be found from the Registry of Research Data Repositories Re3Data.org. The OpenDOAR service also lists repositories that contain open datasets.

Digital preservation for research data

Digital preservation refers to the reliable preservation of digital information for several decades or even centuries to come. Hardware, software, and file formats will become outdated, while the information must be preserved. Reliable digital preservation requires active monitoring of information integrity and anticipation of various risks. Metadata, which describes, for example, the information content, provenance information, and how the content can be used, has a key role in this.

In Finland, CSC produces centralized Digital Preservation Services, operating under the Ministry of Education and Culture. Each organization, like LUT, decides which content is significant enough to be preserved in the DPS.

Considerations may include the following perspectives:

  • Significant potential for continued use of the data.
  • Resources invested in producing the data.
  • The difficulty, expense, or impossibility of replicating the research.
  • The importance of the data, either nationally or in terms of the organization's profile or expertise.

Data re-use

Re-using the data

There are many aspects to consider when contemplating the re-use of research data, both from the viewpoint of the re-user and the original data creator or owner. These aspects include, for example, data citations and the licences used to open the data.

Data citations

In using archived research data, you should always refer to the original researcher according to the same principles as referring to publications. The right to be acknowledged as creator is a significant copyright issue which is also a part of the researcher’s merit. In addition to publishing, researchers can also get credit for their research data. You should cite any data used as a primary or secondary source.

Elements included in the data citation

Below are the key elements that should be included when citing the datasets. The format of the citation depends on the citation style used.

  • Author(s), creator(s) or contributor(s)
  • Publication date
  • Title
  • Publisher: The organisation owning or hosting the data
  • Persistent identifier and location: DOI, Handle, URN etc. or URL of the dataset
  • Version or edition
  • Date accessed, when appropriate

 

Using licences

When opening a dataset it is crucial to have knowledge of the ownership and rights related to the data. Usually, these are planned beforehand and recorded in the data management plan at the beginning of the research project. You need to ensure who has ownership of the data in order to licence the data. You also need to be sure that the dataset does not contain any confidential or third-party data which would prevent opening the dataset. Licensing a dataset is essential for managing how the data can be used, shared, and distributed. Choosing the appropriate license and making the terms clear will help ensure that your dataset is used in ways that align with your goals and legal requirements.

In LUT, it is recommended to use the Creative Commons licences. Creative Commons has a licence selector that can help you choose the right licence for your data. Other possible licence systems are, for example, Open Data Commons (ODC) Licenses and the GNU General Public License (GPL), which is mainly suitable for software.

When using the Creative Commons licence system for datasets, CC BY (attribution) and CC0 (public domain dedication) are popular options. Usually, the CC0 licence is also used for metadata sharing.

Creative Commons licence selection process is based on Tarmo Toikkanen's image, 2014, CC0.

 

Sources:

University of Maryland (2024) Guide to Data Citation. Available at https://www.lib.umd.edu/guide-data-citation (Accessed: 26 Sept 2024)

University of British Columbia (2024) How to Cite - Data: Citation Elements. Available at https://guides.library.ubc.ca/c.php?g=707463&p=5035502 (Accessed: 26 Sept 2024)