Collect, organise, and store data

Last changed: 12 August 2024

Round bottom flask, atom and DNA molecule. Illustration.

Here you will find information on how to manage data during the active stage of a project. There are also tips on how to organise and describe data, and you can read about the considerations that need to be made in terms of information security, data protection and the principle of public access.

Organising, documenting, and describing data in a systematic manner right from the start of a project saves both time and energy, and, more importantly, improves the data quality. In addition, storing the data properly and securely will improve its integrity and further maximise its value.

Read more about planning data management: Plan data management.

Collecting new data and reusing existing data

When designing and planning a research project, there are a number of things that you need to take into consideration with regard to acquiring data. Addressing these issues early on in your project can save you time and effort later.

Depending on your research question, you may be working with primary data, reusing secondary data or a combination of both. Primary data is data that is collected or generated as part of your own research project. Secondary or third-party data is data that already exists and that has been collected or generated as part of someone else’s research project or activity. When it comes to reusing already existing data, you need to make sure you have permission to do so. Make certain you address all ethical, copyright, and rights management issues (e.g., intellectual property rights) that may apply to the data you will be working with.

Go to Discover, reuse, and cite data for advice on how and where to find and reuse already existing data.

It is important to consider both legal and ethical aspects of the data you intend to use. For instance, should you, as part of your research project, be collecting data containing personal or sensitive information, you need to ensure that you comply with data protection legislation and ethical guidelines (see section ‘Information security and data protection’ and Collecting persona data’ below for more information in this respect).

Organising and structuring data

When you collect, generate, acquire and, eventually, start processing data, it can quickly become disorganised. To save time and prevent errors later on, you should decide on how you will organise and name files. Choosing a logical and consistent system allows you and others to locate, identify, and retrieve files quickly and accurately. Ideally, the best time to think about this is at the start of your project. Thus, file management is fundamental to good data management.

Guidance on how to structure files and folders can be found in DMS' Guide to file organisation (pdf).

Researchdata.se has several pages and guides about organising and structuring data:

Version control

Datasets and data files often go through a number of versions. Also, data may be processed by more than one person and over a longer period of time. To ensure the integrity of the data, and make sure everyone involved is working on the right version of the data, it is important to keep track of and document the changes that are made. Version control is a way to track changes and revisions of a dataset.

A good idea – especially when working in collaboration with others – is to document your file management system (i.e., folder structure, file and folder naming convention, file versioning, and choice of file format) in a supporting ReadMe file. It is recommended to place such a ReadMe file within the top-level folder of your project, where it can be found easily by everyone involved. Note: make sure to update the ReadMe file upon changes to the file management system.

Good version control prevents errors, increases data quality and efficiency within a project, as well as facilitates reuse of the data at a later stage.There are a number of ways that version control can be managed, e.g. using file naming, version control tables or version control systems, as described below.

File naming

Using a revision numbering system (e.g., v01 for the first version, v02 for the second, etc.) to track changes to a file (example: 2020-07-28_ProjA_DMP_v01.docx, 2020-08-28_ProjA_DMP_v02.docx) and/or using initials (e.g., John Doe, JD) to identify who has made the changes (example: 2020-08-28_ProjA_DMP_v02_JD.docx).

More detailed information about file versioning can be found in:

Version control tables

Version control tables are included within the document itself and contain information such as version, date of change, name of person who made the change, and the nature and purpose of the change.

Version control systems

Version control systems are automated systems that monitor access and log changes made to a file. Examples of version control systems are git or Subversion.

Documenting data (creating metadata)

Data that has been correctly documented and annotated will have significant ongoing value and can continue to have an impact long after your research project has been completed. Thoroughly and accurately describing and attributing the data will help you, your future you, your collaborators, and others to find, understand, validate, and reuse your data. Start describing and annotating the data early on while the required information is readily available).

Metadata

Metadata is information about the data and one of the most important aspects of research data management. Metadata explains the context behind the structure and content of the data. It is used to describe and characterise elements of the data itself and should answer questions such as:

Why was the data acquired?
Who acquired the data?
Where was the data acquired from and according to what license?
Where, when, and how was the data acquired?
What is the content of the data?
How was the data assessed and evaluated?

Metadata at different levels

Ideally, you should provide at different levels: project level, file/dataset level and data item/variable level.

Project level

Include information regarding the aim and research question of a research project. Document what data is collected, generated, and/or acquired and how it was done. Provide details on the type (e.g., observation, experiment) and nature (e.g., numerical, textual, audio, video) of the data. Explain any abbreviations used.

File or dataset level

Include content, title, creator, date, format, version, data structure, and file relations in your documentation. Describe tools and abbreviations used.

Data item or variable level

Document variables and values, field names, units of measurements, classifications as well as any code written and abbreviations used.

Where to document data

There are different ways in which you can document your data. Certain metadata about a file or data may be embedded within the data or document itself, while some may be recorded in a separate, supporting document, such as a “ReadMe” text file. In any case, make sure such metadata or files are updated if the file or data is changed at any stage of your project, including the active stages of data processing and analysis.

See the Document data at Researchdata.se for more information about documenting research data.

Metadata standards

Applying a metadata standard when describing and annotating the data makes it easier to identify the data. It also makes comparing and combining data from different research projects easier. Making use of a metadata standard allows for a more structured description and annotation of the data and, as such, enables the metadata to be machine-readable.

Many research communities have agreed upon vocabularies and models for how to describe data. Make sure to use a community metadata standard where such is in place.

More information on discipline- or community-specific metadata standards as well as links to such, are available at the web site of the Digital Curation Centre (DCC).
Standard vocabularies and ontologies can be used when describing data. Linked Open Vocabularies, Agroportal ontologies, and org are examples of where you may be able to find suitable terms including definitions.

FAIR data and metadata

Structured and standardised (and thus machine-readable) metadata is important for making your data FAIR (Findable, Accessible, Interoperable, and Reusable).

Reproducibility

In addition to creating metadata for the data itself is it also important to document and record your workflow, i.e. all the actions and steps taken while creating, processing, and analysing the data (see below).

Doing so ensures that the data can be verified, and that your research results can be validated and reproduced. If you develop computer code as part of your research activity, the code is an essential part of the research process and you should make sure it is documented, preserved, and shared.

Documenting processes and workflows

Documenting experiments, field sampling, laboratory analyses, data processing and statistical analyses during the work is a good basis for well-written metadata and is necessary to make research reproducible. There are many tools and aids that can be useful for documenting processes and workflows, for example:

Electronic laboratory notebooks (ELN), which are widely used in life sciences to record and document workflows.
Literate programming, a programming paradigm that combines source code written in a programming language with rich text elements documenting the computational steps performed. One tool for literate programming is Jupyter Notebook which can be used with e.g. Python and R.
Self-contained computing environments (i.e., docker containers), an approach in which information about the dataset, tool, software (including version and all associated libraries), and operating system used at the time of processing and analysing is captured.

Storing and backing up data

Read more in our guide: How to store and back up scientific data at SLU.

Information security and data protection

Information security is about providing and maintaining appropriate protection of information. SLU’s data management policy stresses the importance of information security and states that data producers must ensure that research and environmental monitoring and assessment data is properly protected. To know what level of protection is appropriate, an information security classification should be performed. This involves assessing the aspects of confidentiality, correctness, and accessibility related to the data, determining risks and ensuring the relevant security measures are in place.

Ensuring that only authorised people have access to read and edit data is essential for keeping it protected.

Collecting personal data

If you are going to collect personal data, you need to make sure you comply with data protection legislation. Before collecting and processing such data, make sure you read the information on the SLU data protection web page. This contains both a Quick guide to data protection in research and a more extensive Data protection manual. Questions can be sent to the SLU data protection function at dataskydd@slu.se.

See also information about anonymisation and pseudonymisation on the page Process and analyse data.

Public access to information during ongoing research

Material produced as part of research as well as environmental monitoring and assessment activities carried out at SLU is the property of SLU. As SLU is a public institution, such material becomes official records/documents to which the public is guaranteed access under the Freedom of the Press Act. This means that anyone can request access to the material, even during the active stages of a project (e.g., data collection).

Public requests may in some cases be denied. Valid reasons to restrict access may be, for instance, that data contains personal or sensitive information or information pertaining to protected species.

Research material that has the status of an official record needs to be managed according to the same principles as other official records. That means such documents have to be accessible and in good order so they can be released to the public upon request. They also have to be archived and preserved for the future.

Note that working material, such as draft documents, notes or preliminary processed material, are generally not considered official documents, as long as they are not shared outside SLU.

See the Legal Affairs unit’s web page on Official records, public access, and secrecy in this respect).
Read more about official records in research in the manual for managing research material
More information about archiving research material can be found at Archive and preserve data as well as on the Archives, Information Governance, and Records unit’s Management and preservation web page
The Public Access to Information and Secrecy Act (SFS 2009:400).
Summary of the principle of public access to official documents
The Freedom of the Press Act (in Swedish)
the Archives Act (in Swedish).

Introduction to data management

This page is part of our introduction to data management. It covers the most common aspects of data management and includes best practice strategies, training resources and tips on data management tools. It is organised according to the data lifecycle (see below), a conceptual model that illustrates the different stages of data management.

Content

Data life cycle

The data life cycle model. CC BY SLU Data Management Support. All icons in the life cycle and on the pages are made by Prosymbols from www.flaticon.com.

Contact

Data Management Support (DMS)
dms@slu.se | www.slu.se/dms

Loading…