2.2 BIDS Structure

BIDS stands for the Brain Imaging Data Structure, which is a simple and intuitive way to organize and describe your data. The whole specification can be found on the official website. Here are some excerpts from the official website.

Before we go into details, let’s clarify several important terms.

Dataset - a set of neuroimaging and behavioral data acquired for the purpose of a particular study.
Subject - a person or animal participating in the study. Used interchangeably with the term Participant.
Session - a logical grouping of neuroimaging and behavioral data consistent across subjects. Generally, subjects remain in the scanner for one session.
Sample - a sample about a subject such as tissue, primary cell, or cell-free sample. The sample-<label> key/value pair is used to distinguish between different samples from the same subject.
Data type - a functional group of different types of data. BIDS defines the following data types:
1. func (task based and resting state functional MRI)
2. dwi (diffusion weighted imaging)
3. fmap (field inhomogeneity mapping data such as field maps)
4. anat (structural imaging such as T1, T2, PD, and so on)
5. perf (perfusion)
6. meg (magnetoencephalography)
7. eeg (electroencephalography)
8. ieeg (intracranial electroencephalography)
9. beh (behavioral)
10. pet (positron emission tomography)
11. micr (microscopy) Data files are contained in a directory named for the data type. In raw datasets, the data type directory is nested inside subject and (optionally) session directories.
Task - a set of structured activities performed by the participant. Tasks are usually accompanied by stimuli and responses, and can greatly vary in complexity.
Event - something that happens or may be perceived by a test subject as happening at a particular instant during the recording. Events are most commonly associated with on- or offset of stimulus presentations, or with the distinct marker of on- or offset of a subject’s response or motor action.
Run - an uninterrupted repetition of data acquisition that has the same acquisition parameters and task. Run is a synonym of a data acquisition.
<index> - a nonnegative integer, possibly prefixed with arbitrary number of 0s for consistent indentation, for example, it is 01 in run-01 following run-<index> specification.
<label> - an alphanumeric value, possibly prefixed with arbitrary number of 0s for consistent indentation, for example, it is rest in task-rest following task-<label> specification. Note that labels MUST not collide when casing is ignored (see Case collision intolerance).
suffix - an alphanumeric value, located after the key-value_ pairs (thus after the final _), right before the File extension, for example, it is eeg in sub-05_task-matchingpennies_eeg.vhdr.

File name structure

A filename consists of a chain of entities, or key-value pairs, a suffix and an extension. Two prominent examples of entities are subject and session.

For a data file that was collected in a given session from a given subject, the filename MUST begin with the string sub-<label>_ses-<label>. If the session level is omitted in the folder structure, the filename MUST begin with the string sub-<label>, without ses-<label>.

Note that sub-<label> corresponds to the subject entity because it has the sub- “key” and<label> “value”, where <label> would in a real data file correspond to a unique identifier of that subject, such as 01. The same holds for the session entity with its ses- key and its <label> value.

The extra session layer (at least one /ses-<label> subfolder) SHOULD be added for all subjects if at least one subject in the dataset has more than one session. If a /ses-<label> subfolder is included as part of the directory hierarchy, then the same ses-<label> key/value pair MUST also be included as part of the filenames themselves. Acquisition time of session can be defined in the sessions file.

A chain of entities, followed by a suffix, connected by underscores (_) produces a human readable filename, such as sub-01_task-rest_eeg.edf.

Entities within a filename MUST be unique. For example, the following filename is not valid because it uses the acq entity twice: sub-01_acq-laser_acq-uneven_electrodes.tsv

A summary of all entities in BIDS and the order in which they MUST be specified is available in the entity table in the official website appendix.

Entity-linked file collections

An entity-linked file collection is a set of files that are related to each other based on a repetitive acquisition of sequential data by changing acquisition parameters one (or multiple) at a time or by being inherent components of the same data. Entity-linked collections are identified by a common suffix, indicating the group of files that should be considered a logical unit. Within each collection, files MUST be distinguished from each other by at least one entity (for example, echo) that corresponds to an altered acquisition parameter (EchoTime) or that defines a component relationship (for example, part).

Source vs. raw vs. derived data

BIDS was originally designed to describe and apply consistent naming conventions to raw (unprocessed or minimally processed due to file format conversion) data. During analysis such data will be transformed and partial as well as final results will be saved. Derivatives of the raw data (other than products of DICOM to NIfTI conversion) MUST be kept separate from the raw data. This way one can protect the raw data from accidental changes by file permissions.

Similar rules apply to source data, which is defined as data before harmonization, reconstruction, and/or file format conversion (for example, E-Prime event logs or DICOM files). Storing actual source files with the data is preferred over links to external source repositories to maximize long term preservation, which would suffer if an external repository would not be available anymore. This specification currently does not go into the details of recommending a particular naming scheme for including different types of source data (such as the raw event logs or parameter files, before conversion to BIDS). However, in the case that these data are to be included:

These data MUST be kept in separate sourcedata folder with a similar folder structure as presented below for the BIDS-managed data. For example: sourcedata/sub-01/ses-pre/func/sub-01_ses-pre_task-rest_bold.dicom.tgz or sourcedata/sub-01/ses-pre/func/MyEvent.sce.
A README file SHOULD be found at the root of the sourcedata folder or the derivatives folder, or both. This file should describe the nature of the raw data or the derived data. We RECOMMEND including the PDF print-out with the actual sequence parameters generated by the scanner in the sourcedata folder.

Alternatively one can organize their data in the following way

└─ my_dataset-1/
   ├─ sourcedata/
   │  ├─ sub-01/
   │  ├─ sub-02/
   │  └─ ... 
   ├─ ... 
   └─ derivatives/
      ├─ pipeline_1/
      ├─ pipeline_2/
      └─ ...

Storage of derived datasets

Derivatives can be stored/distributed in two ways:

Under a derivatives/ subfolder in the root of the source BIDS dataset folder to make a clear distinction between raw data and results of data processing. A data processing pipeline will typically have a dedicated directory under which it stores all of its outputs. Different components of a pipeline can, however, also be stored under different subfolders. There are few restrictions on the directory names; it is RECOMMENDED to use the format <pipeline>-<variant> in cases where it is anticipated that the same pipeline will output more than one variant (for example, AFNI-blurring and AFNI-noblurring). For the sake of consistency, the subfolder name SHOULD be the GeneratedBy.Name field in data_description.json, optionally followed by a hyphen and a suffix (see Derived dataset and pipeline description).

Example of derivatives with one directory per pipeline:
```
<dataset>/derivatives/fmriprep-v1.4.1/sub-0001
<dataset>/derivatives/spm/sub-0001
<dataset>/derivatives/vbm/sub-0001
```
Example of a pipeline with split derivative directories:
```
<dataset>/derivatives/spm-preproc/sub-0001
<dataset>/derivatives/spm-stats/sub-0001
```
Example of a pipeline with nested derivative directories:
```
<dataset>/derivatives/spm-preproc/sub-0001
<dataset>/derivatives/spm-preproc/derivatives/spm-stats/sub-0001
```
As a standalone dataset independent of the source (raw or derived) BIDS dataset. This way of specifying derivatives is particularly useful when the source dataset is provided with read-only access, for publishing derivatives as independent bodies of work, or for describing derivatives that were created from more than one source dataset. The sourcedata/ subdirectory MAY be used to include the source dataset(s) that were used to generate the derivatives. Likewise, any code used to generate the derivatives from the source data MAY be included in the code/ subdirectory.

Example of a derivative dataset including the raw dataset as source:

└─ my_processed_data/
   ├─ code/
   │  ├─ processing_pipeline-1.0.0.img 
   │  ├─ hpc_submitter.sh 
   │  └─ ... 
   ├─ sourcedata/
   │  ├─ sub-01/
   │  ├─ sub-02/
   │  └─ ... 
   ├─ sub-01/
   ├─ sub-02/
   └─ ...

File Formation specification

Imaging files

All imaging data MUST be stored using the NIfTI file format. We RECOMMEND using compressed NIfTI files (.nii.gz), either version 1.0 or 2.0. Imaging data SHOULD be converted to the NIfTI format using a tool that provides as much of the NIfTI header information (such as orientation and slice timing information) as possible. Since the NIfTI standard offers limited support for the various image acquisition parameters available in DICOM files, we RECOMMEND that users provide additional meta information extracted from DICOM files in a sidecar JSON file (with the same filename as the .nii[.gz] file, but with a .json extension). Extraction of BIDS compatible metadata can be performed using dcm2niix and dicm2nii DICOM to NIfTI converters. The BIDS-validator will check for conflicts between the JSON file and the data recorded in the NIfTI header.

Tabular files

Tabular data MUST be saved as tab delimited values (.tsv) files, that is, CSV files where commas are replaced by tabs. Tabs MUST be true tab characters and MUST NOT be a series of space characters. Each TSV file MUST start with a header line listing the names of all columns (with the exception of physiological and other continuous recordings). Names MUST be separated with tabs. It is RECOMMENDED that the column names in the header of the TSV file are written in snake_case with the first letter in lower case (for example, variable_name, not Variable_name). String values containing tabs MUST be escaped using double quotes. Missing and non-applicable values MUST be coded as n/a. Numerical values MUST employ the dot (.) as decimal separator and MAY be specified in scientific notation, using e or E to separate the significand from the exponent. TSV files MUST be in UTF-8 encoding.

Example:

onset   duration    response_time   correct stop_trial  go_trial
200 200 0   n/a n/a n/a

Key/value files (dictionaries)

JavaScript Object Notation (JSON) files MUST be used for storing key/value pairs. JSON files MUST be in UTF-8 encoding. Extensive documentation of the format can be found at https://www.json.org/, and at https://tools.ietf.org/html/std90. Several editors have built-in support for JSON syntax highlighting that aids manual creation of such files. An online editor for JSON with built-in validation is available at https://jsoneditoronline.org. It is RECOMMENDED that keys in a JSON file are written in CamelCase with the first letter in upper case (for example, SamplingFrequency, not samplingFrequency). Note however, when a JSON file is used as an accompanying sidecar file for a TSV file, the keys linking a TSV column with their description in the JSON file need to follow the exact formatting as in the TSV file.

Example of a hypothetical *_bold.json file, accompanying a *_bold.nii file:

{
  "RepetitionTime": 3,
  "Instruction": "Lie still and keep your eyes open"
}

Example of a hypothetical *_events.json file, accompanying an *_events.tsv file. Note that the JSON file contains a key describing an arbitrary column stim_presentation_side in the TSV file it accompanies.

{
  "stim_presentation_side": {
    "Levels": {
      "1": "stimulus presented on LEFT side",
      "2": "stimulus presented on RIGHT side"
    }
  }
}

Directory structure

Single session example

This is an example of the folder and file structure. Because there is only one session, the session level is not required by the format. For details on individual files see descriptions in the next section:

├─ sub-control01/
│  ├─ anat/
│  │  ├─ sub-control01_T1w.nii.gz 
│  │  ├─ sub-control01_T1w.json 
│  │  ├─ sub-control01_T2w.nii.gz 
│  │  └─ sub-control01_T2w.json 
│  ├─ func/
│  │  ├─ sub-control01_task-nback_bold.nii.gz 
│  │  ├─ sub-control01_task-nback_bold.json 
│  │  ├─ sub-control01_task-nback_events.tsv 
│  │  ├─ sub-control01_task-nback_physio.tsv.gz 
│  │  ├─ sub-control01_task-nback_physio.json 
│  │  └─ sub-control01_task-nback_sbref.nii.gz 
│  ├─ dwi/
│  │  ├─ sub-control01_dwi.nii.gz 
│  │  ├─ sub-control01_dwi.bval 
│  │  └─ sub-control01_dwi.bvec 
│  └─ fmap/
│     ├─ sub-control01_phasediff.nii.gz 
│     ├─ sub-control01_phasediff.json 
│     └─ sub-control01_magnitude1.nii.gz 
├─ code/
│  └─ deface.py 
├─ derivatives/
├─ README 
├─ participants.tsv 
├─ dataset_description.json 
└─ CHANGES

Unspecified data

Additional files and folders containing raw data MAY be added as needed for special cases. All non-standard file entities SHOULD conform to BIDS-style naming conventions, including alphabetic entities and suffixes and alphanumeric labels/indices. Non-standard suffixes SHOULD reflect the nature of the data, and existing entities SHOULD be used when appropriate. For example, an ASSET calibration scan might be named sub-01_acq-ASSET_calibration.nii.gz.

Non-standard files and directories should be named with care. Future BIDS efforts may standardize new entities and suffixes, changing the meaning of filenames and setting requirements on their contents or metadata. Validation and parsing tools MAY treat the presence of non-standard files and directories as an error, so consult the details of these tools for mechanisms to suppress warnings or provide interpretations of your filenames.

BIDS common data types

Each derivative data file SHOULD be described by a JSON file provided as a sidecar or higher up in the hierarchy of the derived dataset (according to the Inheritance Principle) unless a particular derivative includes REQUIRED metadata fields, in which case a JSON file is also REQUIRED. Each derivative type defines their own set of fields, but all of them share the following (non-required) ones:

Examples

Preprocessed bold NIfTI file in the original coordinate space of the original run. The location of the file in the original datasets is encoded in the RawSources metadata, and _desc-<label> is used to prevent clashing with the original filename.

└─ sub-01/
   └─ func/
      ├─ sub-01_task-rest_desc-preproc_bold.nii.gz 
      └─ sub-01_task-rest_desc-preproc_bold.json

{
    "RawSources": ["sub-01/func/sub-01_task-rest_bold.nii.gz"]
}

If this file was generated with prior knowledge from additional sources, such as the same subject’s T1w, then both files MAY be included in RawSources.

{
    "RawSources": [
        "sub-01/func/sub-01_task-rest_bold.nii.gz",
        "sub-01/anat/sub-01_T1w.nii.gz"
    ]
}

On the other hand, if a preprocessed version of the T1w image was used, and it also occurs in the derivatives, Sources and RawSources can both be specified.

{
    "Sources": [
        "sub-01/anat/sub-01_desc-preproc_T1w.nii.gz"
    ],
    "RawSources": [
        "sub-01/func/sub-01_task-rest_bold.nii.gz"
    ]
}

Spatial references

Derivatives are often aligned to a common spatial reference to allow for the comparison of acquired data across runs, sessions, subjects or datasets. A file may indicate the spatial reference to which it has been aligned using the space entity and/or the SpatialReference metadata.

The space entity may take any value in Image-Based Coordinate Systems.

If the space entity is omitted, or the space is not in the Standard template identifiers table, then the SpatialReference metadata is REQUIRED.

SpatialReference key allowed values

In the case of images with multiple references, an object must link the relevant structures to reference files. If a single volumetric reference is used for multiple structures, the VolumeReference key MAY be used to reduce duplication. For CIFTI-2 images, the relevant structures are BrainStructure values defined in the BrainModel elements found in the CIFTI-2 header.

Examples

Preprocessed bold NIfTI file in individual coordinate space. Please mind that in this case SpatialReference key is REQUIRED.

└─ sub-01/
   └─ func/
      ├─ sub-01_task-rest_space-individual_bold.nii.gz 
      └─ sub-01_task-rest_space-individual_bold.json

{
    "SpatialReference": "sub-01/anat/sub-01_desc-combined_T1w.nii.gz"
}

Preprocessed bold CIFTI-2 files that have been sampled to the fsLR surface meshes defined in the Conte69 atlas along with the MNI152NLin6Asym template. In this example, because all volumetric structures are sampled to the same reference, the VolumeReference key is used as a default, and only the surface references need to be specified by BrainStructure names.

└─ sub-01/
   └─ func/
      ├─ sub-01_task-rest_space-fsLR_den-91k_bold.dtseries.nii 
      └─ sub-01_task-rest_space-fsLR_den-91k_bold.json

{
    "SpatialReference": {
        "VolumeReference": "https://templateflow.s3.amazonaws.com/tpl-MNI152NLin6Asym_res-02_T1w.nii.gz",
        "CIFTI_STRUCTURE_CORTEX_LEFT": "https://github.com/mgxd/brainplot/raw/master/brainplot/Conte69_Atlas/Conte69.L.midthickness.32k_fs_LR.surf.gii",
        "CIFTI_STRUCTURE_CORTEX_RIGHT": "https://github.com/mgxd/brainplot/raw/master/brainplot/Conte69_Atlas/Conte69.R.midthickness.32k_fs_LR.surf.gii"
    }
}

Other important specifications can be found in official website. For examples:

Lab Tutorials

Explorer

2.2 BIDS Structure

File name structure

Entity-linked file collections

Source vs. raw vs. derived data

Storage of derived datasets

File Formation specification

Imaging files

Tabular files

Key/value files (dictionaries)

Directory structure

Single session example

Unspecified data

BIDS common data types

Examples

Spatial references

SpatialReference key allowed values

Examples

Preprocessed or cleaned data

BIDS derivatives

Graph View

Table of Contents