Columns ======= The "columns" of a data frame are slices of comparable data items across each row in a frame (see :ref:`Frame sets` for description of the rows), e.g. * T1-weighted MR acquisition for each imaging session * a genetic test for each subject * an fMRI activation map derived for each study group. .. TODO: visualisation of data frame A data frame is defined by adding "source" columns to access existing (typically acquired) data, and "sink" columns to define where derivatives will be stored within the data tree. The "row frequency" argument of the column (e.g. per 'session', 'subject', etc...) specifies which data frame the column belongs to. The datatype of a column's member items (see :ref:`Entries`) must be consistent and is also specified when the column is created. The data items (e.g. files, scans) within a source column do not need to have consistent labels throughout the dataset although it makes it easier where possible. To handle the case of inconsistent labelling, source columns can match single items in each row of the frame based on several criteria: * **path** - label for the file-group or field * scan type for XNAT stores * relative file path from row sub-directory for file-system/BIDS stores * is treated as a regular-expression if the `is_regex` flag is set. * **quality threshold** - the minimum quality for the item to be included * only applicable for XNAT_ stores, where the quality can be set by UI or API * **header values** - header values are sometimes needed to distinguish file * only available for selected item formats such as :class:`.medimage.Dicom` * **order** - the order that an item appears the data row * e.g. first T1-weighted scan that meets all other criteria in a session If no items, or multiple items are matched, then an error is raised. The *order* flag, can be used to select one of muliple valid options. The ``path`` argument provided to sink columns defines where derived data will be stored within the dataset: * the resource name for XNAT stores. * the relative path to the target location for file-system stores Each column is assigned a name when it is created, which is used when connecting pipeline inputs and outputs to the dataset and accessing the data directly. The column name is used as the default value for the path of sink columns. Use the ':ref:`frametree add-source`' and ':ref:`frametree add-sink`' commands to add columns to a dataset using the CLI. .. code-block:: console $ frametree add-source 'xnat-central//MYXNATPROJECT' T1w \ medimage/dicom-series --path '.*t1_mprage.*' \ --order 1 --quality usable --regex $ frametree add-sink '/data/imaging/my-project' fmri_activation_map \ medimage/nifti-gz --row-frequency group Alternatively via the Python API: .. toggle:: Show/Hide Python Code Example .. code-block:: python from frametree.common import Clinical from fileformats.medimage import DicomSeries, NiftiGz xnat_dataset.add_source( name='T1w', path=r'.*t1_mprage.*' datatype=DicomSeries, order=1, quality_threshold='usable', is_regex=True ) fs_dataset.add_sink( name='brain_template', datatype=NiftiGz, row_frequency='group' ) Once defined, the column data can be conveniently accessed and manipulated via the Python API: .. toggle:: Show/Hide Python Code Example .. code-block:: python import matplotlib.pyplot as plt from frametree.core import FrameSet # Get a column containing all T1-weighted MRI images across the dataset xnat_dataset = FrameSet.load('xnat-central//MYXNATPROJECT') t1w = xnat_dataset['T1w'] # Plot a slice of the image data from a Subject sub01's imaging session # at visit Timepoint TP2. (Note: such data access is only available for selected # data formats that have convenient Python readers) plt.imshow(t1w['TP2', 'sub01'].data[:, :, 30]) NB: one of the main benefits of using datasets in BIDS_ datatype is that the names and file formats of the data are strictly defined. This allows the :class:`.Bids` data store object to automatically add sources to the dataset when it is initialised. .. code-block:: python from frametree.bids import Bids bids_dataset = Bids().dataset( id='/data/openneuro/ds00014') # Print dimensions of T1-weighted MRI image for Subject 'sub01' print(bids_dataset['T1w']['sub01'].header['dim']) Entries ------- Atomic entries within a dataset contain either file-based data or text/numeric fields. In FrameTree, these data items are represented using `fileformats `__ classes, :class:`.FileSet`, (i.e. single files, files + header/side-cars or directories) and :class:`.Field` (e.g. integer, decimal, text, boolean, or arrays thereof), respectively. Data types/file formats can be specified in the CLI using their `MIME-type `__ or a "MIME-like" string, where their type name and registry correspond directly to the fileformats to the fileformats sub-package/class name are specified in the CLI by */*, in "kebab case" e.g. ``mediamge/nifti-gz``. Some frequently used data types are * ``text/plain`` - a text file * ``application/zip`` - a zip archive * ``application/json`` - a JSON file * ``generic/file`` - a single file of any type * ``generic/directory`` - a directory containing any files/sub-directories * ``medimage/nifti-gz-x`` - a gzipped NIfTI file with a BIDS_ JSON side-car (produced by Dcm2Niix_) * ``medimage/dicom-series`` - a directory containing a series of DICOM files * ``field/text`` - a text field * ``field/decimal`` - a decimal field The corresponding Python classes are: .. toggle:: Show/Hide Python Code Example * :class:`fileformats.text.Plain` * :class:`fileformats.application.Zip` * :class:`fileformats.application.Json` * :class:`fileformats.generic.File` * :class:`fileformats.generic.Directory` * :class:`fileformats.medimage.DicomSeries` * :class:`fileformats.medimage.NiftiGz` * :class:`fileformats.field.Text` * :class:`fileformats.field.Decimal` "Extras" packages for some of the file formats may provide converters to alternative formats (e.g. ``medimage/dicom-series`` to ``medimage/nifti-gz-x`` via Dcm2Niix_). They may also contain methods for accessing the headers and the contents of files where applicable. Where a converter is specified from an alternative file format is specified, FrameTree will automatically run the conversion between the format required by a pipeline and that stored in the data store. See FileFormats_ for detailed instructions on how to specify new file formats and converters between them. .. _XNAT: https://xnat.org .. _FileFormats: https://arcanaframework.github.io/fileformats/ .. _BIDS: https://bids.neuroimaging.io .. _Dcm2Niix: https://github.com/rordenlab/dcm2niix