New formats and spaces

Arcana was initially developed for medical-imaging analysis. Therefore, with the notable exceptions of the generic data spaces and file-formats defined in arcana.core.standard, the majority of file-formats and data spaces are specific to medical imaging. However, new formats and data spaces used in other fields can be implemented as required with just a few lines of code.

File formats

File formats are specified using the FileFormats package. Please refer to its documentation on how to add new file formats

Data spaces

New data spaces (see Spaces) are defined by extending the DataSpace abstract base class. DataSpace subclasses are be enums with binary string values of consistent length (i.e. all of length 2 or all of length 3, etc…). The length of the binary string defines the rank of the data space, i.e. the maximum depth of a data tree within the space. The enum must contain members for each permutation of the bit string (e.g. for 2 dimensions, there must be members corresponding to the values 0b00, 0b01, 0b10, 0b11).

For example, in imaging studies scannings sessions are typically organised by analysis group (e.g. test & control), membership within the group (i.e matched subject ID) and time-points for longitudinal studies. In this case, we can visualise the imaging sessions arranged in a 3-D grid along the group, member, and timepoint axes. Note that datasets that only contain one group or time-point can still be represented in this space, and just be singleton along the corresponding axis.

All axes should be included as members of a DataSpace subclass enum with orthogonal binary vector values, e.g.:

member = 0b001
group = 0b010
timepoint = 0b100

The axis that is most often non-singleton should be given the smallest bit as this will be assumed to be the default when there is only one layer in the data tree, e.g. imaging datasets will not always have different groups or time-points but will always have different members (which are equivalent to subjects when there is only one group).

The “leaf rows” of a data tree, imaging sessions in this example, will be the bitwise-and of the dimension vectors, i.e. an imaging session is uniquely defined by its member, group and timepoint ID.:

session = 0b111

In addition to the data items stored in leaf rows, some data, particularly derivatives, may be stored in the dataset along a particular dimension, at a lower “row_frequency” than ‘per session’. For example, brain templates are sometimes calculated ‘per group’. Additionally, data can also be stored in aggregated rows that across a plane of the grid. These frequencies should also be added to the enum, i.e. all permutations of the base dimensions must be included and given intuitive names if possible:

subject = 0b011 - uniquely identified subject within in the dataset.
batch = 0b110 - separate group + timepoint combinations
matchedpoint = 0b101 - matched members and time-points aggregated across groups

Finally, for items that are singular across the whole dataset there should also be a dataset-wide member with value=0:

dataset = 0b000

For example, if you wanted to analyse daily recordings from various weather stations you could define a 2-dimensional “Weather” data space with axes for the date and weather station of the recordings, with the following code

from arcana.core.data.space import DataSpace

class Weather(DataSpace):

    # Define the axes of the dataspace
    timepoint = 0b01
    station = 0b10

    # Name the leaf and root frequencies of the data space
    recording = 0b11
    dataset = 0b00

Note

All permutations of N-D binary strings need to be named within the enum.