Command-line interface¶
Arcana’s command line interface is grouped into five categories, store, dataset, apply, derive, and deploy. Below these categories are the commands that interact with Arcana’s data model, processing and deployment streams.
Store¶
Commands used to access remove data stores and save them for further use
arcana store add¶
Saves the details for a new data store in the configuration file (‘~/.arcana/stores.yaml’).
Arguments¶
- name
The name given to the store for reference in other commands
- type
The storage class and the module it is defined in, e.g. arcana.data.store.xnat:Xnat
- location
The location of the store, e.g. server address
- *varargs
Parameters that are specific to the ‘type’ of storage class to be added
arcana store add [OPTIONS] NAME TYPE
Options
- -s, --server <server>¶
The URI of the server to connect to (if applicable)
- -u, --user <user>¶
The username to use to connect to the store
- -p, --password <password>¶
The password to use to connect to the store
- -c, --cache <cache>¶
The location of a cache dir to download local copies of remote data
- -d, --race-condition-delay <race_condition_delay>¶
How long to wait for changes on a incomplete download before assuming it has been interrupted, clearing it and starting again
- -o, --option <name> <value>¶
Additional key-word arguments that are passed to the store class
Arguments
- NAME¶
Required argument
- TYPE¶
Required argument
arcana store rename¶
Gives a data store saved in the config file (‘~/.arcana/stores.yaml’) a new nickname.
Arguments
OLD_NICKNAME The current name of the store. NEW_NICKNAME The new name for the store.
arcana store rename [OPTIONS] OLD_NICKNAME NEW_NICKNAME
Arguments
- OLD_NICKNAME¶
Required argument
- NEW_NICKNAME¶
Required argument
arcana store remove¶
Remove a saved data store from the config file
- nickname
The nickname the store was given when its details were saved
arcana store remove [OPTIONS] NICKNAME
Arguments
- NICKNAME¶
Required argument
arcana store refresh¶
Refreshes credentials saved for the given store (typically a token that expires)
- nickname
Nickname given to the store to refresh the credentials of
arcana store refresh [OPTIONS] NICKNAME
Options
- -u, --user <user>¶
The username to use to connect to the store
- -p, --password <password>¶
The password to use to connect to the store
Arguments
- NICKNAME¶
Required argument
arcana store ls¶
List available stores that have been saved
arcana store ls [OPTIONS]
Dataset¶
Commands used to define and work with datasets
arcana dataset define¶
Define the tree structure and IDs to include in a dataset. Where possible, the definition file is saved inside the dataset for use by multiple users, if not possible it is stored in the ~/.arcana directory.
DATASET_LOCATOR string containing the nick-name of the store, the ID of the dataset (e.g. XNAT project ID or file-system directory) and the dataset’s name in the format <store-nickname>//<dataset-id>[@<dataset-name>]
HIERARCHY the data frequencies that are present in the data tree. For some store types this is fixed (e.g. XNAT-> subject > session) but for more flexible (e.g. MockRemote), which dimension each layer of sub-directories corresponds to can be arbitrarily specified. dimensions
arcana dataset define [OPTIONS] DATASET_LOCATOR [HIERARCHY]...
Options
- --space <space>¶
The enum that specifies the data dimensions of the dataset. Defaults to Clinical, which consists of the typical dataset>group>subject>session data tree used in medimage trials/studies
- --include <freq-id>¶
The rows to include in the dataset. First value is the row-frequency of the ID (e.g. ‘group’, ‘subject’, ‘session’) followed by the IDs to be included in the dataset. If the second arg contains ‘/’ then it is interpreted as the path to a text file containing a list of IDs
- --exclude <freq-id>¶
The rows to exclude from the dataset. First value is the row-frequency of the ID (e.g. ‘group’, ‘subject’, ‘session’) followed by the IDs to be included in the dataset. If the second arg contains ‘/’ then it is interpreted as the path to a text file containing a list of IDs
- --id-pattern <row-frequency> <pattern>¶
Specifies how IDs of row frequencies that not explicitly provided are inferred from the IDs that are. For example, given a set of subject IDs that are a combination of the ID of the group that they belong to + their member IDs (i.e. matched test/controls have same member ID), e.g.
CONTROL01, CONTROL02, CONTROL03, … and TEST01, TEST02, TEST03
the group ID can be extracted by providing the ID to source it from (i.e. subject) and a regular expression (in Python regex syntax: https://docs.python.org/3/library/re.html) with a single group corresponding to the inferred IDs
–id-pattern group ‘subject:([A-Z]+)[0-9]+’ –id-pattern member ‘subject:[A-Z]+([0-9]+)’
Arguments
- DATASET_LOCATOR¶
Required argument
- HIERARCHY¶
Optional argument(s)
arcana dataset add-source¶
Adds a source column to a dataset. A source column selects comparable items along a dimension of the dataset to serve as an input to pipelines and analyses.
DATASET_LOCATOR The path to the dataset including store and dataset name (where applicable), e.g. central-xnat//MYXNATPROJECT:pass_t1w_qc
NAME: The name the source will be referenced by
FORMAT: The data type of the column. Can be a field (int|float|str|bool), field array (ty.List[int|float|str|bool]) or “file-set” (file, file+header/side-cars or directory)
arcana dataset add-source [OPTIONS] DATASET_LOCATOR NAME DATATYPE
Options
- -f, --row-frequency <dimension>¶
The row-frequency that items appear in the dataset (e.g. per ‘session’, ‘subject’, ‘timepoint’, ‘group’, ‘dataset’ for common:Clinical data dimensions
- Default:
'highest'
- -p, --path <path>¶
Path to item in the dataset. If ‘regex’ option is provided it will be treated as a regular-expression (in Python syntax)
- --order <order>¶
If multiple items match the remaining criteria within a session, select the <order>th of the matching items
- -q, --quality <quality>¶
For data stores that enable flagging of data quality, this option can filter out poor quality scans
- --regex, --no-regex¶
Whether the ‘path’ option should be treated as a regular expression or not
- -h, --header <key-val>¶
Match on specific header value. This option is only valid for select formats that the implement the ‘header_val()’ method (e.g. medimage/dicom-series).
Arguments
- DATASET_LOCATOR¶
Required argument
- NAME¶
Required argument
- DATATYPE¶
Required argument
arcana dataset add-sink¶
Adds a sink column to a dataset. A sink column specifies how data should be written into the dataset.
Arguments¶
- dataset_path
The path to the dataset including store and dataset name (where applicable), e.g. central-xnat//MYXNATPROJECT:pass_t1w_qc
- name
The name the source will be referenced by
- datatype
The data type of the column. Can be a field (int|float|str|bool), field array (ty.List[int|float|str|bool]) or “file-set” (file, file+header/side-cars or directory)
arcana dataset add-sink [OPTIONS] DATASET_LOCATOR NAME DATATYPE
Options
- -f, --row-frequency <dimension>¶
The row-frequency that items appear in the dataset (e.g. per ‘session’, ‘subject’, ‘timepoint’, ‘group’, ‘dataset’ for Clinical data dimensions
- Default:
'highest'
- -p, --path <path>¶
Path to item in the dataset. If ‘regex’ option is provided it will be treated as a regular-expression (in Python syntax)
- -s, --salience <salience>¶
The salience of the column, i.e. whether it will show up on ‘arcana derive menu’
Arguments
- DATASET_LOCATOR¶
Required argument
- NAME¶
Required argument
- DATATYPE¶
Required argument
arcana dataset missing-items¶
Finds the IDs of rows that are missing a valid entry for an item in the column.
- DATASET_LOCATOR of the dataset including store and dataset name (where
applicable), e.g. central-xnat//MYXNATPROJECT:pass_t1w_qc
COLUMN_NAMES, [COLUMN_NAMES, …] for the columns to check, defaults to all source columns
arcana dataset missing-items [OPTIONS] DATASET_LOCATOR [COLUMN_NAMES]...
Arguments
- DATASET_LOCATOR¶
Required argument
- COLUMN_NAMES¶
Optional argument(s)
Apply¶
Commands for applying workflows and analyses to datasets
arcana apply pipeline¶
Apply a Pydra workflow to a dataset as a pipeline between two columns
DATASET_LOCATOR string containing the nickname of the data store, the ID of the dataset (e.g. XNAT project ID or file-system directory) and the dataset’s name in the format <store-nickname>//<dataset-id>[@<dataset-name>]
PIPELINE_NAME is the name of the pipeline
WORKFLOW_LOCATION is the location to a Pydra workflow on the Python system path, <MODULE>:<WORKFLOW>
arcana apply pipeline [OPTIONS] DATASET_LOCATOR PIPELINE_NAME
WORKFLOW_LOCATION
Options
- -i, --input <col-name> <pydra-field> <required-datatype>¶
the link between a column and an input of the workflow. The required format is the location (<module-path>:<class>) of the format expected by the workflow
- -o, --output <col-name> <pydra-field> <produced-datatype>¶
the link between an output of the workflow and a sink column. The produced datatype is the location (<module-path>:<class>) of the datatype produced by the workflow
- -p, --parameter <name> <value>¶
a fixed parameter of the workflow to set when applying it
- -s, --source <col-name> <pydra-field> <required-datatype>¶
add a source to the dataset and link it to an input of the workflow in a single step. The source column must be able to be specified by its path alone and be already in the datatype required by the workflow
- -k, --sink <col-name> <pydra-field> <produced-datatype>¶
add a sink to the dataset and link it to an output of the workflow in a single step. The sink column be in the same datatype as produced by the workflow
- -f, --row-frequency <row_frequency>¶
the row-frequency of the rows the pipeline will be executed over, i.e. will it be run once per-session, per-subject or per whole dataset, by default the highest row-frequency rows (e.g. per-session)
- --overwrite, --no-overwrite¶
whether to overwrite previous pipelines
Arguments
- DATASET_LOCATOR¶
Required argument
- PIPELINE_NAME¶
Required argument
- WORKFLOW_LOCATION¶
Required argument
arcana apply analysis¶
Applies an analysis class to a dataset
arcana apply analysis [OPTIONS]
Derive¶
Commands for calling workflows/analyses to derive derivative data
arcana derive column¶
Derive data for a data sink column and all prerequisite columns.
DATASET_LOCATOR string containing the nickname of the data store, the ID of the dataset (e.g. XNAT project ID or file-system directory) and the dataset’s name in the format <store-nickname>//<dataset-id>[@<dataset-name>]
COLUMNS are the names of the sink columns to derive
arcana derive column [OPTIONS] DATASET_LOCATOR [COLUMNS]...
Options
- -w, --work <work>¶
The location of the directory where the working files created during the pipeline execution will be stored
- --plugin <plugin>¶
The Pydra plugin with which to process the workflow
- --loglevel <loglevel>¶
The level of detail logging information is presented
Arguments
- DATASET_LOCATOR¶
Required argument
- COLUMNS¶
Optional argument(s)
arcana derive output¶
Derive an output
arcana derive output [OPTIONS]
arcana derive ignore-diff¶
Ignore difference between provenance of previously generated derivative and new parameterisation
arcana derive ignore-diff [OPTIONS]
Deploy¶
Commands for deploying arcana pipelines
arcana deploy docs¶
Build docs for one or more yaml wrappers
SPEC_ROOT is the path of a YAML spec file or directory containing one or more such files.
The generated documentation will be saved to OUTPUT.
arcana deploy docs [OPTIONS] SPEC_PATH OUTPUT
Options
- --registry <registry>¶
The Docker registry to deploy the pipeline to
- --flatten, --no-flatten¶
- --loglevel <loglevel>¶
The level to display logs at
- --default-data-space <default_data_space>¶
The default data space to assume if it isn’t explicitly stated in the command
- --spec-root <spec_root>¶
The root path to consider the specs to be relative to, defaults to CWD
Arguments
- SPEC_PATH¶
Required argument
- OUTPUT¶
Required argument
arcana deploy inspect-docker¶
Extract the executable from a Docker image
arcana deploy inspect-docker [OPTIONS] IMAGE_TAG
Arguments
- IMAGE_TAG¶
Required argument