API

Functions

There are four main functions that are used to return the file formats from string and path inputs.

fileformats.core.to_mime(datatype: Type[DataType], official: bool = True) str[source]

Returns the mime-type or mime-like (i.e. using fileformats namespaces instead of putting all non-standard types in the 'application' registry) string corresponding to the given datatype

Parameters:
  • datatype (type) -- the datatype to get the mime string for

  • official (bool) -- whether to use the official mime-type instead of mime-like

Returns:

mime_str -- the MIME type string if iana=True, or MIME-like (i.e. using the fileformats namespace scheme instead of putting all non-standard types into the 'application' registry if not

Return type:

str

fileformats.core.from_mime(mime_str: str) Type[fileformats.core.DataType] | ty.Type[ty.Union][source]

Resolves a MIME type (or MIME-like) string into the corresponding type

Parameters:

mime_str (str) -- the MIME type, or MIME-like (i.e. using the fileformats namespace scheme instead of putting all non-standard types into the 'application' registry), string to resolve

Returns:

datatype -- the resolved datatype

Return type:

type

fileformats.core.find_matching(fspaths: Collection[Path], candidates: Collection[Type[FileSet]] | None = None, standard_only: bool = False, include_generic: bool = False, skip_unconstrained: bool = True) List[Type[FileSet]][source]

Detect the corresponding file format from a set of file-system paths

Parameters:
  • fspaths (list[Path]) -- file-system paths to detect the format of

  • candidates (sequence[FileSet], optional) -- the candidates to select from, by default all file formats

  • standard_only (bool, optional) -- If you only want to return matches from the "standard" IANA types. Only relevant if candidates is None, by default False

  • skip_unconstrained (bool, optional) -- skip formats that aren't constrained by extension, magic number or another check. Only relevant if candidates is None

Returns:

the file formats that match the given file-system paths

Return type:

list[FileSet]

fileformats.core.from_paths(fspaths: Iterable[Path], *candidates: Type[FileSet], common_ok: bool = False, ignore: str | None = None, **kwargs: Any) List[FileSet][source]

Given a list of candidate classes (defaults to all installed in alphabetical order), instantiates all possible file-set instances from a collection of file-system paths.

Note that the order in which the candidates are provided is important as the first valid match for each path will be returned.

Parameters:
  • fspaths (ty.Iterable[Path]) -- file-system paths to instantiate file-sets from

  • *candidates (tuple[fileformats.core.FileSet]) -- the file-set classes to instantiate. If none are provided, then all installed filesets will be tried in alphabetical order of their "mime-like" representation.

  • common_ok (bool) -- whether file-system paths can be used as secondary files in multiple file-sets

  • ignore (str, optional) -- regular expression pattern for file/directory names to ignore if they aren't used in any of the returned file-sets. Any remaining file-paths that are not matched by this pattern will cause an error to be raised.

  • **kwargs (dict[str, Any]) -- keyword arguments passed on to the underlying call to FileSet.from_paths

Returns:

the instantiated file-sets

Return type:

list[fileformats.core.FileSet]

Base Classes

Base classes form the foundation of the fileformats package and are not intended to be instantiated directly, but rather subclassed to create new file formats. The methods and properties of these classes are described here.

class fileformats.core.Classifier[source]

Bases: object

Base class for all file-format "classifiers", including datatypes and abstract types

class property namespace: str | None

The "namespace" the format belongs to under the "fileformats" umbrella namespace

class property type_name: str

Name of type to be used in __repr__. Defined here so it can be overridden

class fileformats.core.DataType[source]

Bases: Classifier

Base class for all file formats and fields.

class property all_types: Iterator[Type[DataType]]

chain(*iterables) --> chain object

Return a chain object whose .__next__() method returns elements from the first iterable until it is exhausted, then elements from the next iterable, until all of the iterables are exhausted.

classmethod get_converter(source_format: Type[DataType], name: str = 'converter', **kwargs: Any) None[source]
classmethod matches(values: Any) bool[source]

Checks whether the given value (fspaths for file-sets) match the datatype specified by the class

Parameters:

values (ty.Any) -- values to check whether they match the given datatype

Returns:

matches -- whether the datatype matches the provided values

Return type:

bool

class property mime_like: str

Generates a "MIME-like" identifier from a format class. The fileformats package namespace forms a superset of IANA MIME registries. Formats with official MIME types will return their MIME type, while extension formats will return a MIME-like identifier, e.g. "text/plain" for fileformats.text.Plain. and "medimage/nifti" for fileformats.medimage.Nifti.

class property mime_type: str

Generates a MIME type identifier from a format class (i.e. an identifier for a non-MIME class in the MIME.

classmethod subclasses() Generator[Type[Self], None, None][source]

Iterate over all installed subclasses

class fileformats.core.FileSet(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]

Bases: DataType

The base class for all format types within the fileformats package. A generic representation of a collection of files related to a single data resource. A file-set can be a single file or directory or a collection thereof, such as a primary file with a "side-car" header.

Parameters:
  • *fspaths (Path | str | FileSet | Collection[Path | str | FileSet]) -- a set of file-system paths pointing to all the resources in the file-set

  • metadata (dict[str, Any]) -- metadata associated with the file-set, typically lazily loaded via read_metadata extra hook but can be provided directly at the time of instantiation

  • **load_kwargs (ty.Any) -- Any keyword arguments to be passed through to read_metadata and load implementations when loading metadata and data to fill the metadata and contents properties respectively.

class property all_formats: Set[Type[FileSet]]

Iterate over all FileSet formats in fileformats.* namespaces

classmethod convert(fileset: FileSet, plugin: str = 'serial', task_name: str | None = None, **kwargs: Any) Self[source]

Convert a given file-set into the format specified by the class

Parameters:
  • fileset (FileSet) -- the file-set object to convert

  • plugin (str) -- the "execution plugin" used to run the conversion task

  • task_name (str) -- the name given to the converter task

  • **kwargs -- args to pass to the conversion process

Returns:

the file-set converted into the type of the current class

Return type:

FileSet

copy(dest_dir: str | Path, mode: CopyMode | str = CopyMode.copy, collation: CopyCollation | str = CopyCollation.any, new_stem: str | None = None, trim: bool = True, make_dirs: bool = False, overwrite: bool = False, supported_modes: CopyMode = CopyMode.any, extension_decomposition: ExtensionDecomposition = ExtensionDecomposition.single) Self[source]

Copies the file-set to a new directory, optionally renaming the files to have consistent name-stems.

Based on the range of options provided, copy determines the "laziest" mode to use, i.e. if we can leave the files where they are and satisfy both the explicit mode requested by the user and the "collation" requirements (see FileSet.CopyCollation), we prefer to do so, otherwise we prefer to symlink, then hardlink, then as a last resort a full copy.

Parameters:
  • dest_dir (str) -- Path to the parent directory to save the file-set

  • mode (FileSet.CopyMode or str, optional) -- designates whether to perform an actual copy or whether a link (symbolic or hard) is okay, 'duplicate' by default. See FielSet.CopyMode for details

  • collation (FileSet.CopyCollation or str, optional) -- how to treat relative paths within the fileset, i.e. whether to move them to a single directory, rename them to the same file-stem or maintain relative directory structure. See FileSet.CopyCollation for details

  • new_stem (str, optional) -- the file name excluding file extensions, to give the files/dirs in the parent directory, by default the original file name is used

  • trim (bool, optional) -- Only copy the paths in the file-set that are "required" by the format, true by default

  • make_dirs (bool, optional) -- Make the parent destination and all missing ancestors if they are missing, false by default

  • overwrite (bool, optional) -- whether to overwrite existing files/directories if present

  • supported_modes (CopyMode, optional) -- supported modes for the copy operation. Used to mask out the requested copy mode

  • extension_decomposition (FileSet.ExtensionDecomposition, optional) -- whether to consider file extensions to start from the first '.' (multiple) or the last (single) or be empty (none), when the extension of a fspath in the FileSet isn't explicitly defined by the FileSet class. Only relevant when collation mode is set to "adjacent". By default True

decomposed_fspaths(required_only: bool = True, decomposition_mode: ExtensionDecomposition = ExtensionDecomposition.single) List[Tuple[Path, str, str]][source]

Decompose paths into parent directory, filename stem, and extension

Parameters:
  • required_only (bool, optional) -- only include required paths, by default True

  • assume_implicit_ext (FileSet.ExtensionDecomposition, optional) -- how to interpret paths without an explicitly defined extension (i.e. by either the extension of the FileSet or nested filesets), by default single

Returns:

decomposed_fspath -- a tuple consisting of the parent directory, file-stem and extension

Return type:

list[tuple[Path, str, str]]

classmethod from_mime(mime_string: str) Type[DataType]

Resolves a FileFormat class from a MIME (IANA) or "MIME-like" identifier (i.e. an identifier for a non-MIME class in the MIME style), e.g.

"text/plain" resolves to fileformats.text.Plain

and

"image/tiff-fx" resolves to fileformats.image.TiffFx

Parameters:

mime_string (str) -- MIME identifier

Returns:

the corresponding file format class

Return type:

type

classmethod from_paths(fspaths: Iterable[Path], common_ok: bool = False, **kwargs: Any) Tuple[Set[Self], Set[Path]][source]

Finds all instances of the fileset class that can be constructed from a collection of file-system paths.

Parameters:
  • fspaths (Iterable[Path]) -- file-system paths to instantiate file-sets from

  • common_ok (bool) -- whether secondary file-system paths can be shared between multiple instances of the returned filesets

  • **kwargs (Any) -- additional keyword arguments to pass to the file

Returns:

  • filesets (set[FileSet]) -- file-sets instantiated from the provided paths

  • remaining (set[Path]) -- remaining file-system paths that weren't used in any of the file-sets

classmethod get_converter(source_format: Type[DataType], name: str = 'converter', **kwargs: Any) TaskBase[source]

Get a converter that converts from the source format type into the format specified by the class

Parameters:
  • source_format (type) -- the format to convert from

  • name (str) -- the name given to the converter task

  • **kwargs -- passed on to the task init method to customise the conversion

Returns:

a pydra task or workflow that performs the conversion, or None if no conversion is required

Return type:

pydra.engine.TaskBase or None

Raises:
  • FileFormatConversionError -- no converters found between source and dest format

  • FileFormatConversionError -- ambiguous (i.e. more than one) converters found between source and dest format

hash(crypto: Callable[[], Any] | None = None, mtime: bool = False, chunk_len: int = 8192, relative_to: Path | None = None, ignore_hidden_files: bool = False, ignore_hidden_dirs: bool = False) str[source]

Calculate a unique hash for the file-set based on the relative paths and contents of its constituent files

Parameters:
  • crypto (function, optional) -- the cryptography method used to hash the files, by default hashlib.sha256

  • **kwargs -- keyword args passed directly through to the hash_dir function

Returns:

hash -- unique hash for the file-set

Return type:

str

hash_files(crypto: Callable[[], Any] | None = None, mtime: bool = False, chunk_len: int = 8192, relative_to: Path | None = None, ignore_hidden_files: bool = False, ignore_hidden_dirs: bool = False) Dict[str, str][source]

Calculate hashes for all files in the file-set based on the relative paths and contents of its constituent files

Parameters:
  • crypto (function, optional) -- the cryptography method used to hash the files, by default hashlib.sha256

  • **kwargs -- keyword args passed directly through to the hash_dir function

Returns:

file_hashes -- unique hashes for each file in the file-set

Return type:

dict[str, bytes]

classmethod matching_exts(fspaths: Collection[Path], exts: List[str | None] | None = None) List[Path][source]

Returns the paths out of the candidates provided that matches the given extension (by default the extension of the class)

Parameters:
  • fspaths (list[Path]) -- The paths to select from

  • ext (list[str], optional) -- the extensions to match, by default the primary and alternate extensions of the class

Returns:

the matching paths

Return type:

list[Path]

Raises:

FileFormatError -- When no paths match or more than one path matches the given extension

metadata

Lazily load metadata from read_metadata extra if implemented, returning an empty metadata array if not

mime_like = 'core/file-set'
class property mime_type: str

Generates a MIME type (IANA) identifier from a format class. If an official IANA MIME type doesn't exist it will create one in the in the MIME style, e.g.

fileformats.text.Plain to "text/plain"

fileformats.image.TiffFx to "image/tiff-fx"

fileformats.mynamespace.MyFormat to "application/x-my-format

Returns:

the MIME type corresponding to the class

Return type:

str

classmethod mock(*fspaths: Path | str) Self[source]

Return an instance of a mocked sub-class of the file format to be used in test routines like doctests that doesn't require to point at actual files

Parameters:

*fspaths (sequence[Path | str]) -- the paths to be provided to the mocked class, by default will be ["mock/<class-name-lower>"]

Returns:

a file-set that will pass type-checking as an instance of the given fileset class but which doesn't actually point to any FS objects.

Return type:

Self

move(dest_dir: str | Path, collation: CopyCollation | str = CopyCollation.any, new_stem: str | None = None, trim: bool = True, make_dirs: bool = False, overwrite: bool = False, extension_decomposition: ExtensionDecomposition = ExtensionDecomposition.single) Self[source]

Moves the file-set to a new directory, optionally renaming the files to have consistent name-stems.

Parameters:
  • dest_dir (str) -- Path to the parent directory to save the file-set

  • collation (FileSet.CopyCollation or str, optional) -- how to treat relative paths within the fileset, i.e. whether to move them to a single directory, rename them to the same file-stem or maintain relative directory structure. See FileSet.CopyCollation for details

  • new_stem (str, optional) -- the file name excluding file extensions, to give the files/dirs in the parent directory, by default the original file name is used

  • trim (bool, optional) -- Only copy the paths in the file-set that are "required" by the format, true by default

  • make_dirs (bool, optional) -- Make the parent destination and all missing ancestors if they are missing, false by default

  • overwrite (bool, optional) -- whether to overwrite existing files/directories if present

  • extension_decomposition (FileSet.ExtensionDecomposition, optional) -- whether to consider file extensions to start from the first '.' (multiple) or the last (single) or be empty (none), when the extension of a fspath in the FileSet isn't explicitly defined by the FileSet class. Only relevant when collation mode is set to "adjacent". By default True

class property possible_exts: List[str | None]

All possible extensions of the file format

classmethod register_converter(source_format: Type[FileSet], converter_spec: ConverterSpec) None[source]

Registers a converter task within a class attribute. Called by the @fileformats.core.converter decorator.

Parameters:
  • source_format (type) -- the source format to register a converter from

  • converter_spec -- a tuple consisting of a task_spec callable that resolves to a Pydra task and a dictionary of keyword arguments to be passed to the task spec at initialisation time

Raises:

FormatConversionError -- if there is already a converter registered between the two types

classmethod sample(dest_dir: Path | None = None, seed: int | str = 0, stem: str | None = None) Self[source]

Return an sample instance of the file-set type for classes where the test_data extra has been implemented

Parameters:
  • dest_dir (Path, optional) -- the path in which to create the test data

  • seed (int) -- seed used to generate content. Defaults to 0 (rather than a timestamp), so the default method call produces consistent runs between calls

  • stem (str) -- the filename stem to give the file

Returns:

an instance of the given file-set class

Return type:

FileSet

select_by_ext(fileformat: Type[FileSet] | None = None) Path[source]

Selects a single path from a set of file-system paths based on the file extension

Parameters:

fileformat (type) -- the format class of the path to select

Returns:

the selected file-system path that matches the extension

Return type:

Path

Raises:

FormatMismatchError -- if more than one paths matches the extension

class property standard_formats: Iterable[Type[FileSet]]

Iterate over all formats in the standard fileformats.* namespaces

class property strext: str

Return extension that is guaranteed to be a string (i.e. not None)

class property unconstrained: bool

Whether the file-format is unconstrained by extension, magic number or another constraint

class fileformats.core.Field(value: ValueType)[source]

Bases: Generic[ValueType, PrimitiveType], DataType

Base class for all field formats

classmethod from_mime(mime_string: str) Type[DataType]

Resolves a FileFormat class from a MIME (IANA) or "MIME-like" identifier (i.e. an identifier for a non-MIME class in the MIME style), e.g.

"text/plain" resolves to fileformats.text.Plain

and

"image/tiff-fx" resolves to fileformats.image.TiffFx

Parameters:

mime_string (str) -- MIME identifier

Returns:

the corresponding file format class

Return type:

type

classmethod from_primitive(dtype: type) Type[Field[Any, Any]][source]
mime_like = 'core/field'
to_primitive() PrimitiveType[source]

Generic Classes

Generic classes representing files and directories can be used as base classes for specific file formats, as well as in cases where the format of the file is not known and only general properties are required.

FsObject exposes of the properties and methods of the pathlib.Path class, where applicable so it and all subclasses should be able to be duck-typed in place of a pathlib.Path object in most cases.

class fileformats.generic.FsObject(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]

Bases: FileSet, PathLike

Generic file-system object, can be either a file or a directory

__fspath__() str[source]

Render to string, so can be treated as any other file-system path, i.e. passed to functions like file 'open'

__str__() str[source]

Renders the file path as a string so it can be used in templating e.g. f'cp {fs_object} /tmp'

absolute() Path[source]
property anchor: str
chmod(mode: int, *, follow_symlinks: bool = True) None[source]
property drive: str
exists() bool[source]
property fspath: Path
group() str | None[source]
is_dir() bool[source]
is_file() bool[source]
property name: str
owner() str | None[source]
property parent: Path

A common parent directory for all the top-level paths in the file-set

property parents: Sequence[Path]
property parts: Tuple[str, ...]
property root: str
stat(follow_symlinks: bool = True) stat_result[source]
property stem: str
property suffix: str
property suffixes: List[str]
class property unconstrained: bool

Whether the file-format is unconstrained by extension, magic number or another constraint

class fileformats.generic.File(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]

Bases: FsObject

Generic file type

property actual_ext: str

The actual file extension (out of the primary and alternate extensions possible)

contents

The contents of the file-set, will be an object of a type that makes sense for the format, as loaded by the load method

open(mode: str = 'r', buffering: int = -1, encoding: str | None = None, errors: str | None = None, newline: str | None = None) IO[str] | IO[bytes][source]

Open a I/O stream to the file

read_bytes() bytes[source]
read_contents(size: int | None = None, offset: int = 0) str | bytes[source]
property stem: str
class fileformats.generic.BinaryFile(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]

Bases: File

open(mode: str = 'r', buffering: int = -1, encoding: str | None = None, errors: str | None = None, newline: str | None = None) IO[bytes][source]

Open a I/O stream to the file

read_contents(size: int | None = None, offset: int = 0) bytes[source]
class fileformats.generic.UnicodeFile(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]

Bases: File

open(mode: str = 'r', buffering: int = -1, encoding: str | None = None, errors: str | None = None, newline: str | None = None) IO[str][source]

Open a I/O stream to the file

read_contents(size: int | None = None, offset: int = 0) str[source]
read_text(encoding: str | None = None, errors: str | None = None) str[source]
class fileformats.generic.Directory(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]

Bases: FsObject

Base directory to be overridden by subtypes that represent directories but don't want to inherit content type "qualifers" (i.e. most of them)

__div__(other: str | Path) Path[source]
contents
property fspath: Path
glob(pattern: str) Iterator[Path][source]
iterdir() Iterator[Path][source]
joinpath(other: str | Path) Path[source]
rglob(pattern: str) Iterator[Path][source]
class fileformats.generic.TypedSet(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]

Bases: TypedCollection

List of specific file types (similar to the contents of a directory but not enclosed in one)

contents

DirectoryOf and SetOf allow the dynamic creation of classes that represent directories and sets of files that contain specific file formats.

class fileformats.generic.DirectoryOf(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]

Bases: WithClassifiers, TypedDirectory

Generic directory classified by the formats of its contents

class fileformats.generic.SetOf(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]

Bases: WithClassifiers, TypedSet

Fields

Fields are used to define non-file data in a what that can be referred to interchangeably with fileformats, in particular by their MIME-like type (see Informal ("MIME-like")), which is under the field namespace, e.g. field/integer or field/decimal+array.

class fileformats.field.Text(value: Any)[source]

Bases: Singular[str, str]

class fileformats.field.Integer(value: Any)[source]

Bases: Singular[int, int], ScalarMixin[int, int]

class fileformats.field.Decimal(value: Any)[source]

Bases: Singular[Decimal, float], ScalarMixin[Decimal, float]

class fileformats.field.Boolean(value: Any)[source]

Bases: Singular[bool, bool], LogicalMixin

class fileformats.field.Array(value: str | Sequence[Any])[source]

Bases: WithClassifier, Field[Tuple[ItemType, ...], Tuple[ItemType, ...]], Sequence[ItemType]