API¶
Functions¶
There are four main functions that are used to return the file formats from string and path inputs.
- fileformats.core.to_mime(datatype: Type[DataType], official: bool = True) str [source]¶
Returns the mime-type or mime-like (i.e. using fileformats namespaces instead of putting all non-standard types in the 'application' registry) string corresponding to the given datatype
- Parameters:
- Returns:
mime_str -- the MIME type string if iana=True, or MIME-like (i.e. using the fileformats namespace scheme instead of putting all non-standard types into the 'application' registry if not
- Return type:
- fileformats.core.from_mime(mime_str: str) Type[fileformats.core.DataType] | ty.Type[ty.Union] [source]¶
Resolves a MIME type (or MIME-like) string into the corresponding type
- fileformats.core.find_matching(fspaths: Collection[Path], candidates: Collection[Type[FileSet]] | None = None, standard_only: bool = False, include_generic: bool = False, skip_unconstrained: bool = True) List[Type[FileSet]] [source]¶
Detect the corresponding file format from a set of file-system paths
- Parameters:
fspaths (list[Path]) -- file-system paths to detect the format of
candidates (sequence[FileSet], optional) -- the candidates to select from, by default all file formats
standard_only (bool, optional) -- If you only want to return matches from the "standard" IANA types. Only relevant if candidates is None, by default False
skip_unconstrained (bool, optional) -- skip formats that aren't constrained by extension, magic number or another check. Only relevant if candidates is None
- Returns:
the file formats that match the given file-system paths
- Return type:
- fileformats.core.from_paths(fspaths: Iterable[Path], *candidates: Type[FileSet], common_ok: bool = False, ignore: str | None = None, **kwargs: Any) List[FileSet] [source]¶
Given a list of candidate classes (defaults to all installed in alphabetical order), instantiates all possible file-set instances from a collection of file-system paths.
Note that the order in which the candidates are provided is important as the first valid match for each path will be returned.
- Parameters:
fspaths (ty.Iterable[Path]) -- file-system paths to instantiate file-sets from
*candidates (tuple[fileformats.core.FileSet]) -- the file-set classes to instantiate. If none are provided, then all installed filesets will be tried in alphabetical order of their "mime-like" representation.
common_ok (bool) -- whether file-system paths can be used as secondary files in multiple file-sets
ignore (str, optional) -- regular expression pattern for file/directory names to ignore if they aren't used in any of the returned file-sets. Any remaining file-paths that are not matched by this pattern will cause an error to be raised.
**kwargs (dict[str, Any]) -- keyword arguments passed on to the underlying call to FileSet.from_paths
- Returns:
the instantiated file-sets
- Return type:
Base Classes¶
Base classes form the foundation of the fileformats package and are not intended to be instantiated directly, but rather subclassed to create new file formats. The methods and properties of these classes are described here.
- class fileformats.core.Classifier[source]¶
Bases:
object
Base class for all file-format "classifiers", including datatypes and abstract types
- class fileformats.core.DataType[source]¶
Bases:
Classifier
Base class for all file formats and fields.
- class property all_types: Iterator[Type[DataType]]¶
chain(*iterables) --> chain object
Return a chain object whose .__next__() method returns elements from the first iterable until it is exhausted, then elements from the next iterable, until all of the iterables are exhausted.
- classmethod get_converter(source_format: Type[DataType], name: str = 'converter', **kwargs: Any) None [source]¶
- classmethod matches(values: Any) bool [source]¶
Checks whether the given value (fspaths for file-sets) match the datatype specified by the class
- Parameters:
values (ty.Any) -- values to check whether they match the given datatype
- Returns:
matches -- whether the datatype matches the provided values
- Return type:
- class property mime_like: str¶
Generates a "MIME-like" identifier from a format class. The fileformats package namespace forms a superset of IANA MIME registries. Formats with official MIME types will return their MIME type, while extension formats will return a MIME-like identifier, e.g. "text/plain" for fileformats.text.Plain. and "medimage/nifti" for fileformats.medimage.Nifti.
- class fileformats.core.FileSet(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]¶
Bases:
DataType
The base class for all format types within the fileformats package. A generic representation of a collection of files related to a single data resource. A file-set can be a single file or directory or a collection thereof, such as a primary file with a "side-car" header.
- Parameters:
*fspaths (Path | str | FileSet | Collection[Path | str | FileSet]) -- a set of file-system paths pointing to all the resources in the file-set
metadata (dict[str, Any]) -- metadata associated with the file-set, typically lazily loaded via read_metadata extra hook but can be provided directly at the time of instantiation
**load_kwargs (ty.Any) -- Any keyword arguments to be passed through to read_metadata and load implementations when loading metadata and data to fill the metadata and contents properties respectively.
- class property all_formats: Set[Type[FileSet]]¶
Iterate over all FileSet formats in fileformats.* namespaces
- classmethod convert(fileset: FileSet, plugin: str = 'serial', task_name: str | None = None, **kwargs: Any) Self [source]¶
Convert a given file-set into the format specified by the class
- Parameters:
- Returns:
the file-set converted into the type of the current class
- Return type:
- copy(dest_dir: str | Path, mode: CopyMode | str = CopyMode.copy, collation: CopyCollation | str = CopyCollation.any, new_stem: str | None = None, trim: bool = True, make_dirs: bool = False, overwrite: bool = False, supported_modes: CopyMode = CopyMode.any, extension_decomposition: ExtensionDecomposition = ExtensionDecomposition.single) Self [source]¶
Copies the file-set to a new directory, optionally renaming the files to have consistent name-stems.
Based on the range of options provided, copy determines the "laziest" mode to use, i.e. if we can leave the files where they are and satisfy both the explicit mode requested by the user and the "collation" requirements (see FileSet.CopyCollation), we prefer to do so, otherwise we prefer to symlink, then hardlink, then as a last resort a full copy.
- Parameters:
dest_dir (str) -- Path to the parent directory to save the file-set
mode (FileSet.CopyMode or str, optional) -- designates whether to perform an actual copy or whether a link (symbolic or hard) is okay, 'duplicate' by default. See FielSet.CopyMode for details
collation (FileSet.CopyCollation or str, optional) -- how to treat relative paths within the fileset, i.e. whether to move them to a single directory, rename them to the same file-stem or maintain relative directory structure. See FileSet.CopyCollation for details
new_stem (str, optional) -- the file name excluding file extensions, to give the files/dirs in the parent directory, by default the original file name is used
trim (bool, optional) -- Only copy the paths in the file-set that are "required" by the format, true by default
make_dirs (bool, optional) -- Make the parent destination and all missing ancestors if they are missing, false by default
overwrite (bool, optional) -- whether to overwrite existing files/directories if present
supported_modes (CopyMode, optional) -- supported modes for the copy operation. Used to mask out the requested copy mode
extension_decomposition (FileSet.ExtensionDecomposition, optional) -- whether to consider file extensions to start from the first '.' (multiple) or the last (single) or be empty (none), when the extension of a fspath in the FileSet isn't explicitly defined by the FileSet class. Only relevant when collation mode is set to "adjacent". By default True
- decomposed_fspaths(required_only: bool = True, decomposition_mode: ExtensionDecomposition = ExtensionDecomposition.single) List[Tuple[Path, str, str]] [source]¶
Decompose paths into parent directory, filename stem, and extension
- Parameters:
required_only (bool, optional) -- only include required paths, by default True
assume_implicit_ext (FileSet.ExtensionDecomposition, optional) -- how to interpret paths without an explicitly defined extension (i.e. by either the extension of the FileSet or nested filesets), by default single
- Returns:
decomposed_fspath -- a tuple consisting of the parent directory, file-stem and extension
- Return type:
- classmethod from_mime(mime_string: str) Type[DataType] ¶
Resolves a FileFormat class from a MIME (IANA) or "MIME-like" identifier (i.e. an identifier for a non-MIME class in the MIME style), e.g.
"text/plain" resolves to fileformats.text.Plain
and
"image/tiff-fx" resolves to fileformats.image.TiffFx
- classmethod from_paths(fspaths: Iterable[Path], common_ok: bool = False, **kwargs: Any) Tuple[Set[Self], Set[Path]] [source]¶
Finds all instances of the fileset class that can be constructed from a collection of file-system paths.
- Parameters:
fspaths (Iterable[Path]) -- file-system paths to instantiate file-sets from
common_ok (bool) -- whether secondary file-system paths can be shared between multiple instances of the returned filesets
**kwargs (Any) -- additional keyword arguments to pass to the file
- Returns:
filesets (set[FileSet]) -- file-sets instantiated from the provided paths
remaining (set[Path]) -- remaining file-system paths that weren't used in any of the file-sets
- classmethod get_converter(source_format: Type[DataType], name: str = 'converter', **kwargs: Any) TaskBase [source]¶
Get a converter that converts from the source format type into the format specified by the class
- Parameters:
- Returns:
a pydra task or workflow that performs the conversion, or None if no conversion is required
- Return type:
pydra.engine.TaskBase or None
- Raises:
FileFormatConversionError -- no converters found between source and dest format
FileFormatConversionError -- ambiguous (i.e. more than one) converters found between source and dest format
- hash(crypto: Callable[[], Any] | None = None, mtime: bool = False, chunk_len: int = 8192, relative_to: Path | None = None, ignore_hidden_files: bool = False, ignore_hidden_dirs: bool = False) str [source]¶
Calculate a unique hash for the file-set based on the relative paths and contents of its constituent files
- Parameters:
crypto (function, optional) -- the cryptography method used to hash the files, by default hashlib.sha256
**kwargs -- keyword args passed directly through to the
hash_dir
function
- Returns:
hash -- unique hash for the file-set
- Return type:
- hash_files(crypto: Callable[[], Any] | None = None, mtime: bool = False, chunk_len: int = 8192, relative_to: Path | None = None, ignore_hidden_files: bool = False, ignore_hidden_dirs: bool = False) Dict[str, str] [source]¶
Calculate hashes for all files in the file-set based on the relative paths and contents of its constituent files
- classmethod matching_exts(fspaths: Collection[Path], exts: List[str | None] | None = None) List[Path] [source]¶
Returns the paths out of the candidates provided that matches the given extension (by default the extension of the class)
- Parameters:
- Returns:
the matching paths
- Return type:
list[Path]
- Raises:
FileFormatError -- When no paths match or more than one path matches the given extension
- metadata¶
Lazily load metadata from read_metadata extra if implemented, returning an empty metadata array if not
- mime_like = 'core/file-set'¶
- class property mime_type: str¶
Generates a MIME type (IANA) identifier from a format class. If an official IANA MIME type doesn't exist it will create one in the in the MIME style, e.g.
fileformats.text.Plain to "text/plain"
fileformats.image.TiffFx to "image/tiff-fx"
fileformats.mynamespace.MyFormat to "application/x-my-format
- Returns:
the MIME type corresponding to the class
- Return type:
- classmethod mock(*fspaths: Path | str) Self [source]¶
Return an instance of a mocked sub-class of the file format to be used in test routines like doctests that doesn't require to point at actual files
- Parameters:
*fspaths (sequence[Path | str]) -- the paths to be provided to the mocked class, by default will be ["mock/<class-name-lower>"]
- Returns:
a file-set that will pass type-checking as an instance of the given fileset class but which doesn't actually point to any FS objects.
- Return type:
Self
- move(dest_dir: str | Path, collation: CopyCollation | str = CopyCollation.any, new_stem: str | None = None, trim: bool = True, make_dirs: bool = False, overwrite: bool = False, extension_decomposition: ExtensionDecomposition = ExtensionDecomposition.single) Self [source]¶
Moves the file-set to a new directory, optionally renaming the files to have consistent name-stems.
- Parameters:
dest_dir (str) -- Path to the parent directory to save the file-set
collation (FileSet.CopyCollation or str, optional) -- how to treat relative paths within the fileset, i.e. whether to move them to a single directory, rename them to the same file-stem or maintain relative directory structure. See FileSet.CopyCollation for details
new_stem (str, optional) -- the file name excluding file extensions, to give the files/dirs in the parent directory, by default the original file name is used
trim (bool, optional) -- Only copy the paths in the file-set that are "required" by the format, true by default
make_dirs (bool, optional) -- Make the parent destination and all missing ancestors if they are missing, false by default
overwrite (bool, optional) -- whether to overwrite existing files/directories if present
extension_decomposition (FileSet.ExtensionDecomposition, optional) -- whether to consider file extensions to start from the first '.' (multiple) or the last (single) or be empty (none), when the extension of a fspath in the FileSet isn't explicitly defined by the FileSet class. Only relevant when collation mode is set to "adjacent". By default True
- classmethod register_converter(source_format: Type[FileSet], converter_spec: ConverterSpec) None [source]¶
Registers a converter task within a class attribute. Called by the @fileformats.core.converter decorator.
- Parameters:
source_format (type) -- the source format to register a converter from
converter_spec -- a tuple consisting of a task_spec callable that resolves to a Pydra task and a dictionary of keyword arguments to be passed to the task spec at initialisation time
- Raises:
FormatConversionError -- if there is already a converter registered between the two types
- classmethod sample(dest_dir: Path | None = None, seed: int | str = 0, stem: str | None = None) Self [source]¶
Return an sample instance of the file-set type for classes where the test_data extra has been implemented
- Parameters:
- Returns:
an instance of the given file-set class
- Return type:
- select_by_ext(fileformat: Type[FileSet] | None = None) Path [source]¶
Selects a single path from a set of file-system paths based on the file extension
- Parameters:
fileformat (type) -- the format class of the path to select
- Returns:
the selected file-system path that matches the extension
- Return type:
Path
- Raises:
FormatMismatchError -- if more than one paths matches the extension
- class fileformats.core.Field(value: ValueType)[source]¶
Bases:
Generic
[ValueType
,PrimitiveType
],DataType
Base class for all field formats
- classmethod from_mime(mime_string: str) Type[DataType] ¶
Resolves a FileFormat class from a MIME (IANA) or "MIME-like" identifier (i.e. an identifier for a non-MIME class in the MIME style), e.g.
"text/plain" resolves to fileformats.text.Plain
and
"image/tiff-fx" resolves to fileformats.image.TiffFx
- mime_like = 'core/field'¶
Generic Classes¶
Generic classes representing files and directories can be used as base classes for specific file formats, as well as in cases where the format of the file is not known and only general properties are required.
FsObject
exposes of the properties and methods of the pathlib.Path
class,
where applicable so it and all subclasses should be able to be duck-typed in place of a
pathlib.Path
object in most cases.
- class fileformats.generic.FsObject(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]¶
-
Generic file-system object, can be either a file or a directory
- __fspath__() str [source]¶
Render to string, so can be treated as any other file-system path, i.e. passed to functions like file 'open'
- __str__() str [source]¶
Renders the file path as a string so it can be used in templating e.g.
f'cp {fs_object} /tmp'
- stat(follow_symlinks: bool = True) stat_result [source]¶
- class fileformats.generic.File(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]¶
Bases:
FsObject
Generic file type
- property actual_ext: str¶
The actual file extension (out of the primary and alternate extensions possible)
- contents¶
The contents of the file-set, will be an object of a type that makes sense for the format, as loaded by the load method
- class fileformats.generic.BinaryFile(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]¶
Bases:
File
- class fileformats.generic.UnicodeFile(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]¶
Bases:
File
- class fileformats.generic.Directory(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]¶
Bases:
FsObject
Base directory to be overridden by subtypes that represent directories but don't want to inherit content type "qualifers" (i.e. most of them)
- contents¶
- class fileformats.generic.TypedSet(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]¶
Bases:
TypedCollection
List of specific file types (similar to the contents of a directory but not enclosed in one)
- contents¶
DirectoryOf and SetOf allow the dynamic creation of classes that represent directories and sets of files that contain specific file formats.
- class fileformats.generic.DirectoryOf(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]¶
Bases:
WithClassifiers
,TypedDirectory
Generic directory classified by the formats of its contents
- class fileformats.generic.SetOf(*fspaths: Iterable[str | Path] | str | Path | fileformats.core.FileSet, metadata: Dict[str, Any] | None = None, **load_kwargs: Any)[source]¶
Bases:
WithClassifiers
,TypedSet
Fields¶
Fields are used to define non-file data in a what that can be referred to interchangeably with fileformats, in particular by their MIME-like type (see Informal ("MIME-like")), which is under the field namespace, e.g. field/integer or field/decimal+array.