Typing

As formats are represented by classes, FileFormats provides a way to specify fine-grain type annotations for Python functions and workflows that operate on files. To augment this functionality, FileFormats also provides a number of special datatype classes that can be used interchangeably with file formats, and the ability to "classify" fileformat types to specify the data expected to be contained within them. In some cases, classifiers are used as part of the validation process, in others they are just annotations.

Classifiers

Classifiable FileFormats classes (i.e. those that inherit the WithClassifiers mixin), can be classified using the [] class operator. For example, a zip file can be classified as containing a PNG image

>>> from fileformats.application import Zip
>>> from fileformats.image import Png, Jpeg

>>> zipped_png_fspath = str(Zip.convert(Png("/path/to/an-image.png")))
>>> Zip[Png].matches(zipped_png_fspath)
True
>>> Zip[Jpeg].matches(zipped_png_fspath)
False

Warning

Classifiers are currently not supported by Mypy and other static type checkers (only dynamic type-checking in Pydra) because they use a custom __subclasshook__ method to implement the subclassing behaviour and overload the __class_getitem__ method. It is hoped that it will be possible to implement a custom Mypy plugin in the future to support this feature.

The types of classes that can be used to classify varies from type to type. For archive types like Zip, Gzip, take another file format type, others specific classifier types. Some classifiable types can take multiple classifiers, whereas others can take one, and the the multiple classifiers can either be ordered or not. In the case of MedicalImage subclasses, multiple unordered classifiers based on the Radlex radiology lexicon can be used to annotate the contents of the images

from fileformats.medimage import NiftiGz, T1Weighted, Brain

def brain_mask(image: NiftiGz[Brain, T1Weighted]) -> NiftiGz[Brain, Mask]:
    ...

Note

At the time of writing, only a subset of the RadLex lexicon has been implemented. It is being expanded as needed.

When classifiable types are converted into MIME-like strings, the classifiers are prepended to the type name with a '+' separator, e.g.

to_mime(Zip[Png]) == "image/png+zip"

If there are multiple classifiers, then they are arranged in alphabetic order (unless they are ordered) and separated by a '.' preceding the '+' separator

to_mime(NiftiGz[T1Weighted, Brain]) == "medimage/brain.t1-weighted+nifti-gz"

Typically the classifier types need to belong to the same subpackage/registry as the main type, but special classes such as application.Zip and application.Gzip, can be classified by any file format type. Other special classifiable types are the generic.DirectoryOf and generic.SetOf collection types. These can be used to specify that a "file format" contains a collection of file formats within a directory or independent files, respectively.

from fileformats.image import Jpeg, Png
from fileformats.generic import DirectoryOf, SetOf

def list_pngs(directory: DirectoryOf[Jpeg]) -> SetOf[Png]:
    return SetOf[Png](Png.convert(j) for j in directory.contents)

Non-file "fields"

There are some use cases where input data can contain a mix of file-based and field data. Therefore, while not file formats, for convenience FileFormats also provides some field datatypes that can be used interchangeably with file format types for some use cases, particularly MIME type.

A common feature they share is the ability to convert them to/from mime-like (see Informal ("MIME-like")) strings, e.g. to_mime(Integer) == "field/integer".

The can be converted to and from their corresponding "primitive types", i.e. int, float, bool, str and list, either by the object inits

>>> from fileformats.field import Integer
>>> my_integer = Integer(1)
>>> int(my_integer)
1

or the Field.to_primitive() and Field.from_primitive() methods

>>> from fileformats.field import Field
>>> field = Field.from_primitive(1)
>>> repr(field)
Integer(1)
>>> field.to_primitive()
1

The items contained within an Array class can be specified using the square brackets operator

from fileformats.field import Array, Integer, Text, Boolean

def my_func(int_array: Array[Integer], text_array: Array[Text]) -> Array[Boolean]:
    ...

This will validate the type of data contained within can be converted into the specified item type

from fileformats.field import Array, Integer

int_array = Array[Integer]([1, 2, 3])  # PASSES
bad_int_array = Array[Integer]([1, 2, 3.5])  # FAILS!

Subclass hooks

Classified types implement the WithClassifiers.__subclasshook__() method, to control the behaviour of the isinstance() and issubclass() functions when they are passed as arguments. Classified types are considered to be subclasses of the classifiable type.

from fileformats.application import Zip
from fileformats.image import Png

assert issubclass(Zip[Png], Zip)
assert isinstance(Zip[Png]("/path/to/zip.zip"), Zip)

Similarly, for types with multiple unordered classifiers, a type with a superset of the classifiers of another type is a subclass

from fileformats.medimage import NiftiGz, T1Weighted, Brain

assert issubclass(NiftiGz[T1Weighted, Brain], NiftiGz[T1Weighted])

This is also the case if the classifiers of the superset type are subclasses of the classifiers in the subset

from fileformats.medimage import NiftiGz, T1Weighted, Brain, Mri

assert issubclass(T1Weighted, Mri)
assert issubclass(NiftiGz[T1Weighted, Brain], NiftiGz[Mri])

or if the classifiable type itself is a subclass

from fileformats.medimage import NiftiGz, NiftiGzX, T1Weighted, Brain, Mri

assert issubclass(NiftiGzX, NiftiGz)
assert issubclass(NiftiGzX[T1Weighted, Brain], NiftiGz[T1Weighted, Brain])

For ordered classifiers, the classifiers must be in the same order to be considered a subclass

from fileformats.testing import R, A, B, C, E

assert issubclass(E, C)
assert issubclass(R[A, B, E], R[A, B, C])