Extras¶
The functionality in addition to the core validation and detection typically requires external dependencies and should be put into the separate extras project within the apa Please use an "extra" hook and implement the method in an "extras" package in the extension template, to be called fileformats-<yournamespace>-extras.
Hooks and implementations¶
FileFormats Extras enable the creation of hooks in FileSet classes using the @extra decorator that can be implemented in separate modules using the @extra_implementation decorator. The "extra methods" typically add additional functionality for accessing and maninpulating the data within the fileset, i.e. not required for format detection and validation, and should be implemented in a separate package if they have external dependencies to keep the main and extension packages dependency free. The standard place to put these extras-implementations is in the sister "extras" package named fileformats-<yournamespace>-extras, located in the extras directory in the extension package root (see https://github.com/ArcanaFramework/fileformats-extension-template for further instructions). It is possible to implement extra methods in other modules, however, the extras package associated with formats namespace will be loaded by default when a hooked method is accessed.
Use the @extra decorator on a method in the to define an extras method,
from typing import Self class MyFormat(File): ext = ".my" @extra def my_extra_method(self, index: int, scale: float, save_path: Path) -> Self: ...
and then reference that method in the extras package using the @extra_implementation
from some_external_package import load_my_format, save_my_format
from fileformats.core import extra_implementation
from fileformats.mypackage import MyFormat
@extra_implementation(MyFormat.my_extra_method)
def my_extra_method(
my_format: MyFormat, index: int scale: float, save_path: Path
) -> MyFormat:
data_array = load_my_format(my_format.fspath)
data_array[:index] *= scale
save_my_format(save_path, data_array)
return MyFormat(save_path)
The first argument to the implementation functions is the instance the method is executed on, and the types of the remaining arguments and return need to match the hooked method exactly.
It is possible to provide multiple overloads for subclasses of the format that defines the hook. Like functools.singledispacth (which is used under the hood), the type of the first argument (not the type of the class the method is referenced from in the decorated) determines which of the overloaded methods is called
class MyFormatX(MyFormat):
ext = ".myx"
@extra_implementation(MyFormat.my_extra_method)
def my_extra_method(
my_format: MyFormat, index: int scale: float, save_path: Path
) -> MyFormat:
...
@extra_implementation(MyFormat.my_extra_method)
def my_extra_method(
my_format: MyFormatX, index: int scale: float, save_path: Path
) -> MyFormatX:
...
Registering converters¶
Converters between two equivalent formats are defined using Pydra dataflow engine
tasks. There are two types
of Pydra tasks, function tasks, Python functions decorated by @pydra.mark.task
, and
shell-command tasks, which wrap command-line tools in Python classes. To register a
Pydra task as a converter between two file formats it needs to be decorated with the
@fileformats.core.converter
decorator. Like the implementation of extra methods,
converters should be implemented in the sister extras package.
Pydra uses type annotations to define the input and outputs of the tasks. It there is
a input to the task named in_file
, and either a single anonymous output or an output
named out_file
, and both are format classes, then no arguments need to be passed
to the converter decorator and the conversion source and target formats are determined
automatically. For example,
from pathlib import Path
import tempfile
import pydra.mark
from fileformats.core import converter
from .mypackage import MyFormat, MyOtherFormat
@converter
@pydra.mark.task
def convert_my_format(in_file: MyFormat, conversion_argument: int = 2) -> MyOtherFormat:
data = in_file.load()
output_path = Path(tempfile.mkdtemp()) / ("out" + MyOtherFormat.ext)
... do conversion ...
return MyOtherFormat.save_new(output_path, data)
defines a converter between MyFormat
and MyOtherFormat
, with the converter
argument conversion_argument
.
The @converter
decorator registers the class in a class attribute of the target class,
therefore only if module containing the converter methods is imported will the converters
be available. Converter arguments can be passed as keyword-arguments to the
get_converter
and convert
methods if required.
Sometimes the source and target formats cannot be automatically determined from the
task signature, and need to be provided as arguments to the @converter
decorator
instead. For example, the converter between raster images using the imageio
package
to do a generic conversion between all image types,
from pathlib import Path
import tempfile
import pydra.mark
import pydra.engine.specs
from fileformats.core import converter
from .raster import RasterImage, Bitmap, Gif, Jpeg, Png, Tiff
@converter(target_format=Bitmap, output_format=Bitmap)
@converter(target_format=Gif, output_format=Gif)
@converter(target_format=Jpeg, output_format=Jpeg)
@converter(target_format=Png, output_format=Png)
@converter(target_format=Tiff, output_format=Tiff)
@pydra.mark.task
@pydra.mark.annotate({"return": {"out_file": RasterImage}})
def convert_image(in_file: RasterImage, output_format: type, out_dir: ty.Optional[Path] = None):
data_array = in_file.load()
if out_dir is None:
out_dir = Path(tempfile.mkdtemp())
output_path = out_dir / (in_file.fspath.stem + output_format.ext)
return output_format.save_new(output_path, data_array)
In this case because we can write the converter in a generic way that allows us to convert
between any image type supported by imageio
, we use the RasterImage
base class
for the input and output format, and explicitly set the target_format
of the output
for each of the support output formats. We also pass output_format
as a keyword argument
from the converter decorator to specify the format we want to convert to.
Note that while the source_format
can be a base class of the format to be converted,
the target_format
can't be, since the subclass my have specific characteristics not
captured by transformation to the base class. However, you can attempt to "cast" a
base class to a sub-class simply by providing the base class as an input, since it will
simply iterate over paths in the base class and attempt to validate them.
>>> sub_format = SubFormat(BaseFormat.convert(another_format))
Shell commands are marked as converters in the same way as function tasks, and existing
ShellCommandTask classes can be registered by calling the converter method on the ShellCommandTask
directly. If required, you can also map the input and output files to in_file
and
out_file
via the converter decorator for any converter task and set appropriate
input fields
from fileformats.yourpackage import YourFormat, YourOtherFormat
from pydra.tasks.thirdparty import ThirdPartyShellCmd
converter(
source_format=YourFormat,
target_format=YourOtherFormat,
in_file=your_file,
out_file=other_file,
compression="y",
)(ThirdPartyShellCmd)
If you need to map any of the converter arguments or perform more complex logic, it is
also possible to decorate a generic function that returns an instantiated Pydra task,
such as in the mrconvert
converter in the fileformats-medimage
package.
@converter(source_format=MedicalImage, target_format=Analyze, out_ext=Analyze.ext)
@converter(
source_format=MedicalImage, target_format=MrtrixImage, out_ext=MrtrixImage.ext
)
@converter(
source_format=MedicalImage,
target_format=MrtrixImageHeader,
out_ext=MrtrixImageHeader.ext,
)
def mrconvert(name, out_ext: str):
"""Initiate an MRConvert task with the output file extension set
Parameters
----------
name : str
name of the converter task
out_ext : str
extension of the output file, used by MRConvert to determine the desired format
Returns
-------
pydra.ShellCommandTask
the converter task
"""
return pydra_mrtrix3_utils.MRConvert(name=name, out_file="out" + out_ext)
Since converter tasks rely on Pydra, which should be added as an "extended" dependency,
they are not loaded by default. However, if there is a package at
fileformats.<namespace>.converters
, it will be attempted to be imported and throw
a warning if the import fails, when get_converter is called on a format in that
namespace.
Warning
If the converters aren't imported successfully, then you will receive a
FormatConversionError
error saying there are no converters between FormatA and
FormatB.