FileFormats#
Fileformats provides a library of file-format types implemented as Python classes. The format classes are designed to be used in file-type validation during the construction of data workflows (e.g. Pydra, Fastr), and provide a common interface to general methods for manipulating and moving the underlying file-system objects between storage locations.
Unlike other file-type Python packages, FileFormats, supports multi-file data formats ("file sets") often found in scientific workflows, e.g. with separate header/data files, directories containing certain file types, and mechanisms to peek at metadata fields to define complex data formats or specific sub-types (e.g. functional MRI DICOM file set).
File-format types are typically identified by a combination of file extensions and "magic numbers", where applicable. In addition to these generic methods, FileFormats provides a flexible framework to conveniently add custom identification routines for exotic file formats, e.g. formats that require inspection of headers to locate other members of the "file set".
Installation#
FileFormats can be installed for Python >=3.7 using pip
$ python3 -m pip install fileformats
This will perform a basic install with minimal dependencies, which can be used for
type validation and detection. To also install the dependencies required to read data
from, and converters between, select file formats, you can install the package with
the extended
option.
$ python3 -m pip install fileformats[extended]
Quick Example#
Validate an mp4 audio file's extension and magic number simply by instantiating the class.
>>> from fileformats.audio import Mp4
>>> mp4_file = Mp4("/path/to/audio.mp4") # checks it exists, its extension and magic number
>>> str(mp4_file)
"/path/to/audio.mp4"
The created FileSet
object implements os.PathLike
so can used in place of str
or pathlib.Path
in most cases, e.g. when opening files
>>> fp = open(mp4_file, "rb")
>>> contents = fp.read()
or in string templates, e.g.
>>> import subprocess
>>> subprocess.run(f"cp {mp4_file} new-dest.mp4", shell=True)
License#
This work is licensed under a Creative Commons Attribution 4.0 International License