FileFormats#

https://github.com/arcanaframework/fileformats/actions/workflows/ci-cd.yml/badge.svg https://codecov.io/gh/arcanaframework/fileformats/branch/main/graph/badge.svg?token=UIS0OGPST7 Supported Python versions Latest Version GitHub stars

Fileformats provides a library of file-format types implemented as Python classes. The format classes are designed to be used in file-type validation during the construction of data workflows (e.g. Pydra, Fastr), and provide a common interface to general methods for manipulating and moving the underlying file-system objects between storage locations.

Unlike other file-type Python packages, FileFormats, supports multi-file data formats ("file sets") often found in scientific workflows, e.g. with separate header/data files, directories containing certain file types, and mechanisms to peek at metadata fields to define complex data formats or specific sub-types (e.g. functional MRI DICOM file set).

File-format types are typically identified by a combination of file extensions and "magic numbers", where applicable. In addition to these generic methods, FileFormats provides a flexible framework to conveniently add custom identification routines for exotic file formats, e.g. formats that require inspection of headers to locate other members of the "file set".

Installation#

FileFormats can be installed for Python >=3.7 using pip

$ python3 -m pip install fileformats

This will perform a basic install with minimal dependencies, which can be used for type validation and detection. To also install the dependencies required to read data from, and converters between, select file formats, you can install the package with the extended option.

$ python3 -m pip install fileformats[extended]

Quick Example#

Validate an mp4 audio file's extension and magic number simply by instantiating the class.

>>> from fileformats.audio import Mp4
>>> mp4_file = Mp4("/path/to/audio.mp4")  # checks it exists, its extension and magic number
>>> str(mp4_file)
"/path/to/audio.mp4"

The created FileSet object implements os.PathLike so can used in place of str or pathlib.Path in most cases, e.g. when opening files

>>> fp = open(mp4_file, "rb")
>>> contents = fp.read()

or in string templates, e.g.

>>> import subprocess
>>> subprocess.run(f"cp {mp4_file} new-dest.mp4", shell=True)

License#

This work is licensed under a Creative Commons Attribution 4.0 International License

Creative Commons Attribution 4.0 International License