FileFormats

https://github.com/arcanaframework/fileformats/actions/workflows/ci-cd.yml/badge.svg https://codecov.io/gh/arcanaframework/fileformats/branch/main/graph/badge.svg?token=UIS0OGPST7 Supported Python versions Latest Version GitHub stars

Fileformats is a library of Python classes that correspond to different file formats for file-type detection/validation, MIME-type lookup and file handling. The format classes also provide hooks for methods to read and manipulate the data contained in the files to facilitate the writing of duck-typed code. Unlike other Python packages, multi-file data formats, e.g. with separate header/data files or directories containing specific files, are supported, and can be handled just like single file types.

File-format types are typically identified by a combination of file extensions and "magic numbers", where applicable. In these cases new formats can be defined in just a few lines. However, for more exotic file formats like MRtrix Image Header, which requires inspection of headers to locate other members of the "file set", FileFormats provides a framework to add custom detection methods.

Extensions and Extras

The main FileFormats package covers all file-types with registered MIME types (see IANA MIME-types). Additional, domain-specific formats can be added via FileFormats extension framework, such as fileformats-medimage for medical imaging data, and fileformats-datascience for formats commonly found in datascience. These extension packages are understandably not comprehensive, but expected to grow as new use cases are found and new formats added (see Extensions).

The main FileFormats and its extension packages don't have any external dependencies. Extra functionality that requires external dependencies, such as libraries to read and write the file data, are implemented in separate extras packages (see Extras), e.g. fileformats-extras, fileformats-medimage-extras), to keep the base packages for format detection and file handling extremely light-weight.

Installation

FileFormats can be installed for Python >=3.8 using pip

$ python3 -m pip install fileformats

Extension packages can be installed similarly

$ python3 -m pip install fileformats-medimage fileformats-datascience

These installations have no dependencies and provide basic format detection and file handling functionality. However, for metadata inspection and format conversion methods that require external dependencies, you will need install the fileformats-extras package.

$ python3 -m pip install fileformats-extras

and likewise for the extension packages

$ python3 -m pip install fileformats-medimage-extras fileformats-datascience-extras

Note

See the Extensions and Extras for instructions on how to implement your own extensions and extras, respectively.

License

This work is licensed under a Creative Commons Attribution 4.0 International License

Creative Commons Attribution 4.0 International License