Image zip

Format specification

The image zip format allows to export/import unannotated datasets with images to/from a zip archive. The format doesn’t support any annotations or attributes.

Import Image zip dataset

There are several ways to import unannotated datasets to your Datumaro project:

  • From an existing archive:
datum create
datum import -f image_zip ./images.zip
  • From a directory with zip archives. Datumaro will import images from all zip files in the directory:
datum create
datum import -f image_zip ./foo

The directory with zip archives must have the following structure:

└── foo/
    ├── archive1.zip/
    |   ├── image_1.jpg
    |   ├── image_2.png
    |   ├── subdir/
    |   |   ├── image_3.jpg
    |   |   └── ...
    |   └── ...
    ├── archive2.zip/
    |   ├── image_101.jpg
    |   ├── image_102.jpg
    |   └── ...
    ...

Images in the archives must have a supported extension, follow the user manual to see the supported extensions.

Export to other formats

Datumaro can convert image zip dataset into any other format Datumaro supports. For example:

datum create -o project
datum import -p project -f image_zip ./images.zip
datum export -p project -f coco -o ./new_dir -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'image_zip')
dataset.export('save_dir', 'coco', save_media=True)

Export an unannotated dataset to a zip archive

Example: exporting images from a VOC dataset to zip archives:

datum create -o project
datum import -p project -f voc ./VOC2012
datum export -p project -f image_zip -- --name voc_images.zip

Extra options for exporting to image_zip format:

  • --save-media allow to export dataset with saving media files (default: False)
  • --image-ext <IMAGE_EXT> allow to specify image extension for exporting dataset (default: use original or .jpg, if none)
  • --name name of output zipfile (default: default.zip)
  • --compression allow to specify archive compression method. Available methods: ZIP_STORED, ZIP_DEFLATED, ZIP_BZIP2, ZIP_LZMA (default: ZIP_STORED). Follow zip documentation for more information.

Examples

Examples of using this format from the code can be found in the format tests