1 - NYU Depth Dataset V2

Format specification

The original NYU Depth Dataset V2 is available here.

Supported annotation types:

  • DepthAnnotation

Import NYU Depth Dataset V2

The NYU Depth Dataset V2 is available for free download.

A Datumaro project with a NYU Depth Dataset V2 source can be created in the following way:

datum create
datum import --format nyu_depth_v2 <path/to/dataset>

It is also possible to import the dataset using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'nyu_depth_v2')

NYU Depth Dataset V2 directory should have the following structure:

Dataset/
    ├── 1.h5
    ├── 2.h5
    ├── 3.h5
    └── ...

To make sure that the selected dataset has been added to the project, you can run datum project info, which will display the project information.

Examples

Examples of using this format from the code can be found in the format tests

2 - ADE20k (v2017)

Format specification

The original ADE20K 2017 dataset is available here.

The consistency set (for checking the annotation consistency) is available here.

Supported annotation types:

  • Masks

Supported annotation attributes:

  • occluded (boolean): whether the object is occluded by another object
  • other arbitrary boolean attributes, which can be specified in the annotation file <image_name>_atr.txt

Import ADE20K 2017 dataset

A Datumaro project with an ADE20k source can be created in the following way:

datum create
datum import --format ade20k2017 <path/to/dataset>

It is also possible to import the dataset using Python API:

import datumaro as dm

ade20k_dataset = dm.Dataset.import_from('<path/to/dataset>', 'ade20k2017')

ADE20K dataset directory should have the following structure:

dataset/
├── dataset_meta.json # a list of non-format labels (optional)
├── subset1/
│   └── super_label_1/
│       ├── img1.jpg
│       ├── img1_atr.txt
│       ├── img1_parts_1.png
│       ├── img1_seg.png
│       ├── img2.jpg
│       ├── img2_atr.txt
│       └── ...
└── subset2/
    ├── img3.jpg
    ├── img3_atr.txt
    ├── img3_parts_1.png
    ├── img3_parts_2.png
    ├── img4.jpg
    ├── img4_atr.txt
    ├── img4_seg.png
    └── ...

The mask images <image_name>_seg.png contain information about the object class segmentation masks and also separate each class into instances. The channels R and G encode the objects class masks. The channel B encodes the instance object masks.

The mask images <image_name>_parts_N.png contain segmentation masks for parts of objects, where N is a number indicating the level in the part hierarchy.

The annotation files <image_name>_atr.txt describe the content of each image. Each line in the text file contains:

  • column 1: instance number,
  • column 2: part level (0 for objects),
  • column 3: occluded (1 for true),
  • column 4: original raw name (might provide a more detailed categorization),
  • column 5: class name (parsed using wordnet),
  • column 6: double-quoted list of attributes, separated by commas. Each column is separated by a #. See example of dataset here.

To add custom classes, you can use dataset_meta.json.

Export to other formats

Datumaro can convert an ADE20K dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports segmentation masks.

There are several ways to convert an ADE20k 2017 dataset to other dataset formats using CLI:

datum create
datum import -f ade20k2017 <path/to/dataset>
datum export -f coco -o <output/dir> -- --save-media

or

datum convert -if ade20k2017 -i <path/to/dataset> \
    -f coco -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'ade202017')
dataset.export('save_dir', 'coco')

Examples

Examples of using this format from the code can be found in the format tests

3 - ADE20k (v2020)

Format specification

The original ADE20K 2020 dataset is available here.

The consistency set (for checking the annotation consistency) is available here.

Supported annotation types:

  • Masks

Supported annotation attributes:

  • occluded (boolean): whether the object is occluded by another object
  • other arbitrary boolean attributes, which can be specified in the annotation file <image_name>.json

Import ADE20K dataset

A Datumaro project with an ADE20k source can be created in the following way:

datum create
datum import --format ade20k2020 <path/to/dataset>

It is also possible to import the dataset using Python API:

import datumaro as dm

ade20k_dataset = dm.Dataset.import_from('<path/to/dataset>', 'ade20k2020')

ADE20K dataset directory should have the following structure:

dataset/
├── dataset_meta.json # a list of non-format labels (optional)
├── subset1/
│   ├── img1/  # directory with instance masks for img1
│   |    ├── instance_001_img1.png
│   |    ├── instance_002_img1.png
│   |    └── ...
│   ├── img1.jpg
│   ├── img1.json
│   ├── img1_seg.png
│   ├── img1_parts_1.png
│   |
│   ├── img2/  # directory with instance masks for img2
│   |    ├── instance_001_img2.png
│   |    ├── instance_002_img2.png
│   |    └── ...
│   ├── img2.jpg
│   ├── img2.json
│   └── ...
│
└── subset2/
    ├── super_label_1/
    |   ├── img3/  # directory with instance masks for img3
    |   |    ├── instance_001_img3.png
    |   |    ├── instance_002_img3.png
    |   |    └── ...
    |   ├── img3.jpg
    |   ├── img3.json
    |   ├── img3_seg.png
    |   ├── img3_parts_1.png
    |   └── ...
    |
    ├── img4/  # directory with instance masks for img4
    |   ├── instance_001_img4.png
    |   ├── instance_002_img4.png
    |   └── ...
    ├── img4.jpg
    ├── img4.json
    ├── img4_seg.png
    └── ...

The mask images <image_name>_seg.png contain information about the object class segmentation masks and also separate each class into instances. The channels R and G encode the objects class masks. The channel B encodes the instance object masks.

The mask images <image_name>_parts_N.png contain segmentation masks for parts of objects, where N is a number indicating the level in the part hierarchy.

The <image_name> directory contains instance masks for each object in the image, these masks represent one-channel images, each pixel of which indicates an affinity to a specific object.

The annotation files <image_name>.json describe the content of each image. See our tests asset for example of this file, or check ADE20K toolkit for it.

To add custom classes, you can use dataset_meta.json.

Export to other formats

Datumaro can convert an ADE20K dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports segmentation masks.

There are several ways to convert an ADE20k dataset to other dataset formats using CLI:

datum create
datum import -f ade20k2020 <path/to/dataset>
datum export -f coco -o ./save_dir -- --save-media

or

datum convert -if ade20k2020 -i <path/to/dataset> \
    -f coco -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'ade20k2020')
dataset.export('save_dir', 'voc')

Examples

Examples of using this format from the code can be found in the format tests

4 - Align CelebA

Format specification

The original CelebA dataset is available here.

Supported annotation types:

  • Label
  • Points (landmarks)

Supported attributes:

  • 5_o_Clock_Shadow, Arched_Eyebrows, Attractive, Bags_Under_Eyes, Bald, Bangs, Big_Lips, Big_Nose, Black_Hair, Blond_Hair, Blurry, Brown_Hair, Bushy_Eyebrows, Chubby, Double_Chin, Eyeglasses, Goatee, Gray_Hair, Heavy_Makeup, High_Cheekbones, Male, Mouth_Slightly_Open, Mustache, Narrow_Eyes, No_Beard, Oval_Face, Pale_Skin, Pointy_Nose, Receding_Hairline, Rosy_Cheeks, Sideburns, Smiling, Straight_Hair, Wavy_Hair, Wearing_Earrings, Wearing_Hat, Wearing_Lipstick, Wearing_Necklace, Wearing_Necktie, Young (boolean)

Import align CelebA dataset

A Datumaro project with an align CelebA source can be created in the following way:

datum create
datum import --format align_celeba <path/to/dataset>

It is also possible to import the dataset using Python API:

import datumaro as dm

align_celeba_dataset = dm.Dataset.import_from('<path/to/dataset>', 'align_celeba')

Align CelebA dataset directory should have the following structure:

dataset/
├── dataset_meta.json # a list of non-format labels (optional)
├── Anno/
│   ├── identity_CelebA.txt
│   ├── list_attr_celeba.txt
│   └── list_landmarks_align_celeba.txt
├── Eval/
│   └── list_eval_partition.txt
└── Img/
    └── img_align_celeba/
        ├── 000001.jpg
        ├── 000002.jpg
        └── ...

The identity_CelebA.txt file contains labels (required). The list_attr_celeba.txt, list_landmarks_align_celeba.txt, list_eval_partition.txt files contain attributes, bounding boxes, landmarks and subsets respectively (optional).

The original CelebA dataset stores images in a .7z archive. The archive needs to be unpacked before importing.

To add custom classes, you can use dataset_meta.json.

Export to other formats

Datumaro can convert an align CelebA dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports labels or landmarks.

There are several ways to convert an align CelebA dataset to other dataset formats using CLI:

datum create
datum import -f align_celeba <path/to/dataset>
datum export -f imagenet_txt -o ./save_dir -- --save-media

or

datum convert -if align_celeba -i <path/to/dataset> \
    -f imagenet_txt -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'align_celeba')
dataset.export('save_dir', 'voc')

Examples

Examples of using this format from the code can be found in the format tests

5 - BraTS

Format specification

The original BraTS dataset is available here. The BraTS data provided since BraTS'17 differs significantly from the data provided during the previous BraTS challenges (i.e., 2016 and backwards). Datumaro supports BraTS'17-20.

Supported annotation types:

  • Mask

Import BraTS dataset

A Datumaro project with a BraTS source can be created in the following way:

datum create
datum import --format brats <path/to/dataset>

It is also possible to import the dataset using Python API:

from datumaro.components.dataset import Dataset

brats_dataset = Dataset.import_from('<path/to/dataset>', 'brats')

BraTS dataset directory should have the following structure:

dataset/
├── imagesTr
│   │── <img1>.nii.gz
│   │── <img2>.nii.gz
│   └── ...
├── imagesTs
│   │── <img3>.nii.gz
│   │── <img4>.nii.gz
│   └── ...
├── labels
└── labelsTr
    │── <img1>.nii.gz
    │── <img2>.nii.gz
    └── ...

The data in Datumaro is stored as multi-frame images (set of 2D images). Annotated images are stored as masks for each 2d image separately with an image_id attribute.

Export to other formats

Datumaro can convert a BraTS dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports segmentation masks.

There are several ways to convert a BraTS dataset to other dataset formats using CLI:

datum create
datum import -f brats <path/to/dataset>
datum export -f voc -o <output/dir> -- --save-media

or

datum convert -if brats -i <path/to/dataset> \
    -f voc -o <output/dir> -- --save-media

Or, using Python API:

from datumaro.components.dataset import Dataset

dataset = Dataset.import_from('<path/to/dataset>', 'brats')
dataset.export('save_dir', 'voc')

Examples

Examples of using this format from the code can be found in the format tests

6 - BraTS Numpy

Format specification

The original BraTS dataset is available here.

Supported annotation types:

  • Mask
  • Cuboid3d

Import BraTS Numpy dataset

A Datumaro project with a BraTS Numpy source can be created in the following way:

datum create
datum import --format brats_numpy <path/to/dataset>

It is also possible to import the dataset using Python API:

from datumaro.components.dataset import Dataset

brats_dataset = Dataset.import_from('<path/to/dataset>', 'brats_numpy')

BraTS Numpy dataset directory should have the following structure:

dataset/
├── <img1>_data_cropped.npy
├── <img1>_label_cropped.npy
├── <img2>_data_cropped.npy
├── <img2>_label_cropped.npy
├── ...
├── labels
├── val_brain_bbox.p
└── val_ids.p

The data in Datumaro is stored as multi-frame images (set of 2D images). Annotated images are stored as masks for each 2d image separately with an image_id attribute.

Export to other formats

Datumaro can convert a BraTS Numpy dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports segmentation masks or cuboids.

There are several ways to convert a BraTS Numpy dataset to other dataset formats using CLI:

datum create
datum import -f brats_numpy <path/to/dataset>
datum export -f voc -o <output/dir> -- --save-media

or

datum convert -if brats_numpy -i <path/to/dataset> \
    -f voc -o <output/dir> -- --save-media

Or, using Python API:

from datumaro.components.dataset import Dataset

dataset = Dataset.import_from('<path/to/dataset>', 'brats_numpy')
dataset.export('save_dir', 'voc')

Examples

Examples of using this format from the code can be found in the format tests

7 - CelebA

Format specification

The original CelebA dataset is available here.

Supported annotation types:

  • Label
  • Bbox
  • Points (landmarks)

Supported attributes:

  • 5_o_Clock_Shadow, Arched_Eyebrows, Attractive, Bags_Under_Eyes, Bald, Bangs, Big_Lips, Big_Nose, Black_Hair, Blond_Hair, Blurry, Brown_Hair, Bushy_Eyebrows, Chubby, Double_Chin, Eyeglasses, Goatee, Gray_Hair, Heavy_Makeup, High_Cheekbones, Male, Mouth_Slightly_Open, Mustache, Narrow_Eyes, No_Beard, Oval_Face, Pale_Skin, Pointy_Nose, Receding_Hairline, Rosy_Cheeks, Sideburns, Smiling, Straight_Hair, Wavy_Hair, Wearing_Earrings, Wearing_Hat, Wearing_Lipstick, Wearing_Necklace, Wearing_Necktie, Young (boolean)

Import CelebA dataset

A Datumaro project with a CelebA source can be created in the following way:

datum create
datum import --format celeba <path/to/dataset>

It is also possible to import the dataset using Python API:

import datumaro as dm

celeba_dataset = dm.Dataset.import_from('<path/to/dataset>', 'celeba')

CelebA dataset directory should have the following structure:

dataset/
├── dataset_meta.json # a list of non-format labels (optional)
├── Anno/
│   ├── identity_CelebA.txt
│   ├── list_attr_celeba.txt
│   ├── list_bbox_celeba.txt
│   └── list_landmarks_celeba.txt
├── Eval/
│   └── list_eval_partition.txt
└── Img/
    └── img_celeba/
        ├── 000001.jpg
        ├── 000002.jpg
        └── ...

The identity_CelebA.txt file contains labels (required). The list_attr_celeba.txt, list_bbox_celeba.txt, list_landmarks_celeba.txt, list_eval_partition.txt files contain attributes, bounding boxes, landmarks and subsets respectively (optional).

The original CelebA dataset stores images in a .7z archive. The archive needs to be unpacked before importing.

To add custom classes, you can use dataset_meta.json.

Export to other formats

Datumaro can convert a CelebA dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports labels, bounding boxes or landmarks.

There are several ways to convert a CelebA dataset to other dataset formats using CLI:

datum create
datum import -f celeba <path/to/dataset>
datum export -f imagenet_txt -o ./save_dir -- --save-media

or

datum convert -if celeba -i <path/to/dataset> \
    -f imagenet_txt -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'celeba')
dataset.export('save_dir', 'voc')

Examples

Examples of using this format from the code can be found in the format tests

8 - CIFAR

Format specification

CIFAR format specification is available here.

Supported annotation types:

  • Label

Datumaro supports Python version CIFAR-10/100. The difference between CIFAR-10 and CIFAR-100 is how labels are stored in the meta files (batches.meta or meta) and in the annotation files. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a “fine” label (the class to which it belongs) and a “coarse” label (the superclass to which it belongs). In CIFAR-10 there are no superclasses.

CIFAR formats contain 32 x 32 images. As an extension, Datumaro supports reading and writing of arbitrary-sized images.

Import CIFAR dataset

The CIFAR dataset is available for free download:

A Datumaro project with a CIFAR source can be created in the following way:

datum create
datum import --format cifar <path/to/dataset>

It is possible to specify project name and project directory. Run datum create --help for more information.

CIFAR-10 dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of non-format labels (optional)
    ├── batches.meta
    ├── <subset_name1>
    ├── <subset_name2>
    └── ...

CIFAR-100 dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of non-format labels (optional)
    ├── meta
    ├── <subset_name1>
    ├── <subset_name2>
    └── ...

Dataset files use the Pickle data format.

Meta files:

CIFAR-10:
    num_cases_per_batch: 1000
    label_names: list of strings (['airplane', 'automobile', 'bird', ...])
    num_vis: 3072

CIFAR-100:
    fine_label_names: list of strings (['apple', 'aquarium_fish', ...])
    coarse_label_names: list of strings (['aquatic_mammals', 'fish', ...])

Annotation files:

Common:
    'batch_label': 'training batch 1 of <N>'
    'data': numpy.ndarray of uint8, layout N x C x H x W
    'filenames': list of strings

    If images have non-default size (32x32) (Datumaro extension):
        'image_sizes': list of (H, W) tuples

CIFAR-10:
    'labels': list of strings

CIFAR-100:
    'fine_labels': list of integers
    'coarse_labels': list of integers

To add custom classes, you can use dataset_meta.json.

Export to other formats

Datumaro can convert a CIFAR dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports the classification task (e.g. MNIST, ImageNet, PascalVOC, etc.)

There are several ways to convert a CIFAR dataset to other dataset formats using CLI:

datum create
datum import -f cifar <path/to/cifar>
datum export -f imagenet -o <output/dir>

or

datum convert -if cifar -i <path/to/dataset> \
    -f imagenet -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'cifar')
dataset.export('save_dir', 'imagenet', save_media=True)

Export to CIFAR

There are several ways to convert a dataset to CIFAR format:

# export dataset into CIFAR format from existing project
datum export -p <path/to/project> -f cifar -o <output/dir> \
    -- --save-media
# converting to CIFAR format from other format
datum convert -if imagenet -i <path/to/dataset> \
    -f cifar -o <output/dir> -- --save-media

Extra options for exporting to CIFAR format:

  • --save-media allow to export dataset with saving media files (by default False)
  • --image-ext <IMAGE_EXT> allow to specify image extension for exporting the dataset (by default .png)
  • --save-dataset-meta - allow to export dataset with saving dataset meta file (by default False)

The format (CIFAR-10 or CIFAR-100) in which the dataset will be exported depends on the presence of superclasses in the LabelCategories.

Examples

Datumaro supports filtering, transformation, merging etc. for all formats and for the CIFAR format in particular. Follow the user manual to get more information about these operations.

There are several examples of using Datumaro operations to solve particular problems with CIFAR dataset:

Example 1. How to create a custom CIFAR-like dataset

import numpy as np
import datumaro as dm

dataset = dm.Dataset.from_iterable([
    dm.DatasetItem(id=0, image=np.ones((32, 32, 3)),
        annotations=[dm.Label(3)]
    ),
    dm.DatasetItem(id=1, image=np.ones((32, 32, 3)),
        annotations=[dm.Label(8)]
    )
], categories=['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck'])

dataset.export('./dataset', format='cifar')

Example 2. How to filter and convert a CIFAR dataset to ImageNet

Convert a CIFAR dataset to ImageNet format, keep only images with the dog class present:

# Download CIFAR-10 dataset:
# https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
datum convert --input-format cifar --input-path <path/to/cifar> \
              --output-format imagenet \
              --filter '/item[annotation/label="dog"]'

Examples of using this format from the code can be found in the format tests

9 - Cityscapes

Format specification

Cityscapes format overview is available here.

Cityscapes format specification is available here.

Supported annotation types:

  • Masks

Supported annotation attributes:

  • is_crowd (boolean). Specifies if the annotation label can distinguish between different instances. If False, the annotation id field encodes the instance id.

Import Cityscapes dataset

The Cityscapes dataset is available for free download.

A Datumaro project with a Cityscapes source can be created in the following way:

datum create
datum import --format cityscapes <path/to/dataset>

Cityscapes dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of non-Cityscapes labels (optional)
    ├── label_colors.txt # a list of non-Cityscapes labels in other format (optional)
    ├── imgsFine/
    │   ├── leftImg8bit
    │   │   ├── <split: train,val, ...>
    │   │   |   ├── {city1}
    │   │   │   |   ├── {city1}_{seq:[0...6]}_{frame:[0...6]}_leftImg8bit.png
    │   │   │   │   └── ...
    │   │   |   ├── {city2}
    │   │   │   └── ...
    │   │   └── ...
    └── gtFine/
        ├── <split: train,val, ...>
        │   ├── {city1}
        │   |   ├── {city1}_{seq:[0...6]}_{frame:[0...6]}_gtFine_color.png
        │   |   ├── {city1}_{seq:[0...6]}_{frame:[0...6]}_gtFine_instanceIds.png
        │   |   ├── {city1}_{seq:[0...6]}_{frame:[0...6]}_gtFine_labelIds.png
        │   │   └── ...
        │   ├── {city2}
        │   └── ...
        └── ...

Annotated files description:

  1. *_leftImg8bit.png - left images in 8-bit LDR format
  2. *_color.png - class labels encoded by its color
  3. *_labelIds.png - class labels are encoded by its index
  4. *_instanceIds.png - class and instance labels encoded by an instance ID. The pixel values encode class and the individual instance: the integer part of a division by 1000 of each ID provides class ID, the remainder is the instance ID. If a certain annotation describes multiple instances, then the pixels have the regular ID of that class

To add custom classes, you can use dataset_meta.json and label_colors.txt. If the dataset_meta.json is not represented in the dataset, then label_colors.txt will be imported if possible.

In label_colors.txt you can define custom color map and non-cityscapes labels, for example:

# label_colors [color_rgb name]
0 124 134 elephant

To make sure that the selected dataset has been added to the project, you can run datum project info, which will display the project information.

Export to other formats

Datumaro can convert a Cityscapes dataset into any other format Datumaro supports. To get the expected result, convert the dataset to formats that support the segmentation task (e.g. PascalVOC, CamVID, etc.)

There are several ways to convert a Cityscapes dataset to other dataset formats using CLI:

datum create
datum import -f cityscapes <path/to/cityscapes>
datum export -f voc -o <output/dir>

or

datum convert -if cityscapes -i <path/to/cityscapes> \
    -f voc -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'cityscapes')
dataset.export('save_dir', 'voc', save_media=True)

Export to Cityscapes

There are several ways to convert a dataset to Cityscapes format:

# export dataset into Cityscapes format from existing project
datum export -p <path/to/project> -f cityscapes -o <output/dir> \
    -- --save-media
# converting to Cityscapes format from other format
datum convert -if voc -i <path/to/dataset> \
    -f cityscapes -o <output/dir> -- --save-media

Extra options for exporting to Cityscapes format:

  • --save-media allow to export dataset with saving media files (by default False)
  • --image-ext IMAGE_EXT allow to specify image extension for exporting dataset (by default - keep original or use .png, if none)
  • --save-dataset-meta - allow to export dataset with saving dataset meta file (by default False)
  • --label_map allow to define a custom colormap. Example:
# mycolormap.txt :
# 0 0 255 sky
# 255 0 0 person
#...
datum export -f cityscapes -- --label-map mycolormap.txt

or you can use original cityscapes colomap:

datum export -f cityscapes -- --label-map cityscapes

Examples

Datumaro supports filtering, transformation, merging etc. for all formats and for the Cityscapes format in particular. Follow the user manual to get more information about these operations.

There are several examples of using Datumaro operations to solve particular problems with a Cityscapes dataset:

Example 1. Load the original Cityscapes dataset and convert to Pascal VOC

datum create -o project
datum import -p project -f cityscapes ./Cityscapes/
datum stats -p project
datum export -p project -o dataset/ -f voc -- --save-media

Example 2. Create a custom Cityscapes-like dataset

from collections import OrderedDict

import numpy as np
import datumaro as dm
import datumaro.plugins.cityscapes_format as Cityscapes

label_map = OrderedDict()
label_map['background'] = (0, 0, 0)
label_map['label_1'] = (1, 2, 3)
label_map['label_2'] = (3, 2, 1)
categories = Cityscapes.make_cityscapes_categories(label_map)

dataset = dm.Dataset.from_iterable([
    dm.DatasetItem(id=1,
        image=np.ones((1, 5, 3)),
        annotations=[
            dm.Mask(image=np.array([[1, 0, 0, 1, 1]]), label=1),
            dm.Mask(image=np.array([[0, 1, 1, 0, 0]]), label=2, id=2,
                attributes={'is_crowd': False}),
        ]
    ),
], categories=categories)

dataset.export('./dataset', format='cityscapes')

Examples of using this format from the code can be found in the format tests

10 - COCO

Format specification

COCO format specification is available here.

The dataset has annotations for multiple tasks. Each task has its own format in Datumaro, and there is also a combined coco format, which includes all the available tasks. The sub-formats have the same options as the “main” format and only limit the set of annotation files they work with. To work with multiple formats, use the corresponding option of the coco format.

Supported tasks / formats:

Supported annotation types (depending on the task):

  • Caption (captions)
  • Label (label, Datumaro extension)
  • Bbox (instances, person keypoints)
  • Polygon (instances, person keypoints)
  • Mask (instances, person keypoints, panoptic, stuff)
  • Points (person keypoints)

Supported annotation attributes:

  • is_crowd (boolean; on bbox, polygon and mask annotations) - Indicates that the annotation covers multiple instances of the same class.
  • score (number; range [0; 1]) - Indicates the confidence in this annotation. Ground truth annotations always have 1.
  • arbitrary attributes (string/number) - A Datumaro extension. Stored in the attributes section of the annotation descriptor.

Import COCO dataset

The COCO dataset is available for free download:

Images:

Annotations:

A Datumaro project with a COCO source can be created in the following way:

datum create
datum import --format coco <path/to/dataset>

It is possible to specify project name and project directory. Run datum create --help for more information.

Extra options for adding a source in the COCO format:

  • --keep-original-category-ids: Add dummy label categories so that category indexes in the imported data source correspond to the category IDs in the original annotation file.

A COCO dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of custom labels (optional)
    ├── images/
    │   ├── train/
    │   │   ├── <image_name1.ext>
    │   │   ├── <image_name2.ext>
    │   │   └── ...
    │   └── val/
    │       ├── <image_name1.ext>
    │       ├── <image_name2.ext>
    │       └── ...
    └── annotations/
        ├── <task>_<subset_name>.json
        └── ...

For the panoptic task, a dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of custom labels (optional)
    ├── images/
    │   ├── train/
    │   │   ├── <image_name1.ext>
    │   │   ├── <image_name2.ext>
    │   │   └── ...
    │   ├── val/
    │   │   ├── <image_name1.ext>
    │   │   ├── <image_name2.ext>
    │   │   └── ...
    └── annotations/
        ├── panoptic_train/
        │   ├── <image_name1.ext>
        │   ├── <image_name2.ext>
        │   └── ...
        ├── panoptic_train.json
        ├── panoptic_val/
        │   ├── <image_name1.ext>
        │   ├── <image_name2.ext>
        │   └── ...
        └── panoptic_val.json

Annotation files must have the names like <task_name>_<subset_name>.json. The year is treated as a part of the subset name. If the annotation file name does’t match this pattern, use one of the task-specific formats instead of plain coco: coco_captions, coco_image_info, coco_instances, coco_labels, coco_panoptic, coco_person_keypoints, coco_stuff. In this case all items of the dataset will be added to the default subset.

To add custom classes, you can use dataset_meta.json.

You can import a dataset for one or several tasks instead of the whole dataset. This option also allows to import annotation files with non-default names. For example:

datum create
datum import --format coco_stuff -r <relpath/to/stuff.json> <path/to/dataset>

To make sure that the selected dataset has been added to the project, you can run datum project info, which will display the project information.

Notes:

  • COCO categories can have any integer ids, however, Datumaro will count annotation category id 0 as “not specified”. This does not contradict the original annotations, because they have category indices starting from 1.

Export to other formats

Datumaro can convert COCO dataset into any other format Datumaro supports. To get the expected result, convert the dataset to formats that support the specified task (e.g. for panoptic segmentation - VOC, CamVID)

There are several ways to convert a COCO dataset to other dataset formats using CLI:

datum create
datum import -f coco <path/to/coco>
datum export -f voc -o <output/dir>

or

datum convert -if coco -i <path/to/coco> -f voc -o <output/dir>

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'coco')
dataset.export('save_dir', 'voc', save_media=True)

Export to COCO

There are several ways to convert a dataset to COCO format:

# export dataset into COCO format from existing project
datum export -p <path/to/project> -f coco -o <output/dir> \
    -- --save-media
# converting to COCO format from other format
datum convert -if voc -i <path/to/dataset> \
    -f coco -o <output/dir> -- --save-media

Extra options for exporting to COCO format:

  • --save-media allow to export dataset with saving media files (by default False)
  • --image-ext IMAGE_EXT allow to specify image extension for exporting dataset (by default - keep original or use .jpg, if none)
  • --save-dataset-meta - allow to export dataset with saving dataset meta file (by default False)
  • --segmentation-mode MODE allow to specify save mode for instance segmentation:
    • ‘guess’: guess the mode for each instance (using ‘is_crowd’ attribute as hint)
    • ‘polygons’: save polygons (merge and convert masks, prefer polygons)
    • ‘mask’: save masks (merge and convert polygons, prefer masks) (by default guess)
  • --crop-covered allow to crop covered segments so that background objects segmentation was more accurate (by default False)
  • --allow-attributes ALLOW_ATTRIBUTES allow export of attributes (by default True). The parameter enables or disables writing the custom annotation attributes to the “attributes” annotation field. This field is an extension to the original COCO format
  • --reindex REINDEX allow to assign new indices to images and annotations, useful to avoid merge conflicts (by default False). This option allows to control if the images and annotations must be given new indices. It can be useful, when you want to preserve the original indices in the produced dataset. Consider having this option enabled when converting from other formats or merging datasets to avoid conflicts
  • --merge-images allow to save all images into a single directory (by default False). The parameter controls the output directory for images. When enabled, the dataset images are saved into a single directory, otherwise they are saved in separate directories by subsets.
  • --tasks TASKS allow to specify tasks for export dataset, by default Datumaro uses all tasks. Example:
datum create
datum import -f coco <path/to/dataset>
datum export -f coco -- --tasks instances,stuff

Examples

Datumaro supports filtering, transformation, merging etc. for all formats and for the COCO format in particular. Follow the user manual to get more information about these operations.

There are several examples of using Datumaro operations to solve particular problems with a COCO dataset:

Example 1. How to load an original panoptic COCO dataset and convert to Pascal VOC

datum create -o project
datum import -p project -f coco_panoptic ./COCO/annotations/panoptic_val2017.json
datum stats -p project
datum export -p project -f voc -- --save-media

Example 2. How to create custom COCO-like dataset

import numpy as np
import datumaro as dm

dataset = dm.Dataset.from_iterable([
  dm.DatasetItem(id='000000000001',
    image=np.ones((1, 5, 3)),
    subset='val',
    attributes={'id': 40},
    annotations=[
      dm.Mask(image=np.array([[0, 0, 1, 1, 0]]), label=3,
        id=7, group=7, attributes={'is_crowd': False}),
      dm.Mask(image=np.array([[0, 1, 0, 0, 1]]), label=1,
        id=20, group=20, attributes={'is_crowd': True}),
    ]
  ),
], categories=['a', 'b', 'c', 'd'])

dataset.export('./dataset', format='coco_panoptic')

Examples of using this format from the code can be found in the format tests

11 - Common Semantic Segmentation

Format specification

CSS format specification is available here.

Supported annotation types:

  • Masks

Import Common Semantic Segmentation dataset

A Datumaro project with a CSS source can be created in the following way:

datum create
datum import --format common_semantic_segmentation <path/to/dataset>

Extra import options:

  • --image-prefix IMAGE_PREFIX allow to import dataset with custom image prefix (by default ‘')
  • --mask-prefix MASK_PREFIX allow to import dataset with custom mask prefix (by default ‘')

CSS dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of labels
    ├── images/
    │   ├── <img1>.png
    │   ├── <img2>.png
    │   └── ...
    └── masks/
        ├── <img1>.png
        ├── <img2>.png
        └── ...

To describe classes and colors, you should use dataset_meta.json.

To make sure that the selected dataset has been added to the project, you can run datum project info, which will display the project information.

Export to other formats

Datumaro can convert a CSS dataset into any other format Datumaro supports. To get the expected result, convert the dataset to formats that support the segmentation task (e.g. PASCAL VOC, CamVid, Cityscapes, etc.)

There are several ways to convert a CSS dataset to other dataset formats using CLI:

datum create
datum import -f common_semantic_segmentation <path/to/dataset>
datum export -f voc -o <output/dir>

or

datum convert -if common_semantic_segmentation -i <path/to/dataset> \
    -f cityscapes -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'common_semantic_segmentation')
dataset.export('save_dir', 'camvid', save_media=True)

Examples

Examples of using this format from the code can be found in the format tests

12 - Common Super Resolution

Format specification

CSR format specification is available here.

Supported annotation types:

  • SuperResolutionAnnotation

Supported attributes:

  • upsampled (Image): upsampled image

Import Common Super Resolution dataset

A Datumaro project with a CSR source can be created in the following way:

datum create
datum import --format common_super_resolution <path/to/dataset>

CSR dataset directory should have the following structure:

└─ Dataset/
    ├── HR/
    │   ├── <img1>.png
    │   ├── <img2>.png
    │   └── ...
    ├── LR/
    │   ├── <img1>.png
    │   ├── <img2>.png
    │   └── ...
    └── upsampled/ # optional
        ├── <img1>.png
        ├── <img2>.png
        └── ...

To make sure that the selected dataset has been added to the project, you can run datum project info, which will display the project information.

Examples

Examples of using this format from the code can be found in the format tests

13 - ICDAR

Format specification

ICDAR is a dataset for text recognition task, it’s available for download here. There is exists two most popular version of this dataset: ICDAR13 and ICDAR15, Datumaro supports both of them.

Original dataset contains the following subformats:

  • ICDAR word recognition;
  • ICDAR text localization;
  • ICDAR text segmentation.

Supported types of annotations:

  • ICDAR word recognition
    • Caption
  • ICDAR text localization
    • Polygon, Bbox
  • ICDAR text segmentation
    • Mask

Supported attributes:

  • ICDAR text localization
    • text: transcription of text is inside a Polygon/Bbox.
  • ICDAR text segmentation
    • index: identifier of the annotation object, which is encoded in the mask and coincides with the line number in which the description of this object is written;
    • text: transcription of text is inside a Mask;
    • color: RGB values of the color corresponding text in the mask image (three numbers separated by space);
    • center: coordinates of the center of text (two numbers separated by space).

Import ICDAR dataset

There is few ways to import ICDAR dataset with Datumaro:

  • Through the Datumaro project
datum create
datum import -f icdar_text_localization <text_localization_dataset>
datum import -f icdar_text_segmentation <text_segmentation_dataset>
datum import -f icdar_word_recognition <word_recognition_dataset>
  • With Python API
import datumaro as dm
data1 = dm.Dataset.import_from('text_localization_path', 'icdar_text_localization')
data2 = dm.Dataset.import_from('text_segmentation_path', 'icdar_text_segmentation')
data3 = dm.Dataset.import_from('word_recognition_path', 'icdar_word_recognition')

Dataset with ICDAR dataset should have the following structure:

For icdar_word_recognition

<dataset_path>/
├── <subset_name_1>
│   ├── gt.txt
│   └── images
│       ├── word_1.png
│       ├── word_2.png
│       ├── ...
├── <subset_name_2>
├── ...

For icdar_text_localization

<dataset_path>/
├── <subset_name_1>
│   ├── gt_img_1.txt
│   ├── gt_img_2.txt
│   ├── ...
│   └── images
│       ├── img_1.png
│       ├── img_2.png
│       ├── ...
├── <subset_name_2>
│   ├── ...
├── ...

For icdar_text_segmentation

<dataset_path>/
├── <subset_name_1>
│   ├── image_1_GT.bmp # mask for image_1
│   ├── image_1_GT.txt # description of mask objects on the image_1
│   ├── image_2_GT.bmp
│   ├── image_2_GT.txt
│   ├── ...
│   └── images
│       ├── image_1.png
│       ├── image_2.png
│       ├── ...
├── <subset_name_2>
│   ├── ...
├── ...

See more information about adding datasets to the project in the docs.

Export to other formats

Datumaro can convert ICDAR dataset into any other format Datumaro supports. Examples:

# converting ICDAR text segmentation dataset into the VOC with `convert` command
datum convert -if icdar_text_segmentation -i source_dataset \
    -f voc -o export_dir -- --save-media
# converting ICDAR text localization into the LabelMe through Datumaro project
datum create
datum import -f icdar_text_localization source_dataset
datum export -f label_me -o ./export_dir -- --save-media

Note: some formats have extra export options. For particular format see the docs to get information about it.

With Datumaro you can also convert your dataset to one of the ICDAR formats, but to get expected result, the source dataset should contain required attributes, described in previous section.

Note: in case with icdar_text_segmentation format, if your dataset contains masks without attribute color then it will be generated automatically.

Available extra export options for ICDAR dataset formats:

  • --save-media allow to export dataset with saving media files (by default False)
  • --image-ext IMAGE_EXT allow to specify image extension for exporting dataset (by default - keep original)

14 - Image zip

Format specification

The image zip format allows to export/import unannotated datasets with images to/from a zip archive. The format doesn’t support any annotations or attributes.

Import Image zip dataset

There are several ways to import unannotated datasets to your Datumaro project:

  • From an existing archive:
datum create
datum import -f image_zip ./images.zip
  • From a directory with zip archives. Datumaro will import images from all zip files in the directory:
datum create
datum import -f image_zip ./foo

The directory with zip archives must have the following structure:

└── foo/
    ├── archive1.zip/
    |   ├── image_1.jpg
    |   ├── image_2.png
    |   ├── subdir/
    |   |   ├── image_3.jpg
    |   |   └── ...
    |   └── ...
    ├── archive2.zip/
    |   ├── image_101.jpg
    |   ├── image_102.jpg
    |   └── ...
    ...

Images in the archives must have a supported extension, follow the user manual to see the supported extensions.

Export to other formats

Datumaro can convert image zip dataset into any other format Datumaro supports. For example:

datum create -o project
datum import -p project -f image_zip ./images.zip
datum export -p project -f coco -o ./new_dir -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'image_zip')
dataset.export('save_dir', 'coco', save_media=True)

Export an unannotated dataset to a zip archive

Example: exporting images from a VOC dataset to zip archives:

datum create -o project
datum import -p project -f voc ./VOC2012
datum export -p project -f image_zip -- --name voc_images.zip

Extra options for exporting to image_zip format:

  • --save-media allow to export dataset with saving media files (default: False)
  • --image-ext <IMAGE_EXT> allow to specify image extension for exporting dataset (default: use original or .jpg, if none)
  • --name name of output zipfile (default: default.zip)
  • --compression allow to specify archive compression method. Available methods: ZIP_STORED, ZIP_DEFLATED, ZIP_BZIP2, ZIP_LZMA (default: ZIP_STORED). Follow zip documentation for more information.

Examples

Examples of using this format from the code can be found in the format tests

15 - ImageNet

Format specification

ImageNet is one of the most popular datasets for image classification task, this dataset is available for downloading here

Supported types of annotations:

  • Label

Format doesn’t support any attributes for annotations objects.

The original ImageNet dataset contains about 1.2M images and information about class name for each image. Datumaro supports two versions of ImageNet format: imagenet and imagenet_txt. The imagenet_txt format assumes storing information about the class of the image in *.txt files. And imagenet format assumes storing information about the class of the image in the name of directory where is this image stored.

Import ImageNet dataset

A Datumaro project with a ImageNet dataset can be created in the following way:

datum create
datum import -f imagenet <path_to_dataset>
# or
datum import -f imagenet_txt <path_to_dataset>

Note: if you use datum import then <path_to_dataset> should not be a subdirectory of directory with Datumaro project, see more information about it in the docs.

Load ImageNet dataset through the Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path_to_dataset>', format='imagenet_txt')

For successful importing of ImageNet dataset the input directory with dataset should has the following structure:

imagenet_dataset/
├── label_0
│   ├── <image_name_0>.jpg
│   ├── <image_name_1>.jpg
│   ├── <image_name_2>.jpg
│   ├── ...
├── label_1
│    ├── <image_name_0>.jpg
│    ├── <image_name_1>.jpg
│    ├── <image_name_2>.jpg
│    ├── ...
├── ...
  
imagenet_txt_dataset/
├── images # directory with images
│   ├── <image_name_0>.jpg
│   ├── <image_name_1>.jpg
│   ├── <image_name_2>.jpg
│   ├── ...
├── synsets.txt # optional, list of labels
└── train.txt   # list of pairs (image_name, label)
  

Note: if you don’t have synsets file then Datumaro will automatically generate classes with a name pattern class-<i>.

Datumaro has few import options for imagenet_txt format, to apply them use the -- after the main command argument.

imagenet_txt import options:

  • --labels {file, generate}: allow to specify where to get label descriptions from (use file to load from the file specified by --labels-file; generate to create generic ones)
  • --labels-file allow to specify path to the file with label descriptions (“synsets.txt”)

Export ImageNet dataset

Datumaro can convert ImageNet into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports Label annotation objects.

# Using `convert` command
datum convert -if imagenet -i <path_to_imagenet> \
    -f voc -o <output_dir> -- --save-media

# Using Datumaro project
datum create
datum import -f imagenet_txt <path_to_imagenet> -- --labels generate
datum export -f open_images -o <output_dir>

And also you can convert your ImageNet dataset using Python API

import datumaro as dm

imagenet_dataset = dm.Dataset.import_from('<path_to_dataset', format='imagenet')

imagenet_dataset.export('<output_dir>', format='vgg_face2', save_media=True)

Note: some formats have extra export options. For particular format see the docs to get information about it.

Export dataset to the ImageNet format

If your dataset contains Label for images and you want to convert this dataset into the ImagetNet format, you can use Datumaro for it:

# Using convert command
datum convert -if open_images -i <path_to_oid> \
    -f imagenet_txt -o <output_dir> -- --save-media --save-dataset-meta

# Using Datumaro project
datum create
datum import -f open_images <path_to_oid>
datum export -f imagenet -o <output_dir>

Extra options for exporting to ImageNet formats:

  • --save-media allow to export dataset with saving media files (by default False)
  • --image-ext <IMAGE_EXT> allow to specify image extension for exporting the dataset (by default .png)
  • --save-dataset-meta - allow to export dataset with saving dataset meta file (by default False)

16 - Kinetics

Format specification

Kinetics 400/600/700 is a video datasets for action recognition task. Dataset is available for downloading here

Supported media type:

  • Video

Supported type of annotations:

  • Label

Supported attributes for labels:

  • time_start (integer) - time (in seconds) of the start of recognized action
  • time_end (integer) - time (in seconds) of the end of recognized action

Import Kinetics dataset

A Datumaro project with a Kinetics dataset can be created in the following way using CLI:

datum create
datum import -f kinetics <path_to_dataset>

Or using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path_to_dataset>', format='kinetics')
├── test.csv
├── train.json
├── train
│   ├── <name_of_video_1_with_yt_id>.avi # extension of video could be other
│   ├── <name_of_video_2_with_yt_id>.avi
│   ├── ...
└── test
    ├── <name_of_video_100_with_yt_id>.avi # extension of video could be other
    ├── <name_of_video_101_with_yt_id>.avi
    ├── ...

Kinetics dataset has two equivalent annotation file formats: .csv and .json. Datumaro supports both, but in case when two annotation files have same names but different extensions Datumaro will use .csv.

Note: name of each video file must contain youtube_id of this video, that specified in annotation file. And to speed up the import, you can leave only the youtube_id in the video filename.

See the full list of supported video extensions here.

17 - KITTI

Format specification

The KITTI dataset has many annotations for different tasks. Datumaro supports only a few of them.

Supported tasks / formats:

  • Object Detection - kitti_detection The format specification is available in README.md here.
  • Segmentation - kitti_segmentation The format specification is available in README.md here.
  • Raw 3D / Velodyne Points - described here

Supported annotation types:

  • Bbox (object detection)
  • Mask (segmentation)

Supported annotation attributes:

  • truncated (boolean) - indicates that the bounding box specified for the object does not correspond to the full extent of the object
  • occluded (boolean) - indicates that a significant portion of the object within the bounding box is occluded by another object
  • score (float) - indicates confidence in detection

Import KITTI dataset

The KITTI left color images for object detection are available here. The KITTI object detection labels are available here. The KITTI segmentation dataset is available here.

A Datumaro project with a KITTI source can be created in the following way:

datum create
datum import --format kitti <path/to/dataset>

It is possible to specify project name and project directory. Run datum create --help for more information.

KITTI detection dataset directory should have the following structure:

└─ Dataset/
    ├── testing/
    │   └── image_2/
    │       ├── <name_1>.<img_ext>
    │       ├── <name_2>.<img_ext>
    │       └── ...
    └── training/
        ├── image_2/ # left color camera images
        │   ├── <name_1>.<img_ext>
        │   ├── <name_2>.<img_ext>
        │   └── ...
        └─── label_2/ # left color camera label files
            ├── <name_1>.txt
            ├── <name_2>.txt
            └── ...

KITTI segmentation dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of non-format labels (optional)
    ├── label_colors.txt # optional, color map for non-original segmentation labels
    ├── testing/
    │   └── image_2/
    │       ├── <name_1>.<img_ext>
    │       ├── <name_2>.<img_ext>
    │       └── ...
    └── training/
        ├── image_2/ # left color camera images
        │   ├── <name_1>.<img_ext>
        │   ├── <name_2>.<img_ext>
        │   └── ...
        ├── label_2/ # left color camera label files
        │   ├── <name_1>.txt
        │   ├── <name_2>.txt
        │   └── ...
        ├── instance/ # instance segmentation masks
        │   ├── <name_1>.png
        │   ├── <name_2>.png
        │   └── ...
        ├── semantic/ # semantic segmentation masks (labels are encoded by its id)
        │   ├── <name_1>.png
        │   ├── <name_2>.png
        │   └── ...
        └── semantic_rgb/ # semantic segmentation masks (labels are encoded by its color)
            ├── <name_1>.png
            ├── <name_2>.png
            └── ...

To add custom classes, you can use dataset_meta.json and label_colors.txt. If the dataset_meta.json is not represented in the dataset, then label_colors.txt will be imported if possible.

You can import a dataset for specific tasks of KITTI dataset instead of the whole dataset, for example:

datum import --format kitti_detection <path/to/dataset>

To make sure that the selected dataset has been added to the project, you can run datum project info, which will display the project information.

Export to other formats

Datumaro can convert a KITTI dataset into any other format Datumaro supports.

Such conversion will only be successful if the output format can represent the type of dataset you want to convert, e.g. segmentation annotations can be saved in Cityscapes format, but not as COCO keypoints.

There are several ways to convert a KITTI dataset to other dataset formats:

datum create
datum import -f kitti <path/to/kitti>
datum export -f cityscapes -o <output/dir>

or

datum convert -if kitti -i <path/to/kitti> -f cityscapes -o <output/dir>

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'kitti')
dataset.export('save_dir', 'cityscapes', save_media=True)

Export to KITTI

There are several ways to convert a dataset to KITTI format:

# export dataset into KITTI format from existing project
datum export -p <path/to/project> -f kitti -o <output/dir> \
    -- --save-media
# converting to KITTI format from other format
datum convert -if cityscapes -i <path/to/dataset> \
    -f kitti -o <output/dir> -- --save-media

Extra options for exporting to KITTI format:

  • --save-media allow to export dataset with saving media files (by default False)
  • --image-ext IMAGE_EXT allow to specify image extension for exporting dataset (by default - keep original or use .png, if none)
  • --save-dataset-meta - allow to export dataset with saving dataset meta file (by default False)
  • --apply-colormap APPLY_COLORMAP allow to use colormap for class masks (in folder semantic_rgb, by default True)
  • --label_map allow to define a custom colormap. Example:
# mycolormap.txt :
# 0 0 255 sky
# 255 0 0 person
#...
datum export -f kitti -- --label-map mycolormap.txt

or you can use original kitti colomap:

datum export -f kitti -- --label-map kitti
  • --tasks TASKS allow to specify tasks for export dataset, by default Datumaro uses all tasks. Example:
datum export -f kitti -- --tasks detection
  • --allow-attributes ALLOW_ATTRIBUTES allow export of attributes (by default True).

Examples

Datumaro supports filtering, transformation, merging etc. for all formats and for the KITTI format in particular. Follow the user manual to get more information about these operations.

There are several examples of using Datumaro operations to solve particular problems with KITTI dataset:

Example 1. How to load an original KITTI dataset and convert to Cityscapes

datum create -o project
datum import -p project -f kitti ./KITTI/
datum stats -p project
datum export -p project -f cityscapes -- --save-media

Example 2. How to create a custom KITTI-like dataset

import numpy as np
import datumaro as dm

import datumaro.plugins.kitti_format as KITTI

label_map = {}
label_map['background'] = (0, 0, 0)
label_map['label_1'] = (1, 2, 3)
label_map['label_2'] = (3, 2, 1)
categories = KITTI.make_kitti_categories(label_map)

dataset = dm.Dataset.from_iterable([
  dm.DatasetItem(id=1,
    image=np.ones((1, 5, 3)),
    annotations=[
      dm.Mask(image=np.array([[1, 0, 0, 1, 1]]), label=1, id=0,
        attributes={'is_crowd': False}),
      dm.Mask(image=np.array([[0, 1, 1, 0, 0]]), label=2, id=0,
        attributes={'is_crowd': False}),
    ]
  ),
], categories=categories)

dataset.export('./dataset', format='kitti')

Examples of using this format from the code can be found in the format tests

18 - LFW

Format specification

LFW (Labeled Faces in the Wild Home) it’s dataset for face identification task, specification for this format is available here. You can also download original LFW dataset here.

Original dataset contains images with people faces. For each image contains information about person’s name, as well as information about images that matched with this person and mismatched with this person. Also LFW contains additional information about landmark points on the face.

Supported annotation types:

  • Label
  • Points (face landmark points)

Supported attributes:

  • negative_pairs: list with names of mismatched persons;
  • positive_pairs: list with names of matched persons;

Import LFW dataset

Importing LFW dataset into the Datumaro project:

datum create
datum import -f lfw <path_to_lfw_dataset>

See more information about adding datasets to the project in the docs.

Also you can import LFW dataset from Python API:

import datumaro as dm

lfw_dataset = dm.Dataset.import_from('<path_to_lfw_dataset>', 'lfw')

For successful importing the LFW dataset, the directory with it should has the following structure:

<path_to_lfw_dataset>/
├── subset_1
│    ├── annotations
│    │   ├── landmarks.txt # list with landmark points for each image
│    │   ├── pairs.txt # list of matched and mismatched pairs of person
│    │   └── people.txt # optional file with a list of persons name
│    └── images
│        ├── name0
│        │   ├── name0_0001.jpg
│        │   ├── name0_0002.jpg
│        │   ├── ...
│        ├── name1
│        │   ├── name1_0001.jpg
│        │   ├── name1_0002.jpg
│        │   ├── ...
├── subset_2
│    ├── ...
├── ...

Full description of annotation *.txt files available here.

Export LFW dataset

With Datumaro you can convert LFW dataset into any other format Datumaro supports. Pay attention that this format should also support Label and/or Points annotation types.

There is few ways to convert LFW dataset into other format:


# Converting to ImageNet with `convert` command:
datum convert -if lfw -i ./lfw_dataset \
    -f imagenet -o ./output_dir -- --save-media


# Converting to VggFace2 through the Datumaro project:
datum create
datum add -f lfw ./lfw_dataset
datum export -f vgg_face2 -o ./output_dir2

Note: some formats have extra export options. For particular format see the docs to get information about it.

Export dataset to the LFW format

With Datumaro you can export dataset that has Label or/and Points annotations to the LFW format, example:

# Converting VGG Face2 dataset into the LFW format
datum convert -if vgg_face2 -i ./voc_dataset \
    -f lfw -o ./output_dir


# Export dataaset to the LFW format through the Datumaro project:
datum create
datum import -f voc_classification ../vgg_dataset
datum export -f lfw -o ./output_dir -- --save-media --image-ext png

Available extra export options for LFW dataset format:

  • --save-media allow to export dataset with saving media files (by default False)
  • --image-ext IMAGE_EXT allow to specify image extension for exporting dataset (by default - keep original)

19 - Mapillary Vistas

Format specification

Mapillary Vistas dataset homepage is available here. After registration the dataset will be available for downloading. The specification for this format contains in the root directory of original dataset.

Supported annotation types: - Mask (class, instances, panoptic) - Polygon

Supported atttibutes: - is_crowd(boolean; on panoptic mask): Indicates that the annotation covers multiple instances of the same class.

Import Mapillary Vistas dataset

Use these instructions to import Mapillary Vistas dataset into Datumaro project:

datum create
datum add -f mapillary_vistas ./dataset

Note: the directory with dataset should be subdirectory of the project directory.

Note: there is no opportunity to import both instance and panoptic masks for one dataset.

Use one of subformats (mapillary_vistas_instances, mapillary_vistas_panoptic), if your dataset contains both panoptic and instance masks:

datum add -f mapillary_vistas_instances ./dataset

or

datum add -f mapillary_vistas_panoptic ./dataset

Extra options for adding a source in the Mapillary Vistas format:

  • --use-original-config: Use original config_*.json file for your version of Mapillary Vistas dataset. This options can helps to import dataset, in case when you don’t have config_*.json file, but your dataset is using original categories of Mapillary Vistas dataset. The version of dataset will be detect by the name of annotation directory in your dataset (v1.2 or v2.0).
  • --keep-original-category-ids: Add dummy label categories so that category indexes in the imported data source correspond to the category IDs in the original annotation file.

Example of using extra options:

datum add -f mapillary_vistas ./dataset -- --use-original-config

Mapillary Vistas dataset has two versions: v1.2, v2.0. They differ in the number of classes, the name of the classes, supported types of annotations, and the names of the directory with annotations. So, the directory with dataset should have one of these structures:

dataset
├── dataset_meta.json # a list of custom labels (optional)
├── config_v1.2.json # config file with description of classes (id, color, name)
├── <subset_name1>
│   ├── images
│   │   ├── <image_name1>.jpg
│   │   ├── <image_name2>.jpg
│   │   ├── ...
│   └── v1.2
│       ├── instances # directory with instance masks
│       │   └── <image_name1>.png
│       │   ├── <image_name2>.png
│       │   ├── ...
│       └── labels # directory with class masks
│           └── <image_name1>.png
│           ├── <image_name2>.png
│           ├── ...
├── <subset_name2>
│   ├── ...
├── ...
  
dataset
├── config_v2.0.json
├── <subset_name1> # config file with description of classes (id, color, name)
│   ├── images
│   │   ├── <image_name1>.jpg
│   │   ├── <image_name2>.jpg
│   │   ├── ...
│   └── v2.0
│       ├── instances # directory with instance masks
│       │   ├── <image_name1>.png
│       │   ├── <image_name2>.png
│       │   ├── ...
│       ├── labels # directory with class masks
│       │   ├── <image_name1>.png
│       │   ├── <image_name2>.png
│       │   ├── ...
│       ├── panoptic # directory with panoptic masks and panoptic config file
│       │   ├── panoptic_2020.json # description of classes and annotations
│       │   ├── <image_name1>.png
│       │   ├── <image_name2>.png
│       │   ├── ...
│       └── polygons # directory with description of polygons
│           ├── <image_name1>.json
│           ├── <image_name2>.json
│           ├── ...
├── <subset_name2>
    ├── ...
├── ...
  
dataset
├── config_v1.2.json # config file with description of classes (id, color, name)
├── images
│   ├── <image_name1>.jpg
│   ├── <image_name2>.jpg
│   ├── ...
└── v1.2
    ├── instances # directory with instance masks
    │   └── <image_name1>.png
    │   ├── <image_name2>.png
    │   ├── ...
    └── labels # directory with class masks
        └── <image_name1>.png
        ├── <image_name2>.png
        ├── ...
  
dataset
├── config_v2.0.json
├── images
│   ├── <image_name1>.jpg
│   ├── <image_name2>.jpg
│   ├── ...
└── v2.0
    ├── instances # directory with instance masks
    │   ├── <image_name1>.png
    │   ├── <image_name2>.png
    │   ├── ...
    ├── labels # directory with class masks
    │   ├── <image_name1>.png
    │   ├── <image_name2>.png
    │   ├── ...
    ├── panoptic # directory with panoptic masks and panoptic config file
    │   ├── panoptic_2020.json # description of classes and annotation objects
    │   ├── <image_name1>.png
    │   ├── <image_name2>.png
    │   ├── ...
    └── polygons # directory with description of polygons
        ├── <image_name1>.json
        ├── <image_name2>.json
        ├── ...
  

To add custom classes, you can use dataset_meta.json.

See examples of annotation files in test assets.

20 - Market-1501

Format specification

Market-1501 is a dataset for person re-identification task, link for downloading this dataset is available here.

Supported items attributes:

  • person_id (str): four-digit number that represent ID of pedestrian;
  • camera_id (int): one-digit number that represent ID of camera that took the image (original dataset has totally 6 cameras);
  • track_id (int): one-digit number that represent ID of the track with the particular pedestrian, this attribute matches with sequence_id in the original dataset;
  • frame_id (int): six-digit number, that mean number of frame within this track. For the tracks, their names are accumulated for each ID, but for frames, they start from “0001” in each track;
  • bbox_id (int): two-digit number, that mean number of bounding bbox that was selected for that image (see the original docs for more info).

These item attributes decodes into the image name with such convention:

0000_c1s1_000000_00.jpg
  • first four digits indicate the person_id;
  • digit after c indicates the camera_id;
  • digit after s indicate the track_id;
  • six digits after s1_ indicate the frame_id;
  • the last two digits before .jpg indicate the bbox_id.

Import Market-1501 dataset

Importing of Market-1501 dataset into the Datumaro project:

datum create
datum import -f market1501 <path_to_market1501>

See more information about adding datasets to the project in the docs.

Or you can import Market-1501 using Python API:

import datumaro as dm
dataset = dm.Dataset.import_from('<path_to_dataset>', 'market1501')

For successful importing the Market-1501 dataset, the directory with it should has the following structure:

market1501_dataset/
├── query # optional directory with query image
│   ├── 0001_c1s1_001051_00.jpg
│   ├── 0002_c1s1_001051_00.jpg
│   ├── ...
├── bounding_box_<subset_name1>
│   ├── 0003_c1s1_001051_00.jpg
│   ├── 0003_c2s1_001054_01.jpg
│   ├── 0004_c1s1_001051_00.jpg
│   ├── ...
├── bounding_box_<subset_name2>
│   ├── 0005_c1s1_001051_00.jpg
│   ├── 0006_c1s1_001051_00.jpg
│   ├── ...
├── ...

Export dataset to the Market-1501 format

With Datumaro you can export dataset, that has person_id item attribute, to the Market-1501 format, example:

# Converting MARS dataset into the Market-1501
datum convert -if mars -i ./mars_dataset \
    -f market1501 -o ./output_dir
# Export dataaset to the Market-1501 format through the Datumaro project:
datum create
datum add -f mars ../mars
datum export -f market1501 -o ./output_dir -- --save-media --image-ext png

Note: if your dataset contains only person_id attributes Datumaro will assign default values for other attributes (camera_id, track_id, bbox_id) and increment frame_id for collisions.

Available extra export options for Market-1501 dataset format:

  • --save-media allow to export dataset with saving media files (by default False)
  • --image-ext IMAGE_EXT allow to specify image extension for exporting dataset (by default - keep original)

21 - MARS

Format specification

MARS is a dataset for the motion analysis and person identification task. MARS dataset is available for downloading here

Supported types of annotations:

  • Bbox

Required attributes:

  • person_id (str): four-digit number that represent ID of pedestrian;
  • camera_id (int): one-digit number that represent ID of camera that took the image (original dataset has totally 6 cameras);
  • track_id (int): four-digit number that represent ID of the track with the particular pedestrian;
  • frame_id (int): three-digit number, that mean number of frame within this track. For the tracks, their names are accumulated for each ID, but for frames, they start from “0001” in each track.

Import MARS dataset

Use these instructions to import MARS dataset into Datumaro project:

datum create
datum add -f mars ./dataset

Note: the directory with dataset should be subdirectory of the project directory.

mars_dataset
├── <bbox_subset_name1>
│   ├── 0001 # directory with images of pedestrian with id 0001
│   │   ├── 0001C1T0001F001.jpg
│   │   ├── 0001C1T0001F002.jpg
│   │   ├── ...
│   ├── 0002 # directory with images of pedestrian with id 0002
│   │   ├── 0002C1T0001F001.jpg
│   │   ├── 0002C1T0001F001.jpg
│   │   ├── ...
│   ├── 0000 # distractors images, which negatively affect retrieval accuracy.
│   │   ├── 0000C1T0001F001.jpg
│   │   ├── 0000C1T0001F001.jpg
│   │   ├── ...
│   ├── 00-1 # junk images which do not affect retrieval accuracy
│   │   ├── 00-1C1T0001F001.jpg
│   │   ├── 00-1C1T0001F001.jpg
│   │   ├── ...
├── <bbox_subset_name2>
│   ├── ...
├── ...

All images in MARS dataset follow a strict convention of naming:

xxxxCxTxxxxFxxx.jpg
  • the first four digits indicate the pedestrian’s number;
  • digit after C indicates the camera id;
  • four digits after T indicate the track id for this pedestrian;
  • three digits after F indicate the frame id with this track.

Note: there are two specific pedestrian IDs 0000 and 00-1 which indicate distracting images and unwanted images respectively.

22 - MNIST

Format specification

MNIST format specification is available here.

Fashion MNIST format specification is available here.

MNIST in CSV format specification is available here.

The dataset has several data formats available. Datumaro supports the binary (Python pickle) format and the CSV variant. Each data format is covered by a separate Datumaro format.

Supported formats:

  • Binary (Python pickle) - mnist
  • CSV - mnist_csv

Supported annotation types:

  • Label

The format only supports single channel 28 x 28 images.

Import MNIST dataset

The MNIST dataset is available for free download:

The Fashion MNIST dataset is available for free download:

The MNIST in CSV dataset is available for free download:

A Datumaro project with a MNIST source can be created in the following way:

datum create
datum import --format mnist <path/to/dataset>
datum import --format mnist_csv <path/to/dataset>

MNIST dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of non-format labels (optional)
    ├── labels.txt # a list of non-digit labels  in other format (optional)
    ├── t10k-images-idx3-ubyte.gz
    ├── t10k-labels-idx1-ubyte.gz
    ├── train-images-idx3-ubyte.gz
    └── train-labels-idx1-ubyte.gz

MNIST in CSV dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of non-format labels (optional)
    ├── labels.txt # a list of non-digit labels  in other format (optional)
    ├── mnist_test.csv
    └── mnist_train.csv

To add custom classes, you can use dataset_meta.json and labels.txt. If the dataset_meta.json is not represented in the dataset, then labels.txt will be imported if possible.

For example, labels.txt for Fashion MNIST the following contents:

T-shirt/top
Trouser
Pullover
Dress
Coat
Sandal
Shirt
Sneaker
Bag
Ankle boot

Export to other formats

Datumaro can convert a MNIST dataset into any other format Datumaro supports. To get the expected result, convert the dataset to formats that support the classification task (e.g. CIFAR-10/100, ImageNet, PascalVOC, etc.)

There are several ways to convert a MNIST dataset to other dataset formats:

datum create
datum import -f mnist <path/to/mnist>
datum export -f imagenet -o <output/dir>

or

datum convert -if mnist -i <path/to/mnist> -f imagenet -o <output/dir>

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'mnist')
dataset.export('save_dir', 'imagenet', save_media=True)

These steps also will work for MNIST in CSV, if you use mnist_csv instead of mnist.

Export to MNIST

There are several ways to convert a dataset to MNIST format:

# export dataset into MNIST format from existing project
datum export -p <path/to/project> -f mnist -o <output/dir> \
    -- --save-media
# converting to MNIST format from other format
datum convert -if imagenet -i <path/to/dataset> \
    -f mnist -o <output/dir> -- --save-media

Extra options for exporting to MNIST format:

  • --save-media allow to export dataset with saving media files (by default False)
  • --image-ext <IMAGE_EXT> allow to specify image extension for exporting dataset (by default .png)
  • --save-dataset-meta - allow to export dataset with saving dataset meta file (by default False)

These commands also work for MNIST in CSV if you use mnist_csv instead of mnist.

Examples

Datumaro supports filtering, transformation, merging etc. for all formats and for the MNIST format in particular. Follow the user manual to get more information about these operations.

There are several examples of using Datumaro operations to solve particular problems with MNIST dataset:

Example 1. How to create a custom MNIST-like dataset

import numpy as np
import datumaro as dm

dataset = dm.Dataset.from_iterable([
    dm.DatasetItem(id=0, image=np.ones((28, 28)),
        annotations=[dm.Label(2)]
    ),
    dm.DatasetItem(id=1, image=np.ones((28, 28)),
        annotations=[dm.Label(7)]
    )
], categories=[str(label) for label in range(10)])

dataset.export('./dataset', format='mnist')

Example 2. How to filter and convert a MNIST dataset to ImageNet

Convert MNIST dataset to ImageNet format, keep only images with 3 class presented:

# Download MNIST dataset:
# https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
# https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
datum convert --input-format mnist --input-path <path/to/mnist> \
              --output-format imagenet \
              --filter '/item[annotation/label="3"]'

Examples of using this format from the code can be found in the binary format tests and csv format tests

23 - MPII Human Pose Dataset

Format specification

The original MPII Human Pose Dataset is available here.

Supported annotation types:

  • Bbox
  • Points

Supported attributes:

  • center (a list with two coordinates of the center point of the object)
  • scale (float)

Import MPII Human Pose Dataset

A Datumaro project with an MPII Human Pose Dataset source can be created in the following way:

datum create
datum import --format mpii <path/to/dataset>

It is also possible to import the dataset using Python API:

import datumaro as dm

mpii_dataset = dm.Dataset.import_from('<path/to/dataset>', 'mpii')

MPII Human Pose Dataset directory should have the following structure:

dataset/
├── mpii_human_pose_v1_u12_1.mat
├── 000000001.jpg
├── 000000002.jpg
├── 000000003.jpg
└── ...

Export to other formats

Datumaro can convert an MPII Human Pose Dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports bounding boxes or points.

There are several ways to convert an MPII Human Pose Dataset to other dataset formats using CLI:

datum create
datum import -f mpii <path/to/dataset>
datum export -f voc -o ./save_dir -- --save-media

or

datum convert -if mpii -i <path/to/dataset> \
    -f voc -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'mpii')
dataset.export('save_dir', 'voc')

Examples

Examples of using this format from the code can be found in the format tests

24 - MPII Human Pose Dataset (JSON)

Format specification

The original MPII Human Pose Dataset is available here.

Supported annotation types:

  • Bbox
  • Points

Supported attributes:

  • center (a list with two coordinates of the center point of the object)
  • scale (float)

Import MPII Human Pose Dataset (JSON)

A Datumaro project with an MPII Human Pose Dataset (JSON) source can be created in the following way:

datum create
datum import --format mpii_json <path/to/dataset>

It is also possible to import the dataset using Python API:

import datumaro as dm

mpii_dataset = dm.Dataset.import_from('<path/to/dataset>', 'mpii_json')

MPII Human Pose Dataset (JSON) directory should have the following structure:

dataset/
├── jnt_visible.npy # optional
├── mpii_annotations.json
├── mpii_headboxes.npy # optional
├── mpii_pos_gt.npy # optional
├── 000000001.jpg
├── 000000002.jpg
├── 000000003.jpg
└── ...

Export to other formats

Datumaro can convert an MPII Human Pose Dataset (JSON) into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports bounding boxes or points.

There are several ways to convert an MPII Human Pose Dataset (JSON) to other dataset formats using CLI:

datum create
datum import -f mpii_json <path/to/dataset>
datum export -f voc -o ./save_dir -- --save-media

or

datum convert -if mpii_json -i <path/to/dataset> \
    -f voc -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'mpii_json')
dataset.export('save_dir', 'voc')

Examples

Examples of using this format from the code can be found in the format tests

25 - Open Images

Format specification

A description of the Open Images Dataset (OID) format is available here. Datumaro supports versions 4, 5 and 6.

Supported annotation types:

  • Label (human-verified image-level labels)
  • Bbox (bounding boxes)
  • Mask (segmentation masks)

Supported annotation attributes:

  • Labels

    • score (read/write, float). The confidence level from 0 to 1. A score of 0 indicates that the image does not contain objects of the corresponding class.
  • Bounding boxes

    • score (read/write, float). The confidence level from 0 to 1. In the original dataset this is always equal to 1, but custom datasets may be created with arbitrary values.
    • occluded (read/write, boolean). Whether the object is occluded by another object.
    • truncated (read/write, boolean). Whether the object extends beyond the boundary of the image.
    • is_group_of (read/write, boolean). Whether the object represents a group of objects of the same class.
    • is_depiction (read/write, boolean). Whether the object is a depiction (such as a drawing) rather than a real object.
    • is_inside (read/write, boolean). Whether the object is seen from the inside.
  • Masks

    • box_id (read/write, string). An identifier for the bounding box associated with the mask.
    • predicted_iou (read/write, float). Predicted IoU value with respect to the ground truth.

Import Open Images dataset

The Open Images dataset is available for free download.

See the open-images-dataset GitHub repository for information on how to download the images.

Datumaro also requires the image description files, which can be downloaded from the following URLs:

In addition, the following metadata file must be present in the annotations directory:

You can optionally download the following additional metadata file:

Annotations can be downloaded from the following URLs:

All annotation files are optional, except that if the mask metadata files for a given subset are downloaded, all corresponding images must be downloaded as well, and vice versa.

A Datumaro project with an OID source can be created in the following way:

datum create
datum import --format open_images <path/to/dataset>

It is possible to specify project name and project directory. Run datum create --help for more information.

Open Images dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of custom labels (optional)
    ├── annotations/
    │   └── bbox_labels_600_hierarchy.json
    │   └── image_ids_and_rotation.csv  # optional
    │   └── oidv6-class-descriptions.csv
    │   └── *-annotations-bbox.csv
    │   └── *-annotations-human-imagelabels.csv
    │   └── *-annotations-object-segmentation.csv
    ├── images/
    |   ├── test/
    |   │   ├── <image_name1.jpg>
    |   │   ├── <image_name2.jpg>
    |   │   └── ...
    |   ├── train/
    |   │   ├── <image_name1.jpg>
    |   │   ├── <image_name2.jpg>
    |   │   └── ...
    |   └── validation/
    |       ├── <image_name1.jpg>
    |       ├── <image_name2.jpg>
    |       └── ...
    └── masks/
        ├── test/
        │   ├── <mask_name1.png>
        │   ├── <mask_name2.png>
        │   └── ...
        ├── train/
        │   ├── <mask_name1.png>
        │   ├── <mask_name2.png>
        │   └── ...
        └── validation/
            ├── <mask_name1.png>
            ├── <mask_name2.png>
            └── ...

The mask images must be extracted from the ZIP archives linked above.

To use per-subset image description files instead of image_ids_and_rotation.csv, place them in the annotations subdirectory. The annotations directory is optional and you can store all annotation files in the root of input path.

To add custom classes, you can use dataset_meta.json.

Creating an image metadata file

To load bounding box and segmentation mask annotations, Datumaro needs to know the sizes of the corresponding images. By default, it will determine these sizes by loading each image from disk, which requires the images to be present and makes the loading process slow.

If you want to load the aforementioned annotations on a machine where the images are not available, or just to speed up the dataset loading process, you can extract the image size information in advance and record it in an image metadata file. This file must be placed at annotations/images.meta, and must contain one line per image, with the following structure:

<ID> <height> <width>

Where <ID> is the file name of the image without the extension, and <height> and <width> are the dimensions of that image. <ID> may be quoted with either single or double quotes.

The image metadata file, if present, will be used to determine the image sizes without loading the images themselves.

Here’s one way to create the images.meta file using ImageMagick, assuming that the images are present on the current machine:

# run this from the dataset directory
find images -name '*.jpg' -exec \
    identify -format '"%[basename]" %[height] %[width]\n' {} + \
    > annotations/images.meta

Export to other formats

Datumaro can convert OID into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports image-level labels. There are several ways to convert OID to other dataset formats:

datum create
datum import -f open_images <path/to/open_images>
datum export -f cvat -o <output/dir>

or

datum convert -if open_images -i <path/to/open_images> -f cvat -o <output/dir>

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'open_images')
dataset.export('save_dir', 'cvat', save_media=True)

Export to Open Images

There are several ways to convert an existing dataset to the Open Images format:

# export dataset into Open Images format from existing project
datum export -p <path/to/project> -f open_images -o <output/dir> \
  -- --save_media
# convert a dataset in another format to the Open Images format
datum convert -if imagenet -i <path/to/dataset> \
    -f open_images -o <output/dir> \
    -- --save-media

Extra options for exporting to the Open Images format:

  • --save-media - save media files when exporting the dataset (by default, False)
  • --image-ext IMAGE_EXT - save image files with the specified extension when exporting the dataset (by default, uses the original extension or .jpg if there isn’t one)
  • --save-dataset-meta - allow to export dataset with saving dataset meta file (by default False)

Examples

Datumaro supports filtering, transformation, merging etc. for all formats and for the Open Images format in particular. Follow the user manual to get more information about these operations.

Here are a few examples of using Datumaro operations to solve particular problems with the Open Images dataset:

Example 1. Load the Open Images dataset and convert to the CVAT format

datum create -o project
datum import -p project -f open_images ./open-images-dataset/
datum stats -p project
datum export -p project -f cvat -- --save-media

Example 2. Create a custom OID-like dataset

import numpy as np
import datumaro as dm

dataset = dm.Dataset.from_iterable([
    dm.DatasetItem(
        id='0000000000000001',
        image=np.ones((1, 5, 3)),
        subset='validation',
        annotations=[
            dm.Label(0, attributes={'score': 1}),
            dm.Label(1, attributes={'score': 0}),
        ],
    ),
], categories=['/m/0', '/m/1'])

dataset.export('./dataset', format='open_images')

Examples of using this format from the code can be found in the format tests.

26 - Pascal VOC

Format specification

Pascal VOC format specification is available here.

The dataset has annotations for multiple tasks. Each task has its own format in Datumaro, and there is also a combined voc format, which includes all the available tasks. The sub-formats have the same options as the “main” format and only limit the set of annotation files they work with. To work with multiple formats, use the corresponding option of the voc format.

Supported tasks / formats:

  • The combined format - voc
  • Image classification - voc_classification
  • Object detection - voc_detection
  • Action classification - voc_action
  • Class and instance segmentation - voc_segmentation
  • Person layout detection - voc_layout

Supported annotation types:

  • Label (classification)
  • Bbox (detection, action detection and person layout)
  • Mask (segmentation)

Supported annotation attributes:

  • occluded (boolean) - indicates that a significant portion of the object within the bounding box is occluded by another object
  • truncated (boolean) - indicates that the bounding box specified for the object does not correspond to the full extent of the object
  • difficult (boolean) - indicates that the object is considered difficult to recognize
  • action attributes (boolean) - jumping, reading and others. Indicate that the object does the corresponding action.
  • arbitrary attributes (string/number) - A Datumaro extension. Stored in the attributes section of the annotation xml file. Available for bbox annotations only.

Import Pascal VOC dataset

The Pascal VOC dataset is available for free download here

A Datumaro project with a Pascal VOC source can be created in the following way:

datum create
datum import --format voc <path/to/dataset>

It is possible to specify project name and project directory. Run datum create --help for more information.

Pascal VOC dataset directory should have the following structure:

└─ Dataset/
   ├── dataset_meta.json # a list of non-Pascal labels (optional)
   ├── labelmap.txt # or a list of non-Pascal labels in other format (optional)
   │
   ├── Annotations/
   │     ├── ann1.xml # Pascal VOC format annotation file
   │     ├── ann2.xml
   │     └── ...
   ├── JPEGImages/
   │    ├── img1.jpg
   │    ├── img2.jpg
   │    └── ...
   ├── SegmentationClass/ # directory with semantic segmentation masks
   │    ├── img1.png
   │    ├── img2.png
   │    └── ...
   ├── SegmentationObject/ # directory with instance segmentation masks
   │    ├── img1.png
   │    ├── img2.png
   │    └── ...
   │
   └── ImageSets/
        ├── Main/ # directory with list of images for detection and classification task
        │   ├── test.txt  # list of image names in test subset  (without extension)
        |   ├── train.txt # list of image names in train subset (without extension)
        |   └── ...
        ├── Layout/ # directory with list of images for person layout task
        │   ├── test.txt
        |   ├── train.txt
        |   └── ...
        ├── Action/ # directory with list of images for action classification task
        │   ├── test.txt
        |   ├── train.txt
        |   └── ...
        └── Segmentation/ # directory with list of images for segmentation task
            ├── test.txt
            ├── train.txt
            └── ...

The ImageSets directory should contain at least one of the directories: Main, Layout, Action, Segmentation. These directories contain .txt files with a list of images in a subset, the subset name is the same as the .txt file name. Subset names can be arbitrary.

To add custom classes, you can use dataset_meta.json and labelmap.txt. If the dataset_meta.json is not represented in the dataset, then labelmap.txt will be imported if possible.

In labelmap.txt you can define custom color map and non-pascal labels, for example:

# label_map [label : color_rgb : parts : actions]
helicopter:::
elephant:0:124:134:head,ear,foot:

It is also possible to import grayscale (1-channel) PNG masks. For grayscale masks provide a list of labels with the number of lines equal to the maximum color index on images. The lines must be in the right order so that line index is equal to the color index. Lines can have arbitrary, but different, colors. If there are gaps in the used color indices in the annotations, they must be filled with arbitrary dummy labels. Example:

car:0,128,0:: # color index 0
aeroplane:10,10,128:: # color index 1
_dummy2:2,2,2:: # filler for color index 2
_dummy3:3,3,3:: # filler for color index 3
boat:108,0,100:: # color index 3
...
_dummy198:198,198,198:: # filler for color index 198
_dummy199:199,199,199:: # filler for color index 199
the_last_label:12,28,0:: # color index 200

You can import dataset for specific tasks of Pascal VOC dataset instead of the whole dataset, for example:

datum import -f voc_detection -r ImageSets/Main/train.txt <path/to/dataset>

To make sure that the selected dataset has been added to the project, you can run datum project info, which will display the project information.

Export to other formats

Datumaro can convert a Pascal VOC dataset into any other format Datumaro supports.

Such conversion will only be successful if the output format can represent the type of dataset you want to convert, e.g. image classification annotations can be saved in ImageNet format, but not as COCO keypoints.

There are several ways to convert a Pascal VOC dataset to other dataset formats:

datum create
datum import -f voc <path/to/voc>
datum export -f coco -o <output/dir>

or

datum convert -if voc -i <path/to/voc> -f coco -o <output/dir>

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'voc')
dataset.export('save_dir', 'coco', save_media=True)

Export to Pascal VOC

There are several ways to convert an existing dataset to Pascal VOC format:

# export dataset into Pascal VOC format (classification) from existing project
datum export -p <path/to/project> -f voc -o <output/dir> -- --tasks classification
# converting to Pascal VOC format from other format
datum convert -if imagenet -i <path/to/dataset> \
    -f voc -o <output/dir> \
    -- --label_map voc --save-media

Extra options for exporting to Pascal VOC format:

  • --save-media - allow to export dataset with saving media files (by default False)
  • --image-ext IMAGE_EXT - allow to specify image extension for exporting dataset (by default use original or .jpg if none)
  • --save-dataset-meta - allow to export dataset with saving dataset meta file (by default False)
  • --apply-colormap APPLY_COLORMAP - allow to use colormap for class and instance masks (by default True)
  • --allow-attributes ALLOW_ATTRIBUTES - allow export of attributes (by default True)
  • --keep-empty KEEP_EMPTY - write subset lists even if they are empty (by default False)
  • --tasks TASKS - allow to specify tasks for export dataset, by default Datumaro uses all tasks. Example:
datum export -f voc -- --tasks detection,classification
  • --label_map PATH - allows to define a custom colormap. Example:
# mycolormap.txt [label : color_rgb : parts : actions]:
# cat:0,0,255::
# person:255,0,0:head:
datum export -f voc_segmentation -- --label-map mycolormap.txt

or you can use original voc colomap:

datum export -f voc_segmentation -- --label-map voc

Examples

Datumaro supports filtering, transformation, merging etc. for all formats and for the Pascal VOC format in particular. Follow user manual to get more information about these operations.

There are few examples of using Datumaro operations to solve particular problems with Pascal VOC dataset:

Example 1. How to prepare an original dataset for training.

In this example, preparing the original dataset to train the semantic segmentation model includes: loading, checking duplicate images, setting the number of images, splitting into subsets, export the result to Pascal VOC format.

datum create -o project
datum import -p project -f voc_segmentation ./VOC2012/ImageSets/Segmentation/trainval.txt
datum stats -p project # check statisctics.json -> repeated images
datum transform -p project -t ndr -- -w trainval -k 2500
datum filter -p project -e '/item[subset="trainval"]'
datum transform -p project -t random_split -- -s train:.8 -s val:.2
datum export -p project -f voc -- --label-map voc --save-media

Example 2. How to create a custom dataset

import datumaro as dm

dataset = dm.Dataset.from_iterable([
    dm.DatasetItem(id='image1', image=dm.Image(path='image1.jpg', size=(10, 20)),
        annotations=[
            dm.Label(3),
            dm.Bbox(1.0, 1.0, 10.0, 8.0, label=0, attributes={'difficult': True, 'running': True}),
            dm.Polygon([1, 2, 3, 2, 4, 4], label=2, attributes={'occluded': True}),
            dm.Polygon([6, 7, 8, 8, 9, 7, 9, 6], label=2),
        ]
    ),
], categories=['person', 'sky', 'water', 'lion'])

dataset.transform('polygons_to_masks')
dataset.export('./mydataset', format='voc', label_map='my_labelmap.txt')

my_labelmap.txt has the following contents:

# label:color_rgb:parts:actions
person:0,0,255:hand,foot:jumping,running
sky:128,0,0::
water:0,128,0::
lion:255,128,0::

Example 3. Load, filter and convert from code

Load Pascal VOC dataset, and export train subset with items which has jumping attribute:

import datumaro as dm

dataset = dm.Dataset.import_from('./VOC2012', format='voc')

train_dataset = dataset.get_subset('train').as_dataset()

def only_jumping(item):
    for ann in item.annotations:
        if ann.attributes.get('jumping'):
            return True
    return False

train_dataset.select(only_jumping)

train_dataset.export('./jumping_label_me', format='label_me', save_media=True)

Example 4. Get information about items in Pascal VOC 2012 dataset for segmentation task:

import datumaro as dm

dataset = dm.Dataset.import_from('./VOC2012', format='voc')

def has_mask(item):
    for ann in item.annotations:
        if ann.type == dm.AnnotationType.mask:
            return True
    return False

dataset.select(has_mask)

print("Pascal VOC 2012 has %s images for segmentation task:" % len(dataset))
for subset_name, subset in dataset.subsets().items():
    for item in subset:
        print(item.id, subset_name, end=";")

After executing this code, we can see that there are 5826 images in Pascal VOC 2012 has for segmentation task and this result is the same as the official documentation

Examples of using this format from the code can be found in tests

27 - Supervisely Point Cloud

Format specification

Specification for the Point Cloud data format is available here.

You can also find examples of working with the dataset here.

Supported annotation types:

  • cuboid_3d

Supported annotation attributes:

  • track_id (read/write, integer), responsible for object field
  • createdAt (write, string),
  • updatedAt (write, string),
  • labelerLogin (write, string), responsible for the corresponding fields in the annotation file.
  • arbitrary attributes

Supported image attributes:

  • description (read/write, string),
  • createdAt (write, string),
  • updatedAt (write, string),
  • labelerLogin (write, string), responsible for the corresponding fields in the annotation file.
  • frame (read/write, integer). Indicates frame number of the image.
  • arbitrary attributes

Import Supervisely Point Cloud dataset

An example dataset in Supervisely Point Cloud format is available for download:

https://drive.google.com/u/0/uc?id=1BtZyffWtWNR-mk_PHNPMnGgSlAkkQpBl&export=download

Point Cloud dataset directory should have the following structure:

└─ Dataset/
    ├── ds0/
    │   ├── ann/
    │   │   ├── <pcdname1.pcd.json>
    │   │   ├── <pcdname2.pcd.json>
    │   │   └── ...
    │   ├── pointcloud/
    │   │   ├── <pcdname1.pcd>
    │   │   ├── <pcdname1.pcd>
    │   │   └── ...
    │   ├── related_images/
    │   │   ├── <pcdname1_pcd>/
    │   │   |  ├── <image_name.ext.json>
    │   │   |  ├── <image_name.ext.json>
    │   │   └── ...
    ├── key_id_map.json
    └── meta.json

There are two ways to import a Supervisely Point Cloud dataset:

datum create
datum import --format sly_pointcloud --input-path <path/to/dataset>

or

datum create
datum import -f sly_pointcloud <path/to/dataset>

To make sure that the selected dataset has been added to the project, you can run datum project info, which will display the project and dataset information.

Export to other formats

Datumaro can convert Supervisely Point Cloud dataset into any other format Datumaro supports.

Such conversion will only be successful if the output format can represent the type of dataset you want to convert, e.g. 3D point clouds can be saved in KITTI Raw format, but not in COCO keypoints.

There are several ways to convert a Supervisely Point Cloud dataset to other dataset formats:

datum create
datum import -f sly_pointcloud <path/to/sly_pcd/>
datum export -f kitti_raw -o <output/dir>

or

datum convert -if sly_pointcloud -i <path/to/sly_pcd/> -f kitti_raw

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'sly_pointcloud')
dataset.export('save_dir', 'kitti_raw', save_media=True)

Export to Supervisely Point Cloud

There are several ways to convert a dataset to Supervisely Point Cloud format:

# export dataset into Supervisely Point Cloud format from existing project
datum export -p <path/to/project> -f sly_pointcloud -o <output/dir> \
    -- --save-media
# converting to Supervisely Point Cloud format from other format
datum convert -if kitti_raw -i <path/to/dataset> \
    -f sly_pointcloud -o <output/dir> -- --save-media

Extra options for exporting in Supervisely Point Cloud format:

  • --save-media allow to export dataset with saving media files. This will include point clouds and related images (by default False)
  • --image-ext IMAGE_EXT allow to specify image extension for exporting dataset (by default - keep original or use .png, if none)
  • --reindex assigns new indices to frames and annotations.
  • --allow-undeclared-attrs allows writing arbitrary annotation attributes. By default, only attributes specified in the input dataset metainfo will be written.

Examples

Example 1. Import dataset, compute statistics

datum create -o project
datum import -p project -f sly_pointcloud ../sly_dataset/
datum stats -p project

Example 2. Convert Supervisely Point Clouds to KITTI Raw

datum convert -if sly_pointcloud -i ../sly_pcd/ \
    -f kitti_raw -o my_kitti/ -- --save-media --reindex --allow-attrs

Example 3. Create a custom dataset

import datumaro as dm

dataset = dm.Dataset.from_iterable([
    dm.DatasetItem(id='frame_1',
        annotations=[
            dm.Cuboid3d(id=206, label=0,
                position=[320.86, 979.18, 1.04],
                attributes={'occluded': False, 'track_id': 1, 'x': 1}),

            dm.Cuboid3d(id=207, label=1,
                position=[318.19, 974.65, 1.29],
                attributes={'occluded': True, 'track_id': 2}),
        ],
        pcd='path/to/pcd1.pcd',
        attributes={'frame': 0, 'description': 'zzz'}
    ),

    dm.DatasetItem(id='frm2',
        annotations=[
            dm.Cuboid3d(id=208, label=1,
                position=[23.04, 8.75, -0.78],
                attributes={'occluded': False, 'track_id': 2})
        ],
        pcd='path/to/pcd2.pcd', related_images=['image2.png'],
        attributes={'frame': 1}
    ),
], categories=['cat', 'dog'])

dataset.export('my_dataset/', format='sly_pointcloud', save_media=True,
    allow_undeclared_attrs=True)

Examples of using this format from the code can be found in the format tests

28 - SYNTHIA

Format specification

The original SYNTHIA dataset is available here.

Datumaro supports all SYNTHIA formats except SYNTHIA-AL.

Supported annotation types:

  • Mask

Supported annotation attributes:

  • dynamic_object (boolean): whether the object moving

Import SYNTHIA dataset

A Datumaro project with a SYNTHIA source can be created in the following way:

datum create
datum import --format synthia <path/to/dataset>

It is also possible to import the dataset using Python API:

import datumaro as dm

synthia_dataset = dm.Dataset.import_from('<path/to/dataset>', 'synthia')

SYNTHIA dataset directory should have the following structure:

dataset/
├── dataset_meta.json # a list of non-format labels (optional)
├── GT/
│   ├── COLOR/
│   │   ├── Stereo_Left/
│   │   │   ├── Omni_B
│   │   │   │   ├── 000000.png
│   │   │   │   ├── 000001.png
│   │   │   │   └── ...
│   │   │   └── ...
│   │   └── Stereo_Right
│   │       ├── Omni_B
│   │       │   ├── 000000.png
│   │       │   ├── 000001.png
│   │       │   └── ...
│   │       └── ...
│   └── LABELS
│       ├── Stereo_Left
│       │   ├── Omni_B
│       │   │   ├── 000000.png
│       │   │   ├── 000001.png
│       │   │   └── ...
│       │   └── ...
│       └── Stereo_Right
│           ├── Omni_B
│           │   ├── 000000.png
│           │   ├── 000001.png
│           │   └── ...
│           └── ...
└── RGB
    ├── Stereo_Left
    │   ├── Omni_B
    │   │   ├── 000000.png
    │   │   ├── 000001.png
    │   │   └── ...
    │   └── ...
    └── Stereo_Right
        ├── Omni_B
        │   ├── 000000.png
        │   ├── 000001.png
        │   └── ...
        └── ...
  • RGB folder containing standard RGB images used for training.
  • GT/LABELS folder containing containing PNG files (one per image). Annotations are given in three channels. The red channel contains the class of that pixel. The green channel contains the class only for those objects that are dynamic (cars, pedestrians, etc.), otherwise it contains 0.
  • GT/COLOR folder containing png files (one per image). Annotations are given using a color representation.

When importing a dataset, only GT/LABELS folder will be used. If it is missing, GT/COLOR folder will be used.

The original dataset also contains depth information, but Datumaro does not currently support it.

To add custom classes, you can use dataset_meta.json.

Export to other formats

Datumaro can convert a SYNTHIA dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports segmentation masks.

There are several ways to convert a SYNTHIA dataset to other dataset formats using CLI:

datum create
datum import -f synthia <path/to/dataset>
datum export -f voc -o <output/dir> -- --save-media

or

datum convert -if synthia -i <path/to/dataset> \
    -f voc -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'synthia')
dataset.export('save_dir', 'voc')

Examples

Examples of using this format from the code can be found in the format tests

29 - Velodyne Points / KITTI Raw 3D

Format specification

Velodyne Points / KITTI Raw 3D data format homepage is available here.

Velodyne Points / KITTI Raw 3D data format specification is available here.

Supported annotation types:

  • Cuboid3d (represent tracks)

Supported annotation attributes:

  • truncation (write, string), possible values: truncation_unset, in_image, truncated, out_image, behind_image (case-independent).
  • occlusion (write, string), possible values: occlusion_unset, visible, partly, fully (case-independent). This attribute has priority over occluded.
  • occluded (read/write, boolean)
  • keyframe (read/write, boolean). Responsible for occlusion_kf field.
  • track_id (read/write, integer). Indicates the group over frames for annotations, represent tracks.

Supported image attributes:

  • frame (read/write, integer). Indicates frame number of the image.

Import KITTI Raw dataset

The velodyne points/KITTI Raw dataset is available for download here and here.

KITTI Raw dataset directory should have the following structure:

└─ Dataset/
    ├── dataset_meta.json # a list of custom labels (optional)
    ├── image_00/ # optional, aligned images from different cameras
    │   └── data/
    │       ├── <name1.ext>
    │       └── <name2.ext>
    ├── image_01/
    │   └── data/
    │       ├── <name1.ext>
    │       └── <name2.ext>
    ...
    │
    ├── velodyne_points/ # optional, 3d point clouds
    │   └── data/
    │       ├── <name1.pcd>
    │       └── <name2.pcd>
    ├── tracklet_labels.xml
    └── frame_list.txt # optional, required for custom image names

The format does not support arbitrary image names and paths, but Datumaro provides an option to use a special index file to allow this.

frame_list.txt contents:

12345 relative/path/to/name1/from/data
46 relative/path/to/name2/from/data
...

To add custom classes, you can use dataset_meta.json.

A Datumaro project with a KITTI source can be created in the following way:

datum create
datum import --format kitti_raw <path/to/dataset>

To make sure that the selected dataset has been added to the project, you can run datum project info, which will display the project and dataset information.

Export to other formats

Datumaro can convert a KITTI Raw dataset into any other format Datumaro supports.

Such conversion will only be successful if the output format can represent the type of dataset you want to convert, e.g. 3D point clouds can be saved in Supervisely Point Clouds format, but not in COCO keypoints.

There are several ways to convert a KITTI Raw dataset to other dataset formats:

datum create
datum import -f kitti_raw <path/to/kitti_raw>
datum export -f sly_pointcloud -o <output/dir>

or

datum convert -if kitti_raw -i <path/to/kitti_raw> -f sly_pointcloud

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'kitti_raw')
dataset.export('save_dir', 'sly_pointcloud', save_media=True)

Export to KITTI Raw

There are several ways to convert a dataset to KITTI Raw format:

# export dataset into KITTI Raw format from existing project
datum export -p <path/to/project> -f kitti_raw -o <output/dir> \
    -- --save-media
# converting to KITTI Raw format from other format
datum convert -if sly_pointcloud -i <path/to/dataset> \
    -f kitti_raw -o <output/dir> -- --save-media --reindex

Extra options for exporting to KITTI Raw format:

  • --save-media allow to export dataset with saving media files. This will include point clouds and related images (by default False)
  • --image-ext IMAGE_EXT allow to specify image extension for exporting dataset (by default - keep original or use .png, if none)
  • --reindex assigns new indices to frames and tracks. Allows annotations without track_id attribute (they will be exported as single-frame tracks).
  • --allow-attrs allows writing arbitrary annotation attributes. They will be written in <annotations> section of <poses><item> (disabled by default)

Examples

Example 1. Import dataset, compute statistics

datum create -o project
datum import -p project -f kitti_raw ../kitti_raw/
datum stats -p project

Example 2. Convert Supervisely Pointclouds to KITTI Raw

datum convert -if sly_pointcloud -i ../sly_pcd/ \
    -f kitti_raw -o my_kitti/ -- --save-media --allow-attrs

Example 3. Create a custom dataset

import numpy as np
import datumaro as dm

dataset = dm.Dataset.from_iterable([
    dm.DatasetItem(id='some/name/qq',
        annotations=[
            dm.Cuboid3d(position=[13.54, -9.41, 0.24], label=0,
                attributes={'occluded': False, 'track_id': 1}),

            dm.Cuboid3d(position=[3.4, -2.11, 4.4], label=1,
                attributes={'occluded': True, 'track_id': 2})
        ],
        pcd='path/to/pcd1.pcd',
        related_images=[np.ones((10, 10)), 'path/to/image2.png', 'image3.jpg'],
        attributes={'frame': 0}
    ),
], categories=['cat', 'dog'])

dataset.export('my_dataset/', format='kitti_raw', save_media=True)

Examples of using this format from the code can be found in the format tests

30 - Vgg Face2 CSV

Format specification

Vgg Face 2 is a dataset for face-recognition task, the repository with some information and sample data of Vgg Face 2 is available here

Supported types of annotations:

  • Bbox
  • Points
  • Label

Format doesn’t support any attributes for annotations objects.

Import Vgg Face2 dataset

A Datumaro project with a Vgg Face 2 dataset can be created in the following way:

datum create
datum import -f vgg_face2 <path_to_dataset>

Note: if you use datum import then <path_to_dataset> should not be a subdirectory of directory with Datumaro project, see more information about it in the docs.

And you can also load Vgg Face 2 through the Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path_to_dataset>', format='vgg_face2')

For successful importing of Vgg Face2 face the input directory with dataset should has the following structure:

vgg_face2_dataset/
├── labels.txt # labels mapping
├── bb_landmark
│   ├── loose_bb_test.csv  # information about bounding boxes for test subset
│   ├── loose_bb_train.csv
│   ├── loose_bb_<any_other_subset_name>.csv
│   ├── loose_landmark_test.csv # landmark points information for test subset
│   ├── loose_landmark_train.csv
│   └── loose_landmark_<any_other_subset_name>.csv
├── test
│   ├── n000001 # directory with images for n000001 label
│   │   ├── 0001_01.jpg
│   │   ├── 0001_02.jpg
│   │   ├── ...
│   ├── n000002 # directory with images for n000002 label
│   │   ├── 0002_01.jpg
│   │   ├── 0003_01.jpg
│   │   ├── ...
│   ├── ...
├── train
│   ├── n000004
│   │   ├── 0004_01.jpg
│   │   ├── 0004_02.jpg
│   │   ├── ...
│   ├── ...
└── <any_other_subset_name>
    ├── ...

Export Vgg Face2 dataset

Datumaro can convert a Vgg Face2 dataset into any other format Datumaro supports. There is few examples how to do it:

# Using `convert` command
datum convert -if vgg_face2 -i <path_to_vgg_face2> \
    -f voc -o <output_dir> -- --save-images

# Using Datumaro project
datum create
datum import -f vgg_face2 <path_to_vgg_face2>
datum export -f yolo -o <output_dir>

Note: to get the expected result from the conversion, the output format should support the same types of annotations (one or more) as Vgg Face2 (Bbox, Points, Label)

And also you can convert your Vgg Face2 dataset using Python API

import datumaro as dm

vgg_face2_dataset = dm.Dataset.import_from('<path_to_dataset', format='vgg_face2')

vgg_face2_dataset.export('<output_dir>', format='open_images', save_media=True)

Note: some formats have extra export options. For particular format see the docs to get information about it.

Export dataset to the Vgg Face2 format

If you have dataset in some format and want to convert this dataset into the Vgg Face2, ensure that this dataset contains Bbox or/and Points or/and Label and use Datumaro to perform conversion. There is few examples:

# Using convert command
datum convert -if wider_face -i <path_to_wider> \
    -f vgg_face2 -o <output_dir>

# Using Datumaro project
datum create
datum import -f wider_face <path_to_wider>
datum export -f vgg_face2 -o <output_dir> -- --save-media --image-ext '.png'

Note: vgg_face2 format supports only one Bbox per image

Extra options for exporting to Vgg Face2 format:

  • --save-media allow to export dataset with saving media files (by default False)
  • --image-ext <IMAGE_EXT> allow to specify image extension for exporting the dataset (by default .png)
  • --save-dataset-meta - allow to export dataset with saving dataset meta file (by default False)

31 - VoTT CSV

Format specification

VoTT (Visual Object Tagging Tool) is an open source annotation tool released by Microsoft. VoTT CSV is the format used by VoTT when the user exports a project and selects “CSV” as the export format.

Supported annotation types:

  • Bbox

Import VoTT dataset

A Datumaro project with a VoTT CSV source can be created in the following way:

datum create
datum import --format vott_csv <path/to/dataset>

It is also possible to import the dataset using Python API:

import datumaro as dm

vott_csv_dataset = dm.Dataset.import_from('<path/to/dataset>', 'vott_csv')

VoTT CSV dataset directory should have the following structure:

dataset/
├── dataset_meta.json # a list of custom labels (optional)
├── img0001.jpg
├── img0002.jpg
├── img0003.jpg
├── img0004.jpg
├── ...
├── test-export.csv
├── train-export.csv
└── ...

To add custom classes, you can use dataset_meta.json.

Export to other formats

Datumaro can convert a VoTT CSV dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports bounding boxes.

There are several ways to convert a VoTT CSV dataset to other dataset formats using CLI:

datum create
datum import -f vott_csv <path/to/dataset>
datum export -f voc -o ./save_dir -- --save-media

or

datum convert -if vott_csv -i <path/to/dataset> \
    -f voc -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'vott_csv')
dataset.export('save_dir', 'voc')

Examples

Examples of using this format from the code can be found in VoTT CSV tests.

32 - VoTT JSON

Format specification

VoTT (Visual Object Tagging Tool) is an open source annotation tool released by Microsoft. VoTT JSON is the format used by VoTT when the user exports a project and selects “VoTT JSON” as the export format.

Supported annotation types:

  • Bbox

Import VoTT dataset

A Datumaro project with a VoTT JSON source can be created in the following way:

datum create
datum import --format vott_json <path/to/dataset>

It is also possible to import the dataset using Python API:

import datumaro as dm

vott_json_dataset = dm.Dataset.import_from('<path/to/dataset>', 'vott_json')

VoTT JSON dataset directory should have the following structure:

dataset/
├── dataset_meta.json # a list of custom labels (optional)
├── img0001.jpg
├── img0002.jpg
├── img0003.jpg
├── img0004.jpg
├── ...
├── test-export.json
├── train-export.json
└── ...

To add custom classes, you can use dataset_meta.json.

Export to other formats

Datumaro can convert a VoTT JSON dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports bounding boxes.

There are several ways to convert a VoTT JSON dataset to other dataset formats using CLI:

datum create
datum import -f vott_json <path/to/dataset>
datum export -f voc -o ./save_dir -- --save-media

or

datum convert -if vott_json -i <path/to/dataset> \
    -f voc -o <output/dir> -- --save-media

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'vott_json')
dataset.export('save_dir', 'voc')

Examples

Examples of using this format from the code can be found in VoTT JSON tests.

33 - WIDER Face

Format specification

WIDER Face dataset is a face detection benchmark dataset, that available for download here.

Supported types of annotation:

  • Bbox
  • Label

Supported attributes for bboxes:

  • blur:
    • 0 face without blur;
    • 1 face with normal blur;
    • 2 face with heavy blur.
  • expression:
    • 0 face with typical expression;
    • 1 face with exaggerate expression.
  • illumination:
    • 0 image contains normal illumination;
    • 1 image contains extreme illumination.
  • pose:
    • 0 pose is typical;
    • 1 pose is atypical.
  • invalid:
    • 0 image is valid;
    • 1 image is invalid.
  • occluded:
    • 0 face without occlusion;
    • 1 face with partial occlusion;
    • 2 face with heavy occlusion.

Import WIDER Face dataset

Importing of WIDER Face dataset into the Datumaro project:

datum create
datum import -f wider_face <path_to_wider_face>

Directory with WIDER Face dataset should has the following structure:

<path_to_wider_face>
├── labels.txt  # optional file with list of classes
├── wider_face_split # directory with description of bboxes for each image
│   ├── wider_face_subset1_bbx_gt.txt
│   ├── wider_face_subset2_bbx_gt.txt
│   ├── ...
├── WIDER_subset1 # instead of 'subset1' you can use any other subset name
│   └── images
│       ├── 0--label_0 # instead of 'label_<n>' you can use any other class name
│       │   ├──  0_label_0_image_01.jpg
│       │   ├──  0_label_0_image_02.jpg
│       │   ├──  ...
│       ├── 1--label_1
│       │   ├──  1_label_1_image_01.jpg
│       │   ├──  1_label_1_image_02.jpg
│       │   ├──  ...
│       ├── ...
├── WIDER_subset2
│  └── images
│      ├── ...
├── ...

Check README file of the original WIDER Face dataset to get more information about structure of .txt annotation files. Also example of WIDER Face dataset available in our test assets.

Export WIDER Face dataset

With Datumaro you can convert WIDER Face dataset into any other format Datumaro supports. Pay attention that this format should also support Label and/or Bbox annotation types.

Few ways to export WIDER Face dataset using CLI:

# Using `convert` command
datum convert -if wider_face -i <path_to_wider_face> \
    -f voc -o <output_dir> -- --save-media

# Through the Datumaro project
datum create
datum import -f wider_face <path_to_wider_face>
datum export -f voc -o <output_dir> -- -save-media

Export WIDER Face dataset using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path_to_wider_face', 'wider_face')

# Here you can perform some transformation using dataset.transform or
# dataset.filter

dataset.export('output_dir', 'open_images', save_media=True)

Note: some formats have extra export options. For particular format see the docs to get information about it.

Export to WIDER Face dataset

Using Datumaro you can convert your dataset into the WIDER Face format, but for succseful exporting your dataset should contain Label and/or Bbox.

Here example of exporting VOC dataset (object detection task) into the WIDER Face format:

datum create
datum import -f voc_detection <path_to_voc>
datum export -f wider_face -o <output_dir> -- --save-media --image-ext='.png'

Available extra export options for WIDER Face dataset format:

  • --save-media allow to export dataset with saving media files (by default False)
  • --image-ext IMAGE_EXT allow to specify image extension for exporting dataset (by default - keep original)

34 - YOLO

Format specification

The YOLO dataset format is for training and validating object detection models. Specification for this format is available here.

You can also find official examples of working with YOLO dataset here.

Supported annotation types:

  • Bounding boxes

YOLO format doesn’t support attributes for annotations.

The format supports arbitrary subset names, except classes, names and backup.

Note, that by default, the YOLO framework does not expect any subset names, except train and valid, Datumaro supports this as an extension. If there is no subset separation in a project, the data will be saved in the train subset.

Import YOLO dataset

A Datumaro project with a YOLO source can be created in the following way:

datum create
datum import --format yolo <path/to/dataset>

YOLO dataset directory should have the following structure:

└─ yolo_dataset/
   │
   ├── dataset_meta.json # a list of non-format labels (optional)
   ├── obj.names  # file with list of classes
   ├── obj.data   # file with dataset information
   ├── train.txt  # list of image paths in train subset
   ├── valid.txt  # list of image paths in valid subset
   │
   ├── obj_train_data/  # directory with annotations and images for train subset
   │    ├── image1.txt  # list of labeled bounding boxes for image1
   │    ├── image1.jpg
   │    ├── image2.txt
   │    ├── image2.jpg
   │    └── ...
   │
   └── obj_valid_data/  # directory with annotations and images for valid subset
        ├── image101.txt
        ├── image101.jpg
        ├── image102.txt
        ├── image102.jpg
        └── ...
  • obj.data should have the following content, it is not necessary to have both subsets, but necessary to have one of them:
classes = 5 # optional
names = <path/to/obj.names>
train = <path/to/train.txt>
valid = <path/to/valid.txt>
backup = backup/ # optional
  • obj.names contains a list of classes. The line number for the class is the same as its index:
label1  # label1 has index 0
label2  # label2 has index 1
label3  # label2 has index 2
...
  • Files train.txt and valid.txt should have the following structure:
<path/to/image1.jpg>
<path/to/image2.jpg>
...
  • Files in directories obj_train_data/ and obj_valid_data/ should contain information about labeled bounding boxes for images:
# image1.txt:
# <label_index> <x_center> <y_center> <width> <height>
0 0.250000 0.400000 0.300000 0.400000
3 0.600000 0.400000 0.400000 0.266667

Here x_center, y_center, width, and height are relative to the image’s width and height. The x_center and y_center are center of rectangle (are not top-left corner).

To add custom classes, you can use dataset_meta.json.

Export to other formats

Datumaro can convert YOLO dataset into any other format Datumaro supports. For successful conversion the output format should support object detection task (e.g. Pascal VOC, COCO, TF Detection API etc.)

There are several ways to convert a YOLO dataset to other dataset formats:

datum create
datum add -f yolo <path/to/yolo/>
datum export -f voc -o <output/dir>

or

datum convert -if yolo -i <path/to/dataset> \
              -f coco_instances -o <path/to/dataset>

Or, using Python API:

import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'yolo')
dataset.export('save_dir', 'coco_instances', save_media=True)

Export to YOLO format

Datumaro can convert an existing dataset to YOLO format, if the dataset supports object detection task.

Example:

datum create
datum import -f coco_instances <path/to/dataset>
datum export -f yolo -o <path/to/dataset> -- --save-media

Extra options for exporting to YOLO format:

  • --save-media allow to export dataset with saving media files (default: False)
  • --image-ext <IMAGE_EXT> allow to specify image extension for exporting dataset (default: use original or .jpg, if none)
  • --add-path-prefix allows to specify, whether to include the data/ path prefix in the annotation files or not (default: True)

Examples

Example 1. Prepare PASCAL VOC dataset for exporting to YOLO format dataset

datum create -o project
datum import -p project -f voc ./VOC2012
datum filter -p project -e '/item[subset="train" or subset="val"]'
datum transform -p project -t map_subsets -- -s train:train -s val:valid
datum export -p project -f yolo -- --save-media

Example 2. Remove a class from YOLO dataset

Delete all items, which contain cat objects and remove cat from list of classes:

datum create -o project
datum import -p project -f yolo ./yolo_dataset
datum filter -p project -m i+a -e '/item/annotation[label!="cat"]'
datum transform -p project -t remap_labels -- -l cat:
datum export -p project -f yolo -o ./yolo_without_cats

Example 3. Create a custom dataset in YOLO format

import numpy as np
import datumaro as dm

dataset = dm.Dataset.from_iterable([
    dm.DatasetItem(id='image_001', subset='train',
        image=np.ones((20, 20, 3)),
        annotations=[
            dm.Bbox(3.0, 1.0, 8.0, 5.0, label=1),
            dm.Bbox(1.0, 1.0, 10.0, 1.0, label=2)
        ]
    ),
    dm.DatasetItem(id='image_002', subset='train',
        image=np.ones((15, 10, 3)),
        annotations=[
            dm.Bbox(4.0, 4.0, 4.0, 4.0, label=3)
        ]
    )
], categories=['house', 'bridge', 'crosswalk', 'traffic_light'])

dataset.export('../yolo_dataset', format='yolo', save_media=True)

Example 4. Get information about objects on each image

If you only want information about label names for each image, then you can get it from code:

import datumaro as dm

dataset = dm.Dataset.import_from('./yolo_dataset', format='yolo')
cats = dataset.categories()[dm.AnnotationType.label]

for item in dataset:
    for ann in item.annotations:
        print(item.id, cats[ann.label].name)

And If you want complete information about each item you can run:

datum create -o project
datum import -p project -f yolo ./yolo_dataset
datum filter -p project --dry-run -e '/item'