ADE20k (v2020)

Format specification

The original ADE20K 2020 dataset is available here.

The consistency set (for checking the annotation consistency) is available here.

Supported annotation types:

  • Masks

Supported annotation attributes:

  • occluded (boolean): whether the object is occluded by another object
  • other arbitrary boolean attributes, which can be specified in the annotation file <image_name>.json

Import ADE20K dataset

A Datumaro project with an ADE20k source can be created in the following way:

datum create
datum import --format ade20k2020 <path/to/dataset>

It is also possible to import the dataset using Python API:

from datumaro.components.dataset import Dataset

ade20k_dataset = Dataset.import_from('<path/to/dataset>', 'ade20k2020')

ADE20K dataset directory should have the following structure:

dataset/
├── subset1/
│   ├── img1/  # directory with instance masks for img1
│   |    ├── instance_001_img1.png
│   |    ├── instance_002_img1.png
│   |    └── ...
│   ├── img1.jpg
│   ├── img1.json
│   ├── img1_seg.png
│   ├── img1_parts_1.png
│   |
│   ├── img2/  # directory with instance masks for img2
│   |    ├── instance_001_img2.png
│   |    ├── instance_002_img2.png
│   |    └── ...
│   ├── img2.jpg
│   ├── img2.json
│   └── ...
│
└── subset2/
    ├── super_label_1/
    |   ├── img3/  # directory with instance masks for img3
    |   |    ├── instance_001_img3.png
    |   |    ├── instance_002_img3.png
    |   |    └── ...
    |   ├── img3.jpg
    |   ├── img3.json
    |   ├── img3_seg.png
    |   ├── img3_parts_1.png
    |   └── ...
    |
    ├── img4/  # directory with instance masks for img4
    |   ├── instance_001_img4.png
    |   ├── instance_002_img4.png
    |   └── ...
    ├── img4.jpg
    ├── img4.json
    ├── img4_seg.png
    └── ...

The mask images <image_name>_seg.png contain information about the object class segmentation masks and also separate each class into instances. The channels R and G encode the objects class masks. The channel B encodes the instance object masks.

The mask images <image_name>_parts_N.png contain segmentation masks for parts of objects, where N is a number indicating the level in the part hierarchy.

The <image_name> directory contains instance masks for each object in the image, these masks represent one-channel images, each pixel of which indicates an affinity to a specific object.

The annotation files <image_name>.json describe the content of each image. See our tests asset for example of this file, or check ADE20K toolkit for it.

Export to other formats

Datumaro can convert an ADE20K dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports segmentation masks.

There are several ways to convert an ADE20k dataset to other dataset formats using CLI:

datum create
datum import -f ade20k2020 <path/to/dataset>
datum export -f coco -o ./save_dir -- --save-images
# or
datum convert -if ade20k2020 -i <path/to/dataset> \
    -f coco -o <output/dir> -- --save-images

Or, using Python API:

from datumaro.components.dataset import Dataset

dataset = Dataset.import_from('<path/to/dataset>', 'ade20k2020')
dataset.export('save_dir', 'voc')

Examples

Examples of using this format from the code can be found in the format tests