CelebA
Format specification
The original CelebA dataset is available here.
Supported annotation types:
LabelBboxPoints(landmarks)
Supported attributes:
5_o_Clock_Shadow,Arched_Eyebrows,Attractive,Bags_Under_Eyes,Bald,Bangs,Big_Lips,Big_Nose,Black_Hair,Blond_Hair,Blurry,Brown_Hair,Bushy_Eyebrows,Chubby,Double_Chin,Eyeglasses,Goatee,Gray_Hair,Heavy_Makeup,High_Cheekbones,Male,Mouth_Slightly_Open,Mustache,Narrow_Eyes,No_Beard,Oval_Face,Pale_Skin,Pointy_Nose,Receding_Hairline,Rosy_Cheeks,Sideburns,Smiling,Straight_Hair,Wavy_Hair,Wearing_Earrings,Wearing_Hat,Wearing_Lipstick,Wearing_Necklace,Wearing_Necktie,Young(boolean)
Import CelebA dataset
A Datumaro project with a CelebA source can be created in the following way:
datum create
datum import --format celeba <path/to/dataset>
It is also possible to import the dataset using Python API:
import datumaro as dm
celeba_dataset = dm.Dataset.import_from('<path/to/dataset>', 'celeba')
CelebA dataset directory should have the following structure:
dataset/
├── dataset_meta.json # a list of non-format labels (optional)
├── Anno/
│ ├── identity_CelebA.txt
│ ├── list_attr_celeba.txt
│ ├── list_bbox_celeba.txt
│ └── list_landmarks_celeba.txt
├── Eval/
│ └── list_eval_partition.txt
└── Img/
└── img_celeba/
├── 000001.jpg
├── 000002.jpg
└── ...
The identity_CelebA.txt file contains labels (required).
The list_attr_celeba.txt, list_bbox_celeba.txt,
list_landmarks_celeba.txt, list_eval_partition.txt files contain
attributes, bounding boxes, landmarks and subsets respectively
(optional).
The original CelebA dataset stores images in a .7z archive. The archive needs to be unpacked before importing.
To add custom classes, you can use dataset_meta.json.
Export to other formats
Datumaro can convert a CelebA dataset into any other format Datumaro supports. To get the expected result, convert the dataset to a format that supports labels, bounding boxes or landmarks.
There are several ways to convert a CelebA dataset to other dataset formats using CLI:
datum create
datum import -f celeba <path/to/dataset>
datum export -f imagenet_txt -o ./save_dir -- --save-images
or
datum convert -if celeba -i <path/to/dataset> \
-f imagenet_txt -o <output/dir> -- --save-images
Or, using Python API:
import datumaro as dm
dataset = dm.Dataset.import_from('<path/to/dataset>', 'celeba')
dataset.export('save_dir', 'voc')
Examples
Examples of using this format from the code can be found in the format tests