Supported data formats
Datumaro only works with 2d RGB(A) images.
To create an unlabelled dataset from an arbitrary directory with images use
ImageDir
format:
datum create -o <project/dir>
datum add path -p <project/dir> -f image_dir <directory/path/>
or if you work with Datumaro API:
For using with a project:
from datumaro.components.project import Project
project = Project()
project.add_source('source1', {
'format': 'image_dir',
'url': 'directory/path/'
})
dataset = project.make_dataset()
And for using as a dataset:
from datumaro.components.dataset import Dataset
dataset = Dataset.import_from('directory/path/', 'image_dir')
This will search for images in the directory recursively and add
them as dataset entries with names like <subdir1>/<subsubdir1>/<image_name1>
.
The list of formats matches the list of supported image formats in OpenCV.
.jpg, .jpeg, .jpe, .jp2, .png, .bmp, .dib, .tif, .tiff, .tga, .webp, .pfm,
.sr, .ras, .exr, .hdr, .pic, .pbm, .pgm, .ppm, .pxm, .pnm
After addition into a project, images can be split into subsets and renamed with transformations, filtered, joined with existing annotations etc.
To use a video as an input, one should either create an Extractor plugin, which splits a video into frames, or split the video manually and import images.