random sampler module
- class datumaro.plugins.sampler.random_sampler.RandomSampler(extractor: datumaro.components.extractor.IExtractor, count: int, *, subset: Optional[str] = None, seed: Optional[int] = None)[source]
Bases:
datumaro.components.extractor.Transform
,datumaro.components.cli_plugin.CliPlugin
Sampler that keeps no more than required number of items in the dataset.
- Notes:
Items are selected uniformly
Requesting a sample larger than the number of all images will return all images
Example: select subset of 20 images randomly
random_sampler -k 20
Example: select subset of 20 images, modify only ‘train’ subset
random_sampler -k 20 -s train
- class datumaro.plugins.sampler.random_sampler.LabelRandomSampler(extractor: datumaro.components.extractor.IExtractor, *, count: Optional[int] = None, label_counts: Optional[Mapping[str, int]] = None, seed: Optional[int] = None)[source]
Bases:
datumaro.components.extractor.Transform
,datumaro.components.cli_plugin.CliPlugin
Sampler that keeps at least the required number of annotations of each class in the dataset for each subset separately.
Consider using the “stats” command to get class distribution in the dataset.
- Notes:
Items can contain annotations of several selected classes (e.g. 3 bounding boxes per image). The number of annotations in the resulting dataset varies between max(class counts) and sum(class counts)
If the input dataset does not has enough class annotations, the result will contain only what is available
Items are selected uniformly
For reasons above, the resulting class distribution in the dataset may not be the same as requested
The resulting dataset will only keep annotations for classes with specified count > 0
Example: select at least 5 annotations of each class randomly
label_random_sampler -k 5
Example: select at least 5 images with “cat” annotations and 3 “person”
label_random_sampler -l "cat:5" -l "person:3"