hl_ops module

datumaro.components.hl_ops.transform(dataset: datumaro.components.extractor.IExtractor, method: Union[str, Type[datumaro.components.extractor.Transform]], *, env: Optional[datumaro.components.environment.Environment] = None, **kwargs) → datumaro.components.extractor.IExtractor[source]

Applies some function to dataset items.

Results are computed lazily, if the transform supports this.

Parameters

dataset – The dataset to be transformed
method – The transformation to be applied to the dataset. If a string is passed, it is treated as a plugin name, which is searched for in the environment set by the ‘env’ argument
env – A plugin collection. If not set, the built-in plugins are used
**kwargs – Parameters for the transformation

Returns: a wrapper around the input dataset

datumaro.components.hl_ops.filter(dataset: datumaro.components.extractor.IExtractor, expr: str, *, filter_annotations: bool = False, remove_empty: bool = False) → datumaro.components.extractor.IExtractor[source]

Filters out some dataset items or annotations, using a custom filter expression.

Parameters

dataset – The dataset to be filtered
expr – XPath-formatted filter expression (e.g. /item[subset = ‘train’], /item/annotation[label = ‘cat’])
filter_annotations – Indicates if the filter should be applied to items or annotations
remove_empty – When filtering annotations, allows to exclude empty items from the resulting dataset

Returns: a wrapper around the input dataset, which is computed lazily: during iteration

datumaro.components.hl_ops.merge(*datasets: datumaro.components.extractor.IExtractor) → datumaro.components.extractor.IExtractor[source]

Merges several datasets using the “simple” (exact matching) algorithm:

items are matched by (id, subset) pairs

matching items share the fields available

nothing + nothing = nothing,

nothing + something = something

something A + something B = conflict

annotations are matched by value and shared

in case of conflicts, throws an error

Returns: a wrapper around the input datasets

datumaro.components.hl_ops.run_model(dataset: datumaro.components.extractor.IExtractor, model: Union[datumaro.components.launcher.Launcher, Type[datumaro.components.launcher.ModelTransform]], *, batch_size: int = 1, **kwargs) → datumaro.components.extractor.IExtractor[source]

Applies a model to dataset items’ media and produces a dataset with media and annotations.

Parameters

dataset – The dataset to be transformed
model – The model to be applied to the dataset
batch_size – The number of dataset items processed simultaneously by the model
**kwargs – Parameters for the model

Returns: a wrapper around the input dataset, which is computed lazily: during iteration

datumaro.components.hl_ops.export(dataset: datumaro.components.extractor.IExtractor, path: str, format: Union[str, Type[datumaro.components.converter.Converter]], *, env: Optional[datumaro.components.environment.Environment] = None, **kwargs) → None[source]

Saves the input dataset in some format.

Parameters

dataset – The dataset to be saved
path – The output directory
format – The desired output format for the dataset. If a string is passed, it is treated as a plugin name, which is searched for in the environment set by the ‘env’ argument
env – A plugin collection. If not set, the built-in plugins are used
**kwargs – Parameters for the export format

datumaro.components.hl_ops.validate(dataset: datumaro.components.extractor.IExtractor, task: Union[str, datumaro.components.validator.TaskType], *, env: Optional[datumaro.components.environment.Environment] = None, **kwargs) → Dict[source]

Checks dataset annotations for correctness relatively to a task type.

Parameters

dataset – The dataset to check
task – Target task type - classification, detection etc.
env – A plugin collection. If not set, the built-in plugins are used
**kwargs – Parameters for the validator

Returns: a dictionary with validation results