hl_ops module

datumaro.components.hl_ops.transform(dataset: datumaro.components.extractor.IExtractor, method: Union[str, Type[datumaro.components.extractor.Transform]], *, env: Optional[datumaro.components.environment.Environment] = None, **kwargs) datumaro.components.extractor.IExtractor[source]

Applies some function to dataset items.

Results are computed lazily, if the transform supports this.

Parameters
  • dataset – The dataset to be transformed

  • method – The transformation to be applied to the dataset. If a string is passed, it is treated as a plugin name, which is searched for in the environment set by the ‘env’ argument

  • env – A plugin collection. If not set, the built-in plugins are used

  • **kwargs – Parameters for the transformation

Returns: a wrapper around the input dataset

datumaro.components.hl_ops.filter(dataset: datumaro.components.extractor.IExtractor, expr: str, *, filter_annotations: bool = False, remove_empty: bool = False) datumaro.components.extractor.IExtractor[source]

Filters out some dataset items or annotations, using a custom filter expression.

Parameters
  • dataset – The dataset to be filtered

  • expr – XPath-formatted filter expression (e.g. /item[subset = ‘train’], /item/annotation[label = ‘cat’])

  • filter_annotations – Indicates if the filter should be applied to items or annotations

  • remove_empty – When filtering annotations, allows to exclude empty items from the resulting dataset

Returns: a wrapper around the input dataset, which is computed lazily

during iteration

datumaro.components.hl_ops.merge(*datasets: datumaro.components.extractor.IExtractor) datumaro.components.extractor.IExtractor[source]

Merges several datasets using the “simple” (exact matching) algorithm:

  • items are matched by (id, subset) pairs

  • matching items share the fields available

    • nothing + nothing = nothing,

    • nothing + something = something

    • something A + something B = conflict

  • annotations are matched by value and shared

  • in case of conflicts, throws an error

Returns: a wrapper around the input datasets

datumaro.components.hl_ops.run_model(dataset: datumaro.components.extractor.IExtractor, model: Union[datumaro.components.launcher.Launcher, Type[datumaro.components.launcher.ModelTransform]], *, batch_size: int = 1, **kwargs) datumaro.components.extractor.IExtractor[source]

Applies a model to dataset items’ media and produces a dataset with media and annotations.

Parameters
  • dataset – The dataset to be transformed

  • model – The model to be applied to the dataset

  • batch_size – The number of dataset items processed simultaneously by the model

  • **kwargs – Parameters for the model

Returns: a wrapper around the input dataset, which is computed lazily

during iteration

datumaro.components.hl_ops.export(dataset: datumaro.components.extractor.IExtractor, path: str, format: Union[str, Type[datumaro.components.converter.Converter]], *, env: Optional[datumaro.components.environment.Environment] = None, **kwargs) None[source]

Saves the input dataset in some format.

Parameters
  • dataset – The dataset to be saved

  • path – The output directory

  • format – The desired output format for the dataset. If a string is passed, it is treated as a plugin name, which is searched for in the environment set by the ‘env’ argument

  • env – A plugin collection. If not set, the built-in plugins are used

  • **kwargs – Parameters for the export format

datumaro.components.hl_ops.validate(dataset: datumaro.components.extractor.IExtractor, task: Union[str, datumaro.components.validator.TaskType], *, env: Optional[datumaro.components.environment.Environment] = None, **kwargs) Dict[source]

Checks dataset annotations for correctness relatively to a task type.

Parameters
  • dataset – The dataset to check

  • task – Target task type - classification, detection etc.

  • env – A plugin collection. If not set, the built-in plugins are used

  • **kwargs – Parameters for the validator

Returns: a dictionary with validation results