hl_ops module
- datumaro.components.hl_ops.transform(dataset: datumaro.components.extractor.IExtractor, method: Union[str, Type[datumaro.components.extractor.Transform]], *, env: Optional[datumaro.components.environment.Environment] = None, **kwargs) datumaro.components.extractor.IExtractor [source]
Applies some function to dataset items.
Results are computed lazily, if the transform supports this.
- Parameters
dataset – The dataset to be transformed
method – The transformation to be applied to the dataset. If a string is passed, it is treated as a plugin name, which is searched for in the environment set by the ‘env’ argument
env – A plugin collection. If not set, the built-in plugins are used
**kwargs – Parameters for the transformation
Returns: a wrapper around the input dataset
- datumaro.components.hl_ops.filter(dataset: datumaro.components.extractor.IExtractor, expr: str, *, filter_annotations: bool = False, remove_empty: bool = False) datumaro.components.extractor.IExtractor [source]
Filters out some dataset items or annotations, using a custom filter expression.
- Parameters
dataset – The dataset to be filtered
expr – XPath-formatted filter expression (e.g. /item[subset = ‘train’], /item/annotation[label = ‘cat’])
filter_annotations – Indicates if the filter should be applied to items or annotations
remove_empty – When filtering annotations, allows to exclude empty items from the resulting dataset
- Returns: a wrapper around the input dataset, which is computed lazily
during iteration
- datumaro.components.hl_ops.merge(*datasets: datumaro.components.extractor.IExtractor) datumaro.components.extractor.IExtractor [source]
Merges several datasets using the “simple” (exact matching) algorithm:
items are matched by (id, subset) pairs
matching items share the fields available
nothing + nothing = nothing,
nothing + something = something
something A + something B = conflict
annotations are matched by value and shared
in case of conflicts, throws an error
Returns: a wrapper around the input datasets
- datumaro.components.hl_ops.run_model(dataset: datumaro.components.extractor.IExtractor, model: Union[datumaro.components.launcher.Launcher, Type[datumaro.components.launcher.ModelTransform]], *, batch_size: int = 1, **kwargs) datumaro.components.extractor.IExtractor [source]
Applies a model to dataset items’ media and produces a dataset with media and annotations.
- Parameters
dataset – The dataset to be transformed
model – The model to be applied to the dataset
batch_size – The number of dataset items processed simultaneously by the model
**kwargs – Parameters for the model
- Returns: a wrapper around the input dataset, which is computed lazily
during iteration
- datumaro.components.hl_ops.export(dataset: datumaro.components.extractor.IExtractor, path: str, format: Union[str, Type[datumaro.components.converter.Converter]], *, env: Optional[datumaro.components.environment.Environment] = None, **kwargs) None [source]
Saves the input dataset in some format.
- Parameters
dataset – The dataset to be saved
path – The output directory
format – The desired output format for the dataset. If a string is passed, it is treated as a plugin name, which is searched for in the environment set by the ‘env’ argument
env – A plugin collection. If not set, the built-in plugins are used
**kwargs – Parameters for the export format
- datumaro.components.hl_ops.validate(dataset: datumaro.components.extractor.IExtractor, task: Union[str, datumaro.components.validator.TaskType], *, env: Optional[datumaro.components.environment.Environment] = None, **kwargs) Dict [source]
Checks dataset annotations for correctness relatively to a task type.
- Parameters
dataset – The dataset to check
task – Target task type - classification, detection etc.
env – A plugin collection. If not set, the built-in plugins are used
**kwargs – Parameters for the validator
Returns: a dictionary with validation results