Contribution Guide

Installation

Prerequisites

  • Python (3.7+)
git clone https://github.com/cvat-ai/datumaro

Optionally, install a virtual environment (recommended):

python -m pip install virtualenv
python -m virtualenv venv
. venv/bin/activate

Then install all dependencies:

pip install -r requirements.txt

Install Datumaro:

pip install -e /path/to/the/cloned/repo/

Optional dependencies

These components are only required for plugins and not installed by default:

  • OpenVINO
  • Accuracy Checker
  • TensorFlow
  • PyTorch
  • MxNet
  • Caffe

Usage

datum --help
python -m datumaro --help
python datumaro/ --help
python datum.py --help
import datumaro

Code style

Try to be readable and consistent with the existing codebase.

The project uses Black for code formatting and isort for sorting import statements. You can find corresponding configurations in pyproject.toml in the repository root. No trailing whitespaces, at most 100 characters per line.

Datumaro includes a Git pre-commit hook, dev/pre-commit.py that can help you follow the style requirements. See the comment at the top of that file for more information.

Environment

The recommended editor is VS Code with the Python language plugin.

Testing

It is expected that all Datumaro functionality is covered and checked by unit tests. Tests are placed in the tests/ directory. Additional pre-generated files for tests can be stored in the tests/assets/ directory. CLI tests are separated from the core tests, they are stored in the tests/cli/ directory.

Currently, we use pytest for testing.

To run tests use:

pytest -v

or

python -m pytest -v

Test cases

Test marking

For better integration with CI and requirements tracking, we use special annotations for tests.

A test needs to linked with a requirement it is related to. To link a test, use:

from unittest import TestCase
from .requirements import Requirements, mark_requirement

class MyTests(TestCase):
    @mark_requirement(Requirements.DATUM_GENERAL_REQ)
    def test_my_requirement(self):
        ... do stuff ...

Such marking will apply markings from the requirement specified. They can be overridden for a specific test:

import pytest

class MyTests(TestCase):
    @pytest.mark.priority_low
    @mark_requirement(Requirements.DATUM_GENERAL_REQ)
    def test_my_requirement(self):
        ... do stuff ...

Requirements

Requirements and other links need to be added to tests/requirements.py:

DATUM_244 = "Add Snyk integration"
DATUM_BUG_219 = "Return format is not uniform"
# Fully defined in GitHub issues:
@pytest.mark.reqids(Requirements.DATUM_244, Requirements.DATUM_333)

# And defined any other way:
@pytest.mark.reqids(Requirements.DATUM_GENERAL_REQ)
Available annotations for tests and requirements

Markings are defined in tests/conftest.py.

A list of requirements and bugs

@pytest.mark.requids(Requirements.DATUM_123)
@pytest.mark.bugs(Requirements.DATUM_BUG_456)

A priority

@pytest.mark.priority_low
@pytest.mark.priority_medium
@pytest.mark.priority_high

Component The marking used for indication of different system components

@pytest.mark.components(DatumaroComponent.Datumaro)

Skipping tests

@pytest.mark.skip(SkipMessages.NOT_IMPLEMENTED)

Parametrized runs

Parameters are used for running the same test with different parameters e.g.

@pytest.mark.parametrize("numpy_array, batch_size", [
    (np.zeros([2]), 0),
    (np.zeros([2]), 1),
    (np.zeros([2]), 2),
    (np.zeros([2]), 5),
    (np.zeros([5]), 2),
])

Test documentation

Tests are documented with docs strings. Test descriptions must contain the following: sections: Description, Expected results and Steps.

def test_can_convert_polygons_to_mask(self):
    """
    <b>Description:</b>
    Ensure that the dataset polygon annotation can be properly converted
    into dataset segmentation mask.

    <b>Expected results:</b>
    Dataset segmentation mask converted from dataset polygon annotation
    is equal to an expected mask.

    <b>Steps:</b>
    1. Prepare dataset with polygon annotation
    2. Prepare dataset with expected mask segmentation mode
    3. Convert source dataset to target, with conversion of annotation
      from polygon to mask.
    4. Verify that resulting segmentation mask is equal to the expected mask.
    """