Transform Project

This command allows to modify images or annotations in a project all at once.

datum transform --help

datum transform \
    -p <project_dir> \
    -o <output_dir> \
    -t <transform_name> \
    -- [extra transform options]

Example: split a dataset randomly to train and test subsets, ratio is 2:1

datum transform -t random_split -- --subset train:.67 --subset test:.33

Example: split a dataset in task-specific manner. The tasks supported are classification, detection, segmentation and re-identification.

datum transform -t split -- \
    -t classification --subset train:.5 --subset val:.2 --subset test:.3

datum transform -t split -- \
    -t detection --subset train:.5 --subset val:.2 --subset test:.3

datum transform -t split -- \
    -t segmentation --subset train:.5 --subset val:.2 --subset test:.3

datum transform -t split -- \
    -t reid --subset train:.5 --subset val:.2 --subset test:.3 \
    --query .5

Example: convert polygons to masks, masks to boxes etc.:

datum transform -t boxes_to_masks
datum transform -t masks_to_polygons
datum transform -t polygons_to_masks
datum transform -t shapes_to_boxes

Example: remap dataset labels, person to car and cat to dog, keep bus, remove others

datum transform -t remap_labels -- \
    -l person:car -l bus:bus -l cat:dog \
    --default delete

Example: rename dataset items by a regular expression

  • Replace pattern with replacement
  • Remove frame_ from item ids
datum transform -t rename -- -e '|pattern|replacement|'
datum transform -t rename -- -e '|frame_(\d+)|\\1|'

Example: sampling dataset items as many as the number of target samples with sampling method entered by the user, divide into sampled and unsampled subsets

  • There are five methods of sampling the m option.
    • topk: Return the k with high uncertainty data
    • lowk: Return the k with low uncertainty data
    • randk: Return the random k data
    • mixk: Return half to topk method and the rest to lowk method
    • randtopk: First, select 3 times the number of k randomly, and return the topk among them.
datum transform -t sampler -- \
    -a entropy \
    -i train \
    -o sampled \
    -u unsampled \
    -m topk \
    -k 20

Example : control number of outputs to 100 after NDR

  • There are two methods in NDR e option
    • random: sample from removed data randomly
    • similarity: sample from removed data with ascending
  • There are two methods in NDR u option
    • uniform: sample data with uniform distribution
    • inverse: sample data with reciprocal of the number
datum transform -t ndr -- \
    -w train \
    -a gradient \
    -k 100 \
    -e random \
    -u uniform