Detect dataset format
This command attempts to detect the format of a dataset in a directory. Currently, only local directories are supported.
The detection result may be one of:
- a single format being detected;
- no formats being detected (if the dataset doesn’t match any known format);
- multiple formats being detected (if the dataset is ambiguous).
The command outputs this result in a human-readable form and
optionally as a machine-readable JSON report (see --json-report
).
The format of the machine-readable report is as follows:
{
"detected_formats": [
"detected-format-name-1", "detected-format-name-2", ...
],
"rejected_formats": {
"rejected-format-name-1": {
"reason": <reason-code>,
"message": "line 1\nline 2\n...\nline N"
},
"rejected-format-name-2": ...,
...
}
}
The <reason-code>
can be one of:
-
"detection_unsupported"
: the corresponding format does not support detection. -
"insufficient_confidence"
: the dataset matched the corresponding format, but it matched at least one other format better. -
"unmet_requirements"
: the dataset didn’t meet at least one requirement of the corresponding format.
Other reason codes may be defined in the future.
Usage:
datum detect-format [-h] [-p PROJECT_DIR] [--show-rejections]
[--json-report JSON_REPORT]
url
Parameters:
<url>
- Path to the dataset to analyse.-h
,--help
- Print the help message and exit.-p, --project
(string) - Directory of the project to use as the context (default: current directory). The project might contain local plugins with custom formats, which will be used for detection.--show-rejections
- Describe why each supported format that wasn’t detected was rejected. This only affects the human-readable output; the machine-readable report always includes rejection information.--json-report
(string) - Path to which to save a JSON report describing detected and rejected formats. By default, no report is saved.
Example: detect the format of a dataset in a given directory, showing rejection information:
datum detect-format --show-rejections path/to/dataset