: Minimum set of features required for the underlying filesystem
: Schema interface
: Cron Manager interface
Successively run each task of a job
Successively run each task of a job
: job name as defined in the YML file and sql parameters to pass to SQL statements.
Ingest the file (called by the cron manager at ingestion time for a specific dataset
Move the files from the landing area to the pending area.
Move the files from the landing area to the pending area. files are loaded one domain at a time each domain has its own directory and is specified in the "directory" key of Domain YML file compressed files are uncompressed if a corresponding ack file exist. Compressed files are recognized by their extension which should be one of .tgz, .zip, .gz. raw file should also have a corresponding ack file before moving the files to the pending area, the ack files are deleted To import files without ack specify an empty "ack" key (aka ack:"") in the domain YML file. "ack" is the default ack extension searched for but you may specify a different one in the domain YML file.
Split files into resolved and unresolved datasets.
Split files into resolved and unresolved datasets. A file is unresolved if a corresponding schema is not found. Schema matching is based on the dataset filename pattern
: includes Load pending dataset of these domain only excludes : Do not load datasets of these domains if both lists are empty, all domains are included
Runs the metrics job
Runs the metrics job
: Client's configuration for metrics computing
Set nullable property of column.
Set nullable property of column.
source DataFrame
is the flag to set, such that the column is either nullable or not
The whole worklfow works as follow :