: Airflow REST API endpoint, aka. http://127.0.0.1:8080/api/experimental
datasets in the data pipeline go through several stages and are stored on disk at each of these stages.
datasets in the data pipeline go through several stages and are stored on disk at each of these stages. This setting allow to customize the folder names of each of these stages.
: Name of the pending area
: Named of the unresolved area
: Name of the archive area
: Name of the ingesting area
: Name of the accepted area
: Name of the rejected area
: Name of the business area
: Absolute path, datasets root folder beneath which each area is defined.
: Absolute path, location where all types / domains and auto jobs are defined
: Absolute path, location where all computed metrics are stored
: Absolute path, location where all log are stored
: Should we backup the ingested datasets ? true by default
: Choose between parquet, orc ... Default is parquet
: Writing format for rejected datasets, choose between parquet, orc ... Default is parquet
: Writing format for audit datasets, choose between parquet, orc ... Default is parquet
: Cron Job Manager : simple (useful for testing) or airflow ? simple by default
: Should we create basics Hive statistics on the generated dataset ? true by default
: Should we create a Hive Table ? true by default
: see Area above
: Airflow end point. Should be defined even if simple launccher is used instead of airflow.
Describes a connection to a JDBC-accessible database engine
Describes a connection to a JDBC-accessible database engine
source / sink format (jdbc by default). Cf spark.format possible values
Spark SaveMode to use. If not present, the save mode will be computed from the write disposition set in the YAM file
any option required by the format used to ingest / tranform / compute the data. Eg for JDBC uri, user and password are required uri the URI of the database engine. It must start with "jdbc:" user the username under which to connect to the database engine password the password to use in order to connect to the database engine
the index into the Comet.jdbcEngines map of the underlying database engine, in case one cannot use the engine name from the uri
the use case for engineOverride is when you need to have an alternate schema definition (e.g. non-standard table names) alongside with the regular schema definition, on the same underlying engine.
Describes how to use a specific type of JDBC-accessible database engine
Describes how to use a specific type of JDBC-accessible database engine
for each of the Standard Table Names used by Comet, the specific SQL DDL statements as expected in the engine's own dialect.
: Max number of unique values allowed in cardinality compute
: Map of privacy algorightms name -> PrivacyEngine