An alignment strategy will accept an input Row and return an output Row that is compatible with the target schema.
Strategy responsible for the filenames created by eel when writing out data.
A Hive Part that can read values from the metastore, rather than reading values from files.
A Hive Part that can read values from the metastore, rather than reading values from files. This can be used only when the requested fields are all partition keys.
sets which fields are required by the caller.
optional predicate which will filter rows at the read level
optional constraits on the partition data to narrow which partitions are read
A handler that is invoked with the schema of the source and the existing schema in the metastore.
A handler that is invoked with the schema of the source and the existing schema in the metastore.
This allows a handler to decide how to handle differences. For instance an implementation may choose to evolve the metastore schema to add missing fields. Another implemention may throw an exception if the schemas are not aligned.
Accepts a metastore schema and returns the schema that should actually be persisted to disk.
Accepts a metastore schema and returns the schema that should actually be persisted to disk. This allows us to determine if some data is not written, for example in parquet files it is common to skip writing out partition data, since that data is present in the metastore.
An implementation of MetastoreSchemaHandler that will evolve the metastore schema were possible to match the incoming data.
An implementation of MetastoreSchemaHandler that will evolve the metastore schema were possible to match the incoming data.
It will do this by adding missing fields to the end of the current schema. The new fields cannot be added as partition fields as the table will already have been created.
Locates files for a given table.
Locates files for a given table.
Connects to the hive metastore to get the partitions list (or if no partitions then just root) and scans those directories.
Returns a Map of each partition to the files in that partition.
If partition constraints are specified then those partitions are filtered out.
If there are no partitions then the Map will contain a single key, of Partition.empty which acts as the root.
An implementation of MetastoreSchemaHandler that does nothing, this may result in errors downstream if, for example, the input schema does not include all columns and defaults cannot be applied.
An implementation of MetastoreSchemaHandler that requires the input schema to be compatible with the metastore schema.
An implementation of MetastoreSchemaHandler that requires the input schema to be compatible with the metastore schema. Compatiblity is achieved when all fields in the input schema are already defined in the metastore, with compatible types.
With this handler, the input schema is allowed to have extra fields which are not present in the metastore. It is assumed they will be dropped by the alignment strategy.
If the schemas are not compatible then an exception is raised.
An AlignmentStrategy that will use default values, or nulls, to pad out rows to match the target schema, dropping any fields that exist in the input, but not the output, schema
This strategy will drop partition columns from the schema so that they not written out to the files.
An implementation of MetastoreSchemaHandler that requires the input schema to be equal to the metastore schema.
An implementation of MetastoreSchemaHandler that requires the input schema to be equal to the metastore schema. Equality is defined as having the same field names with the same types (order is irrelevant).
Any missing fields or additional fields not present will cause an exception to be raised.
If the schemas are not equal then an exception is raised.
An alignment strategy will accept an input Row and return an output Row that is compatible with the target schema. This allows writing to sinks whereby the output schema is not the same as the input schema.
For example, the input may come from a JDBC table, and an output Hive table only defines a subset of the columns. Each row would need to be aligned so that it matches the subset schema.
Implementations are free to add values, drop values or throw an exception if they wish.