Package

io.eels.component

hive

Permalink

package hive

Visibility
  1. Public
  2. All

Type Members

  1. trait AlignmentStrategy extends AnyRef

    Permalink

    An alignment strategy will accept an input Row and return an output Row that is compatible with the target schema.

    An alignment strategy will accept an input Row and return an output Row that is compatible with the target schema. This allows writing to sinks whereby the output schema is not the same as the input schema.

    For example, the input may come from a JDBC table, and an output Hive table only defines a subset of the columns. Each row would need to be aligned so that it matches the subset schema.

    Implementations are free to add values, drop values or throw an exception if they wish.

  2. trait CommitCallback extends AnyRef

    Permalink
  3. case class Compactor(dbname: String, tablename: String)(implicit fs: FileSystem, conf: Configuration, client: IMetaStoreClient) extends Logging with Product with Serializable

    Permalink
  4. trait FileListener extends AnyRef

    Permalink
  5. trait FilenameStrategy extends AnyRef

    Permalink

    Strategy responsible for the filenames created by eel when writing out data.

  6. class HiveContext extends AnyRef

    Permalink
  7. case class HiveDatabase(dbName: String)(implicit fs: FileSystem, client: IMetaStoreClient) extends Product with Serializable

    Permalink
  8. case class HiveDatasetUri(db: String, table: String) extends Product with Serializable

    Permalink
  9. trait HiveDialect extends Logging

    Permalink
  10. class HiveFilePublisher extends Publisher[Seq[Row]] with Using

    Permalink

  11. class HiveOps extends Logging

    Permalink
  12. trait HiveOutputStream extends AnyRef

    Permalink
  13. class HivePartitionExtractor extends AnyRef

    Permalink
  14. class HivePartitionPublisher extends Publisher[Seq[Row]] with Logging

    Permalink

    A Hive Part that can read values from the metastore, rather than reading values from files.

    A Hive Part that can read values from the metastore, rather than reading values from files. This can be used only when the requested fields are all partition keys.

  15. class HivePartitionScanner extends Logging

    Permalink
  16. case class HiveSink(dbName: String, tableName: String, permission: Option[FsPermission] = None, inheritPermissions: Option[Boolean] = None, principal: Option[String] = None, partitionFields: Seq[String] = Nil, partitionStrategy: PartitionStrategy = new DynamicPartitionStrategy, filenameStrategy: FilenameStrategy = DefaultFilenameStrategy, stagingStrategy: StagingStrategy = DefaultStagingStrategy, metastoreSchemaHandler: MetastoreSchemaHandler = ..., alignStrategy: AlignmentStrategy = RowPaddingAlignmentStrategy, outputSchemaStrategy: OutputSchemaStrategy = SkipPartitionsOutputSchemaStrategy, keytabPath: Option[Path] = None, fileListener: FileListener = FileListener.noop, createTable: Boolean = false, dialect: Option[HiveDialect] = None, callbacks: Seq[CommitCallback] = Nil, roundingMode: RoundingMode = RoundingMode.UNNECESSARY, metadata: Map[String, String] = Map.empty)(implicit fs: FileSystem, client: IMetaStoreClient) extends Sink with Logging with Product with Serializable

    Permalink
  17. class HiveSinkWriter extends SinkWriter with Logging

    Permalink
  18. case class HiveSource(dbName: String, tableName: String, projection: List[String] = Nil, predicate: Option[Predicate] = None, partitionConstraints: Seq[PartitionConstraint] = Nil, principal: Option[String] = None, keytabPath: Option[Path] = None)(implicit fs: FileSystem, client: IMetaStoreClient) extends Source with Logging with Using with Product with Serializable

    Permalink

    projection

    sets which fields are required by the caller.

    predicate

    optional predicate which will filter rows at the read level

    partitionConstraints

    optional constraits on the partition data to narrow which partitions are read

  19. trait HiveStats extends AnyRef

    Permalink
  20. case class HiveTable(dbName: String, tableName: String)(implicit fs: FileSystem, conf: Configuration, client: IMetaStoreClient) extends Logging with Product with Serializable

    Permalink
  21. trait MetastoreSchemaHandler extends AnyRef

    Permalink

    A handler that is invoked with the schema of the source and the existing schema in the metastore.

    A handler that is invoked with the schema of the source and the existing schema in the metastore.

    This allows a handler to decide how to handle differences. For instance an implementation may choose to evolve the metastore schema to add missing fields. Another implemention may throw an exception if the schemas are not aligned.

  22. trait OutputSchemaStrategy extends AnyRef

    Permalink

    Accepts a metastore schema and returns the schema that should actually be persisted to disk.

    Accepts a metastore schema and returns the schema that should actually be persisted to disk. This allows us to determine if some data is not written, for example in parquet files it is common to skip writing out partition data, since that data is present in the metastore.

  23. class ParquetHiveStats extends HiveStats with Logging

    Permalink
  24. case class PartitionColumn(name: String, dataType: DataType = StringType) extends Product with Serializable

    Permalink
  25. trait RowAligner extends AnyRef

    Permalink
  26. trait StagingStrategy extends AnyRef

    Permalink
  27. trait StagingStrategy2 extends AnyRef

    Permalink
  28. case class TableSpec(tableName: String, tableType: TableType, location: String, cols: Seq[FieldSchema], numBuckets: Int, bucketNames: List[String], params: Map[String, String], inputFormat: String, outputFormat: String, serde: String, retention: Int, createTime: Long, lastAccessTime: Long, owner: String) extends Product with Serializable

    Permalink

Value Members

  1. object DefaultFilenameStrategy extends FilenameStrategy

    Permalink
  2. object DefaultStagingStrategy extends StagingStrategy

    Permalink
  3. object EvolutionMetastoreSchemaHandler extends MetastoreSchemaHandler with Logging

    Permalink

    An implementation of MetastoreSchemaHandler that will evolve the metastore schema were possible to match the incoming data.

    An implementation of MetastoreSchemaHandler that will evolve the metastore schema were possible to match the incoming data.

    It will do this by adding missing fields to the end of the current schema. The new fields cannot be added as partition fields as the table will already have been created.

  4. object FileListener

    Permalink
  5. object HiveDDL

    Permalink
  6. object HiveDatasetUri extends Serializable

    Permalink
  7. object HiveDialect extends Logging

    Permalink
  8. object HiveFileScanner extends Logging

    Permalink
  9. object HiveSchemaFns extends Logging

    Permalink
  10. object HiveSink extends Serializable

    Permalink
  11. object HiveTableFilesFn extends Logging

    Permalink

    Locates files for a given table.

    Locates files for a given table.

    Connects to the hive metastore to get the partitions list (or if no partitions then just root) and scans those directories.

    Returns a Map of each partition to the files in that partition.

    If partition constraints are specified then those partitions are filtered out.

    If there are no partitions then the Map will contain a single key, of Partition.empty which acts as the root.

  12. object NoopMetastoreSchemaHandler extends MetastoreSchemaHandler

    Permalink

    An implementation of MetastoreSchemaHandler that does nothing, this may result in errors downstream if, for example, the input schema does not include all columns and defaults cannot be applied.

  13. object RequireCompatibilityMetastoreSchemaHandler extends MetastoreSchemaHandler

    Permalink

    An implementation of MetastoreSchemaHandler that requires the input schema to be compatible with the metastore schema.

    An implementation of MetastoreSchemaHandler that requires the input schema to be compatible with the metastore schema. Compatiblity is achieved when all fields in the input schema are already defined in the metastore, with compatible types.

    With this handler, the input schema is allowed to have extra fields which are not present in the metastore. It is assumed they will be dropped by the alignment strategy.

    If the schemas are not compatible then an exception is raised.

  14. object RowPaddingAlignmentStrategy extends AlignmentStrategy

    Permalink

    An AlignmentStrategy that will use default values, or nulls, to pad out rows to match the target schema, dropping any fields that exist in the input, but not the output, schema

  15. object SkipPartitionsOutputSchemaStrategy extends OutputSchemaStrategy

    Permalink

    This strategy will drop partition columns from the schema so that they not written out to the files.

  16. object StrictMetastoreSchemaHandler extends MetastoreSchemaHandler

    Permalink

    An implementation of MetastoreSchemaHandler that requires the input schema to be equal to the metastore schema.

    An implementation of MetastoreSchemaHandler that requires the input schema to be equal to the metastore schema. Equality is defined as having the same field names with the same types (order is irrelevant).

    Any missing fields or additional fields not present will cause an exception to be raised.

    If the schemas are not equal then an exception is raised.

  17. package dialect

    Permalink
  18. package partition

    Permalink

Ungrouped