Class AbstractOrcFileInputFormat<T,​BatchT,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>

  • Type Parameters:
    T - The type of records produced by this reader format.
    All Implemented Interfaces:
    Serializable, org.apache.flink.api.java.typeutils.ResultTypeQueryable<T>, org.apache.flink.connector.file.src.reader.BulkFormat<T,​SplitT>
    Direct Known Subclasses:
    OrcColumnarRowInputFormat

    public abstract class AbstractOrcFileInputFormat<T,​BatchT,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
    extends Object
    implements org.apache.flink.connector.file.src.reader.BulkFormat<T,​SplitT>
    The base for ORC readers for the FileSource. Implements the reader initialization, vectorized reading, and pooling of column vector objects.

    Subclasses implement the conversion to the specific result record(s) that they return by creating via extending AbstractOrcFileInputFormat.OrcReaderBatch.

    See Also:
    Serialized Form
    • Field Detail

      • schema

        protected final org.apache.orc.TypeDescription schema
      • selectedFields

        protected final int[] selectedFields
      • batchSize

        protected final int batchSize
    • Constructor Detail

      • AbstractOrcFileInputFormat

        protected AbstractOrcFileInputFormat​(OrcShim<BatchT> shim,
                                             org.apache.hadoop.conf.Configuration hadoopConfig,
                                             org.apache.orc.TypeDescription schema,
                                             int[] selectedFields,
                                             List<OrcFilters.Predicate> conjunctPredicates,
                                             int batchSize)
        Parameters:
        shim - the shim for various Orc dependent versions. If you use the latest version, please use OrcShim.defaultShim() directly.
        hadoopConfig - the hadoop config for orc reader.
        schema - the full schema of orc format.
        selectedFields - the read selected field of orc format.
        conjunctPredicates - the filter predicates that can be evaluated.
        batchSize - the batch size of orc reader.