Package org.apache.flink.orc
Class AbstractOrcFileInputFormat<T,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
- java.lang.Object
-
- org.apache.flink.orc.AbstractOrcFileInputFormat<T,BatchT,SplitT>
-
- Type Parameters:
T
- The type of records produced by this reader format.
- All Implemented Interfaces:
Serializable
,org.apache.flink.api.java.typeutils.ResultTypeQueryable<T>
,org.apache.flink.connector.file.src.reader.BulkFormat<T,SplitT>
- Direct Known Subclasses:
OrcColumnarRowInputFormat
public abstract class AbstractOrcFileInputFormat<T,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> extends Object implements org.apache.flink.connector.file.src.reader.BulkFormat<T,SplitT>
The base for ORC readers for theFileSource
. Implements the reader initialization, vectorized reading, and pooling of column vector objects.Subclasses implement the conversion to the specific result record(s) that they return by creating via extending
AbstractOrcFileInputFormat.OrcReaderBatch
.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>
TheOrcReaderBatch
class holds the data structures containing the batch data (column vectors, row arrays, ...) and performs the batch conversion from the ORC representation to the result format.protected static class
AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>
A vectorized ORC reader.
-
Field Summary
Fields Modifier and Type Field Description protected int
batchSize
protected List<OrcFilters.Predicate>
conjunctPredicates
protected SerializableHadoopConfigWrapper
hadoopConfigWrapper
protected org.apache.orc.TypeDescription
schema
protected int[]
selectedFields
protected OrcShim<BatchT>
shim
-
Constructor Summary
Constructors Modifier Constructor Description protected
AbstractOrcFileInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>
createReader(org.apache.flink.configuration.Configuration config, SplitT split)
abstract AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>
createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>> recycler, int batchSize)
Creates theAbstractOrcFileInputFormat.OrcReaderBatch
structure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.abstract org.apache.flink.api.common.typeinfo.TypeInformation<T>
getProducedType()
Gets the type produced by this format.boolean
isSplittable()
AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>
restoreReader(org.apache.flink.configuration.Configuration config, SplitT split)
-
-
-
Field Detail
-
hadoopConfigWrapper
protected final SerializableHadoopConfigWrapper hadoopConfigWrapper
-
schema
protected final org.apache.orc.TypeDescription schema
-
selectedFields
protected final int[] selectedFields
-
conjunctPredicates
protected final List<OrcFilters.Predicate> conjunctPredicates
-
batchSize
protected final int batchSize
-
-
Constructor Detail
-
AbstractOrcFileInputFormat
protected AbstractOrcFileInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize)
- Parameters:
shim
- the shim for various Orc dependent versions. If you use the latest version, please useOrcShim.defaultShim()
directly.hadoopConfig
- the hadoop config for orc reader.schema
- the full schema of orc format.selectedFields
- the read selected field of orc format.conjunctPredicates
- the filter predicates that can be evaluated.batchSize
- the batch size of orc reader.
-
-
Method Detail
-
createReader
public AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> createReader(org.apache.flink.configuration.Configuration config, SplitT split) throws IOException
- Specified by:
createReader
in interfaceorg.apache.flink.connector.file.src.reader.BulkFormat<T,BatchT>
- Throws:
IOException
-
restoreReader
public AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> restoreReader(org.apache.flink.configuration.Configuration config, SplitT split) throws IOException
- Specified by:
restoreReader
in interfaceorg.apache.flink.connector.file.src.reader.BulkFormat<T,BatchT>
- Throws:
IOException
-
isSplittable
public boolean isSplittable()
-
createReaderBatch
public abstract AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>> recycler, int batchSize)
Creates theAbstractOrcFileInputFormat.OrcReaderBatch
structure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.
-
getProducedType
public abstract org.apache.flink.api.common.typeinfo.TypeInformation<T> getProducedType()
Gets the type produced by this format.
-
-