Package org.apache.parquet.hadoop
Class ParquetWriter.Builder<T,SELF extends ParquetWriter.Builder<T,SELF>>
- java.lang.Object
-
- org.apache.parquet.hadoop.ParquetWriter.Builder<T,SELF>
-
- Type Parameters:
T
- The type of objects written by the constructed ParquetWriter.SELF
- The type of this builder that is returned by builder methods
- Direct Known Subclasses:
ExampleParquetWriter.Builder
- Enclosing class:
- ParquetWriter<T>
public abstract static class ParquetWriter.Builder<T,SELF extends ParquetWriter.Builder<T,SELF>> extends Object
An abstract builder class for ParquetWriter instances. Object models should extend this builder to provide writer configuration options.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description ParquetWriter<T>
build()
Build aParquetWriter
with the accumulated configuration.SELF
config(String property, String value)
Set a property that will be available to the read path.SELF
enableDictionaryEncoding()
Enables dictionary encoding for the constructed writer.SELF
enablePageWriteChecksum()
Enables writing page level checksums for the constructed writer.SELF
enableValidation()
Enables validation for the constructed writer.protected abstract WriteSupport<T>
getWriteSupport(org.apache.hadoop.conf.Configuration conf)
protected abstract SELF
self()
SELF
withBloomFilterEnabled(boolean enabled)
Sets the bloom filter enabled/disabledSELF
withBloomFilterEnabled(String columnPath, boolean enabled)
Sets the bloom filter enabled/disabled for the specified column.SELF
withBloomFilterFPP(String columnPath, double fpp)
SELF
withBloomFilterNDV(String columnPath, long ndv)
Sets the NDV (number of distinct values) for the specified column.SELF
withByteStreamSplitEncoding(boolean enableByteStreamSplit)
SELF
withColumnIndexTruncateLength(int length)
Sets the length to be used for truncating binary values in a binary column index.SELF
withCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
Set thecompression codec
used by the constructed writer.SELF
withConf(org.apache.hadoop.conf.Configuration conf)
Set theConfiguration
used by the constructed writer.SELF
withDictionaryEncoding(boolean enableDictionary)
Enable or disable dictionary encoding for the constructed writer.SELF
withDictionaryEncoding(String columnPath, boolean enableDictionary)
Enable or disable dictionary encoding of the specified column for the constructed writer.SELF
withDictionaryPageSize(int dictionaryPageSize)
Set the Parquet format dictionary page size used by the constructed writer.SELF
withEncryption(FileEncryptionProperties encryptionProperties)
Set thefile encryption properties
used by the constructed writer.SELF
withMaxPaddingSize(int maxPaddingSize)
Set the maximum amount of padding, in bytes, that will be used to align row groups with blocks in the underlying filesystem.SELF
withMaxRowCountForPageSizeCheck(int max)
Sets the maximum number of rows to write before a page size check is done.SELF
withMinRowCountForPageSizeCheck(int min)
Sets the minimum number of rows to write before a page size check is done.SELF
withPageRowCountLimit(int rowCount)
Sets the Parquet format page row count limit used by the constructed writer.SELF
withPageSize(int pageSize)
Set the Parquet format page size used by the constructed writer.SELF
withPageWriteChecksumEnabled(boolean enablePageWriteChecksum)
Enables writing page level checksums for the constructed writer.SELF
withRowGroupSize(int rowGroupSize)
Deprecated.UsewithRowGroupSize(long)
insteadSELF
withRowGroupSize(long rowGroupSize)
Set the Parquet format row group size used by the constructed writer.SELF
withStatisticsTruncateLength(int length)
Sets the length which the min/max binary values in row groups are truncated to.SELF
withValidation(boolean enableValidation)
Enable or disable validation for the constructed writer.SELF
withWriteMode(ParquetFileWriter.Mode mode)
Set thewrite mode
used when creating the backing file for this writer.SELF
withWriterVersion(org.apache.parquet.column.ParquetProperties.WriterVersion version)
Set theformat version
used by the constructed writer.
-
-
-
Method Detail
-
self
protected abstract SELF self()
- Returns:
- this as the correct subclass of ParquetWriter.Builder.
-
getWriteSupport
protected abstract WriteSupport<T> getWriteSupport(org.apache.hadoop.conf.Configuration conf)
- Parameters:
conf
- a configuration- Returns:
- an appropriate WriteSupport for the object model.
-
withConf
public SELF withConf(org.apache.hadoop.conf.Configuration conf)
Set theConfiguration
used by the constructed writer.- Parameters:
conf
- aConfiguration
- Returns:
- this builder for method chaining.
-
withWriteMode
public SELF withWriteMode(ParquetFileWriter.Mode mode)
Set thewrite mode
used when creating the backing file for this writer.- Parameters:
mode
- aParquetFileWriter.Mode
- Returns:
- this builder for method chaining.
-
withCompressionCodec
public SELF withCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
Set thecompression codec
used by the constructed writer.- Parameters:
codecName
- aCompressionCodecName
- Returns:
- this builder for method chaining.
-
withEncryption
public SELF withEncryption(FileEncryptionProperties encryptionProperties)
Set thefile encryption properties
used by the constructed writer.- Parameters:
encryptionProperties
- aFileEncryptionProperties
- Returns:
- this builder for method chaining.
-
withRowGroupSize
@Deprecated public SELF withRowGroupSize(int rowGroupSize)
Deprecated.UsewithRowGroupSize(long)
insteadSet the Parquet format row group size used by the constructed writer.- Parameters:
rowGroupSize
- an integer size in bytes- Returns:
- this builder for method chaining.
-
withRowGroupSize
public SELF withRowGroupSize(long rowGroupSize)
Set the Parquet format row group size used by the constructed writer.- Parameters:
rowGroupSize
- an integer size in bytes- Returns:
- this builder for method chaining.
-
withPageSize
public SELF withPageSize(int pageSize)
Set the Parquet format page size used by the constructed writer.- Parameters:
pageSize
- an integer size in bytes- Returns:
- this builder for method chaining.
-
withPageRowCountLimit
public SELF withPageRowCountLimit(int rowCount)
Sets the Parquet format page row count limit used by the constructed writer.- Parameters:
rowCount
- limit for the number of rows stored in a page- Returns:
- this builder for method chaining
-
withDictionaryPageSize
public SELF withDictionaryPageSize(int dictionaryPageSize)
Set the Parquet format dictionary page size used by the constructed writer.- Parameters:
dictionaryPageSize
- an integer size in bytes- Returns:
- this builder for method chaining.
-
withMaxPaddingSize
public SELF withMaxPaddingSize(int maxPaddingSize)
Set the maximum amount of padding, in bytes, that will be used to align row groups with blocks in the underlying filesystem. If the underlying filesystem is not a block filesystem like HDFS, this has no effect.- Parameters:
maxPaddingSize
- an integer size in bytes- Returns:
- this builder for method chaining.
-
enableDictionaryEncoding
public SELF enableDictionaryEncoding()
Enables dictionary encoding for the constructed writer.- Returns:
- this builder for method chaining.
-
withDictionaryEncoding
public SELF withDictionaryEncoding(boolean enableDictionary)
Enable or disable dictionary encoding for the constructed writer.- Parameters:
enableDictionary
- whether dictionary encoding should be enabled- Returns:
- this builder for method chaining.
-
withByteStreamSplitEncoding
public SELF withByteStreamSplitEncoding(boolean enableByteStreamSplit)
-
withDictionaryEncoding
public SELF withDictionaryEncoding(String columnPath, boolean enableDictionary)
Enable or disable dictionary encoding of the specified column for the constructed writer.- Parameters:
columnPath
- the path of the column (dot-string)enableDictionary
- whether dictionary encoding should be enabled- Returns:
- this builder for method chaining.
-
enableValidation
public SELF enableValidation()
Enables validation for the constructed writer.- Returns:
- this builder for method chaining.
-
withValidation
public SELF withValidation(boolean enableValidation)
Enable or disable validation for the constructed writer.- Parameters:
enableValidation
- whether validation should be enabled- Returns:
- this builder for method chaining.
-
withWriterVersion
public SELF withWriterVersion(org.apache.parquet.column.ParquetProperties.WriterVersion version)
Set theformat version
used by the constructed writer.- Parameters:
version
- aWriterVersion
- Returns:
- this builder for method chaining.
-
enablePageWriteChecksum
public SELF enablePageWriteChecksum()
Enables writing page level checksums for the constructed writer.- Returns:
- this builder for method chaining.
-
withPageWriteChecksumEnabled
public SELF withPageWriteChecksumEnabled(boolean enablePageWriteChecksum)
Enables writing page level checksums for the constructed writer.- Parameters:
enablePageWriteChecksum
- whether page checksums should be written out- Returns:
- this builder for method chaining.
-
withBloomFilterNDV
public SELF withBloomFilterNDV(String columnPath, long ndv)
Sets the NDV (number of distinct values) for the specified column.- Parameters:
columnPath
- the path of the column (dot-string)ndv
- the NDV of the column- Returns:
- this builder for method chaining.
-
withBloomFilterEnabled
public SELF withBloomFilterEnabled(boolean enabled)
Sets the bloom filter enabled/disabled- Parameters:
enabled
- whether to write bloom filters- Returns:
- this builder for method chaining
-
withBloomFilterEnabled
public SELF withBloomFilterEnabled(String columnPath, boolean enabled)
Sets the bloom filter enabled/disabled for the specified column. If not set for the column specifically the default enabled/disabled state will take place. SeewithBloomFilterEnabled(boolean)
.- Parameters:
columnPath
- the path of the column (dot-string)enabled
- whether to write bloom filter for the column- Returns:
- this builder for method chaining
-
withMinRowCountForPageSizeCheck
public SELF withMinRowCountForPageSizeCheck(int min)
Sets the minimum number of rows to write before a page size check is done.- Parameters:
min
- writes at least `min` rows before invoking a page size check- Returns:
- this builder for method chaining
-
withMaxRowCountForPageSizeCheck
public SELF withMaxRowCountForPageSizeCheck(int max)
Sets the maximum number of rows to write before a page size check is done.- Parameters:
max
- makes a page size check after `max` rows have been written- Returns:
- this builder for method chaining
-
withColumnIndexTruncateLength
public SELF withColumnIndexTruncateLength(int length)
Sets the length to be used for truncating binary values in a binary column index.- Parameters:
length
- the length to truncate to- Returns:
- this builder for method chaining
-
withStatisticsTruncateLength
public SELF withStatisticsTruncateLength(int length)
Sets the length which the min/max binary values in row groups are truncated to.- Parameters:
length
- the length to truncate to- Returns:
- this builder for method chaining
-
config
public SELF config(String property, String value)
Set a property that will be available to the read path. For writers that use a Hadoop configuration, this is the recommended way to add configuration values.- Parameters:
property
- a String property namevalue
- a String property value- Returns:
- this builder for method chaining.
-
build
public ParquetWriter<T> build() throws IOException
Build aParquetWriter
with the accumulated configuration.- Returns:
- a configured
ParquetWriter
instance. - Throws:
IOException
- if there is an error while creating the writer
-
-