Package org.apache.parquet.hadoop
Class ParquetWriter<T>
- java.lang.Object
-
- org.apache.parquet.hadoop.ParquetWriter<T>
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
ExampleParquetWriter
public class ParquetWriter<T> extends Object implements Closeable
Write records to a Parquet file.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ParquetWriter.Builder<T,SELF extends ParquetWriter.Builder<T,SELF>>
An abstract builder class for ParquetWriter instances.
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_BLOCK_SIZE
static org.apache.parquet.hadoop.metadata.CompressionCodecName
DEFAULT_COMPRESSION_CODEC_NAME
static boolean
DEFAULT_IS_DICTIONARY_ENABLED
static boolean
DEFAULT_IS_VALIDATING_ENABLED
static int
DEFAULT_PAGE_SIZE
static org.apache.parquet.column.ParquetProperties.WriterVersion
DEFAULT_WRITER_VERSION
static int
MAX_PADDING_SIZE_DEFAULT
static String
OBJECT_MODEL_NAME_PROP
-
Constructor Summary
Constructors Constructor Description ParquetWriter(org.apache.hadoop.fs.Path file, org.apache.hadoop.conf.Configuration conf, WriteSupport<T> writeSupport)
Deprecated.ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport)
Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize)
Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, boolean enableDictionary, boolean validating)
Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating)
Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, org.apache.parquet.column.ParquetProperties.WriterVersion writerVersion)
Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, org.apache.parquet.column.ParquetProperties.WriterVersion writerVersion, org.apache.hadoop.conf.Configuration conf)
Deprecated.will be removed in 2.0.0ParquetWriter(org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, org.apache.parquet.column.ParquetProperties.WriterVersion writerVersion, org.apache.hadoop.conf.Configuration conf)
Deprecated.will be removed in 2.0.0
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
long
getDataSize()
ParquetMetadata
getFooter()
void
write(T object)
-
-
-
Field Detail
-
DEFAULT_BLOCK_SIZE
public static final int DEFAULT_BLOCK_SIZE
- See Also:
- Constant Field Values
-
DEFAULT_PAGE_SIZE
public static final int DEFAULT_PAGE_SIZE
- See Also:
- Constant Field Values
-
DEFAULT_COMPRESSION_CODEC_NAME
public static final org.apache.parquet.hadoop.metadata.CompressionCodecName DEFAULT_COMPRESSION_CODEC_NAME
-
DEFAULT_IS_DICTIONARY_ENABLED
public static final boolean DEFAULT_IS_DICTIONARY_ENABLED
- See Also:
- Constant Field Values
-
DEFAULT_IS_VALIDATING_ENABLED
public static final boolean DEFAULT_IS_VALIDATING_ENABLED
- See Also:
- Constant Field Values
-
DEFAULT_WRITER_VERSION
public static final org.apache.parquet.column.ParquetProperties.WriterVersion DEFAULT_WRITER_VERSION
-
OBJECT_MODEL_NAME_PROP
public static final String OBJECT_MODEL_NAME_PROP
- See Also:
- Constant Field Values
-
MAX_PADDING_SIZE_DEFAULT
public static final int MAX_PADDING_SIZE_DEFAULT
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter. (with dictionary encoding enabled and validation off)- Parameters:
file
- the file to createwriteSupport
- the implementation to write a record to a RecordConsumercompressionCodecName
- the compression codec to useblockSize
- the block size thresholdpageSize
- the page size threshold- Throws:
IOException
- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, boolean enableDictionary, boolean validating) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter.- Parameters:
file
- the file to createwriteSupport
- the implementation to write a record to a RecordConsumercompressionCodecName
- the compression codec to useblockSize
- the block size thresholdpageSize
- the page size threshold (both data and dictionary)enableDictionary
- to turn dictionary encoding onvalidating
- to turn on validation using the schema- Throws:
IOException
- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter.- Parameters:
file
- the file to createwriteSupport
- the implementation to write a record to a RecordConsumercompressionCodecName
- the compression codec to useblockSize
- the block size thresholdpageSize
- the page size thresholddictionaryPageSize
- the page size threshold for the dictionary pagesenableDictionary
- to turn dictionary encoding onvalidating
- to turn on validation using the schema- Throws:
IOException
- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, org.apache.parquet.column.ParquetProperties.WriterVersion writerVersion) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter. Directly instantiates a HadoopConfiguration
which reads configuration from the classpath.- Parameters:
file
- the file to createwriteSupport
- the implementation to write a record to a RecordConsumercompressionCodecName
- the compression codec to useblockSize
- the block size thresholdpageSize
- the page size thresholddictionaryPageSize
- the page size threshold for the dictionary pagesenableDictionary
- to turn dictionary encoding onvalidating
- to turn on validation using the schemawriterVersion
- version of parquetWriter fromParquetProperties.WriterVersion
- Throws:
IOException
- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, org.apache.parquet.column.ParquetProperties.WriterVersion writerVersion, org.apache.hadoop.conf.Configuration conf) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter.- Parameters:
file
- the file to createwriteSupport
- the implementation to write a record to a RecordConsumercompressionCodecName
- the compression codec to useblockSize
- the block size thresholdpageSize
- the page size thresholddictionaryPageSize
- the page size threshold for the dictionary pagesenableDictionary
- to turn dictionary encoding onvalidating
- to turn on validation using the schemawriterVersion
- version of parquetWriter fromParquetProperties.WriterVersion
conf
- Hadoop configuration to use while accessing the filesystem- Throws:
IOException
- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode, WriteSupport<T> writeSupport, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, org.apache.parquet.column.ParquetProperties.WriterVersion writerVersion, org.apache.hadoop.conf.Configuration conf) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter.- Parameters:
file
- the file to createmode
- file creation modewriteSupport
- the implementation to write a record to a RecordConsumercompressionCodecName
- the compression codec to useblockSize
- the block size thresholdpageSize
- the page size thresholddictionaryPageSize
- the page size threshold for the dictionary pagesenableDictionary
- to turn dictionary encoding onvalidating
- to turn on validation using the schemawriterVersion
- version of parquetWriter fromParquetProperties.WriterVersion
conf
- Hadoop configuration to use while accessing the filesystem- Throws:
IOException
- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, WriteSupport<T> writeSupport) throws IOException
Deprecated.will be removed in 2.0.0Create a new ParquetWriter. The default block size is 128 MB. The default page size is 1 MB. Default compression is no compression. Dictionary encoding is disabled.- Parameters:
file
- the file to createwriteSupport
- the implementation to write a record to a RecordConsumer- Throws:
IOException
- if there is an error while writing
-
ParquetWriter
@Deprecated public ParquetWriter(org.apache.hadoop.fs.Path file, org.apache.hadoop.conf.Configuration conf, WriteSupport<T> writeSupport) throws IOException
Deprecated.- Throws:
IOException
-
-
Method Detail
-
write
public void write(T object) throws IOException
- Throws:
IOException
-
close
public void close() throws IOException
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
getFooter
public ParquetMetadata getFooter()
- Returns:
- the ParquetMetadata written to the (closed) file.
-
getDataSize
public long getDataSize()
- Returns:
- the total size of data written to the file and buffered in memory
-
-