Class ParquetFileWriter


  • public class ParquetFileWriter
    extends Object
    Internal implementation of the Parquet file writer as a block container
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  ParquetFileWriter.Mode  
    • Constructor Summary

      Constructors 
      Constructor Description
      ParquetFileWriter​(org.apache.hadoop.conf.Configuration configuration, org.apache.parquet.schema.MessageType schema, org.apache.hadoop.fs.Path file)
      Deprecated.
      will be removed in 2.0.0
      ParquetFileWriter​(org.apache.hadoop.conf.Configuration configuration, org.apache.parquet.schema.MessageType schema, org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode)
      Deprecated.
      will be removed in 2.0.0
      ParquetFileWriter​(org.apache.hadoop.conf.Configuration configuration, org.apache.parquet.schema.MessageType schema, org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize)
      Deprecated.
      will be removed in 2.0.0
      ParquetFileWriter​(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize)
      Deprecated.
      will be removed in 2.0.0
      ParquetFileWriter​(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, int columnIndexTruncateLength, int statisticsTruncateLength, boolean pageWriteChecksumEnabled)  
      ParquetFileWriter​(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, int columnIndexTruncateLength, int statisticsTruncateLength, boolean pageWriteChecksumEnabled, FileEncryptionProperties encryptionProperties)  
      ParquetFileWriter​(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, int columnIndexTruncateLength, int statisticsTruncateLength, boolean pageWriteChecksumEnabled, InternalFileEncryptor encryptor)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      void appendColumnChunk​(org.apache.parquet.column.ColumnDescriptor descriptor, org.apache.parquet.io.SeekableInputStream from, ColumnChunkMetaData chunk, org.apache.parquet.column.values.bloomfilter.BloomFilter bloomFilter, org.apache.parquet.internal.column.columnindex.ColumnIndex columnIndex, org.apache.parquet.internal.column.columnindex.OffsetIndex offsetIndex)  
      void appendFile​(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path file)
      Deprecated.
      will be removed in 2.0.0; use appendFile(InputFile) instead
      void appendFile​(org.apache.parquet.io.InputFile file)  
      void appendRowGroup​(org.apache.hadoop.fs.FSDataInputStream from, BlockMetaData rowGroup, boolean dropColumns)
      Deprecated.
      will be removed in 2.0.0; use appendRowGroup(SeekableInputStream,BlockMetaData,boolean) instead
      void appendRowGroup​(org.apache.parquet.io.SeekableInputStream from, BlockMetaData rowGroup, boolean dropColumns)  
      void appendRowGroups​(org.apache.hadoop.fs.FSDataInputStream file, List<BlockMetaData> rowGroups, boolean dropColumns)
      Deprecated.
      will be removed in 2.0.0; use appendRowGroups(SeekableInputStream,List,boolean) instead
      void appendRowGroups​(org.apache.parquet.io.SeekableInputStream file, List<BlockMetaData> rowGroups, boolean dropColumns)  
      void end​(Map<String,​String> extraMetaData)
      ends a file once all blocks have been written.
      void endBlock()
      ends a block once all column chunks have been written
      void endColumn()
      end a column (once all rep, def and data have been written)
      InternalFileEncryptor getEncryptor()  
      ParquetMetadata getFooter()  
      long getNextRowGroupSize()  
      long getPos()  
      static ParquetMetadata mergeMetadataFiles​(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.conf.Configuration conf)
      Deprecated.
      metadata files are not recommended and will be removed in 2.0.0
      static ParquetMetadata mergeMetadataFiles​(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.conf.Configuration conf, KeyValueMetadataMergeStrategy keyValueMetadataMergeStrategy)
      Deprecated.
      metadata files are not recommended and will be removed in 2.0.0
      void start()
      start the file
      void startBlock​(long recordCount)
      start a block
      void startColumn​(org.apache.parquet.column.ColumnDescriptor descriptor, long valueCount, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName)
      start a column inside a block
      void writeDataPage​(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding)
      Deprecated.
      void writeDataPage​(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics statistics, long rowCount, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding)
      Writes a single page
      void writeDataPage​(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics statistics, long rowCount, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD)
      Writes a single page
      void writeDataPage​(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics statistics, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding)
      Deprecated.
      this method does not support writing column indexes; Use writeDataPage(int, int, BytesInput, Statistics, long, Encoding, Encoding, Encoding) instead
      void writeDataPage​(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics statistics, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD)
      writes a single page
      void writeDataPageV2​(int rowCount, int nullCount, int valueCount, org.apache.parquet.bytes.BytesInput repetitionLevels, org.apache.parquet.bytes.BytesInput definitionLevels, org.apache.parquet.column.Encoding dataEncoding, org.apache.parquet.bytes.BytesInput compressedData, int uncompressedDataSize, org.apache.parquet.column.statistics.Statistics<?> statistics)
      Writes a single v2 data page
      void writeDictionaryPage​(org.apache.parquet.column.page.DictionaryPage dictionaryPage)
      writes a dictionary page page
      void writeDictionaryPage​(org.apache.parquet.column.page.DictionaryPage dictionaryPage, org.apache.parquet.format.BlockCipher.Encryptor headerBlockEncryptor, byte[] AAD)  
      static void writeMergedMetadataFile​(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.fs.Path outputPath, org.apache.hadoop.conf.Configuration conf)
      Deprecated.
      metadata files are not recommended and will be removed in 2.0.0
      static void writeMetadataFile​(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path outputPath, List<Footer> footers)
      Deprecated.
      metadata files are not recommended and will be removed in 2.0.0
      static void writeMetadataFile​(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path outputPath, List<Footer> footers, ParquetOutputFormat.JobSummaryLevel level)
      Deprecated.
      metadata files are not recommended and will be removed in 2.0.0
    • Constructor Detail

      • ParquetFileWriter

        @Deprecated
        public ParquetFileWriter​(org.apache.hadoop.conf.Configuration configuration,
                                 org.apache.parquet.schema.MessageType schema,
                                 org.apache.hadoop.fs.Path file)
                          throws IOException
        Deprecated.
        will be removed in 2.0.0
        Parameters:
        configuration - Hadoop configuration
        schema - the schema of the data
        file - the file to write to
        Throws:
        IOException - if the file can not be created
      • ParquetFileWriter

        @Deprecated
        public ParquetFileWriter​(org.apache.hadoop.conf.Configuration configuration,
                                 org.apache.parquet.schema.MessageType schema,
                                 org.apache.hadoop.fs.Path file,
                                 ParquetFileWriter.Mode mode)
                          throws IOException
        Deprecated.
        will be removed in 2.0.0
        Parameters:
        configuration - Hadoop configuration
        schema - the schema of the data
        file - the file to write to
        mode - file creation mode
        Throws:
        IOException - if the file can not be created
      • ParquetFileWriter

        @Deprecated
        public ParquetFileWriter​(org.apache.hadoop.conf.Configuration configuration,
                                 org.apache.parquet.schema.MessageType schema,
                                 org.apache.hadoop.fs.Path file,
                                 ParquetFileWriter.Mode mode,
                                 long rowGroupSize,
                                 int maxPaddingSize)
                          throws IOException
        Deprecated.
        will be removed in 2.0.0
        Parameters:
        configuration - Hadoop configuration
        schema - the schema of the data
        file - the file to write to
        mode - file creation mode
        rowGroupSize - the row group size
        maxPaddingSize - the maximum padding
        Throws:
        IOException - if the file can not be created
      • ParquetFileWriter

        @Deprecated
        public ParquetFileWriter​(org.apache.parquet.io.OutputFile file,
                                 org.apache.parquet.schema.MessageType schema,
                                 ParquetFileWriter.Mode mode,
                                 long rowGroupSize,
                                 int maxPaddingSize)
                          throws IOException
        Deprecated.
        will be removed in 2.0.0
        Parameters:
        file - OutputFile to create or overwrite
        schema - the schema of the data
        mode - file creation mode
        rowGroupSize - the row group size
        maxPaddingSize - the maximum padding
        Throws:
        IOException - if the file can not be created
      • ParquetFileWriter

        public ParquetFileWriter​(org.apache.parquet.io.OutputFile file,
                                 org.apache.parquet.schema.MessageType schema,
                                 ParquetFileWriter.Mode mode,
                                 long rowGroupSize,
                                 int maxPaddingSize,
                                 int columnIndexTruncateLength,
                                 int statisticsTruncateLength,
                                 boolean pageWriteChecksumEnabled)
                          throws IOException
        Parameters:
        file - OutputFile to create or overwrite
        schema - the schema of the data
        mode - file creation mode
        rowGroupSize - the row group size
        maxPaddingSize - the maximum padding
        columnIndexTruncateLength - the length which the min/max values in column indexes tried to be truncated to
        statisticsTruncateLength - the length which the min/max values in row groups tried to be truncated to
        pageWriteChecksumEnabled - whether to write out page level checksums
        Throws:
        IOException - if the file can not be created
      • ParquetFileWriter

        public ParquetFileWriter​(org.apache.parquet.io.OutputFile file,
                                 org.apache.parquet.schema.MessageType schema,
                                 ParquetFileWriter.Mode mode,
                                 long rowGroupSize,
                                 int maxPaddingSize,
                                 int columnIndexTruncateLength,
                                 int statisticsTruncateLength,
                                 boolean pageWriteChecksumEnabled,
                                 FileEncryptionProperties encryptionProperties)
                          throws IOException
        Throws:
        IOException
      • ParquetFileWriter

        public ParquetFileWriter​(org.apache.parquet.io.OutputFile file,
                                 org.apache.parquet.schema.MessageType schema,
                                 ParquetFileWriter.Mode mode,
                                 long rowGroupSize,
                                 int maxPaddingSize,
                                 int columnIndexTruncateLength,
                                 int statisticsTruncateLength,
                                 boolean pageWriteChecksumEnabled,
                                 InternalFileEncryptor encryptor)
                          throws IOException
        Throws:
        IOException
    • Method Detail

      • start

        public void start()
                   throws IOException
        start the file
        Throws:
        IOException - if there is an error while writing
      • startBlock

        public void startBlock​(long recordCount)
                        throws IOException
        start a block
        Parameters:
        recordCount - the record count in this block
        Throws:
        IOException - if there is an error while writing
      • startColumn

        public void startColumn​(org.apache.parquet.column.ColumnDescriptor descriptor,
                                long valueCount,
                                org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName)
                         throws IOException
        start a column inside a block
        Parameters:
        descriptor - the column descriptor
        valueCount - the value count in this column
        compressionCodecName - a compression codec name
        Throws:
        IOException - if there is an error while writing
      • writeDictionaryPage

        public void writeDictionaryPage​(org.apache.parquet.column.page.DictionaryPage dictionaryPage)
                                 throws IOException
        writes a dictionary page page
        Parameters:
        dictionaryPage - the dictionary page
        Throws:
        IOException - if there is an error while writing
      • writeDictionaryPage

        public void writeDictionaryPage​(org.apache.parquet.column.page.DictionaryPage dictionaryPage,
                                        org.apache.parquet.format.BlockCipher.Encryptor headerBlockEncryptor,
                                        byte[] AAD)
                                 throws IOException
        Throws:
        IOException
      • writeDataPage

        @Deprecated
        public void writeDataPage​(int valueCount,
                                  int uncompressedPageSize,
                                  org.apache.parquet.bytes.BytesInput bytes,
                                  org.apache.parquet.column.Encoding rlEncoding,
                                  org.apache.parquet.column.Encoding dlEncoding,
                                  org.apache.parquet.column.Encoding valuesEncoding)
                           throws IOException
        Deprecated.
        writes a single page
        Parameters:
        valueCount - count of values
        uncompressedPageSize - the size of the data once uncompressed
        bytes - the compressed data for the page without header
        rlEncoding - encoding of the repetition level
        dlEncoding - encoding of the definition level
        valuesEncoding - encoding of values
        Throws:
        IOException - if there is an error while writing
      • writeDataPage

        @Deprecated
        public void writeDataPage​(int valueCount,
                                  int uncompressedPageSize,
                                  org.apache.parquet.bytes.BytesInput bytes,
                                  org.apache.parquet.column.statistics.Statistics statistics,
                                  org.apache.parquet.column.Encoding rlEncoding,
                                  org.apache.parquet.column.Encoding dlEncoding,
                                  org.apache.parquet.column.Encoding valuesEncoding)
                           throws IOException
        Deprecated.
        this method does not support writing column indexes; Use writeDataPage(int, int, BytesInput, Statistics, long, Encoding, Encoding, Encoding) instead
        writes a single page
        Parameters:
        valueCount - count of values
        uncompressedPageSize - the size of the data once uncompressed
        bytes - the compressed data for the page without header
        statistics - statistics for the page
        rlEncoding - encoding of the repetition level
        dlEncoding - encoding of the definition level
        valuesEncoding - encoding of values
        Throws:
        IOException - if there is an error while writing
      • writeDataPage

        public void writeDataPage​(int valueCount,
                                  int uncompressedPageSize,
                                  org.apache.parquet.bytes.BytesInput bytes,
                                  org.apache.parquet.column.statistics.Statistics statistics,
                                  long rowCount,
                                  org.apache.parquet.column.Encoding rlEncoding,
                                  org.apache.parquet.column.Encoding dlEncoding,
                                  org.apache.parquet.column.Encoding valuesEncoding)
                           throws IOException
        Writes a single page
        Parameters:
        valueCount - count of values
        uncompressedPageSize - the size of the data once uncompressed
        bytes - the compressed data for the page without header
        statistics - the statistics of the page
        rowCount - the number of rows in the page
        rlEncoding - encoding of the repetition level
        dlEncoding - encoding of the definition level
        valuesEncoding - encoding of values
        Throws:
        IOException - if any I/O error occurs during writing the file
      • writeDataPage

        public void writeDataPage​(int valueCount,
                                  int uncompressedPageSize,
                                  org.apache.parquet.bytes.BytesInput bytes,
                                  org.apache.parquet.column.statistics.Statistics statistics,
                                  long rowCount,
                                  org.apache.parquet.column.Encoding rlEncoding,
                                  org.apache.parquet.column.Encoding dlEncoding,
                                  org.apache.parquet.column.Encoding valuesEncoding,
                                  org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor,
                                  byte[] pageHeaderAAD)
                           throws IOException
        Writes a single page
        Parameters:
        valueCount - count of values
        uncompressedPageSize - the size of the data once uncompressed
        bytes - the compressed data for the page without header
        statistics - the statistics of the page
        rowCount - the number of rows in the page
        rlEncoding - encoding of the repetition level
        dlEncoding - encoding of the definition level
        valuesEncoding - encoding of values
        metadataBlockEncryptor - encryptor for block data
        pageHeaderAAD - pageHeader AAD
        Throws:
        IOException - if any I/O error occurs during writing the file
      • writeDataPage

        public void writeDataPage​(int valueCount,
                                  int uncompressedPageSize,
                                  org.apache.parquet.bytes.BytesInput bytes,
                                  org.apache.parquet.column.statistics.Statistics statistics,
                                  org.apache.parquet.column.Encoding rlEncoding,
                                  org.apache.parquet.column.Encoding dlEncoding,
                                  org.apache.parquet.column.Encoding valuesEncoding,
                                  org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor,
                                  byte[] pageHeaderAAD)
                           throws IOException
        writes a single page
        Parameters:
        valueCount - count of values
        uncompressedPageSize - the size of the data once uncompressed
        bytes - the compressed data for the page without header
        statistics - statistics for the page
        rlEncoding - encoding of the repetition level
        dlEncoding - encoding of the definition level
        valuesEncoding - encoding of values
        metadataBlockEncryptor - encryptor for block data
        pageHeaderAAD - pageHeader AAD
        Throws:
        IOException - if there is an error while writing
      • writeDataPageV2

        public void writeDataPageV2​(int rowCount,
                                    int nullCount,
                                    int valueCount,
                                    org.apache.parquet.bytes.BytesInput repetitionLevels,
                                    org.apache.parquet.bytes.BytesInput definitionLevels,
                                    org.apache.parquet.column.Encoding dataEncoding,
                                    org.apache.parquet.bytes.BytesInput compressedData,
                                    int uncompressedDataSize,
                                    org.apache.parquet.column.statistics.Statistics<?> statistics)
                             throws IOException
        Writes a single v2 data page
        Parameters:
        rowCount - count of rows
        nullCount - count of nulls
        valueCount - count of values
        repetitionLevels - repetition level bytes
        definitionLevels - definition level bytes
        dataEncoding - encoding for data
        compressedData - compressed data bytes
        uncompressedDataSize - the size of uncompressed data
        statistics - the statistics of the page
        Throws:
        IOException - if any I/O error occurs during writing the file
      • endColumn

        public void endColumn()
                       throws IOException
        end a column (once all rep, def and data have been written)
        Throws:
        IOException - if there is an error while writing
      • endBlock

        public void endBlock()
                      throws IOException
        ends a block once all column chunks have been written
        Throws:
        IOException - if there is an error while writing
      • appendFile

        @Deprecated
        public void appendFile​(org.apache.hadoop.conf.Configuration conf,
                               org.apache.hadoop.fs.Path file)
                        throws IOException
        Deprecated.
        will be removed in 2.0.0; use appendFile(InputFile) instead
        Parameters:
        conf - a configuration
        file - a file path to append the contents of to this file
        Throws:
        IOException - if there is an error while reading or writing
      • appendFile

        public void appendFile​(org.apache.parquet.io.InputFile file)
                        throws IOException
        Throws:
        IOException
      • appendRowGroups

        @Deprecated
        public void appendRowGroups​(org.apache.hadoop.fs.FSDataInputStream file,
                                    List<BlockMetaData> rowGroups,
                                    boolean dropColumns)
                             throws IOException
        Deprecated.
        will be removed in 2.0.0; use appendRowGroups(SeekableInputStream,List,boolean) instead
        Parameters:
        file - a file stream to read from
        rowGroups - row groups to copy
        dropColumns - whether to drop columns from the file that are not in this file's schema
        Throws:
        IOException - if there is an error while reading or writing
      • appendRowGroup

        public void appendRowGroup​(org.apache.parquet.io.SeekableInputStream from,
                                   BlockMetaData rowGroup,
                                   boolean dropColumns)
                            throws IOException
        Throws:
        IOException
      • appendColumnChunk

        public void appendColumnChunk​(org.apache.parquet.column.ColumnDescriptor descriptor,
                                      org.apache.parquet.io.SeekableInputStream from,
                                      ColumnChunkMetaData chunk,
                                      org.apache.parquet.column.values.bloomfilter.BloomFilter bloomFilter,
                                      org.apache.parquet.internal.column.columnindex.ColumnIndex columnIndex,
                                      org.apache.parquet.internal.column.columnindex.OffsetIndex offsetIndex)
                               throws IOException
        Parameters:
        descriptor - the descriptor for the target column
        from - a file stream to read from
        chunk - the column chunk to be copied
        bloomFilter - the bloomFilter for this chunk
        columnIndex - the column index for this chunk
        offsetIndex - the offset index for this chunk
        Throws:
        IOException
      • end

        public void end​(Map<String,​String> extraMetaData)
                 throws IOException
        ends a file once all blocks have been written. closes the file.
        Parameters:
        extraMetaData - the extra meta data to write in the footer
        Throws:
        IOException - if there is an error while writing
      • mergeMetadataFiles

        @Deprecated
        public static ParquetMetadata mergeMetadataFiles​(List<org.apache.hadoop.fs.Path> files,
                                                         org.apache.hadoop.conf.Configuration conf)
                                                  throws IOException
        Deprecated.
        metadata files are not recommended and will be removed in 2.0.0
        Given a list of metadata files, merge them into a single ParquetMetadata Requires that the schemas be compatible, and the extraMetadata be exactly equal.
        Parameters:
        files - a list of files to merge metadata from
        conf - a configuration
        Returns:
        merged parquet metadata for the files
        Throws:
        IOException - if there is an error while writing
      • mergeMetadataFiles

        @Deprecated
        public static ParquetMetadata mergeMetadataFiles​(List<org.apache.hadoop.fs.Path> files,
                                                         org.apache.hadoop.conf.Configuration conf,
                                                         KeyValueMetadataMergeStrategy keyValueMetadataMergeStrategy)
                                                  throws IOException
        Deprecated.
        metadata files are not recommended and will be removed in 2.0.0
        Given a list of metadata files, merge them into a single ParquetMetadata Requires that the schemas be compatible, and the extraMetadata be exactly equal.
        Parameters:
        files - a list of files to merge metadata from
        conf - a configuration
        keyValueMetadataMergeStrategy - strategy to merge values for same key, if there are multiple
        Returns:
        merged parquet metadata for the files
        Throws:
        IOException - if there is an error while writing
      • writeMergedMetadataFile

        @Deprecated
        public static void writeMergedMetadataFile​(List<org.apache.hadoop.fs.Path> files,
                                                   org.apache.hadoop.fs.Path outputPath,
                                                   org.apache.hadoop.conf.Configuration conf)
                                            throws IOException
        Deprecated.
        metadata files are not recommended and will be removed in 2.0.0
        Given a list of metadata files, merge them into a single metadata file. Requires that the schemas be compatible, and the extraMetaData be exactly equal. This is useful when merging 2 directories of parquet files into a single directory, as long as both directories were written with compatible schemas and equal extraMetaData.
        Parameters:
        files - a list of files to merge metadata from
        outputPath - path to write merged metadata to
        conf - a configuration
        Throws:
        IOException - if there is an error while reading or writing
      • writeMetadataFile

        @Deprecated
        public static void writeMetadataFile​(org.apache.hadoop.conf.Configuration configuration,
                                             org.apache.hadoop.fs.Path outputPath,
                                             List<Footer> footers)
                                      throws IOException
        Deprecated.
        metadata files are not recommended and will be removed in 2.0.0
        writes a _metadata and _common_metadata file
        Parameters:
        configuration - the configuration to use to get the FileSystem
        outputPath - the directory to write the _metadata file to
        footers - the list of footers to merge
        Throws:
        IOException - if there is an error while writing
      • writeMetadataFile

        @Deprecated
        public static void writeMetadataFile​(org.apache.hadoop.conf.Configuration configuration,
                                             org.apache.hadoop.fs.Path outputPath,
                                             List<Footer> footers,
                                             ParquetOutputFormat.JobSummaryLevel level)
                                      throws IOException
        Deprecated.
        metadata files are not recommended and will be removed in 2.0.0
        writes _common_metadata file, and optionally a _metadata file depending on the ParquetOutputFormat.JobSummaryLevel provided
        Parameters:
        configuration - the configuration to use to get the FileSystem
        outputPath - the directory to write the _metadata file to
        footers - the list of footers to merge
        level - level of summary to write
        Throws:
        IOException - if there is an error while writing
      • getPos

        public long getPos()
                    throws IOException
        Returns:
        the current position in the underlying file
        Throws:
        IOException - if there is an error while getting the current stream's position