Interface FileOutputFormatBuilder.OutputOptions<T>

    • Method Detail

      • compression

        FileOutputFormatBuilder.OutputOptions<T> compression​(String compressionType)
        Sets the compression type to use for data blocks, overriding the default. Specifying a compression may require additional libraries to be available to your Job.
        Parameters:
        compressionType - one of "none", "gz", "bzip2", "lzo", "lz4", "snappy", or "zstd"
      • dataBlockSize

        FileOutputFormatBuilder.OutputOptions<T> dataBlockSize​(long dataBlockSize)
        Sets the size for data blocks within each file.
        Data blocks are a span of key/value pairs stored in the file that are compressed and indexed as a group.

        Making this value smaller may increase seek performance, but at the cost of increasing the size of the indexes (which can also affect seek performance).

        Parameters:
        dataBlockSize - the block size, in bytes
      • fileBlockSize

        FileOutputFormatBuilder.OutputOptions<T> fileBlockSize​(long fileBlockSize)
        Sets the size for file blocks in the file system; file blocks are managed, and replicated, by the underlying file system.
        Parameters:
        fileBlockSize - the block size, in bytes
      • indexBlockSize

        FileOutputFormatBuilder.OutputOptions<T> indexBlockSize​(long indexBlockSize)
        Sets the size for index blocks within each file; smaller blocks means a deeper index hierarchy within the file, while larger blocks mean a more shallow index hierarchy within the file. This can affect the performance of queries.
        Parameters:
        indexBlockSize - the block size, in bytes
      • replication

        FileOutputFormatBuilder.OutputOptions<T> replication​(int replication)
        Sets the file system replication factor for the resulting file, overriding the file system default.
        Parameters:
        replication - the number of replicas for produced files
      • sampler

        FileOutputFormatBuilder.OutputOptions<T> sampler​(SamplerConfiguration samplerConfig)
        Specify a sampler to be used when writing out data. This will result in the output file having sample data.
        Parameters:
        samplerConfig - The configuration for creating sample data in the output file.
      • store

        void store​(T job)
        Finish configuring, verify and serialize options into the Job or JobConf