Package

com.indix.utils.spark

parquet

Permalink

package parquet

Visibility
  1. Public
  2. All

Type Members

  1. class DirectParquetOutputCommitter extends ParquetOutputCommitter

    Permalink

    An output committer for writing Parquet files.

    An output committer for writing Parquet files. In stead of writing to the _temporary folder like what parquet.hadoop.ParquetOutputCommitter does, this output committer writes data directly to the destination folder. This can be useful for data stored in S3, where directory operations are relatively expensive.

    To enable this output committer, users may set the "spark.sql.parquet.output.committer.class" property via Hadoop org.apache.hadoop.conf.Configuration. Not that this property overrides "spark.sql.sources.outputCommitterClass".

    *NOTE*

    NEVER use DirectParquetOutputCommitter when appending data, because currently there's no safe way undo a failed appending job (that's why both abortTask() and abortJob() are left empty).

Value Members

  1. package avro

    Permalink

Ungrouped