Class

com.indix.utils.spark.parquet

DirectParquetOutputCommitter

Related Doc: package parquet

Permalink

class DirectParquetOutputCommitter extends ParquetOutputCommitter

An output committer for writing Parquet files. In stead of writing to the _temporary folder like what parquet.hadoop.ParquetOutputCommitter does, this output committer writes data directly to the destination folder. This can be useful for data stored in S3, where directory operations are relatively expensive.

To enable this output committer, users may set the "spark.sql.parquet.output.committer.class" property via Hadoop org.apache.hadoop.conf.Configuration. Not that this property overrides "spark.sql.sources.outputCommitterClass".

*NOTE*

NEVER use DirectParquetOutputCommitter when appending data, because currently there's no safe way undo a failed appending job (that's why both abortTask() and abortJob() are left empty).

Linear Supertypes
ParquetOutputCommitter, FileOutputCommitter, OutputCommitter, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DirectParquetOutputCommitter
  2. ParquetOutputCommitter
  3. FileOutputCommitter
  4. OutputCommitter
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DirectParquetOutputCommitter(outputPath: Path, context: TaskAttemptContext)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. val LOG: Log

    Permalink
  5. def abortJob(arg0: JobContext, arg1: State): Unit

    Permalink
    Definition Classes
    FileOutputCommitter → OutputCommitter
    Annotations
    @throws( classOf[java.io.IOException] )
  6. def abortTask(taskContext: TaskAttemptContext): Unit

    Permalink
    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  7. def abortTask(arg0: TaskAttemptContext, arg1: Path): Unit

    Permalink
    Definition Classes
    FileOutputCommitter
    Annotations
    @Private() @throws( classOf[java.io.IOException] )
  8. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  9. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. def commitJob(jobContext: JobContext): Unit

    Permalink
    Definition Classes
    DirectParquetOutputCommitter → ParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  11. def commitTask(taskContext: TaskAttemptContext): Unit

    Permalink
    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  12. def commitTask(arg0: TaskAttemptContext, arg1: Path): Unit

    Permalink
    Definition Classes
    FileOutputCommitter
    Annotations
    @Private() @throws( classOf[java.io.IOException] )
  13. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  15. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  17. def getCommittedTaskPath(arg0: Int, arg1: TaskAttemptContext): Path

    Permalink
    Attributes
    protected[org.apache.hadoop.mapreduce.lib.output]
    Definition Classes
    FileOutputCommitter
  18. def getCommittedTaskPath(arg0: TaskAttemptContext): Path

    Permalink
    Definition Classes
    FileOutputCommitter
  19. def getJobAttemptPath(arg0: Int): Path

    Permalink
    Attributes
    protected[org.apache.hadoop.mapreduce.lib.output]
    Definition Classes
    FileOutputCommitter
  20. def getJobAttemptPath(arg0: JobContext): Path

    Permalink
    Definition Classes
    FileOutputCommitter
  21. def getTaskAttemptPath(arg0: TaskAttemptContext): Path

    Permalink
    Definition Classes
    FileOutputCommitter
  22. def getWorkPath(): Path

    Permalink
    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter
  23. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  24. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  25. def isRecoverySupported(arg0: JobContext): Boolean

    Permalink
    Definition Classes
    OutputCommitter
    Annotations
    @throws( classOf[java.io.IOException] )
  26. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  27. def needsTaskCommit(taskContext: TaskAttemptContext): Boolean

    Permalink
    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  28. def needsTaskCommit(arg0: TaskAttemptContext, arg1: Path): Boolean

    Permalink
    Definition Classes
    FileOutputCommitter
    Annotations
    @Private() @throws( classOf[java.io.IOException] )
  29. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  30. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  31. def recoverTask(arg0: TaskAttemptContext): Unit

    Permalink
    Definition Classes
    FileOutputCommitter → OutputCommitter
    Annotations
    @throws( classOf[java.io.IOException] )
  32. def setupJob(jobContext: JobContext): Unit

    Permalink
    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  33. def setupTask(taskContext: TaskAttemptContext): Unit

    Permalink
    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  34. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  35. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  36. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def cleanupJob(arg0: JobContext): Unit

    Permalink
    Definition Classes
    FileOutputCommitter → OutputCommitter
    Annotations
    @Deprecated @deprecated @throws( classOf[java.io.IOException] )
    Deprecated

    (Since version ) see corresponding Javadoc for more information.

  2. def isRecoverySupported(): Boolean

    Permalink
    Definition Classes
    FileOutputCommitter → OutputCommitter
    Annotations
    @Deprecated @deprecated
    Deprecated

    (Since version ) see corresponding Javadoc for more information.

Inherited from ParquetOutputCommitter

Inherited from FileOutputCommitter

Inherited from OutputCommitter

Inherited from AnyRef

Inherited from Any

Ungrouped