com.indix.utils.spark.parquet

DirectParquetOutputCommitter

class DirectParquetOutputCommitter extends ParquetOutputCommitter

An output committer for writing Parquet files. In stead of writing to the _temporary folder like what parquet.hadoop.ParquetOutputCommitter does, this output committer writes data directly to the destination folder. This can be useful for data stored in S3, where directory operations are relatively expensive.

To enable this output committer, users may set the "spark.sql.parquet.output.committer.class" property via Hadoop org.apache.hadoop.conf.Configuration. Not that this property overrides "spark.sql.sources.outputCommitterClass".

*NOTE*

NEVER use DirectParquetOutputCommitter when appending data, because currently there's no safe way undo a failed appending job (that's why both abortTask() and abortJob() are left empty).

Linear Supertypes
ParquetOutputCommitter, FileOutputCommitter, OutputCommitter, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DirectParquetOutputCommitter
  2. ParquetOutputCommitter
  3. FileOutputCommitter
  4. OutputCommitter
  5. AnyRef
  6. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DirectParquetOutputCommitter(outputPath: Path, context: TaskAttemptContext)

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. val LOG: Log

  7. def abortJob(arg0: JobContext, arg1: State): Unit

    Definition Classes
    FileOutputCommitter → OutputCommitter
    Annotations
    @throws( classOf[java.io.IOException] )
  8. def abortTask(taskContext: TaskAttemptContext): Unit

    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  9. def abortTask(arg0: TaskAttemptContext, arg1: Path): Unit

    Definition Classes
    FileOutputCommitter
    Annotations
    @Private() @throws( classOf[java.io.IOException] )
  10. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  11. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  12. def commitJob(jobContext: JobContext): Unit

    Definition Classes
    DirectParquetOutputCommitter → ParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  13. def commitTask(taskContext: TaskAttemptContext): Unit

    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  14. def commitTask(arg0: TaskAttemptContext, arg1: Path): Unit

    Definition Classes
    FileOutputCommitter
    Annotations
    @Private() @throws( classOf[java.io.IOException] )
  15. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  16. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  17. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  18. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  19. def getCommittedTaskPath(arg0: Int, arg1: TaskAttemptContext): Path

    Attributes
    protected[org.apache.hadoop.mapreduce.lib.output]
    Definition Classes
    FileOutputCommitter
  20. def getCommittedTaskPath(arg0: TaskAttemptContext): Path

    Definition Classes
    FileOutputCommitter
  21. def getJobAttemptPath(arg0: Int): Path

    Attributes
    protected[org.apache.hadoop.mapreduce.lib.output]
    Definition Classes
    FileOutputCommitter
  22. def getJobAttemptPath(arg0: JobContext): Path

    Definition Classes
    FileOutputCommitter
  23. def getTaskAttemptPath(arg0: TaskAttemptContext): Path

    Definition Classes
    FileOutputCommitter
  24. def getWorkPath(): Path

    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter
  25. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  26. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  27. def isRecoverySupported(arg0: JobContext): Boolean

    Definition Classes
    OutputCommitter
    Annotations
    @throws( classOf[java.io.IOException] )
  28. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  29. def needsTaskCommit(taskContext: TaskAttemptContext): Boolean

    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  30. def needsTaskCommit(arg0: TaskAttemptContext, arg1: Path): Boolean

    Definition Classes
    FileOutputCommitter
    Annotations
    @Private() @throws( classOf[java.io.IOException] )
  31. final def notify(): Unit

    Definition Classes
    AnyRef
  32. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  33. def recoverTask(arg0: TaskAttemptContext): Unit

    Definition Classes
    FileOutputCommitter → OutputCommitter
    Annotations
    @throws( classOf[java.io.IOException] )
  34. def setupJob(jobContext: JobContext): Unit

    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  35. def setupTask(taskContext: TaskAttemptContext): Unit

    Definition Classes
    DirectParquetOutputCommitter → FileOutputCommitter → OutputCommitter
  36. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  37. def toString(): String

    Definition Classes
    AnyRef → Any
  38. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def cleanupJob(arg0: JobContext): Unit

    Definition Classes
    FileOutputCommitter → OutputCommitter
    Annotations
    @Deprecated @deprecated @throws( classOf[java.io.IOException] )
    Deprecated

    (Since version ) see corresponding Javadoc for more information.

  2. def isRecoverySupported(): Boolean

    Definition Classes
    FileOutputCommitter → OutputCommitter
    Annotations
    @Deprecated @deprecated
    Deprecated

    (Since version ) see corresponding Javadoc for more information.

Inherited from ParquetOutputCommitter

Inherited from FileOutputCommitter

Inherited from OutputCommitter

Inherited from AnyRef

Inherited from Any

Ungrouped