Packages

c

org.apache.spark.sql.execution.datasources

DynamicPartitionDataConcurrentWriter

class DynamicPartitionDataConcurrentWriter extends BaseDynamicPartitionDataWriter with Logging

Dynamic partition writer with concurrent writers, meaning multiple concurrent writers are opened for writing.

The process has the following steps:

  • Step 1: Maintain a map of output writers per each partition and/or bucket columns. Keep all writers opened and write rows one by one.
  • Step 2: If number of concurrent writers exceeds limit, sort rest of rows on partition and/or bucket column(s). Write rows one by one, and eagerly close the writer when finishing each partition and/or bucket.

Caller is expected to call writeWithIterator() instead of write() to write records.

Linear Supertypes
Logging, BaseDynamicPartitionDataWriter, FileFormatDataWriter, DataWriter[InternalRow], Closeable, AutoCloseable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DynamicPartitionDataConcurrentWriter
  2. Logging
  3. BaseDynamicPartitionDataWriter
  4. FileFormatDataWriter
  5. DataWriter
  6. Closeable
  7. AutoCloseable
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new DynamicPartitionDataConcurrentWriter(description: WriteJobDescription, taskAttemptContext: TaskAttemptContext, committer: FileCommitProtocol, concurrentOutputWriterSpec: ConcurrentOutputWriterSpec, customMetrics: Map[String, SQLMetric] = Map.empty)

Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes
    Logging

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val MAX_FILE_COUNTER: Int

    Max number of files a single task writes out due to file size.

    Max number of files a single task writes out due to file size. In most cases the number of files written should be very small. This is just a safe guard to protect some really bad settings, e.g. maxRecordsPerFile = 1.

    Attributes
    protected
    Definition Classes
    FileFormatDataWriter
  5. final def abort(): Unit
    Definition Classes
    FileFormatDataWriter → DataWriter
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  8. final def close(): Unit
    Definition Classes
    FileFormatDataWriter → Closeable → AutoCloseable
  9. final def commit(): WriteTaskResult

    Returns the summary of relative information which includes the list of partition strings written out.

    Returns the summary of relative information which includes the list of partition strings written out. The list of partitions is sent back to the driver and used to update the catalog. Other information will be sent back to the driver too and used to e.g. update the metrics in UI.

    Definition Classes
    FileFormatDataWriter → DataWriter
  10. def currentMetricsValues(): Array[CustomTaskMetric]
    Definition Classes
    DataWriter
  11. var currentWriter: OutputWriter
    Attributes
    protected
    Definition Classes
    FileFormatDataWriter
  12. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  14. var fileCounter: Int

    File counter for writing current partition or bucket.

    File counter for writing current partition or bucket. For same partition or bucket, we may have more than one file, due to number of records limit per file.

    Attributes
    protected
    Definition Classes
    BaseDynamicPartitionDataWriter
  15. lazy val getBucketId: (InternalRow) => Int

    Given an input row, returns the corresponding bucketId

    Given an input row, returns the corresponding bucketId

    Attributes
    protected
    Definition Classes
    BaseDynamicPartitionDataWriter
  16. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  17. val getOutputRow: UnsafeProjection

    Returns the data columns to be written given an input row

    Returns the data columns to be written given an input row

    Attributes
    protected
    Definition Classes
    BaseDynamicPartitionDataWriter
  18. lazy val getPartitionValues: (InternalRow) => UnsafeRow

    Extracts the partition values out of an input row.

    Extracts the partition values out of an input row.

    Attributes
    protected
    Definition Classes
    BaseDynamicPartitionDataWriter
  19. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  20. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  21. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  22. val isBucketed: Boolean

    Flag saying whether or not the data to be written out is bucketed.

    Flag saying whether or not the data to be written out is bucketed.

    Attributes
    protected
    Definition Classes
    BaseDynamicPartitionDataWriter
  23. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  24. val isPartitioned: Boolean

    Flag saying whether or not the data to be written out is partitioned.

    Flag saying whether or not the data to be written out is partitioned.

    Attributes
    protected
    Definition Classes
    BaseDynamicPartitionDataWriter
  25. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  26. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  27. def logDebug(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  28. def logDebug(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. def logDebug(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def logDebug(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  31. def logError(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  32. def logError(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  33. def logError(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  34. def logError(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  35. def logInfo(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  36. def logInfo(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  37. def logInfo(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  38. def logInfo(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  39. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  40. def logTrace(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  41. def logTrace(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  42. def logTrace(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  43. def logTrace(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  44. def logWarning(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  45. def logWarning(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  46. def logWarning(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  47. def logWarning(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  48. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  49. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  50. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  51. var recordsInFile: Long

    Number of records in current file.

    Number of records in current file.

    Attributes
    protected
    Definition Classes
    BaseDynamicPartitionDataWriter
  52. def releaseCurrentWriter(): Unit

    Release resources of currentWriter.

    Release resources of currentWriter.

    Attributes
    protected
    Definition Classes
    FileFormatDataWriter
  53. def releaseResources(): Unit

    Release resources for all concurrent output writers.

    Release resources for all concurrent output writers.

    Attributes
    protected
    Definition Classes
    DynamicPartitionDataConcurrentWriterFileFormatDataWriter
  54. def renewCurrentWriter(partitionValues: Option[InternalRow], bucketId: Option[Int], closeCurrentWriter: Boolean): Unit

    Opens a new OutputWriter given a partition key and/or a bucket id.

    Opens a new OutputWriter given a partition key and/or a bucket id. If bucket id is specified, we will append it to the end of the file name, but before the file extension, e.g. part-r-00009-ea518ad4-455a-4431-b471-d24e03814677-00002.gz.parquet

    partitionValues

    the partition which all tuples being written by this OutputWriter belong to

    bucketId

    the bucket which all tuples being written by this OutputWriter belong to

    closeCurrentWriter

    close and release resource for current writer

    Attributes
    protected
    Definition Classes
    BaseDynamicPartitionDataWriter
  55. def renewCurrentWriterIfTooManyRecords(partitionValues: Option[InternalRow], bucketId: Option[Int]): Unit

    Open a new output writer when number of records exceeding limit.

    Open a new output writer when number of records exceeding limit.

    partitionValues

    the partition which all tuples being written by this OutputWriter belong to

    bucketId

    the bucket which all tuples being written by this OutputWriter belong to

    Attributes
    protected
    Definition Classes
    BaseDynamicPartitionDataWriter
  56. val statsTrackers: Seq[WriteTaskStatsTracker]

    Trackers for computing various statistics on the data as it's being written out.

    Trackers for computing various statistics on the data as it's being written out.

    Attributes
    protected
    Definition Classes
    FileFormatDataWriter
  57. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  58. def toString(): String
    Definition Classes
    AnyRef → Any
  59. val updatedPartitions: Set[String]
    Attributes
    protected
    Definition Classes
    FileFormatDataWriter
  60. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  61. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  62. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  63. def withLogContext(context: HashMap[String, String])(body: => Unit): Unit
    Attributes
    protected
    Definition Classes
    Logging
  64. def write(record: InternalRow): Unit

    Writes a record.

    Writes a record.

    Definition Classes
    DynamicPartitionDataConcurrentWriterFileFormatDataWriter → DataWriter
  65. def writeAll(arg0: Iterator[InternalRow]): Unit
    Definition Classes
    DataWriter
    Annotations
    @throws(classOf[java.io.IOException])
  66. def writeRecord(record: InternalRow): Unit

    Writes the given record with current writer.

    Writes the given record with current writer.

    record

    The record to write

    Attributes
    protected
    Definition Classes
    BaseDynamicPartitionDataWriter
  67. def writeWithIterator(iterator: Iterator[InternalRow]): Unit

    Write iterator of records with concurrent writers.

    Write iterator of records with concurrent writers.

    Definition Classes
    DynamicPartitionDataConcurrentWriterFileFormatDataWriter
  68. final def writeWithMetrics(record: InternalRow, count: Long): Unit
    Definition Classes
    FileFormatDataWriter

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from Logging

Inherited from FileFormatDataWriter

Inherited from DataWriter[InternalRow]

Inherited from Closeable

Inherited from AutoCloseable

Inherited from AnyRef

Inherited from Any

Ungrouped