Packages

c

org.apache.spark.sql.execution.datasources

DynamicPartitionDataConcurrentWriter

class DynamicPartitionDataConcurrentWriter extends BaseDynamicPartitionDataWriter with Logging

Dynamic partition writer with concurrent writers, meaning multiple concurrent writers are opened for writing.

The process has the following steps:

  • Step 1: Maintain a map of output writers per each partition and/or bucket columns. Keep all writers opened and write rows one by one.
  • Step 2: If number of concurrent writers exceeds limit, sort rest of rows on partition and/or bucket column(s). Write rows one by one, and eagerly close the writer when finishing each partition and/or bucket.

Caller is expected to call writeWithIterator() instead of write() to write records.

Linear Supertypes
Logging, BaseDynamicPartitionDataWriter, FileFormatDataWriter, DataWriter[InternalRow], Closeable, AutoCloseable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DynamicPartitionDataConcurrentWriter
  2. Logging
  3. BaseDynamicPartitionDataWriter
  4. FileFormatDataWriter
  5. DataWriter
  6. Closeable
  7. AutoCloseable
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new DynamicPartitionDataConcurrentWriter(description: WriteJobDescription, taskAttemptContext: TaskAttemptContext, committer: FileCommitProtocol, concurrentOutputWriterSpec: ConcurrentOutputWriterSpec, customMetrics: Map[String, SQLMetric] = Map.empty)

Value Members

  1. def abort(): Unit
    Definition Classes
    FileFormatDataWriter → DataWriter
  2. def close(): Unit
    Definition Classes
    FileFormatDataWriter → Closeable → AutoCloseable
  3. def commit(): WriteTaskResult

    Returns the summary of relative information which includes the list of partition strings written out.

    Returns the summary of relative information which includes the list of partition strings written out. The list of partitions is sent back to the driver and used to update the catalog. Other information will be sent back to the driver too and used to e.g. update the metrics in UI.

    Definition Classes
    FileFormatDataWriter → DataWriter
  4. def currentMetricsValues(): Array[CustomTaskMetric]
    Definition Classes
    DataWriter
  5. def write(record: InternalRow): Unit

    Writes a record.

    Writes a record.

    Definition Classes
    DynamicPartitionDataConcurrentWriterFileFormatDataWriter → DataWriter
  6. def writeWithIterator(iterator: Iterator[InternalRow]): Unit

    Write iterator of records with concurrent writers.

    Write iterator of records with concurrent writers.

    Definition Classes
    DynamicPartitionDataConcurrentWriterFileFormatDataWriter
  7. def writeWithMetrics(record: InternalRow, count: Long): Unit
    Definition Classes
    FileFormatDataWriter