class DynamicPartitionDataConcurrentWriter extends BaseDynamicPartitionDataWriter with Logging
Dynamic partition writer with concurrent writers, meaning multiple concurrent writers are opened for writing.
The process has the following steps:
- Step 1: Maintain a map of output writers per each partition and/or bucket columns. Keep all writers opened and write rows one by one.
- Step 2: If number of concurrent writers exceeds limit, sort rest of rows on partition and/or bucket column(s). Write rows one by one, and eagerly close the writer when finishing each partition and/or bucket.
Caller is expected to call writeWithIterator()
instead of write()
to write records.
- Alphabetic
- By Inheritance
- DynamicPartitionDataConcurrentWriter
- Logging
- BaseDynamicPartitionDataWriter
- FileFormatDataWriter
- DataWriter
- Closeable
- AutoCloseable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new DynamicPartitionDataConcurrentWriter(description: WriteJobDescription, taskAttemptContext: TaskAttemptContext, committer: FileCommitProtocol, concurrentOutputWriterSpec: ConcurrentOutputWriterSpec, customMetrics: Map[String, SQLMetric] = Map.empty)
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val MAX_FILE_COUNTER: Int
Max number of files a single task writes out due to file size.
Max number of files a single task writes out due to file size. In most cases the number of files written should be very small. This is just a safe guard to protect some really bad settings, e.g. maxRecordsPerFile = 1.
- Attributes
- protected
- Definition Classes
- FileFormatDataWriter
- def abort(): Unit
- Definition Classes
- FileFormatDataWriter → DataWriter
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def close(): Unit
- Definition Classes
- FileFormatDataWriter → Closeable → AutoCloseable
- def commit(): WriteTaskResult
Returns the summary of relative information which includes the list of partition strings written out.
Returns the summary of relative information which includes the list of partition strings written out. The list of partitions is sent back to the driver and used to update the catalog. Other information will be sent back to the driver too and used to e.g. update the metrics in UI.
- Definition Classes
- FileFormatDataWriter → DataWriter
- def currentMetricsValues(): Array[CustomTaskMetric]
- Definition Classes
- DataWriter
- var currentWriter: OutputWriter
- Attributes
- protected
- Definition Classes
- FileFormatDataWriter
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- var fileCounter: Int
File counter for writing current partition or bucket.
File counter for writing current partition or bucket. For same partition or bucket, we may have more than one file, due to number of records limit per file.
- Attributes
- protected
- Definition Classes
- BaseDynamicPartitionDataWriter
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- lazy val getBucketId: (InternalRow) => Int
Given an input row, returns the corresponding
bucketId
Given an input row, returns the corresponding
bucketId
- Attributes
- protected
- Definition Classes
- BaseDynamicPartitionDataWriter
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- val getOutputRow: UnsafeProjection
Returns the data columns to be written given an input row
Returns the data columns to be written given an input row
- Attributes
- protected
- Definition Classes
- BaseDynamicPartitionDataWriter
- lazy val getPartitionValues: (InternalRow) => UnsafeRow
Extracts the partition values out of an input row.
Extracts the partition values out of an input row.
- Attributes
- protected
- Definition Classes
- BaseDynamicPartitionDataWriter
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- val isBucketed: Boolean
Flag saying whether or not the data to be written out is bucketed.
Flag saying whether or not the data to be written out is bucketed.
- Attributes
- protected
- Definition Classes
- BaseDynamicPartitionDataWriter
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val isPartitioned: Boolean
Flag saying whether or not the data to be written out is partitioned.
Flag saying whether or not the data to be written out is partitioned.
- Attributes
- protected
- Definition Classes
- BaseDynamicPartitionDataWriter
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- var recordsInFile: Long
Number of records in current file.
Number of records in current file.
- Attributes
- protected
- Definition Classes
- BaseDynamicPartitionDataWriter
- def releaseCurrentWriter(): Unit
Release resources of
currentWriter
.Release resources of
currentWriter
.- Attributes
- protected
- Definition Classes
- FileFormatDataWriter
- def releaseResources(): Unit
Release resources for all concurrent output writers.
Release resources for all concurrent output writers.
- Attributes
- protected
- Definition Classes
- DynamicPartitionDataConcurrentWriter → FileFormatDataWriter
- def renewCurrentWriter(partitionValues: Option[InternalRow], bucketId: Option[Int], closeCurrentWriter: Boolean): Unit
Opens a new OutputWriter given a partition key and/or a bucket id.
Opens a new OutputWriter given a partition key and/or a bucket id. If bucket id is specified, we will append it to the end of the file name, but before the file extension, e.g. part-r-00009-ea518ad4-455a-4431-b471-d24e03814677-00002.gz.parquet
- partitionValues
the partition which all tuples being written by this OutputWriter belong to
- bucketId
the bucket which all tuples being written by this OutputWriter belong to
- closeCurrentWriter
close and release resource for current writer
- Attributes
- protected
- Definition Classes
- BaseDynamicPartitionDataWriter
- def renewCurrentWriterIfTooManyRecords(partitionValues: Option[InternalRow], bucketId: Option[Int]): Unit
Open a new output writer when number of records exceeding limit.
Open a new output writer when number of records exceeding limit.
- partitionValues
the partition which all tuples being written by this
OutputWriter
belong to- bucketId
the bucket which all tuples being written by this
OutputWriter
belong to
- Attributes
- protected
- Definition Classes
- BaseDynamicPartitionDataWriter
- val statsTrackers: Seq[WriteTaskStatsTracker]
Trackers for computing various statistics on the data as it's being written out.
Trackers for computing various statistics on the data as it's being written out.
- Attributes
- protected
- Definition Classes
- FileFormatDataWriter
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- val updatedPartitions: Set[String]
- Attributes
- protected
- Definition Classes
- FileFormatDataWriter
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def write(record: InternalRow): Unit
Writes a record.
Writes a record.
- Definition Classes
- DynamicPartitionDataConcurrentWriter → FileFormatDataWriter → DataWriter
- def writeRecord(record: InternalRow): Unit
Writes the given record with current writer.
Writes the given record with current writer.
- record
The record to write
- Attributes
- protected
- Definition Classes
- BaseDynamicPartitionDataWriter
- def writeWithIterator(iterator: Iterator[InternalRow]): Unit
Write iterator of records with concurrent writers.
Write iterator of records with concurrent writers.
- Definition Classes
- DynamicPartitionDataConcurrentWriter → FileFormatDataWriter
- def writeWithMetrics(record: InternalRow, count: Long): Unit
- Definition Classes
- FileFormatDataWriter