trait WriteTaskStatsTracker extends AnyRef
A trait for classes that are capable of collecting statistics on data that's being processed by a single write task in FileFormatWriter - i.e. there should be one instance per executor.
This trait is coupled with the way FileFormatWriter works, in the sense that its methods will be called according to how tuples are being written out to disk, namely in sorted order according to partitionValue(s), then bucketId.
As such, a typical call scenario is:
newPartition -> newBucket -> newFile -> newRow -. ^ |^_ ^| | | || | || ||
newPartition and newBucket events are only triggered if the relation to be written out is partitioned and/or bucketed, respectively.
- Alphabetic
- By Inheritance
- WriteTaskStatsTracker
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Abstract Value Members
-
abstract
def
getFinalStats(): WriteTaskStats
Returns the final statistics computed so far.
Returns the final statistics computed so far.
- returns
An object of subtype of WriteTaskStats, to be sent to the driver.
- Note
This may only be called once. Further use of the object may lead to undefined behavior.
-
abstract
def
newBucket(bucketId: Int): Unit
Process the fact that a new bucket is about to written.
Process the fact that a new bucket is about to written. Only triggered when the relation is bucketed by a (non-empty) sequence of columns.
- bucketId
The bucket number.
-
abstract
def
newFile(filePath: String): Unit
Process the fact that a new file is about to be written.
Process the fact that a new file is about to be written.
- filePath
Path of the file into which future rows will be written.
-
abstract
def
newPartition(partitionValues: InternalRow): Unit
Process the fact that a new partition is about to be written.
Process the fact that a new partition is about to be written. Only triggered when the relation is partitioned by a (non-empty) sequence of columns.
- partitionValues
The values that define this new partition.
-
abstract
def
newRow(row: InternalRow): Unit
Process the fact that a new row to update the tracked statistics accordingly.
Process the fact that a new row to update the tracked statistics accordingly. The row will be written to the most recently witnessed file (via
newFile
).- row
Current data row to be processed.
- Note
Keep in mind that any overhead here is per-row, obviously, so implementations should be as lightweight as possible.
Concrete Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()