trait WriteTaskStatsTracker extends AnyRef
A trait for classes that are capable of collecting statistics on data that's being processed by a single write task in FileFormatWriter - i.e. there should be one instance per executor.
newPartition event is only triggered if the relation to be written out is partitioned.
- Alphabetic
- By Inheritance
- WriteTaskStatsTracker
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Abstract Value Members
- abstract def closeFile(filePath: String): Unit
Process the fact that a file is finished to be written and closed.
Process the fact that a file is finished to be written and closed.
- filePath
Path of the file.
- abstract def getFinalStats(taskCommitTime: Long): WriteTaskStats
Returns the final statistics computed so far.
Returns the final statistics computed so far.
- taskCommitTime
Time of committing the task.
- returns
An object of subtype of WriteTaskStats, to be sent to the driver.
- Note
This may only be called once. Further use of the object may lead to undefined behavior.
- abstract def newFile(filePath: String): Unit
Process the fact that a new file is about to be written.
Process the fact that a new file is about to be written.
- filePath
Path of the file into which future rows will be written.
- abstract def newPartition(partitionValues: InternalRow): Unit
Process the fact that a new partition is about to be written.
Process the fact that a new partition is about to be written. Only triggered when the relation is partitioned by a (non-empty) sequence of columns.
- partitionValues
The values that define this new partition.
- abstract def newRow(filePath: String, row: InternalRow): Unit
Process the fact that a new row to update the tracked statistics accordingly.
Process the fact that a new row to update the tracked statistics accordingly.
- filePath
Path of the file which the row is written to.
- row
Current data row to be processed.
- Note
Keep in mind that any overhead here is per-row, obviously, so implementations should be as lightweight as possible.