Packages

t

org.apache.spark.sql.execution.datasources

WriteTaskStatsTracker

trait WriteTaskStatsTracker extends AnyRef

A trait for classes that are capable of collecting statistics on data that's being processed by a single write task in FileFormatWriter - i.e. there should be one instance per executor.

newPartition event is only triggered if the relation to be written out is partitioned.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. WriteTaskStatsTracker
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Abstract Value Members

  1. abstract def closeFile(filePath: String): Unit

    Process the fact that a file is finished to be written and closed.

    Process the fact that a file is finished to be written and closed.

    filePath

    Path of the file.

  2. abstract def getFinalStats(taskCommitTime: Long): WriteTaskStats

    Returns the final statistics computed so far.

    Returns the final statistics computed so far.

    taskCommitTime

    Time of committing the task.

    returns

    An object of subtype of WriteTaskStats, to be sent to the driver.

    Note

    This may only be called once. Further use of the object may lead to undefined behavior.

  3. abstract def newFile(filePath: String): Unit

    Process the fact that a new file is about to be written.

    Process the fact that a new file is about to be written.

    filePath

    Path of the file into which future rows will be written.

  4. abstract def newPartition(partitionValues: InternalRow): Unit

    Process the fact that a new partition is about to be written.

    Process the fact that a new partition is about to be written. Only triggered when the relation is partitioned by a (non-empty) sequence of columns.

    partitionValues

    The values that define this new partition.

  5. abstract def newRow(filePath: String, row: InternalRow): Unit

    Process the fact that a new row to update the tracked statistics accordingly.

    Process the fact that a new row to update the tracked statistics accordingly.

    filePath

    Path of the file which the row is written to.

    row

    Current data row to be processed.

    Note

    Keep in mind that any overhead here is per-row, obviously, so implementations should be as lightweight as possible.