Packages

class ApplyInPandasWithStateWriter extends BaseStreamingArrowWriter

This class abstracts the complexity on constructing Arrow RecordBatches for data and state with bin-packing and chunking. The caller only need to call the proper public methods of this class startNewGroup, writeRow, finalizeGroup, finalizeData and this class will write the data and state into Arrow RecordBatches with performing bin-pack and chunk internally.

This class requires that the parameter root has been initialized with the Arrow schema like below: - data fields - state field

  • nested schema (Refer ApplyInPandasWithStateWriter.STATE_METADATA_SCHEMA)

Please refer the code comment in the implementation to see how the writes of data and state against Arrow RecordBatch work with consideration of bin-packing and chunking.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ApplyInPandasWithStateWriter
  2. BaseStreamingArrowWriter
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new ApplyInPandasWithStateWriter(root: VectorSchemaRoot, writer: ArrowStreamWriter, arrowMaxRecordsPerBatch: Int)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val arrowWriterForData: ArrowWriter
    Attributes
    protected
    Definition Classes
    ApplyInPandasWithStateWriterBaseStreamingArrowWriter
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  7. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  8. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  9. def finalizeCurrentArrowBatch(): Unit

    Finalizes the current batch and writes it to the Arrow stream.

    Finalizes the current batch and writes it to the Arrow stream.

    Definition Classes
    ApplyInPandasWithStateWriterBaseStreamingArrowWriter
  10. def finalizeCurrentChunk(isLastChunkForGroup: Boolean): Unit

    Finalizes the current chunk.

    Finalizes the current chunk. We only reset the number of rows for the current chunk here since not all the writers need this step.

    Attributes
    protected
    Definition Classes
    ApplyInPandasWithStateWriterBaseStreamingArrowWriter
  11. def finalizeData(): Unit

    Indicates writer that all groups have been processed.

  12. def finalizeGroup(): Unit

    Indicates writer that current group has finalized and there will be no further row bound to the current group.

  13. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  14. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  15. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  16. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  17. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  18. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  19. var numRowsForCurrentChunk: Int
    Attributes
    protected
    Definition Classes
    BaseStreamingArrowWriter
  20. def startNewGroup(keyRow: UnsafeRow, groupState: GroupStateImpl[Row]): Unit

    Indicates writer to start with new grouping key.

    Indicates writer to start with new grouping key.

    keyRow

    The grouping key row for current group.

    groupState

    The instance of GroupStateImpl for current group.

  21. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  22. def toString(): String
    Definition Classes
    AnyRef → Any
  23. var totalNumRowsForBatch: Int
    Attributes
    protected
    Definition Classes
    BaseStreamingArrowWriter
  24. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  25. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  26. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  27. def writeRow(dataRow: InternalRow): Unit

    Indicates writer to write a row for current batch.

    Indicates writer to write a row for current batch.

    dataRow

    The row to write for current batch.

    Definition Classes
    BaseStreamingArrowWriter

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from AnyRef

Inherited from Any

Ungrouped