org.apache.spark.sql.execution.python
ApplyInPandasWithStateWriter
Companion object ApplyInPandasWithStateWriter
class ApplyInPandasWithStateWriter extends AnyRef
This class abstracts the complexity on constructing Arrow RecordBatches for data and state with
bin-packing and chunking. The caller only need to call the proper public methods of this class
startNewGroup, writeRow, finalizeGroup, finalizeData and this class will write the data
and state into Arrow RecordBatches with performing bin-pack and chunk internally.
This class requires that the parameter root has been initialized with the Arrow schema like
below:
- data fields
- state field
- nested schema (Refer ApplyInPandasWithStateWriter.STATE_METADATA_SCHEMA)
Please refer the code comment in the implementation to see how the writes of data and state against Arrow RecordBatch work with consideration of bin-packing and chunking.
- Alphabetic
- By Inheritance
- ApplyInPandasWithStateWriter
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new ApplyInPandasWithStateWriter(root: VectorSchemaRoot, writer: ArrowStreamWriter, arrowMaxRecordsPerBatch: Int)
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def finalizeData(): Unit
Indicates writer that all groups have been processed.
- def finalizeGroup(): Unit
Indicates writer that current group has finalized and there will be no further row bound to the current group.
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def startNewGroup(keyRow: UnsafeRow, groupState: GroupStateImpl[Row]): Unit
Indicates writer to start with new grouping key.
Indicates writer to start with new grouping key.
- keyRow
The grouping key row for current group.
- groupState
The instance of GroupStateImpl for current group.
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def writeRow(dataRow: InternalRow): Unit
Indicates writer to write a row in the current group.
Indicates writer to write a row in the current group.
- dataRow
The row to write in the current group.