Packages

class ApplyInPandasWithStatePythonRunner extends BasePythonRunner[InType, OutType] with PythonArrowInput[InType] with PythonArrowOutput[OutType]

A variant implementation of ArrowPythonRunner to serve the operation applyInPandasWithState.

Unlike normal ArrowPythonRunner which both input and output (executor <-> python worker) are InternalRow, applyInPandasWithState has side data (state information) in both input and output along with data, which requires different struct on Arrow RecordBatch.

Linear Supertypes
PythonArrowOutput[OutType], PythonArrowInput[InType], BasePythonRunner[InType, OutType], Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ApplyInPandasWithStatePythonRunner
  2. PythonArrowOutput
  3. PythonArrowInput
  4. BasePythonRunner
  5. Logging
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new ApplyInPandasWithStatePythonRunner(funcs: Seq[ChainedPythonFunctions], evalType: Int, argOffsets: Array[Array[Int]], inputSchema: StructType, timeZoneId: String, initialWorkerConf: Map[String, String], stateEncoder: ExpressionEncoder[Row], keySchema: StructType, outputSchema: StructType, stateValueSchema: StructType, pythonMetrics: Map[String, SQLMetric], jobArtifactUUID: Option[String])

Type Members

  1. class MonitorThread extends Thread
    Definition Classes
    BasePythonRunner
  2. abstract class ReaderIterator extends Iterator[OUT]
    Definition Classes
    BasePythonRunner
  3. class WriterMonitorThread extends Thread
    Definition Classes
    BasePythonRunner
  4. abstract class WriterThread extends Thread
    Definition Classes
    BasePythonRunner

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val accumulator: PythonAccumulatorV2
    Attributes
    protected
    Definition Classes
    BasePythonRunner
  5. val argOffsets: Array[Array[Int]]
    Attributes
    protected
    Definition Classes
    BasePythonRunner
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. val authSocketTimeout: Long
    Attributes
    protected
    Definition Classes
    BasePythonRunner
  8. val bufferSize: Int
    Definition Classes
    ApplyInPandasWithStatePythonRunner → BasePythonRunner
  9. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  10. def compute(inputIterator: Iterator[InType], partitionIndex: Int, context: TaskContext): Iterator[OutType]
    Definition Classes
    BasePythonRunner
  11. def deserializeColumnarBatch(batch: ColumnarBatch, schema: StructType): OutType

    Deserialize ColumnarBatch received from the Python worker to produce the output.

    Deserialize ColumnarBatch received from the Python worker to produce the output. Schema info for given ColumnarBatch is also provided as well.

    Attributes
    protected
    Definition Classes
    ApplyInPandasWithStatePythonRunner → PythonArrowOutput
  12. val envVars: Map[String, String]
    Attributes
    protected
    Definition Classes
    BasePythonRunner
  13. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  15. val errorOnDuplicatedFieldNames: Boolean
    Definition Classes
    ApplyInPandasWithStatePythonRunner → PythonArrowInput
  16. val evalType: Int
    Attributes
    protected
    Definition Classes
    BasePythonRunner
  17. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  18. val funcs: Seq[ChainedPythonFunctions]
    Attributes
    protected
    Definition Classes
    BasePythonRunner
  19. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  20. def handleMetadataAfterExec(stream: DataInputStream): Unit
    Attributes
    protected
    Definition Classes
    PythonArrowOutput
  21. def handleMetadataBeforeExec(stream: DataOutputStream): Unit

    This method sends out the additional metadata before sending out actual data.

    This method sends out the additional metadata before sending out actual data.

    Specifically, this class overrides this method to also write the schema for state value.

    Attributes
    protected
    Definition Classes
    ApplyInPandasWithStatePythonRunner → PythonArrowInput
  22. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  23. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  24. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  25. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  26. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  27. val jobArtifactUUID: Option[String]
    Attributes
    protected
    Definition Classes
    BasePythonRunner
  28. val largeVarTypes: Boolean
    Attributes
    protected
    Definition Classes
    ApplyInPandasWithStatePythonRunner → PythonArrowInput
  29. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  30. def logDebug(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  31. def logDebug(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  32. def logError(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  33. def logError(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  34. def logInfo(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  35. def logInfo(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  36. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  37. def logTrace(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  38. def logTrace(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  39. def logWarning(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  40. def logWarning(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  41. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  42. def newReaderIterator(stream: DataInputStream, writerThread: WriterThread, startTime: Long, env: SparkEnv, worker: Socket, pid: Option[Int], releasedOrClosed: AtomicBoolean, context: TaskContext): Iterator[OutType]
    Attributes
    protected
    Definition Classes
    PythonArrowOutput
  43. def newWriterThread(env: SparkEnv, worker: Socket, inputIterator: Iterator[InType], partitionIndex: Int, context: TaskContext): WriterThread
    Attributes
    protected
    Definition Classes
    PythonArrowInput
  44. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  45. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  46. val pythonExec: String
    Definition Classes
    ApplyInPandasWithStatePythonRunner → BasePythonRunner
  47. val pythonMetrics: Map[String, SQLMetric]
    Definition Classes
    ApplyInPandasWithStatePythonRunner → PythonArrowOutput → PythonArrowInput
  48. val pythonVer: String
    Attributes
    protected
    Definition Classes
    BasePythonRunner
  49. val schema: StructType
    Attributes
    protected
    Definition Classes
    ApplyInPandasWithStatePythonRunner → PythonArrowInput
  50. val simplifiedTraceback: Boolean
    Definition Classes
    ApplyInPandasWithStatePythonRunner → BasePythonRunner
  51. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  52. val timeZoneId: String
    Attributes
    protected
    Definition Classes
    ApplyInPandasWithStatePythonRunner → PythonArrowInput
  53. def toString(): String
    Definition Classes
    AnyRef → Any
  54. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  55. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  56. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  57. val workerConf: Map[String, String]
    Attributes
    protected
    Definition Classes
    ApplyInPandasWithStatePythonRunner → PythonArrowInput
  58. def writeIteratorToArrowStream(root: VectorSchemaRoot, writer: ArrowStreamWriter, dataOut: DataOutputStream, inputIterator: Iterator[InType]): Unit

    Read the (key, state, values) from input iterator and construct Arrow RecordBatches, and write constructed RecordBatches to the writer.

    Read the (key, state, values) from input iterator and construct Arrow RecordBatches, and write constructed RecordBatches to the writer.

    See ApplyInPandasWithStateWriter for more details.

    Attributes
    protected
    Definition Classes
    ApplyInPandasWithStatePythonRunner → PythonArrowInput
  59. def writeUDF(dataOut: DataOutputStream, funcs: Seq[ChainedPythonFunctions], argOffsets: Array[Array[Int]]): Unit
    Attributes
    protected
    Definition Classes
    PythonArrowInput

Inherited from PythonArrowOutput[OutType]

Inherited from PythonArrowInput[InType]

Inherited from BasePythonRunner[InType, OutType]

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped