class ApplyInPandasWithStatePythonRunner extends BasePythonRunner[InType, OutType] with PythonArrowInput[InType] with PythonArrowOutput[OutType]
A variant implementation of ArrowPythonRunner to serve the operation applyInPandasWithState.
Unlike normal ArrowPythonRunner which both input and output (executor <-> python worker) are InternalRow, applyInPandasWithState has side data (state information) in both input and output along with data, which requires different struct on Arrow RecordBatch.
- Alphabetic
- By Inheritance
- ApplyInPandasWithStatePythonRunner
- PythonArrowOutput
- PythonArrowInput
- BasePythonRunner
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new ApplyInPandasWithStatePythonRunner(funcs: Seq[(ChainedPythonFunctions, Long)], evalType: Int, argOffsets: Array[Array[Int]], inputSchema: StructType, _timeZoneId: String, initialWorkerConf: Map[String, String], stateEncoder: ExpressionEncoder[Row], keySchema: StructType, outputSchema: StructType, stateValueSchema: StructType, pythonMetrics: Map[String, SQLMetric], jobArtifactUUID: Option[String])
Type Members
- implicit class LogStringContext extends AnyRef
- Definition Classes
- Logging
- class MonitorThread extends Thread
- Definition Classes
- BasePythonRunner
- class ReaderInputStream extends InputStream
- Definition Classes
- BasePythonRunner
- abstract class ReaderIterator extends Iterator[OUT]
- Definition Classes
- BasePythonRunner
- abstract class Writer extends AnyRef
- Definition Classes
- BasePythonRunner
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val accumulator: PythonAccumulator
- Attributes
- protected
- Definition Classes
- BasePythonRunner
- val argOffsets: Array[Array[Int]]
- Attributes
- protected
- Definition Classes
- BasePythonRunner
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- val authSocketTimeout: Long
- Attributes
- protected
- Definition Classes
- BasePythonRunner
- val bufferSize: Int
- Definition Classes
- ApplyInPandasWithStatePythonRunner → BasePythonRunner
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- def close(): Unit
- Attributes
- protected
- Definition Classes
- PythonArrowInput
- def compute(inputIterator: Iterator[InType], partitionIndex: Int, context: TaskContext): Iterator[OutType]
- Definition Classes
- BasePythonRunner
- def deserializeColumnarBatch(batch: ColumnarBatch, schema: StructType): OutType
Deserialize ColumnarBatch received from the Python worker to produce the output.
Deserialize ColumnarBatch received from the Python worker to produce the output. Schema info for given ColumnarBatch is also provided as well.
- Attributes
- protected
- Definition Classes
- ApplyInPandasWithStatePythonRunner → PythonArrowOutput
- val envVars: Map[String, String]
- Attributes
- protected
- Definition Classes
- BasePythonRunner
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- val errorOnDuplicatedFieldNames: Boolean
- Definition Classes
- ApplyInPandasWithStatePythonRunner → PythonArrowInput
- val evalType: Int
- Attributes
- protected
- Definition Classes
- BasePythonRunner
- val faultHandlerEnabled: Boolean
- Definition Classes
- ApplyInPandasWithStatePythonRunner → BasePythonRunner
- val funcs: Seq[ChainedPythonFunctions]
- Attributes
- protected
- Definition Classes
- BasePythonRunner
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def handleMetadataAfterExec(stream: DataInputStream): Unit
- Attributes
- protected
- Definition Classes
- PythonArrowOutput
- def handleMetadataBeforeExec(stream: DataOutputStream): Unit
This method sends out the additional metadata before sending out actual data.
This method sends out the additional metadata before sending out actual data.
Specifically, this class overrides this method to also write the schema for state value.
- Attributes
- protected
- Definition Classes
- ApplyInPandasWithStatePythonRunner → PythonArrowInput
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- val jobArtifactUUID: Option[String]
- Attributes
- protected
- Definition Classes
- BasePythonRunner
- val largeVarTypes: Boolean
- Attributes
- protected
- Definition Classes
- ApplyInPandasWithStatePythonRunner → PythonArrowInput
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def newReaderIterator(stream: DataInputStream, writer: Writer, startTime: Long, env: SparkEnv, worker: PythonWorker, pid: Option[Int], releasedOrClosed: AtomicBoolean, context: TaskContext): Iterator[OutType]
- Attributes
- protected
- Definition Classes
- PythonArrowOutput
- def newWriter(env: SparkEnv, worker: PythonWorker, inputIterator: Iterator[InType], partitionIndex: Int, context: TaskContext): Writer
- Attributes
- protected
- Definition Classes
- PythonArrowInput
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- val pythonExec: String
- Definition Classes
- ApplyInPandasWithStatePythonRunner → BasePythonRunner
- val pythonMetrics: Map[String, SQLMetric]
- Definition Classes
- ApplyInPandasWithStatePythonRunner → PythonArrowOutput → PythonArrowInput
- val pythonVer: String
- Attributes
- protected
- Definition Classes
- BasePythonRunner
- val root: VectorSchemaRoot
- Attributes
- protected
- Definition Classes
- PythonArrowInput
- lazy val schema: StructType
- Attributes
- protected
- Definition Classes
- ApplyInPandasWithStatePythonRunner → PythonArrowInput
- val simplifiedTraceback: Boolean
- Definition Classes
- ApplyInPandasWithStatePythonRunner → BasePythonRunner
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- lazy val timeZoneId: String
- Attributes
- protected
- Definition Classes
- ApplyInPandasWithStatePythonRunner → PythonArrowInput
- val timelyFlushEnabled: Boolean
- Attributes
- protected
- Definition Classes
- BasePythonRunner
- val timelyFlushTimeoutNanos: Long
- Attributes
- protected
- Definition Classes
- BasePythonRunner
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- def withLogContext(context: HashMap[String, String])(body: => Unit): Unit
- Attributes
- protected
- Definition Classes
- Logging
- val workerConf: Map[String, String]
- Attributes
- protected
- Definition Classes
- ApplyInPandasWithStatePythonRunner → PythonArrowInput
- def writeNextInputToArrowStream(root: VectorSchemaRoot, writer: ArrowStreamWriter, dataOut: DataOutputStream, inputIterator: Iterator[InType]): Boolean
Read the (key, state, values) from input iterator and construct Arrow RecordBatches, and write constructed RecordBatches to the writer.
Read the (key, state, values) from input iterator and construct Arrow RecordBatches, and write constructed RecordBatches to the writer.
See ApplyInPandasWithStateWriter for more details.
- Attributes
- protected
- Definition Classes
- ApplyInPandasWithStatePythonRunner → PythonArrowInput
- def writeUDF(dataOut: DataOutputStream): Unit
- Attributes
- protected
- Definition Classes
- ApplyInPandasWithStatePythonRunner → PythonArrowInput
- val writer: ArrowStreamWriter
- Attributes
- protected
- Definition Classes
- PythonArrowInput
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)