Previously ColumnGenerator.
ColumnGenerator - prevously Column; it is now the base class for all ColumnGenerators.
ColumnList allows users to specify custom generators for a list of columns inside a StructType column.
:: Experimental :: Base class for testing Spark DataFrames.
This is the base trait for Spark Streaming testsuite.
This is the base trait for Spark Streaming testsuite. This provides basic functionality to run user-defined set of input on user-defined stream operations, and verify the output matches as expected.
This implementation is designed to work with JUnit for java users.
Note: this always uses the manual clock to control Spark Streaming's batches.
Manages a local sc
SparkContext
variable,
correctly stopping it after each test.
Provides a local sc
SparkContext
variable, correctly stopping it after each test.
Provides a local sc
SparkContext
variable, correctly stopping it after each test.
The stopping logic is provided in LocalSparkContext
.
This listener collects basic execution time information to be used in micro type performance tests.
This listener collects basic execution time information to be used in micro type performance tests. Be careful imposing strict limits as there is a large amount of variability.
Shares an HDFS MiniCluster based SparkContext
between all tests in a suite and
closes it at the end.
Shares an HDFS MiniCluster based SparkContext
between all tests in a suite and
closes it at the end. This requires that the env variable SPARK_HOME is set.
Further more if this is used in Spark versions prior to 1.6.3,
all Spark tests must run against the yarn mini cluster.
(see https://issues.apache.org/jira/browse/SPARK-10812 for details).
Shares a local SparkContext
between all tests in a suite
and closes it at the end.
Shares a local SparkContext
between all tests in a suite
and closes it at the end. You can share between suites by enabling
reuseContextIfPossible.
Methods for testing Spark actions.
Methods for testing Spark actions. Because actions don't return a DStream, you will need to verify the results of your test against mocks.
This is the base trait for Spark Streaming testsuites.
This is the base trait for Spark Streaming testsuites. This provides basic functionality to run user-defined set of input on user-defined stream operations, and verify the output.
This is a input stream just for the testsuites.
This is a input stream just for the testsuites. This is equivalent to a checkpointable, replayable, reliable message queue like Kafka. It requires a sequence as input, and returns the i_th element at the i_th batch under manual clock.
Based on TestInputStream class from TestSuiteBase in the Apache Spark project.
This is a output stream just for testing.
This is a output stream just for testing.
The buffer contains a sequence of RDD's, each containing a sequence of items
Shares an HDFS MiniCluster based SparkContext
between all tests in a suite and
closes it at the end.
Shares an HDFS MiniCluster based SparkContext
between all tests in a suite and
closes it at the end. This requires that the env variable SPARK_HOME is set.
Further more if this is used prior to Spark 1.6.3,
all Spark tests must run against the yarn mini cluster
(see https://issues.apache.org/jira/browse/SPARK-10812 for details).
Extractor that matches the UDTs exposed by Spark ML.
Previously ColumnGenerator. Allows the user to specify a generator for a specific column.