Persist this RDD with the default storage level (MEMORY_ONLY
).
Return a new RDD that is reduced into numPartitions
partitions.
Return a new RDD containing the distinct elements in this RDD.
Return a new RDD containing the distinct elements in this RDD.
Return a new RDD containing only the elements that satisfy a predicate.
:: Experimental :: Appends the rows from this RDD to the specified table.
:: Experimental :: Appends the rows from this RDD to the specified table.
:: Experimental :: Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
:: Experimental :: Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
Return the intersection of this RDD and another one.
Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did. Performs a hash partition across the cluster
Note that this method performs a shuffle internally.
How many partitions to use in the resulting RDD
Return the intersection of this RDD and another one.
Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did.
Note that this method performs a shuffle internally.
Partitioner to use for the resulting RDD
Return the intersection of this RDD and another one.
Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did.
Note that this method performs a shuffle internally.
Set this RDD's storage level to persist its values across operations after the first time it is computed.
Set this RDD's storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet..
Persist this RDD with the default storage level (MEMORY_ONLY
).
Prints out the schema in the tree format.
Prints out the schema in the tree format.
:: DeveloperApi :: A lazily computed query execution workflow.
:: DeveloperApi :: A lazily computed query execution workflow. All other RDD operations are passed through to the RDD that is produced by this workflow. This workflow is produced lazily because invoking the whole query optimization pipeline can be expensive.
The query execution is considered a Developer API as phases may be added or removed in future releases. This execution is only exposed to provide an interface for inspecting the various phases for debugging purposes. Applications should not depend on particular phases existing or producing any specific output, even for exactly the same query.
Additionally, the RDD exposed by this execution is not designed for consumption by end users. In particular, it does not contain any schema information, and it reuses Row objects internally. This object reuse improves performance, but can make programming against the RDD more difficult. Instead end users should perform RDD operations on a SchemaRDD directly.
Registers this RDD as a temporary table using the given name.
Registers this RDD as a temporary table using the given name. The lifetime of this temporary table is tied to the SQLContext that was used to create this SchemaRDD.
Return a new RDD that has exactly numPartitions
partitions.
Return a new RDD that has exactly numPartitions
partitions.
Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
If you are decreasing the number of partitions in this RDD, consider using coalesce
,
which can avoid performing a shuffle.
Saves the contents of this SchemaRDD
as a parquet file, preserving the schema.
Saves the contents of this SchemaRDD
as a parquet file, preserving the schema. Files that
are written out using this method can be read back in as a SchemaRDD using the parquetFile
function.
:: Experimental :: Creates a table from the the contents of this SchemaRDD.
:: Experimental :: Creates a table from the the contents of this SchemaRDD. This will fail if the table already exists.
Note that this currently only works with SchemaRDDs that are created from a HiveContext as
there is no notion of a persisted catalog in a standard SQL context. Instead you can write
an RDD out to a parquet file, and then register that file as a table. This "table" can then
be the target of an insertInto
.
Returns the output schema in the tree format.
Returns the output schema in the tree format.
Assign a name to this RDD
Return an RDD with the elements from this
that are not in other
.
Return an RDD with the elements from this
that are not in other
.
Return an RDD with the elements from this
that are not in other
.
Return an RDD with the elements from this
that are not in other
.
Uses this
partitioner/partition size, because even if other
is huge, the resulting
RDD will be <= us.
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
Whether to block until all blocks are deleted.
This RDD.
An RDD of Row objects that is returned as the result of a Spark SQL query. In addition to standard RDD operations, a JavaSchemaRDD can also be registered as a table in the JavaSQLContext that was used to create. Registering a JavaSchemaRDD allows its contents to be queried in future SQL statement.