Persist this TypedDataset with the default storage level (MEMORY_AND_DISK
).
Persist this TypedDataset with the default storage level (MEMORY_AND_DISK
).
apache/spark
Returns a new TypedDataset that has exactly numPartitions
partitions.
Returns a new TypedDataset that has exactly numPartitions
partitions.
Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g.
if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of
the 100 new partitions will claim 10 of the current partitions.
apache/spark
Returns a new TypedDataset that contains only the unique elements of this TypedDataset.
Returns a new TypedDataset that contains only the unique elements of this TypedDataset.
Note that, equality checking is performed directly on the encoded representation of the data
and thus is not affected by a custom equals
function defined on T
.
apache/spark
Prints the plans (logical and physical) to the console for debugging purposes.
Prints the plans (logical and physical) to the console for debugging purposes.
apache/spark
Returns a new TypedDataset that only contains elements where func
returns true
.
Returns a new TypedDataset that only contains elements where func
returns true
.
apache/spark
Returns a new TypedDataset by first applying a function to all elements of this TypedDataset, and then flattening the results.
Returns a new TypedDataset by first applying a function to all elements of this TypedDataset, and then flattening the results.
apache/spark
Returns a new TypedDataset that contains only the elements of this TypedDataset that are also
present in other
.
Returns a new TypedDataset that contains only the elements of this TypedDataset that are also
present in other
.
Note that, equality checking is performed directly on the encoded representation of the data
and thus is not affected by a custom equals
function defined on T
.
apache/spark
Returns a new TypedDataset that contains the result of applying func
to each element.
Returns a new TypedDataset that contains the result of applying func
to each element.
apache/spark
Returns a new TypedDataset that contains the result of applying func
to each partition.
Returns a new TypedDataset that contains the result of applying func
to each partition.
apache/spark
Persist this TypedDataset with the given storage level.
Persist this TypedDataset with the given storage level.
One of: MEMORY_ONLY
, MEMORY_AND_DISK
, MEMORY_ONLY_SER
,
MEMORY_AND_DISK_SER
, DISK_ONLY
, MEMORY_ONLY_2
, MEMORY_AND_DISK_2
, etc.
apache/spark
Prints the schema of the underlying Dataset
to the console in a nice tree format.
Prints the schema of the underlying Dataset
to the console in a nice tree format.
apache/spark
Converts this TypedDataset to an RDD.
Converts this TypedDataset to an RDD.
apache/spark
Returns a new TypedDataset that has exactly numPartitions
partitions.
Returns a new TypedDataset that has exactly numPartitions
partitions.
apache/spark
Returns a new TypedDataset by sampling a fraction of records.
Returns a new TypedDataset by sampling a fraction of records.
apache/spark
Returns a new TypedDataset where any elements present in other
have been removed.
Returns a new TypedDataset where any elements present in other
have been removed.
Note that, equality checking is performed directly on the encoded representation of the data
and thus is not affected by a custom equals
function defined on T
.
apache/spark
Converts this strongly typed collection of data to generic Dataframe.
Converts this strongly typed collection of data to generic Dataframe. In contrast to the strongly typed objects that Dataset operations work on, a Dataframe returns generic Row objects that allow fields to be accessed by ordinal or name.
apache/spark
Concise syntax for chaining custom transformations.
Concise syntax for chaining custom transformations.
apache/spark
Returns a new TypedDataset that contains the elements of both this and the other
TypedDataset
combined.
Returns a new TypedDataset that contains the elements of both this and the other
TypedDataset
combined.
Note that, this function is not a typical set union operation, in that it does not eliminate
duplicate items. As such, it is analogous to UNION ALL
in SQL.
apache/spark
Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.
Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.
Whether to block until all blocks are deleted. apache/spark
This trait implements TypedDataset methods that have the same signature than their
Dataset
equivalent. Each method simply forwards the call to the underlyingDataset
.Documentation marked "apache/spark" is thanks to apache/spark Contributors at https://github.com/apache/spark, licensed under Apache v2.0 available at http://www.apache.org/licenses/LICENSE-2.0