Returns a new TypedDataset that only contains elements where func
returns true
.
Returns a new TypedDataset that only contains elements where func
returns true
.
apache/spark
Returns a new TypedDataset by first applying a function to all elements of this TypedDataset, and then flattening the results.
Returns a new TypedDataset by first applying a function to all elements of this TypedDataset, and then flattening the results.
apache/spark
Returns a new TypedDataset that contains the result of applying func
to each element.
Returns a new TypedDataset that contains the result of applying func
to each element.
apache/spark
Returns a new TypedDataset that contains the result of applying func
to each partition.
Returns a new TypedDataset that contains the result of applying func
to each partition.
apache/spark
Optionally reduces the elements of this TypedDataset using the specified binary function.
Optionally reduces the elements of this TypedDataset using the specified binary function. The given
func
must be commutative and associative or the result may be non-deterministic.
Differs from Dataset#reduce
by wrapping its result into an Option
and an effect-suspending F
.
Methods on
TypedDataset[T]
that go through a full serialization and deserialization ofT
, and execute outside of the Catalyst runtime.The correct way to do a projection on a single column is to use the
select
method as follows:Spark provides an alternative way to obtain the same resulting
Dataset
, using themap
method:This second approach is however substantially slower than the first one, and should be avoided as possible. Indeed, under the hood this
map
will deserialize the entireTuple3
to an full JVM object, call the apply method of the_._2
closure on it, and serialize the resulting String back to its Catalyst representation.