Package | Description |
---|---|
org.apache.flink.api.java | |
org.apache.flink.api.java.operators | |
org.apache.flink.api.java.operators.join | |
org.apache.flink.api.java.utils |
Modifier and Type | Method and Description |
---|---|
<X> DataSet<X> |
DataSet.runOperation(CustomUnaryOperation<T,X> operation)
Runs a
CustomUnaryOperation on the data set. |
Modifier and Type | Method and Description |
---|---|
protected static void |
DataSet.checkSameExecutionContext(DataSet<?> set1,
DataSet<?> set2) |
protected static void |
DataSet.checkSameExecutionContext(DataSet<?> set1,
DataSet<?> set2) |
<R> CoGroupOperator.CoGroupOperatorSets<T,R> |
DataSet.coGroup(DataSet<R> other)
Initiates a CoGroup transformation.
A CoGroup transformation combines the elements of two DataSets into one DataSet. |
<R> CrossOperator.DefaultCross<T,R> |
DataSet.cross(DataSet<R> other)
Initiates a Cross transformation.
A Cross transformation combines the elements of two DataSets into one DataSet. |
<R> CrossOperator.DefaultCross<T,R> |
DataSet.crossWithHuge(DataSet<R> other)
Initiates a Cross transformation.
A Cross transformation combines the elements of two DataSets into one DataSet. |
<R> CrossOperator.DefaultCross<T,R> |
DataSet.crossWithTiny(DataSet<R> other)
Initiates a Cross transformation.
A Cross transformation combines the elements of two DataSets into one DataSet. |
<R> JoinOperatorSetsBase<T,R> |
DataSet.fullOuterJoin(DataSet<R> other)
Initiates a Full Outer Join transformation.
An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine
joining elements into one DataSet.Elements of both DataSets that do not have a matching element on the opposing side are joined with null and emitted to the
resulting DataSet. |
<R> JoinOperatorSetsBase<T,R> |
DataSet.fullOuterJoin(DataSet<R> other,
JoinOperatorBase.JoinHint strategy)
Initiates a Full Outer Join transformation.
An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine
joining elements into one DataSet.Elements of both DataSets that do not have a matching element on the opposing side are joined with null and emitted to the
resulting DataSet. |
<R> DeltaIteration<T,R> |
DataSet.iterateDelta(DataSet<R> workset,
int maxIterations,
int... keyPositions)
Initiates a delta iteration.
|
<R> JoinOperator.JoinOperatorSets<T,R> |
DataSet.join(DataSet<R> other)
Initiates a Join transformation.
|
<R> JoinOperator.JoinOperatorSets<T,R> |
DataSet.join(DataSet<R> other,
JoinOperatorBase.JoinHint strategy)
Initiates a Join transformation.
|
<R> JoinOperator.JoinOperatorSets<T,R> |
DataSet.joinWithHuge(DataSet<R> other)
Initiates a Join transformation.
A Join transformation joins the elements of two DataSets on key equality and provides multiple ways to combine
joining elements into one DataSet.This method also gives the hint to the optimizer that the second DataSet to join is much larger than the first one. This method returns a JoinOperator.JoinOperatorSets on which one of the where methods
can be called to define the join key of the first joining (i.e., this) DataSet. |
<R> JoinOperator.JoinOperatorSets<T,R> |
DataSet.joinWithTiny(DataSet<R> other)
Initiates a Join transformation.
|
<R> JoinOperatorSetsBase<T,R> |
DataSet.leftOuterJoin(DataSet<R> other)
Initiates a Left Outer Join transformation.
An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine
joining elements into one DataSet.Elements of the left DataSet (i.e. |
<R> JoinOperatorSetsBase<T,R> |
DataSet.leftOuterJoin(DataSet<R> other,
JoinOperatorBase.JoinHint strategy)
Initiates a Left Outer Join transformation.
An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine
joining elements into one DataSet.Elements of the left DataSet (i.e. |
<R> JoinOperatorSetsBase<T,R> |
DataSet.rightOuterJoin(DataSet<R> other)
Initiates a Right Outer Join transformation.
An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine
joining elements into one DataSet.Elements of the right DataSet (i.e. |
<R> JoinOperatorSetsBase<T,R> |
DataSet.rightOuterJoin(DataSet<R> other,
JoinOperatorBase.JoinHint strategy)
Initiates a Right Outer Join transformation.
An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine
joining elements into one DataSet.Elements of the right DataSet (i.e. |
UnionOperator<T> |
DataSet.union(DataSet<T> other)
Creates a union of this DataSet with an other DataSet.
|
Modifier and Type | Class and Description |
---|---|
class |
AggregateOperator<IN>
This operator represents the application of a "aggregate" operation on a data set, and the
result data set produced by the function.
|
class |
BulkIterationResultSet<T> |
class |
CoGroupOperator<I1,I2,OUT>
A
DataSet that is the result of a CoGroup transformation. |
class |
CoGroupRawOperator<I1,I2,OUT>
A
DataSet that is the result of a CoGroup transformation. |
class |
CrossOperator<I1,I2,OUT>
A
DataSet that is the result of a Cross transformation. |
static class |
CrossOperator.DefaultCross<I1,I2>
|
static class |
CrossOperator.ProjectCross<I1,I2,OUT extends Tuple>
|
class |
DataSource<OUT>
An operation that creates a new data set (data source).
|
static class |
DeltaIteration.SolutionSetPlaceHolder<ST>
A
DataSet that acts as a placeholder for the solution set during the iteration. |
static class |
DeltaIteration.WorksetPlaceHolder<WT>
A
DataSet that acts as a placeholder for the workset during the iteration. |
class |
DeltaIterationResultSet<ST,WT> |
class |
DistinctOperator<T>
This operator represents the application of a "distinct" function on a data set, and the
result data set produced by the function.
|
class |
FilterOperator<T>
This operator represents the application of a "filter" function on a data set, and the
result data set produced by the function.
|
class |
FlatMapOperator<IN,OUT>
This operator represents the application of a "flatMap" function on a data set, and the
result data set produced by the function.
|
class |
GroupCombineOperator<IN,OUT>
This operator behaves like the GroupReduceOperator with Combine but only runs the Combine part which reduces all data
locally in their partitions.
|
class |
GroupReduceOperator<IN,OUT>
This operator represents the application of a "reduceGroup" function on a data set, and the
result data set produced by the function.
|
class |
IterativeDataSet<T>
The IterativeDataSet represents the start of an iteration.
|
class |
JoinOperator<I1,I2,OUT>
A
DataSet that is the result of a Join transformation. |
static class |
JoinOperator.DefaultJoin<I1,I2>
|
static class |
JoinOperator.EquiJoin<I1,I2,OUT>
A Join transformation that applies a
JoinFunction on each pair of joining elements.It also represents the DataSet that is the result of a Join transformation. |
static class |
JoinOperator.ProjectJoin<I1,I2,OUT extends Tuple>
|
class |
MapOperator<IN,OUT>
This operator represents the application of a "map" function on a data set, and the
result data set produced by the function.
|
class |
MapPartitionOperator<IN,OUT>
This operator represents the application of a "mapPartition" function on a data set, and the
result data set produced by the function.
|
class |
NoOpOperator<IN>
This operator will be ignored during translation.
|
class |
Operator<OUT,O extends Operator<OUT,O>>
Base class of all operators in the Java API.
|
class |
PartitionOperator<T>
This operator represents a partitioning.
|
class |
ProjectOperator<IN,OUT extends Tuple>
This operator represents the application of a projection operation on a data set, and the
result data set produced by the function.
|
class |
ReduceOperator<IN>
This operator represents the application of a "reduce" function on a data set, and the
result data set produced by the function.
|
class |
SingleInputOperator<IN,OUT,O extends SingleInputOperator<IN,OUT,O>>
Base class for operations that operates on a single input data set.
|
class |
SingleInputUdfOperator<IN,OUT,O extends SingleInputUdfOperator<IN,OUT,O>>
The SingleInputUdfOperator is the base class of all unary operators that execute
user-defined functions (UDFs).
|
class |
SortPartitionOperator<T>
This operator represents a DataSet with locally sorted partitions.
|
class |
TwoInputOperator<IN1,IN2,OUT,O extends TwoInputOperator<IN1,IN2,OUT,O>>
Base class for operations that operates on two input data sets.
|
class |
TwoInputUdfOperator<IN1,IN2,OUT,O extends TwoInputUdfOperator<IN1,IN2,OUT,O>>
The TwoInputUdfOperator is the base class of all binary operators that execute
user-defined functions (UDFs).
|
class |
UnionOperator<T>
Java API operator for union of two data sets
|
Modifier and Type | Field and Description |
---|---|
protected DataSet<T> |
Grouping.inputDataSet |
Modifier and Type | Method and Description |
---|---|
DataSet<ST> |
DeltaIteration.closeWith(DataSet<ST> solutionSetDelta,
DataSet<WT> newWorkset)
Closes the delta iteration.
|
DataSet<T> |
IterativeDataSet.closeWith(DataSet<T> iterationResult)
Closes the iteration.
|
DataSet<T> |
IterativeDataSet.closeWith(DataSet<T> iterationResult,
DataSet<?> terminationCriterion)
Closes the iteration and specifies a termination criterion.
|
DataSet<OUT> |
CustomUnaryOperation.createResult() |
DataSet<T> |
DataSink.getDataSet() |
DataSet<ST> |
DeltaIteration.getInitialSolutionSet()
Gets the initial solution set.
|
DataSet<WT> |
DeltaIteration.getInitialWorkset()
Gets the initial workset.
|
DataSet<IN> |
SingleInputOperator.getInput()
Gets the data set that this operation uses as its input.
|
DataSet<IN> |
NoOpOperator.getInput() |
DataSet<IN1> |
TwoInputOperator.getInput1()
Gets the data set that this operation uses as its first input.
|
DataSet<IN2> |
TwoInputOperator.getInput2()
Gets the data set that this operation uses as its second input.
|
DataSet<T> |
Grouping.getInputDataSet()
Returns the input DataSet of a grouping operation, that is the one before the grouping.
|
DataSet<T> |
BulkIterationResultSet.getNextPartialSolution() |
DataSet<ST> |
DeltaIterationResultSet.getNextSolutionSet() |
DataSet<WT> |
DeltaIterationResultSet.getNextWorkset() |
DataSet<?> |
BulkIterationResultSet.getTerminationCriterion() |
Modifier and Type | Method and Description |
---|---|
Map<String,DataSet<?>> |
SingleInputUdfOperator.getBroadcastSets() |
Map<String,DataSet<?>> |
UdfOperator.getBroadcastSets()
Gets the broadcast sets (name and data set) that have been added to context of the UDF.
|
Map<String,DataSet<?>> |
TwoInputUdfOperator.getBroadcastSets() |
Modifier and Type | Method and Description |
---|---|
DataSet<ST> |
DeltaIteration.closeWith(DataSet<ST> solutionSetDelta,
DataSet<WT> newWorkset)
Closes the delta iteration.
|
DataSet<ST> |
DeltaIteration.closeWith(DataSet<ST> solutionSetDelta,
DataSet<WT> newWorkset)
Closes the delta iteration.
|
DataSet<T> |
IterativeDataSet.closeWith(DataSet<T> iterationResult)
Closes the iteration.
|
DataSet<T> |
IterativeDataSet.closeWith(DataSet<T> iterationResult,
DataSet<?> terminationCriterion)
Closes the iteration and specifies a termination criterion.
|
DataSet<T> |
IterativeDataSet.closeWith(DataSet<T> iterationResult,
DataSet<?> terminationCriterion)
Closes the iteration and specifies a termination criterion.
|
void |
CustomUnaryOperation.setInput(DataSet<IN> inputData) |
void |
NoOpOperator.setInput(DataSet<IN> input) |
O |
SingleInputUdfOperator.withBroadcastSet(DataSet<?> data,
String name) |
O |
UdfOperator.withBroadcastSet(DataSet<?> data,
String name)
Adds a certain data set as a broadcast set to this operator.
|
O |
TwoInputUdfOperator.withBroadcastSet(DataSet<?> data,
String name) |
Modifier and Type | Field and Description |
---|---|
protected DataSet<I1> |
JoinOperatorSetsBase.input1 |
protected DataSet<I2> |
JoinOperatorSetsBase.input2 |
Constructor and Description |
---|
JoinOperatorSetsBase(DataSet<I1> input1,
DataSet<I2> input2) |
JoinOperatorSetsBase(DataSet<I1> input1,
DataSet<I2> input2) |
JoinOperatorSetsBase(DataSet<I1> input1,
DataSet<I2> input2,
JoinOperatorBase.JoinHint hint) |
JoinOperatorSetsBase(DataSet<I1> input1,
DataSet<I2> input2,
JoinOperatorBase.JoinHint hint) |
JoinOperatorSetsBase(DataSet<I1> input1,
DataSet<I2> input2,
JoinOperatorBase.JoinHint hint,
JoinType type) |
JoinOperatorSetsBase(DataSet<I1> input1,
DataSet<I2> input2,
JoinOperatorBase.JoinHint hint,
JoinType type) |
Modifier and Type | Method and Description |
---|---|
static <T> DataSet<Tuple2<Integer,Long>> |
DataSetUtils.countElementsPerPartition(DataSet<T> input)
Method that goes over all the elements in each partition in order to retrieve
the total number of elements.
|
static <T> DataSet<T> |
DataSetUtils.sampleWithSize(DataSet<T> input,
boolean withReplacement,
int numSamples)
Generate a sample of DataSet which contains fixed size elements.
|
static <T> DataSet<T> |
DataSetUtils.sampleWithSize(DataSet<T> input,
boolean withReplacement,
int numSamples,
long seed)
Generate a sample of DataSet which contains fixed size elements.
|
static <T> DataSet<Tuple2<Long,T>> |
DataSetUtils.zipWithIndex(DataSet<T> input)
Method that assigns a unique
Long value to all elements in the input data set. |
static <T> DataSet<Tuple2<Long,T>> |
DataSetUtils.zipWithUniqueId(DataSet<T> input)
Method that assigns a unique
Long value to all elements in the input data set in the following way:
a map function is applied to the input data set
each map task holds a counter c which is increased for each record
c is shifted by n bits where n = log2(number of parallel tasks)
to create a unique ID among all tasks, the task id is added to the counter
for each record, the resulting counter is collected
|
Modifier and Type | Method and Description |
---|---|
static <T> Utils.ChecksumHashCode |
DataSetUtils.checksumHashCode(DataSet<T> input)
Deprecated.
replaced with
org.apache.flink.graph.asm.dataset.ChecksumHashCode in Gelly |
static <T> DataSet<Tuple2<Integer,Long>> |
DataSetUtils.countElementsPerPartition(DataSet<T> input)
Method that goes over all the elements in each partition in order to retrieve
the total number of elements.
|
static <T> PartitionOperator<T> |
DataSetUtils.partitionByRange(DataSet<T> input,
DataDistribution distribution,
int... fields)
Range-partitions a DataSet on the specified tuple field positions.
|
static <T,K extends Comparable<K>> |
DataSetUtils.partitionByRange(DataSet<T> input,
DataDistribution distribution,
KeySelector<T,K> keyExtractor)
Range-partitions a DataSet using the specified key selector function.
|
static <T> PartitionOperator<T> |
DataSetUtils.partitionByRange(DataSet<T> input,
DataDistribution distribution,
String... fields)
Range-partitions a DataSet on the specified fields.
|
static <T> MapPartitionOperator<T,T> |
DataSetUtils.sample(DataSet<T> input,
boolean withReplacement,
double fraction)
Generate a sample of DataSet by the probability fraction of each element.
|
static <T> MapPartitionOperator<T,T> |
DataSetUtils.sample(DataSet<T> input,
boolean withReplacement,
double fraction,
long seed)
Generate a sample of DataSet by the probability fraction of each element.
|
static <T> DataSet<T> |
DataSetUtils.sampleWithSize(DataSet<T> input,
boolean withReplacement,
int numSamples)
Generate a sample of DataSet which contains fixed size elements.
|
static <T> DataSet<T> |
DataSetUtils.sampleWithSize(DataSet<T> input,
boolean withReplacement,
int numSamples,
long seed)
Generate a sample of DataSet which contains fixed size elements.
|
static <R extends Tuple,T extends Tuple> |
DataSetUtils.summarize(DataSet<T> input)
Summarize a DataSet of Tuples by collecting single pass statistics for all columns
Example usage:
|
static <T> DataSet<Tuple2<Long,T>> |
DataSetUtils.zipWithIndex(DataSet<T> input)
Method that assigns a unique
Long value to all elements in the input data set. |
static <T> DataSet<Tuple2<Long,T>> |
DataSetUtils.zipWithUniqueId(DataSet<T> input)
Method that assigns a unique
Long value to all elements in the input data set in the following way:
a map function is applied to the input data set
each map task holds a counter c which is increased for each record
c is shifted by n bits where n = log2(number of parallel tasks)
to create a unique ID among all tasks, the task id is added to the counter
for each record, the resulting counter is collected
|
Copyright © 2014–2017 The Apache Software Foundation. All rights reserved.