T
- The type of the elements in this stream.@Public public class DataStream<T> extends Object
限定符和类型 | 字段和说明 |
---|---|
protected StreamExecutionEnvironment |
environment |
protected org.apache.flink.api.dag.Transformation<T> |
transformation |
构造器和说明 |
---|
DataStream(StreamExecutionEnvironment environment,
org.apache.flink.api.dag.Transformation<T> transformation)
Create a new
DataStream in the given execution environment with
partitioning set to forward by default. |
限定符和类型 | 方法和说明 |
---|---|
DataStreamSink<T> |
addSink(SinkFunction<T> sinkFunction)
Adds the given sink to this DataStream.
|
SingleOutputStreamOperator<T> |
assignTimestamps(TimestampExtractor<T> extractor)
|
SingleOutputStreamOperator<T> |
assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner)
Assigns timestamps to the elements in the data stream and periodically creates
watermarks to signal event time progress.
|
SingleOutputStreamOperator<T> |
assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks<T> timestampAndWatermarkAssigner)
Assigns timestamps to the elements in the data stream and creates watermarks to
signal event time progress based on the elements themselves.
|
DataStream<T> |
broadcast()
Sets the partitioning of the
DataStream so that the output elements
are broadcasted to every parallel instance of the next operation. |
BroadcastStream<T> |
broadcast(org.apache.flink.api.common.state.MapStateDescriptor<?,?>... broadcastStateDescriptors)
Sets the partitioning of the
DataStream so that the output elements
are broadcasted to every parallel instance of the next operation. |
protected <F> F |
clean(F f)
Invokes the
ClosureCleaner
on the given function if closure cleaning is enabled in the ExecutionConfig . |
<T2> CoGroupedStreams<T,T2> |
coGroup(DataStream<T2> otherStream)
Creates a join operation.
|
<R> BroadcastConnectedStream<T,R> |
connect(BroadcastStream<R> broadcastStream)
Creates a new
BroadcastConnectedStream by connecting the current
DataStream or KeyedStream with a BroadcastStream . |
<R> ConnectedStreams<T,R> |
connect(DataStream<R> dataStream)
Creates a new
ConnectedStreams by connecting
DataStream outputs of (possible) different types with each other. |
AllWindowedStream<T,GlobalWindow> |
countWindowAll(long size)
Windows this
DataStream into tumbling count windows. |
AllWindowedStream<T,GlobalWindow> |
countWindowAll(long size,
long slide)
Windows this
DataStream into sliding count windows. |
protected <R> SingleOutputStreamOperator<R> |
doTransform(String operatorName,
org.apache.flink.api.common.typeinfo.TypeInformation<R> outTypeInfo,
StreamOperatorFactory<R> operatorFactory) |
SingleOutputStreamOperator<T> |
filter(org.apache.flink.api.common.functions.FilterFunction<T> filter)
Applies a Filter transformation on a
DataStream . |
<R> SingleOutputStreamOperator<R> |
flatMap(org.apache.flink.api.common.functions.FlatMapFunction<T,R> flatMapper)
Applies a FlatMap transformation on a
DataStream . |
<R> SingleOutputStreamOperator<R> |
flatMap(org.apache.flink.api.common.functions.FlatMapFunction<T,R> flatMapper,
org.apache.flink.api.common.typeinfo.TypeInformation<R> outputType)
Applies a FlatMap transformation on a
DataStream . |
DataStream<T> |
forward()
Sets the partitioning of the
DataStream so that the output elements
are forwarded to the local subtask of the next operation. |
org.apache.flink.api.common.ExecutionConfig |
getExecutionConfig() |
StreamExecutionEnvironment |
getExecutionEnvironment()
Returns the
StreamExecutionEnvironment that was used to create this
DataStream . |
int |
getId()
Returns the ID of the
DataStream in the current StreamExecutionEnvironment . |
org.apache.flink.api.common.operators.ResourceSpec |
getMinResources()
Gets the minimum resources for this operator.
|
int |
getParallelism()
Gets the parallelism for this operator.
|
org.apache.flink.api.common.operators.ResourceSpec |
getPreferredResources()
Gets the preferred resources for this operator.
|
org.apache.flink.api.dag.Transformation<T> |
getTransformation()
Returns the
Transformation that represents the operation that logically creates
this DataStream . |
org.apache.flink.api.common.typeinfo.TypeInformation<T> |
getType()
Gets the type of the stream.
|
DataStream<T> |
global()
Sets the partitioning of the
DataStream so that the output values
all go to the first instance of the next processing operator. |
IterativeStream<T> |
iterate()
Initiates an iterative part of the program that feeds back data streams.
|
IterativeStream<T> |
iterate(long maxWaitTimeMillis)
Initiates an iterative part of the program that feeds back data streams.
|
<T2> JoinedStreams<T,T2> |
join(DataStream<T2> otherStream)
Creates a join operation.
|
KeyedStream<T,org.apache.flink.api.java.tuple.Tuple> |
keyBy(int... fields)
Partitions the operator state of a
DataStream by the given key positions. |
<K> KeyedStream<T,K> |
keyBy(org.apache.flink.api.java.functions.KeySelector<T,K> key)
It creates a new
KeyedStream that uses the provided key for partitioning
its operator states. |
<K> KeyedStream<T,K> |
keyBy(org.apache.flink.api.java.functions.KeySelector<T,K> key,
org.apache.flink.api.common.typeinfo.TypeInformation<K> keyType)
It creates a new
KeyedStream that uses the provided key with explicit type information
for partitioning its operator states. |
KeyedStream<T,org.apache.flink.api.java.tuple.Tuple> |
keyBy(String... fields)
Partitions the operator state of a
DataStream using field expressions. |
<R> SingleOutputStreamOperator<R> |
map(org.apache.flink.api.common.functions.MapFunction<T,R> mapper)
Applies a Map transformation on a
DataStream . |
<R> SingleOutputStreamOperator<R> |
map(org.apache.flink.api.common.functions.MapFunction<T,R> mapper,
org.apache.flink.api.common.typeinfo.TypeInformation<R> outputType)
Applies a Map transformation on a
DataStream . |
<K> DataStream<T> |
partitionCustom(org.apache.flink.api.common.functions.Partitioner<K> partitioner,
int field)
Partitions a tuple DataStream on the specified key fields using a custom partitioner.
|
<K> DataStream<T> |
partitionCustom(org.apache.flink.api.common.functions.Partitioner<K> partitioner,
org.apache.flink.api.java.functions.KeySelector<T,K> keySelector)
Partitions a DataStream on the key returned by the selector, using a custom partitioner.
|
<K> DataStream<T> |
partitionCustom(org.apache.flink.api.common.functions.Partitioner<K> partitioner,
String field)
Partitions a POJO DataStream on the specified key fields using a custom partitioner.
|
DataStreamSink<T> |
print()
Writes a DataStream to the standard output stream (stdout).
|
DataStreamSink<T> |
print(String sinkIdentifier)
Writes a DataStream to the standard output stream (stdout).
|
DataStreamSink<T> |
printToErr()
Writes a DataStream to the standard output stream (stderr).
|
DataStreamSink<T> |
printToErr(String sinkIdentifier)
Writes a DataStream to the standard output stream (stderr).
|
<R> SingleOutputStreamOperator<R> |
process(ProcessFunction<T,R> processFunction)
Applies the given
ProcessFunction on the input stream, thereby
creating a transformed output stream. |
<R> SingleOutputStreamOperator<R> |
process(ProcessFunction<T,R> processFunction,
org.apache.flink.api.common.typeinfo.TypeInformation<R> outputType)
Applies the given
ProcessFunction on the input stream, thereby
creating a transformed output stream. |
<R extends org.apache.flink.api.java.tuple.Tuple> |
project(int... fieldIndexes)
Initiates a Project transformation on a
Tuple DataStream . |
DataStream<T> |
rebalance()
Sets the partitioning of the
DataStream so that the output elements
are distributed evenly to instances of the next operation in a round-robin
fashion. |
DataStream<T> |
rescale()
Sets the partitioning of the
DataStream so that the output elements
are distributed evenly to a subset of instances of the next operation in a round-robin
fashion. |
protected DataStream<T> |
setConnectionType(StreamPartitioner<T> partitioner)
Internal function for setting the partitioner for the DataStream.
|
DataStream<T> |
shuffle()
Sets the partitioning of the
DataStream so that the output elements
are shuffled uniformly randomly to the next operation. |
SplitStream<T> |
split(OutputSelector<T> outputSelector)
已过时。
Please use side output instead.
|
AllWindowedStream<T,TimeWindow> |
timeWindowAll(Time size)
Windows this
DataStream into tumbling time windows. |
AllWindowedStream<T,TimeWindow> |
timeWindowAll(Time size,
Time slide)
Windows this
DataStream into sliding time windows. |
<R> SingleOutputStreamOperator<R> |
transform(String operatorName,
org.apache.flink.api.common.typeinfo.TypeInformation<R> outTypeInfo,
OneInputStreamOperator<T,R> operator)
Method for passing user defined operators along with the type
information that will transform the DataStream.
|
<R> SingleOutputStreamOperator<R> |
transform(String operatorName,
org.apache.flink.api.common.typeinfo.TypeInformation<R> outTypeInfo,
OneInputStreamOperatorFactory<T,R> operatorFactory)
Method for passing user defined operators created by the given factory along with the type information that will
transform the DataStream.
|
DataStream<T> |
union(DataStream<T>... streams)
Creates a new
DataStream by merging DataStream outputs of
the same type with each other. |
<W extends Window> |
windowAll(WindowAssigner<? super T,W> assigner)
Windows this data stream to a
AllWindowedStream , which evaluates windows
over a non key grouped stream. |
DataStreamSink<T> |
writeAsCsv(String path)
已过时。
Please use the
StreamingFileSink explicitly using the
addSink(SinkFunction) method. |
DataStreamSink<T> |
writeAsCsv(String path,
org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
已过时。
Please use the
StreamingFileSink explicitly using the
addSink(SinkFunction) method. |
<X extends org.apache.flink.api.java.tuple.Tuple> |
writeAsCsv(String path,
org.apache.flink.core.fs.FileSystem.WriteMode writeMode,
String rowDelimiter,
String fieldDelimiter)
已过时。
Please use the
StreamingFileSink explicitly using the
addSink(SinkFunction) method. |
DataStreamSink<T> |
writeAsText(String path)
已过时。
Please use the
StreamingFileSink explicitly using the
addSink(SinkFunction) method. |
DataStreamSink<T> |
writeAsText(String path,
org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
已过时。
Please use the
StreamingFileSink explicitly using the
addSink(SinkFunction) method. |
DataStreamSink<T> |
writeToSocket(String hostName,
int port,
org.apache.flink.api.common.serialization.SerializationSchema<T> schema)
Writes the DataStream to a socket as a byte array.
|
DataStreamSink<T> |
writeUsingOutputFormat(org.apache.flink.api.common.io.OutputFormat<T> format)
已过时。
Please use the
StreamingFileSink explicitly using the
addSink(SinkFunction) method. |
protected final StreamExecutionEnvironment environment
protected final org.apache.flink.api.dag.Transformation<T> transformation
public DataStream(StreamExecutionEnvironment environment, org.apache.flink.api.dag.Transformation<T> transformation)
DataStream
in the given execution environment with
partitioning set to forward by default.environment
- The StreamExecutionEnvironment@Internal public int getId()
DataStream
in the current StreamExecutionEnvironment
.public int getParallelism()
@PublicEvolving public org.apache.flink.api.common.operators.ResourceSpec getMinResources()
@PublicEvolving public org.apache.flink.api.common.operators.ResourceSpec getPreferredResources()
public org.apache.flink.api.common.typeinfo.TypeInformation<T> getType()
protected <F> F clean(F f)
ClosureCleaner
on the given function if closure cleaning is enabled in the ExecutionConfig
.public StreamExecutionEnvironment getExecutionEnvironment()
StreamExecutionEnvironment
that was used to create this
DataStream
.public org.apache.flink.api.common.ExecutionConfig getExecutionConfig()
@SafeVarargs public final DataStream<T> union(DataStream<T>... streams)
DataStream
by merging DataStream
outputs of
the same type with each other. The DataStreams merged using this operator
will be transformed simultaneously.streams
- The DataStreams to union output with.DataStream
.@Deprecated public SplitStream<T> split(OutputSelector<T> outputSelector)
OutputSelector
.
Calling this method on an operator creates a new SplitStream
.outputSelector
- The user defined
OutputSelector
for directing the tuples.SplitStream
public <R> ConnectedStreams<T,R> connect(DataStream<R> dataStream)
ConnectedStreams
by connecting
DataStream
outputs of (possible) different types with each other.
The DataStreams connected using this operator can be used with
CoFunctions to apply joint transformations.dataStream
- The DataStream with which this stream will be connected.ConnectedStreams
.@PublicEvolving public <R> BroadcastConnectedStream<T,R> connect(BroadcastStream<R> broadcastStream)
BroadcastConnectedStream
by connecting the current
DataStream
or KeyedStream
with a BroadcastStream
.
The latter can be created using the broadcast(MapStateDescriptor[])
method.
The resulting stream can be further processed using the BroadcastConnectedStream.process(MyFunction)
method, where MyFunction
can be either a
KeyedBroadcastProcessFunction
or a BroadcastProcessFunction
depending on the current stream being a KeyedStream
or not.
broadcastStream
- The broadcast stream with the broadcast state to be connected with this stream.BroadcastConnectedStream
.public <K> KeyedStream<T,K> keyBy(org.apache.flink.api.java.functions.KeySelector<T,K> key)
KeyedStream
that uses the provided key for partitioning
its operator states.key
- The KeySelector to be used for extracting the key for partitioningDataStream
with partitioned state (i.e. KeyedStream)public <K> KeyedStream<T,K> keyBy(org.apache.flink.api.java.functions.KeySelector<T,K> key, org.apache.flink.api.common.typeinfo.TypeInformation<K> keyType)
KeyedStream
that uses the provided key with explicit type information
for partitioning its operator states.key
- The KeySelector to be used for extracting the key for partitioning.keyType
- The type information describing the key type.DataStream
with partitioned state (i.e. KeyedStream)public KeyedStream<T,org.apache.flink.api.java.tuple.Tuple> keyBy(int... fields)
DataStream
by the given key positions.fields
- The position of the fields on which the DataStream
will be grouped.DataStream
with partitioned state (i.e. KeyedStream)public KeyedStream<T,org.apache.flink.api.java.tuple.Tuple> keyBy(String... fields)
DataStream
using field expressions.
A field expression is either the name of a public field or a getter method with parentheses
of the DataStream
's underlying type. A dot can be used to drill
down into objects, as in "field1.getInnerField2()"
.fields
- One or more field expressions on which the state of the DataStream
operators will be
partitioned.DataStream
with partitioned state (i.e. KeyedStream)public <K> DataStream<T> partitionCustom(org.apache.flink.api.common.functions.Partitioner<K> partitioner, int field)
Note: This method works only on single field keys.
partitioner
- The partitioner to assign partitions to keys.field
- The field index on which the DataStream is partitioned.public <K> DataStream<T> partitionCustom(org.apache.flink.api.common.functions.Partitioner<K> partitioner, String field)
Note: This method works only on single field keys.
partitioner
- The partitioner to assign partitions to keys.field
- The expression for the field on which the DataStream is partitioned.public <K> DataStream<T> partitionCustom(org.apache.flink.api.common.functions.Partitioner<K> partitioner, org.apache.flink.api.java.functions.KeySelector<T,K> keySelector)
Note: This method works only on single field keys, i.e. the selector cannot return tuples of fields.
partitioner
- The partitioner to assign partitions to keys.keySelector
- The KeySelector with which the DataStream is partitioned.KeySelector
public DataStream<T> broadcast()
DataStream
so that the output elements
are broadcasted to every parallel instance of the next operation.@PublicEvolving public BroadcastStream<T> broadcast(org.apache.flink.api.common.state.MapStateDescriptor<?,?>... broadcastStateDescriptors)
DataStream
so that the output elements
are broadcasted to every parallel instance of the next operation. In addition,
it implicitly as many broadcast states
as the specified descriptors which can be used to store the element of the stream.broadcastStateDescriptors
- the descriptors of the broadcast states to create.BroadcastStream
which can be used in the connect(BroadcastStream)
to
create a BroadcastConnectedStream
for further processing of the elements.@PublicEvolving public DataStream<T> shuffle()
DataStream
so that the output elements
are shuffled uniformly randomly to the next operation.public DataStream<T> forward()
DataStream
so that the output elements
are forwarded to the local subtask of the next operation.public DataStream<T> rebalance()
DataStream
so that the output elements
are distributed evenly to instances of the next operation in a round-robin
fashion.@PublicEvolving public DataStream<T> rescale()
DataStream
so that the output elements
are distributed evenly to a subset of instances of the next operation in a round-robin
fashion.
The subset of downstream operations to which the upstream operation sends elements depends on the degree of parallelism of both the upstream and downstream operation. For example, if the upstream operation has parallelism 2 and the downstream operation has parallelism 4, then one upstream operation would distribute elements to two downstream operations while the other upstream operation would distribute to the other two downstream operations. If, on the other hand, the downstream operation has parallelism 2 while the upstream operation has parallelism 4 then two upstream operations will distribute to one downstream operation while the other two upstream operations will distribute to the other downstream operations.
In cases where the different parallelisms are not multiples of each other one or several downstream operations will have a differing number of inputs from upstream operations.
@PublicEvolving public DataStream<T> global()
DataStream
so that the output values
all go to the first instance of the next processing operator. Use this
setting with care since it might cause a serious performance bottleneck
in the application.@PublicEvolving public IterativeStream<T> iterate()
IterativeStream.closeWith(DataStream)
. The transformation of
this IterativeStream will be the iteration head. The data stream
given to the IterativeStream.closeWith(DataStream)
method is
the data stream that will be fed back and used as the input for the
iteration head. The user can also use different feedback type than the
input of the iteration and treat the input and feedback streams as a
ConnectedStreams
be calling
IterativeStream.withFeedbackType(TypeInformation)
A common usage pattern for streaming iterations is to use output
splitting to send a part of the closing data stream to the head. Refer to
split(OutputSelector)
for more information.
The iteration edge will be partitioned the same way as the first input of
the iteration head unless it is changed in the
IterativeStream.closeWith(DataStream)
call.
By default a DataStream with iteration will never terminate, but the user can use the maxWaitTime parameter to set a max waiting time for the iteration head. If no data received in the set time, the stream terminates.
@PublicEvolving public IterativeStream<T> iterate(long maxWaitTimeMillis)
IterativeStream.closeWith(DataStream)
. The transformation of
this IterativeStream will be the iteration head. The data stream
given to the IterativeStream.closeWith(DataStream)
method is
the data stream that will be fed back and used as the input for the
iteration head. The user can also use different feedback type than the
input of the iteration and treat the input and feedback streams as a
ConnectedStreams
be calling
IterativeStream.withFeedbackType(TypeInformation)
A common usage pattern for streaming iterations is to use output
splitting to send a part of the closing data stream to the head. Refer to
split(OutputSelector)
for more information.
The iteration edge will be partitioned the same way as the first input of
the iteration head unless it is changed in the
IterativeStream.closeWith(DataStream)
call.
By default a DataStream with iteration will never terminate, but the user can use the maxWaitTime parameter to set a max waiting time for the iteration head. If no data received in the set time, the stream terminates.
maxWaitTimeMillis
- Number of milliseconds to wait between inputs before shutting
downpublic <R> SingleOutputStreamOperator<R> map(org.apache.flink.api.common.functions.MapFunction<T,R> mapper)
DataStream
. The transformation
calls a MapFunction
for each element of the DataStream. Each
MapFunction call returns exactly one element. The user can also extend
RichMapFunction
to gain access to other features provided by the
RichFunction
interface.R
- output typemapper
- The MapFunction that is called for each element of the
DataStream.DataStream
.public <R> SingleOutputStreamOperator<R> map(org.apache.flink.api.common.functions.MapFunction<T,R> mapper, org.apache.flink.api.common.typeinfo.TypeInformation<R> outputType)
DataStream
. The transformation
calls a MapFunction
for each element of the DataStream. Each
MapFunction call returns exactly one element. The user can also extend
RichMapFunction
to gain access to other features provided by the
RichFunction
interface.R
- output typemapper
- The MapFunction that is called for each element of the
DataStream.outputType
- TypeInformation
for the result type of the function.DataStream
.public <R> SingleOutputStreamOperator<R> flatMap(org.apache.flink.api.common.functions.FlatMapFunction<T,R> flatMapper)
DataStream
. The
transformation calls a FlatMapFunction
for each element of the
DataStream. Each FlatMapFunction call can return any number of elements
including none. The user can also extend RichFlatMapFunction
to
gain access to other features provided by the
RichFunction
interface.R
- output typeflatMapper
- The FlatMapFunction that is called for each element of the
DataStreamDataStream
.public <R> SingleOutputStreamOperator<R> flatMap(org.apache.flink.api.common.functions.FlatMapFunction<T,R> flatMapper, org.apache.flink.api.common.typeinfo.TypeInformation<R> outputType)
DataStream
. The
transformation calls a FlatMapFunction
for each element of the
DataStream. Each FlatMapFunction call can return any number of elements
including none. The user can also extend RichFlatMapFunction
to
gain access to other features provided by the
RichFunction
interface.R
- output typeflatMapper
- The FlatMapFunction that is called for each element of the
DataStreamoutputType
- TypeInformation
for the result type of the function.DataStream
.@PublicEvolving public <R> SingleOutputStreamOperator<R> process(ProcessFunction<T,R> processFunction)
ProcessFunction
on the input stream, thereby
creating a transformed output stream.
The function will be called for every element in the input streams and can produce zero or more output elements.
R
- The type of elements emitted by the ProcessFunction
.processFunction
- The ProcessFunction
that is called for each element
in the stream.DataStream
.@Internal public <R> SingleOutputStreamOperator<R> process(ProcessFunction<T,R> processFunction, org.apache.flink.api.common.typeinfo.TypeInformation<R> outputType)
ProcessFunction
on the input stream, thereby
creating a transformed output stream.
The function will be called for every element in the input streams and can produce zero or more output elements.
R
- The type of elements emitted by the ProcessFunction
.processFunction
- The ProcessFunction
that is called for each element
in the stream.outputType
- TypeInformation
for the result type of the function.DataStream
.public SingleOutputStreamOperator<T> filter(org.apache.flink.api.common.functions.FilterFunction<T> filter)
DataStream
. The
transformation calls a FilterFunction
for each element of the
DataStream and retains only those element for which the function returns
true. Elements for which the function returns false are filtered. The
user can also extend RichFilterFunction
to gain access to other
features provided by the
RichFunction
interface.filter
- The FilterFunction that is called for each element of the
DataStream.@PublicEvolving public <R extends org.apache.flink.api.java.tuple.Tuple> SingleOutputStreamOperator<R> project(int... fieldIndexes)
Tuple
DataStream
.The transformation projects each Tuple of the DataSet onto a (sub)set of fields.
fieldIndexes
- The field indexes of the input tuples that are retained. The
order of fields in the output tuple corresponds to the order
of field indexes.Tuple
,
DataStream
public <T2> CoGroupedStreams<T,T2> coGroup(DataStream<T2> otherStream)
CoGroupedStreams
for an example of how the keys
and window can be specified.public <T2> JoinedStreams<T,T2> join(DataStream<T2> otherStream)
JoinedStreams
for an example of how the keys
and window can be specified.public AllWindowedStream<T,TimeWindow> timeWindowAll(Time size)
DataStream
into tumbling time windows.
This is a shortcut for either .window(TumblingEventTimeWindows.of(size))
or
.window(TumblingProcessingTimeWindows.of(size))
depending on the time characteristic
set using
Note: This operation is inherently non-parallel since all elements have to pass through
the same operator instance.
StreamExecutionEnvironment.setStreamTimeCharacteristic(org.apache.flink.streaming.api.TimeCharacteristic)
size
- The size of the window.public AllWindowedStream<T,TimeWindow> timeWindowAll(Time size, Time slide)
DataStream
into sliding time windows.
This is a shortcut for either .window(SlidingEventTimeWindows.of(size, slide))
or
.window(SlidingProcessingTimeWindows.of(size, slide))
depending on the time characteristic
set using
StreamExecutionEnvironment.setStreamTimeCharacteristic(org.apache.flink.streaming.api.TimeCharacteristic)
Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
size
- The size of the window.public AllWindowedStream<T,GlobalWindow> countWindowAll(long size)
DataStream
into tumbling count windows.
Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
size
- The size of the windows in number of elements.public AllWindowedStream<T,GlobalWindow> countWindowAll(long size, long slide)
DataStream
into sliding count windows.
Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
size
- The size of the windows in number of elements.slide
- The slide interval in number of elements.@PublicEvolving public <W extends Window> AllWindowedStream<T,W> windowAll(WindowAssigner<? super T,W> assigner)
AllWindowedStream
, which evaluates windows
over a non key grouped stream. Elements are put into windows by a
WindowAssigner
. The grouping of
elements is done by window.
A Trigger
can be defined to specify
when windows are evaluated. However, WindowAssigners
have a default Trigger
that is used if a Trigger
is not specified.
Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
assigner
- The WindowAssigner
that assigns elements to windows.@Deprecated public SingleOutputStreamOperator<T> assignTimestamps(TimestampExtractor<T> extractor)
assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks)
of assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks)
instead.If you know that the timestamps are strictly increasing you can use an
AscendingTimestampExtractor
. Otherwise,
you should provide a TimestampExtractor
that also implements
TimestampExtractor.getCurrentWatermark()
to keep track of watermarks.
extractor
- The TimestampExtractor that is called for each element of the DataStream.assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks)
,
assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks)
public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner)
This method creates watermarks periodically (for example every second), based
on the watermarks indicated by the given watermark generator. Even when no new elements
in the stream arrive, the given watermark generator will be periodically checked for
new watermarks. The interval in which watermarks are generated is defined in
ExecutionConfig.setAutoWatermarkInterval(long)
.
Use this method for the common cases, where some characteristic over all elements should generate the watermarks, or where watermarks are simply trailing behind the wall clock time by a certain amount.
For the second case and when the watermarks are required to lag behind the maximum
timestamp seen so far in the elements of the stream by a fixed amount of time, and this
amount is known in advance, use the
BoundedOutOfOrdernessTimestampExtractor
.
For cases where watermarks should be created in an irregular fashion, for example
based on certain markers that some element carry, use the
AssignerWithPunctuatedWatermarks
.
timestampAndWatermarkAssigner
- The implementation of the timestamp assigner and
watermark generator.AssignerWithPeriodicWatermarks
,
AssignerWithPunctuatedWatermarks
,
assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks)
public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks<T> timestampAndWatermarkAssigner)
This method creates watermarks based purely on stream elements. For each element
that is handled via TimestampAssigner.extractTimestamp(Object, long)
,
the AssignerWithPunctuatedWatermarks.checkAndGetNextWatermark(Object, long)
method is called, and a new watermark is emitted, if the returned watermark value is
non-negative and greater than the previous watermark.
This method is useful when the data stream embeds watermark elements, or certain elements carry a marker that can be used to determine the current event time watermark. This operation gives the programmer full control over the watermark generation. Users should be aware that too aggressive watermark generation (i.e., generating hundreds of watermarks every second) can cost some performance.
For cases where watermarks should be created in a regular fashion, for example
every x milliseconds, use the AssignerWithPeriodicWatermarks
.
timestampAndWatermarkAssigner
- The implementation of the timestamp assigner and
watermark generator.AssignerWithPunctuatedWatermarks
,
AssignerWithPeriodicWatermarks
,
assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks)
@PublicEvolving public DataStreamSink<T> print()
For each element of the DataStream the result of Object.toString()
is written.
NOTE: This will print to stdout on the machine where the code is executed, i.e. the Flink worker.
@PublicEvolving public DataStreamSink<T> printToErr()
For each element of the DataStream the result of Object.toString()
is written.
NOTE: This will print to stderr on the machine where the code is executed, i.e. the Flink worker.
@PublicEvolving public DataStreamSink<T> print(String sinkIdentifier)
For each element of the DataStream the result of Object.toString()
is written.
NOTE: This will print to stdout on the machine where the code is executed, i.e. the Flink worker.
sinkIdentifier
- The string to prefix the output with.@PublicEvolving public DataStreamSink<T> printToErr(String sinkIdentifier)
For each element of the DataStream the result of Object.toString()
is written.
NOTE: This will print to stderr on the machine where the code is executed, i.e. the Flink worker.
sinkIdentifier
- The string to prefix the output with.@Deprecated @PublicEvolving public DataStreamSink<T> writeAsText(String path)
StreamingFileSink
explicitly using the
addSink(SinkFunction)
method.For every element of the DataStream the result of Object.toString()
is written.
path
- The path pointing to the location the text file is written to.@Deprecated @PublicEvolving public DataStreamSink<T> writeAsText(String path, org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
StreamingFileSink
explicitly using the
addSink(SinkFunction)
method.For every element of the DataStream the result of Object.toString()
is written.
path
- The path pointing to the location the text file is written towriteMode
- Controls the behavior for existing files. Options are
NO_OVERWRITE and OVERWRITE.@Deprecated @PublicEvolving public DataStreamSink<T> writeAsCsv(String path)
StreamingFileSink
explicitly using the
addSink(SinkFunction)
method.For every field of an element of the DataStream the result of Object.toString()
is written. This method can only be used on data streams of tuples.
path
- the path pointing to the location the text file is written to@Deprecated @PublicEvolving public DataStreamSink<T> writeAsCsv(String path, org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
StreamingFileSink
explicitly using the
addSink(SinkFunction)
method.For every field of an element of the DataStream the result of Object.toString()
is written. This method can only be used on data streams of tuples.
path
- the path pointing to the location the text file is written towriteMode
- Controls the behavior for existing files. Options are
NO_OVERWRITE and OVERWRITE.@Deprecated @PublicEvolving public <X extends org.apache.flink.api.java.tuple.Tuple> DataStreamSink<T> writeAsCsv(String path, org.apache.flink.core.fs.FileSystem.WriteMode writeMode, String rowDelimiter, String fieldDelimiter)
StreamingFileSink
explicitly using the
addSink(SinkFunction)
method.For every field of an element of the DataStream the result of Object.toString()
is written. This method can only be used on data streams of tuples.
path
- the path pointing to the location the text file is written towriteMode
- Controls the behavior for existing files. Options are
NO_OVERWRITE and OVERWRITE.rowDelimiter
- the delimiter for two rowsfieldDelimiter
- the delimiter for two fields@PublicEvolving public DataStreamSink<T> writeToSocket(String hostName, int port, org.apache.flink.api.common.serialization.SerializationSchema<T> schema)
SerializationSchema
.hostName
- host of the socketport
- port of the socketschema
- schema for serialization@Deprecated @PublicEvolving public DataStreamSink<T> writeUsingOutputFormat(org.apache.flink.api.common.io.OutputFormat<T> format)
StreamingFileSink
explicitly using the
addSink(SinkFunction)
method.The output is not participating in Flink's checkpointing!
For writing to a file system periodically, the use of the "flink-connector-filesystem" is recommended.
format
- The output format@PublicEvolving public <R> SingleOutputStreamOperator<R> transform(String operatorName, org.apache.flink.api.common.typeinfo.TypeInformation<R> outTypeInfo, OneInputStreamOperator<T,R> operator)
R
- type of the return streamoperatorName
- name of the operator, for logging purposesoutTypeInfo
- the output type of the operatoroperator
- the object containing the transformation logictransform(String, TypeInformation, OneInputStreamOperatorFactory)
@PublicEvolving public <R> SingleOutputStreamOperator<R> transform(String operatorName, org.apache.flink.api.common.typeinfo.TypeInformation<R> outTypeInfo, OneInputStreamOperatorFactory<T,R> operatorFactory)
This method uses the rather new operator factories and should only be used when custom factories are needed.
R
- type of the return streamoperatorName
- name of the operator, for logging purposesoutTypeInfo
- the output type of the operatoroperatorFactory
- the factory for the operator.protected <R> SingleOutputStreamOperator<R> doTransform(String operatorName, org.apache.flink.api.common.typeinfo.TypeInformation<R> outTypeInfo, StreamOperatorFactory<R> operatorFactory)
protected DataStream<T> setConnectionType(StreamPartitioner<T> partitioner)
partitioner
- Partitioner to set.public DataStreamSink<T> addSink(SinkFunction<T> sinkFunction)
StreamExecutionEnvironment.execute()
method is called.sinkFunction
- The object containing the sink's invoke function.@Internal public org.apache.flink.api.dag.Transformation<T> getTransformation()
Transformation
that represents the operation that logically creates
this DataStream
.Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.