Primary method to transform the source data stream into the output data stream.
Primary method to transform the source data stream into the output data stream. The output of this method is passed into sink(). This method must be overridden by subclasses.
input data stream created by source()
implicit flink job config
output data stream
A source data stream for the control events.
A source data stream for the control events.
implicit flink config
a data stream of control events.
A source data stream for the data events.
A source data stream for the data events.
implicit flink config
a data stream of data events.
A pipeline for transforming a single stream.
A pipeline for transforming a single stream. Passes the output of source() through transform() and the result of that into maybeSink(), which may pass it into sink() if we're not testing. Ultimately, returns the output data stream to facilitate testing.
implicit flink job config
data output stream
The output stream will only be passed to sink() if FlinkConfig.mockEdges evaluates to false (ie, you're not testing).
The output stream will only be passed to sink() if FlinkConfig.mockEdges evaluates to false (ie, you're not testing).
the output data stream to pass into sink()
implicit flink job config
Returns the pattern used by CEP to aggregate the control and data streams.
Returns the pattern used by CEP to aggregate the control and data streams. Should not need to be overridden.
cep pattern
Writes the transformed data stream to configured output sinks.
Writes the transformed data stream to configured output sinks. *
a transformed stream from transform()
implicit flink job config
Generate a data stream of data control periods.
Generate a data stream of data control periods. This method does not generally need to be overridden in subclasses. It interleaves the data and control streams to produce a single stream of DataOrControl objects and then uses flink's CEP library to match sequences of control-on, data, and control-off events, which aggregates into a stream of DataControlPeriods.
At the moment, flink's CEP library has a bug in how its greedy
operator works, which requires that
we manually filter the control stream to remove multiple, sequential controls with the same isActive()
value. So multiple on
controls are replaced with the earliest on
control and the same for off
controls.
implicit flink configuration
data stream of data control periods
A simple flink job that transforms a data stream and a control stream into an output stream. This uses flink's CEP library to match sequences of data elements that fall between control elements that are alternately active and inactive. As an example, let
on
represent an active control,off
represent an inactive control, andd
to represent data elements. Then the following streamd1 d2 on d3 d4 d5 off d6 d7
would output one DataControlPeriod object with a start time of the timestamp ofon
, and end time of the timestamp ofoff
and the elementsd3, d4, d5
as the payload.the data type
the control type
the output stream element type