Class/Object

io.epiphanous.flinkrunner.flink

DataControlJob

Related Docs: object DataControlJob | package flink

Permalink

abstract class DataControlJob[D <: FlinkEvent, C <: FlinkEvent, OUT <: FlinkEvent] extends FlinkJob[DataControlPeriod[D], OUT]

A simple flink job that transforms a data stream and a control stream into an output stream. This uses flink's CEP library to match sequences of data elements that fall between control elements that are alternately active and inactive. As an example, let on represent an active control, off represent an inactive control, and d to represent data elements. Then the following stream d1 d2 on d3 d4 d5 off d6 d7 would output one DataControlPeriod object with a start time of the timestamp of on, and end time of the timestamp of off and the elements d3, d4, d5 as the payload.

D

the data type

C

the control type

OUT

the output stream element type

Linear Supertypes
FlinkJob[DataControlPeriod[D], OUT], LazyLogging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataControlJob
  2. FlinkJob
  3. LazyLogging
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DataControlJob()(implicit arg0: TypeInformation[D], arg1: TypeInformation[C], arg2: TypeInformation[OUT])

    Permalink

Abstract Value Members

  1. abstract def transform(in: DataStream[DataControlPeriod[D]])(implicit config: FlinkConfig, env: SEE): DataStream[OUT]

    Permalink

    Primary method to transform the source data stream into the output data stream.

    Primary method to transform the source data stream into the output data stream. The output of this method is passed into sink(). This method must be overridden by subclasses.

    in

    input data stream created by source()

    config

    implicit flink job config

    returns

    output data stream

    Definition Classes
    FlinkJob

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def control(implicit config: FlinkConfig, env: SEE): DataStream[C]

    Permalink

    A source data stream for the control events.

    A source data stream for the control events.

    config

    implicit flink config

    returns

    a data stream of control events.

  7. def data(implicit config: FlinkConfig, env: SEE): DataStream[D]

    Permalink

    A source data stream for the data events.

    A source data stream for the data events.

    config

    implicit flink config

    returns

    a data stream of data events.

  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. def flow()(implicit config: FlinkConfig, env: SEE): DataStream[OUT]

    Permalink

    A pipeline for transforming a single stream.

    A pipeline for transforming a single stream. Passes the output of source() through transform() and the result of that into maybeSink(), which may pass it into sink() if we're not testing. Ultimately, returns the output data stream to facilitate testing.

    config

    implicit flink job config

    returns

    data output stream

    Definition Classes
    FlinkJob
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  15. lazy val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    LazyLogging
  16. def maybeAssignTimestampsAndWatermarks(in: DataStream[DataControlPeriod[D]])(implicit config: FlinkConfig, env: SEE): Unit

    Permalink
    Definition Classes
    FlinkJob
  17. def maybeSink(out: DataStream[OUT])(implicit config: FlinkConfig, env: SEE): Unit

    Permalink

    The output stream will only be passed to sink() if FlinkConfig.mockEdges evaluates to false (ie, you're not testing).

    The output stream will only be passed to sink() if FlinkConfig.mockEdges evaluates to false (ie, you're not testing).

    out

    the output data stream to pass into sink()

    config

    implicit flink job config

    Definition Classes
    FlinkJob
  18. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  19. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  20. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  21. def pattern(implicit config: FlinkConfig): Pattern[DataOrControl[D, C], DataOrControl[D, C]]

    Permalink

    Returns the pattern used by CEP to aggregate the control and data streams.

    Returns the pattern used by CEP to aggregate the control and data streams. Should not need to be overridden.

    returns

    cep pattern

  22. def run()(implicit config: FlinkConfig, env: SEE): Either[Iterator[OUT], Unit]

    Permalink
    Definition Classes
    FlinkJob
  23. def sink(out: DataStream[OUT])(implicit config: FlinkConfig, env: SEE): Unit

    Permalink

    Writes the transformed data stream to configured output sinks.

    Writes the transformed data stream to configured output sinks. *

    out

    a transformed stream from transform()

    config

    implicit flink job config

    Definition Classes
    FlinkJob
  24. def source()(implicit config: FlinkConfig, env: SEE): DataStream[DataControlPeriod[D]]

    Permalink

    Generate a data stream of data control periods.

    Generate a data stream of data control periods. This method does not generally need to be overridden in subclasses. It interleaves the data and control streams to produce a single stream of DataOrControl objects and then uses flink's CEP library to match sequences of control-on, data, and control-off events, which aggregates into a stream of DataControlPeriods.

    At the moment, flink's CEP library has a bug in how its greedy operator works, which requires that we manually filter the control stream to remove multiple, sequential controls with the same isActive() value. So multiple on controls are replaced with the earliest on control and the same for off controls.

    config

    implicit flink configuration

    returns

    data stream of data control periods

    Definition Classes
    DataControlJobFlinkJob
  25. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  26. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  27. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from FlinkJob[DataControlPeriod[D], OUT]

Inherited from LazyLogging

Inherited from AnyRef

Inherited from Any

Ungrouped