Generating watermarks

Each operator has a single watermark value for all of its output streams. Operators can set the operator watermark that drives watermark generation. Teracloud® Streams automatically uses this value to submit watermarks, but might submit watermarks more infrequently than the source operator updates the watermark. A timeInterval window pane which has received late data (late tuples) triggers again when the operator's watermark is updated.

Watermarks for an operator are defined using the following modes:

OutputEventTime

The watermark is based on the event time of the submitted tuples. It is set at runtime.

InputWatermark

The watermark is based on the watermark values of the input streams. It is set at runtime. The watermark is updated when the minimum value of received watermarks on all the input ports gets advanced.

Manual

The watermark is set by calling an event-time API method (setWatermark()).

Note:
  • A new watermark is submitted when the operator's watermark is set to a value greater or equal to the previous value including the minimumGap defined by the current @eventTime annotation.
  • To define watermark generation mode, operators can call the EventTimeContext setMode() method in the operator constructor.
  • If the watermark generation mode is not set and the operator is annotated with the SPL annotation @eventTime, runtime sets the mode to OutputEventTime. Otherwise it is set it to InputWatermark.
  • An operator instance is a watermark source if its occurrence in the SPL graph is annotated with @eventTime. The SPL runtime sets its watermark mode to OutputEventTime if the mode is not set in the constructor. Alternatively, the operator sets its watermark mode to Manual and calls setWatermark() when needed.
  • Operators which buffer received a tuple before generating an output tuple (such as Delay, Gate, ThreadedSplit) set their watermark mode to OutputEventTime. The input tuple order relative to input watermarks is lost for these operators, because their watermark is set to the time of their most recent output tuple. If tuples are emitted out of order, for example if they have arrived out of order, then they may be emitted late with respect to the operator's watermark even though they were not late on input.

Watermark forwarding operators are:

Non-buffering operators
Process the input one tuple at a time, such as SPL Filter or Functor. They forward watermarks as follows:
  • The event-time attribute of the output tuple is copied from the input attribute with the same name and type.
  • The operator’s watermark is set to the minimum watermark value received on all input ports.
Windowed operators
Forward watermarks as follows:
  • The event-time attribute of the output tuples should be set to the event time of the end of the window, to ensure that tuples are not emitted late with regard to the emitted watermark. If the operator outputs a selection of the input tuples (for example, tuples where attribute x == Max(x) over the window), the input event-time value can be preserved by copying it into another attribute.
  • The operator’s watermark is set to the minimum watermark value received on all input ports.

When a timeInterval window pane triggers because of late tuples, the aggregation results are based on the contents of the pane which triggered because the operator's watermark has passed over the end of the window (complete pane) plus the late data.