Generating watermarks
Each operator has a single watermark value for all of its output streams. Operators can set the operator watermark that drives watermark generation. Teracloud® Streams automatically uses this value to submit watermarks, but might submit watermarks more infrequently than the source operator updates the watermark. A timeInterval window pane which has received late data (late tuples) triggers again when the operator's watermark is updated.
Watermarks for an operator are defined using the following modes:
- OutputEventTime
-
The watermark is based on the event time of the submitted tuples. It is set at runtime.
- InputWatermark
-
The watermark is based on the watermark values of the input streams. It is set at runtime. The watermark is updated when the minimum value of received watermarks on all the input ports gets advanced.
- Manual
-
The watermark is set by calling an event-time API method
(setWatermark())
.
- A new watermark is submitted when the operator's watermark is set to a value greater or equal to
the previous value including the
minimumGap
defined by the current@eventTime
annotation. - To define watermark generation mode, operators can call the
EventTimeContext setMode()
method in the operator constructor. - If the watermark generation mode is not set and the operator is annotated with the SPL
annotation
@eventTime
, runtime sets the mode toOutputEventTime
. Otherwise it is set it toInputWatermark
. - An operator instance is a watermark source if its occurrence in the SPL graph is annotated with
@eventTime
. The SPL runtime sets its watermark mode toOutputEventTime
if the mode is not set in the constructor. Alternatively, the operator sets its watermark mode toManual
and callssetWatermark()
when needed. - Operators which buffer received a tuple before generating an output tuple (such as
Delay
,Gate
,ThreadedSplit
) set their watermark mode toOutputEventTime
. The input tuple order relative to input watermarks is lost for these operators, because their watermark is set to the time of their most recent output tuple. If tuples are emitted out of order, for example if they have arrived out of order, then they may be emitted late with respect to the operator's watermark even though they were not late on input.
Watermark forwarding operators are:
- Non-buffering operators
- Process the input one tuple at a time, such as SPL Filter or Functor. They forward watermarks as follows:
- The event-time attribute of the output tuple is copied from the input attribute with the same name and type.
- The operator’s watermark is set to the minimum watermark value received on all input ports.
- Windowed operators
- Forward watermarks as follows:
- The event-time attribute of the output tuples should be set to the event time of the end of the
window, to ensure that tuples are not emitted late with regard to the emitted watermark. If the
operator outputs a selection of the input tuples (for example, tuples where attribute
x == Max(x)
over the window), the input event-time value can be preserved by copying it into another attribute. - The operator’s watermark is set to the minimum watermark value received on all input ports.
- The event-time attribute of the output tuples should be set to the event time of the end of the
window, to ensure that tuples are not emitted late with regard to the emitted watermark. If the
operator outputs a selection of the input tuples (for example, tuples where attribute
When a timeInterval window pane triggers because of late tuples, the aggregation results are based on the contents of the pane which triggered because the operator's watermark has passed over the end of the window (complete pane) plus the late data.