Introducing watermarks

Watermarks flow in a data stream and carry a time value. They provide a metric of event-time progress in the stream.

In the context of an:

Input stream
A watermark with value timeX indicates that all tuples with event time less than timeX have been received.
Output stream
A watermark with value timeX indicates that all tuples with event time less than timeX have been submitted.
Note:
  • A watermark with value timeX received on an input port indicates that the current event time in that stream is timeX and no more tuples with an event time older than timeX will be received on that port.
  • A watermark is only an estimate of completeness. If the event-time value is set before ingestion by Stream applications, tuples with event times earlier than timeX might be received after a watermark with value timeX because data sources may be temporarily disconnected, for example, sensors could emit data in bursts to conserve power or mobile devices could be out of service coverage.