Windowing caveats

Window caveats apply to window implementations and configurations, development checks, memory usage, threading, and punctuation.

Non-determinism

Streams Processing Language (SPL) uses local system time for implementing windows with time-based eviction and trigger policies. This method leads to a non-deterministic behavior of the application. When determinism is important, developers can use windows with an attribute-delta eviction and trigger policy on a time stamp attribute. Using the time stamp attribute makes the operator logic to evict tuples that are based on a logical time instead of physical server time.

Compile-time and runtime checks

When you write primitive operators, you must check whether the window configuration of the operator instance is valid. Checks can be done either during compile time (C++) or run time (Java™). If the configuration is not valid, an error must be reported. One such example is the Join operator from the SPL Standard Toolkit, which reports a compiler error when configured with a tumbling window option. Not checking configuration validity can lead to undefined behavior when a window configuration is not expected.

TimeInterval windows

For panes which have overlapping intervals you should use tuple shared pointers for the template parameter, as in the following code example:

file MyOperator_h.cgt:
. . .
typedef SPL::TimeIntervalWindow<streams_boost::shared_ptr<IPort0Type>, PartitionByType> WindowType;
WindowType window_;

When the pane is destructed, the shared pointer destructor decreases the tuple pointer reference count. The tuple is deleted when there are no more panes referencing it. Simple tuple pointers work fine if the pane event-time intervals are not overlapping, which ensures that each tuple is assigned to a single pane and is deleted only once.

Memory usage

A window with a time-based eviction policy can consume great amounts of memory when the buffered time period is long and the input data rate is high. A window with an attribute-delta eviction policy can also have high memory consumption when a large quantity of tuples must be processed to satisfy the delta condition. It is often possible to lower memory consumption by adding an upstream aggregation operator. This method reduces a group of tuples into a single one, effectively trading off update frequency for reduced state usage in the downstream windowed operator. Partitioned windows can also consume great amounts of memory when the number of partitioned attribute keys is large. Users can alleviate the memory consumption problem by using a partition eviction policy to eliminate windows that are rarely used.

Memory usage caveats for timeInterval windows

A timeInterval window can consume great amounts of memory when the total event-time interval covered by all window panes is long and the input data rate is high. The amount of memory is approximately equal to:

(watermarkLag + intervalDuration + discardAge) * Tuple_size * Avg_tuple_throughput

where:

watermarkLag: Represents the duration between the most advanced event-time of tuples inserted into the window and the operator's watermark.
intervalDuration: Specifies the value of the intervalDuration parameter of the timeInterval window clause.
discardAge: Specifies the value of the discardAge parameter of the timeInterval window clause.
Tuple_size: The size of memory allocated for one tuple.
Avg_tuple_throughput: Is the average throughput which is measured in tuples for each second of event-time.

Watermarks usually lag behind the most advanced tuple timestamp to allow for the normal processing of out-of-order tuples, rather than marking them late. The lag element of @eventTime defines a lower bound for the actual watermark lag, however, there is no built-in upper bound. If the watermark falls too much behind the most advanced tuple timestamp, the timeInterval window accumulates data as more panes wait to be triggered and the consumed memory increases.

Threading

The C++ windowing library uses threads to fire eviction and trigger policy events for time-based windows. The event handling routines automatically lock the access to window contents. Other routines can use the SPL::AutoWindowDataAcquirer class to safely access window data. Windowed operators must always use an SPL::AutoPortMutex to protect the process functions that access windows.

Punctuation

Primitive operator developers can configure input and output ports of windowed operators according to their punctuation semantics. The following table describes how developers can configure the punctuation semantics of an input port that use the SPL window clause. In general, windowed operators send punctuation downstream at window bounds (for example, when a tumbling window gets full). In that case, the operator output port is configured as Generating. More details on punctuation configuration can be found in the Stream punctuation topic.


Port	Configuration	Use case
Input	`Expecting`	A tumbling window with punctuation eviction policy (`punct`) is the only configuration allowed.
Input	`Oblivious`	A tumbling window with punctuation eviction policy configuration is not allowed.
Input	`WindowBound`	A tumbling window with punctuation eviction policy and other window configurations are allowed.