Windowing caveats
Window caveats apply to window implementations and configurations, development checks, memory usage, threading, and punctuation.
Non-determinism
Streams Processing Language (SPL) uses local system time for implementing windows with time-based eviction and trigger policies. This method leads to a non-deterministic behavior of the application. When determinism is important, developers can use windows with an attribute-delta eviction and trigger policy on a time stamp attribute. Using the time stamp attribute makes the operator logic to evict tuples that are based on a logical time instead of physical server time.
Compile-time and runtime checks
When you write primitive operators, you must check whether the window configuration of the operator instance is valid. Checks can be done either during compile time (C++) or run time (Java™). If the configuration is not valid, an error must be reported. One such example is the Join operator from the SPL Standard Toolkit, which reports a compiler error when configured with a tumbling window option. Not checking configuration validity can lead to undefined behavior when a window configuration is not expected.
TimeInterval windows
For panes which have overlapping intervals you should use tuple shared pointers for the template parameter, as in the following code example:
file MyOperator_h.cgt:
. . .
typedef SPL::TimeIntervalWindow<streams_boost::shared_ptr<IPort0Type>, PartitionByType> WindowType;
WindowType window_;
When the pane is destructed, the shared pointer destructor decreases the tuple pointer reference count. The tuple is deleted when there are no more panes referencing it. Simple tuple pointers work fine if the pane event-time intervals are not overlapping, which ensures that each tuple is assigned to a single pane and is deleted only once.
Memory usage
A window with a time-based eviction policy can consume great amounts of memory when the buffered time period is long and the input data rate is high. A window with an attribute-delta eviction policy can also have high memory consumption when a large quantity of tuples must be processed to satisfy the delta condition. It is often possible to lower memory consumption by adding an upstream aggregation operator. This method reduces a group of tuples into a single one, effectively trading off update frequency for reduced state usage in the downstream windowed operator. Partitioned windows can also consume great amounts of memory when the number of partitioned attribute keys is large. Users can alleviate the memory consumption problem by using a partition eviction policy to eliminate windows that are rarely used.
Memory usage caveats for timeInterval windows
(watermarkLag + intervalDuration + discardAge) * Tuple_size * Avg_tuple_throughput
- watermarkLag
- Represents the duration between the most advanced event-time of tuples inserted into the window and the operator's watermark.
- intervalDuration
- Specifies the value of the
intervalDuration
parameter of the timeInterval window clause. - discardAge
- Specifies the value of the
discardAge
parameter of the timeInterval window clause. - Tuple_size
- The size of memory allocated for one tuple.
- Avg_tuple_throughput
- Is the average throughput which is measured in tuples for each second of event-time.
Watermarks usually lag behind the most advanced tuple timestamp to allow for the normal
processing of out-of-order tuples, rather than marking them late. The lag
element
of @eventTime
defines a lower bound for the actual watermark lag, however, there is
no built-in upper bound. If the watermark falls too much behind the most advanced tuple timestamp,
the timeInterval window accumulates data as more panes wait to be triggered and the consumed memory
increases.
Threading
The C++ windowing library uses
threads to fire eviction and trigger policy events for time-based
windows. The event handling routines automatically lock the access
to window contents. Other routines can use the SPL::AutoWindowDataAcquirer
class
to safely access window data. Windowed operators must always use an SPL::AutoPortMutex
to
protect the process functions that access windows.
Punctuation
Primitive operator developers can configure input and output ports of windowed operators
according to their punctuation semantics. The following table describes how developers can
configure the punctuation semantics of an input port that use the SPL window clause. In general,
windowed operators send punctuation downstream at window bounds (for example, when a tumbling
window gets full). In that case, the operator output port is configured as
Generating
. More details on punctuation configuration can be found in the
Stream punctuation topic.
Port | Configuration | Use case |
---|---|---|
Input | Expecting |
A tumbling window with punctuation eviction
policy (punct ) is the only configuration allowed. |
Input | Oblivious |
A tumbling window with punctuation eviction policy configuration is not allowed. |
Input | WindowBound |
A tumbling window with punctuation eviction policy and other window configurations are allowed. |