Choosing an operator based on port mutability settings to improve performance

If multiple operators can provide similar function, choosing an operator that is based on its port mutability configurations can yield performance benefits. This performance benefit depends on the operator fan-out and the input port mutability of downstream operators.

Consider an example of a subgraph that transforms a series of tuple attributes in a linear pipeline. A tuple attribute transformation can be achieved with either a Functor or a Custom operator.

The following example shows a sample stream application doing linear pipeline transformations with the Functor operator (lines 14-16 and 18-20). The first transformation changes the value of attribute value x (line 15), and the second transformation sorts the values in the list attribute y (line 19). This implementation results in two copies, both occurring inside the Functor operator logic. Depending on the size of the processed tuple, copying can cause an increase in processing time and decrease performance.
01: namespace sample;
02: 
03: composite MainFunctor {
04:   type 
05:     Number = float64 x, list<float64> y;
06:  
07:   graph
08:     stream<Number> RandNumbers = Beacon() {
09:       output
10:         RandNumbers : x = random(), 
11:                       y = random(10); 
12:     }
13:
14:     stream<Number> NewX = Functor(RandNumbers) {
15:       output NewX : x = x + 10.0;
16:     }
17:
18:     stream<Number> NewY = Functor(NewX) {
19:       output NewY : y = sort(NewX.y);
20:     }
21:
22:     () as Sink = FileSink(NewY) {
23:       param file : "output.dat";
24:     }   
25: }
The Functor operator has a non-mutating input port and a mutating output port. Therefore, the operator implementation must always make a tuple copy (constant reference to non-constant reference).

This graphic shows the port mutability configurations for each operator in the previous application, and indicates when a tuple copy occurs in the graph. A non-mutating port is represented by i, and m represents a mutating port. A tuple copy is represented by C, and a C inside the operator indicates that a copy was made by the operator logic. A C in the stream connection indicates that an automatic copy was made by the Teracloud® Streams instance.

Figure 1. Port mutability.

This flow chart uses four circles in a line to show the port mutability configurations of the previous code application.

The flow chart identifies if a tuple copy is made by operator logic, or automatically by the Teracloud Streams instance.

The following example shows the same application with a Custom operator to transform tuples (lines 14-20 and 22-28). This implementation does not result in any copy inside the operator that does transformations, since the attribute modifications are made and then sent directly to the next operator in the pipeline.

01: namespace sample;
02: 
03: composite MainCustom {
04:   type 
05:     Number = float64 x, list<float64> y;
06:  
07:   graph
08:     stream<Number> RandNumbers = Beacon() {
09:       output
10:         RandNumbers : x = random(), 
11:                       y = random(10); 
12:     }
13:
14:     stream<Number> NewX = Custom(RandNumbers) {
15:       logic
16:         onTuple RandNumbers: {
17:           RandNumbers.x += 10.0;
18:           submit(RandNumbers, NewX);
19:         }
20:     }
21:
22:     stream<Number> NewY = Custom(NewX) {
23:       logic
24:         onTuple NewX: {
25:           sortM(NewX.y);
26:           submit(NewX,NewY);
27:         }
28:     }
29:    
30:     () as Sink = FileSink(NewY) {
31:       param file : "output.dat";
32:     }
33: }

The Custom operator has mutable input and output ports. Using static analysis, the Streams Processing Language (SPL) compiler determines whether the generated code for the Custom operator body requires a tuple copy before a submit call. This action ensures that the tuple values remain unchanged when the submitted tuple values are used later by the operator logic. If transformations occur in a linear pipeline, no copies are needed to ensure safety, and tuple values do not need to be copied. As a result, an implementation with the Custom operator results in increased application performance.

The following figure shows the equivalent flow graph for the application with the Custom operator. This application has only one copy that is made in place automatically by the Teracloud® Streams instance. The copy occurs when a tuple transfers from an operator with non-mutating output port to an operator with mutating input port (O1 to O2).

This flow chart shows the automatic tuple transfer by Teracloud Streams instance.