Merging or joining the streams after splitting

After you split the computation load within or across processing elements, you must merge the data streams. If you split the load with user-defined parallelism, this step is done for you by Teracloud® Streams.

The most effective way to merge streams after splitting is by feeding them into the downstream operator. However, there is a case for which you cannot do this. Consider a slightly modified version of the SplitParallelizer composite that has an output stream (instead of Sink operators) :

namespace sample;
composite SplitParallelizer(output out; input in) {
  param    
    operator $source; 
    operator $body;   
    type     $typeIn;
    type     $typeOut;
  graph

    stream<$typeIn> Src = $source(in) {} 

    (stream<$typeIn> Splitted0   
     <%for(my $i=1; $i<4; ++$i) {%>
       ,stream<$typeIn> Splitted<%=$i%> 
    <%}%>) = ThreadedSplit(Src) {} 
 
    <%for(my $i=0; $i<4; ++$i) {%>   
      stream<$typeOut> Processed<%=$i%>
      = $body(Splitted<%=$i%>)   
    <%}%>

    stream<$typeOut> Processed
     = Filter(Processed0
    <%for(my $i=1; $i<4; ++$i) {%>   
      , Processed<%=$i%>  
    <%}%>) {}   
} 

The preferred operator to merge or join the split streams is a Filter operator. Unlike the Functor and Union operators, the Filter operator does not copy the input tuples; it forwards them to the output stream.