Implementing generic operators

Generic operators are implemented through code generation.

The code generation logic sits within segments that are marked by <% and %> and is used to generate C++ code to augment the C++ code that sits outside the sections. The generator code is written with the Perl language. Here is a simple example.

cerr << "<% print 'Hello World'; %>" << endl;

This mixed-mode code simply translates into C++ code that prints a message. While this example is illustrative in terms of the code generation mechanics, it does not represent a common use case. During development of generic primitive operators, the need for code generation arises because the code needs to be customized based on the configuration of the operator instance at hand. The specific instance configuration is accessible to the code generator through a $model variable. As a result, in most cases, the generator code involves variables that depend on the $model. For instance:

<% my $kind = $model->getContext()->getKind(); %>
...  
cerr << "The operator is of kind <%=$kind%>" << endl;

The code generation segments can be used throughout the code. Furthermore, variables that are defined in one segment are available in later segments as they are visible in the current Perl lexical scope. This example also shows a shorthand notation for printing variables. The following two forms are equivalent: <% print $var; %> and <%= $var %>.

The model object represents an operator instance configuration, which is based on how the operator instance is configured in the SPL code. Some of the fundamental pieces of information available in the model include, but are not limited to:

  • Windowing configuration, such as window type, eviction policy, trigger policy,
  • Parameter configuration, such as parameter names, types, values.
  • Input and output port information, such as number of ports, port tuple types.
  • Output attribute assignments, such as assignment expressions, output functions.
The model object, represented by the variable $model in the generator code, is of type SPL::Operator::Instance::OperatorInstance. You can find API documentation on Perl classes and modules in the $STREAMS_INSTALL/doc/spl/operator/code-generation-api/perl directory after you install Teracloud® Streams. In addition to the classes that represent the operator instance model and various objects that are contained within it, SPL also provides a module that is called SPL::CodeGen.pm, which provides helper routines for common code generation tasks.
The generated operator code for an operator instance can be found under the application directory, at output/src/operator. The code generation takes place when an application that uses the operator is built. For an operator instance named A.B.C, the code for it is generated into files C.h and C.cpp under the directory output/src/operator/A/B. Consider the following example:
composite MyOp (input In; output Out) {
  ...  
  stream<Out> Out = Filter(In) {...}
}
...
stream<MyType> A = MyOp(B) {...}

The instance of the Filter operator that appears is named as A.Out and the code for it can be found under the directory output/src/operator/A in files Out.h and Out.cpp.

Parameter Handling

There are three different forms of parameters, depending on the kind of expression that is allowed as values. The types are attribute-free expressions, custom literals, and expressions that reference input tuple attributes. As an example of an attribute-free parameter, consider the size parameter:
stream<MyType> Out = MyOper(In) {
  param
    size : pow(2,10) - pow(10,2);
  ...  
}

To customize the generated code that is based on the value of the size parameter, the code generation support might be used as follows:

<%
  my $sizeParam = $model->getParameterByName("size");
  my $size = (not $sizeParam) ? "10" :
    $sizeParam->getValueAt(0)->getCppExpression();
%>
...
int32 size = <%=$size%>;

The generated code is equivalent to int32 size = pow(2,10) - pow(10,2);. However, the operator model allows the compiler to rewrite expressions, the generated code looks like int32 size = lit$0;1. This method saves the compiler from generating repeated code for operator instances that differ only slightly in their parameter configurations, such as size: 10; versus size: pow(2, 3);.

As an example of a custom literal parameter, consider the format parameter.

stream<MyType> Out = MyOper(In) {
  param
    format: txt;
  ...
}

In this example, txt is a custom literal that is defined in the operator model. The value of this parameter can be inspected at code generation time to generate different, specialized code. For instance:

<%
  my $formatParam = $model->getParameterByName("format");
  my $format = (not $formatParam) ? "csv" :
    $formatParam->getValueAt(0)->getSPLExpression();
%>
...
<%if ($format eq "csv") {%>
  ... // C++ code for csv
<%} elsif ($format eq "txt") {%>
  ... // C++ code for txt
<%}%>

In the code $format does not contain an expression that might be embedded into the generated code, instead it contains a value that is to be inspected by the generator code, to emit customized C++ code.

Inspecting the SPL expression that is returned by getSPLExpression() should only be done if rewriteAllowed is false, or the parameter has expression type Constant. If rewriteAllowed is true, the SPL compiler can create only one version of the operator, using a runtime value to instantiate several versions. If it is done, then the value that is returned by getSPLExpression() corresponds to only one of the instantiated operators. When rewriteAllowed is true, getSPLExpression() should only be used for reporting errors at compile time, and not used to generate code.

As an example of a free-form expression that can reference input tuple attributes, consider the filter parameter.

stream<uint64 id, uint32 cnt> In = ...
stream<MyType> Out = MyOper(In) {
  param
    filter: id !=0 && cnt < 40;
  ...
}

In this example, the filter parameter is configured with the expression id !=0 && cnt < 40. The expression references tuple attributes from the input stream and must be evaluated each time that a new tuple is received. The code generator template for this operator can employ code like the following to achieve this result:

<%
  my $filterParam = $model->getParameterByName("filter");
  my $filter = (not $filterParam) ? "true" :
    $filterParam->getValueAt(0)->getCppExpression();
  my $iport = $model->getInputPortAt(0);
  my $ituple = $iport->getCppTupleName();
%>
void MY_OPERATOR::process(Tuple & tuple, uint32_t port) {
  assert(port==0); // this op should have a single port
  IPort0Type & <%=$ituple%> = static_cast<IPort0Type &>(tuple);
  if (<%=$filter%>) {
    ...
  }
  ...
}

The key point in this example is to make sure that the expression contained in $filter is valid when it gets emitted as C++ code. For that purpose, a tuple with the right type and the right variable name in scope is needed. As shown by the following line of code in this example:

IPort0Type & <%=$ituple%> = static_cast<IPort0Type &>(tuple);

The statement performs two important functions. First, it casts the generic Tuple to its actual type of IPort0Type (also available as <%=$iport->getCppTupleType()%>). Second, it creates an alias that is named $ituple, which is the same name that is used in the $filter expression to refer to the input tuple. In effect, it creates a reference variable with the right name and type, so that the $filter expression is valid.

An alternative, hygienic way of creating a C++ expression that references input tuples, and is valid within the current context, is to use the adaptCppExpression method that is provided by the SPL::CodeGen module, which takes, as parameters, the C++ expression, and the names of the tuple variables that are in the current scope. It is illustrated as follows:

<% # assume that the 'filter' parameter is mandatory
  my $filterParam = $model->getParameterByName("filter");
  my $filterExpr = $filterParam->getValueAt(0)->getCppExpression();
%>
void MY_OPERATOR::process(Tuple & tuple, uint32_t port) {
  if (<%=SPL::CodeGen:adaptCppExpression($filterExpr, "tuple")%>) {
    ...
  }
}

In the example, the local variable name tuple is passed to the adaptCppExpression method, so that the tuple references in the C++ expression are tied to the local variable tuple. If there can be more than one port that is referenced in an expression, then a list of tuple variable names can be passed to the adaptCppExpression method, one for each input port, in order and with no gaps.

Often a default value or expression is used in a code generation template when a parameter is not specified in the operator instance model. This can be achieved by introducing a simple Perl function that checks for the existence of the parameter and returns the right context-dependent expression. This technique is illustrated in the following example:


<%
  sub getFilterExpr {
    my $filterParam = $model->getParameterByName("filter");
    return "true" unless($filterParam); # the default value is 'true'
    my $expr = $filterParam->getValueAt(0)->getCppExpression();
    return SPL::CodeGen:adaptCppExpression($expr, @_);
  }
%>
void MY_OPERATOR::process(Tuple & tuple, uint32_t port) {
  if (<%=getFilterExpr("tuple")%>) {
    ...
  }
}
Parameters with expressions that do not involve stream tuples or attributes, and whose values are only known at run time, are available at run time in a form that is based on the getParameter_name() functions for non-generic operators. The SPL::Operator class contains typedefs and functions to access parameter values at run time:
  • const std::tr1::unordered_set<std::string>& getParameterNames() const returns a set of strings that contain the names of all parameters.
  • const ParameterValueListType& getParameterValues(std::string const & param) const returns expression value of type ParameterValue that can be inspected at run time to extract type information and the value of the expression. The values might have different types if no type was specified for the parameter in the operator model.
  • const std::tr1::unordered_map<std::string, std::vector<ConstValueHandle>* >& getParameters() returns the map from parameter name to a pointer to a list of ConstValueHandle.
These interfaces are also available for non-generic operators.

Output Assignment Handling

There are two forms of output assignments, plain assignments and assignments with output functions.

The assignment for the output stream that is named Out in the following SPL segment is an example of plain output assignment.

stream<uint64 id> In = ...
stream<rstring name, uint64 id> Out = MyOper(In) {
  ...
  output
    Out: id = hashCode(id), name = (rstring)id + "_id";
}

The output assignment can be implemented as follows in the code generator:

<%
  my $iport = $model->getInputPortAt(0);
  my $ituple = $iport->getCppTupleName();
  my $oport = $model->getOutputPortAt(0);
  # Get the output tuple constructor initializer based on the output
  # attribute assignments that appear in the operator instance model
  my $otupleInit = SPL::CodeGen::getOutputTupleCppInitializer($oport);
%>
...
void MY_OPERATOR::process(Tuple & tuple, uint32_t port)
{
  IPort0Type & <%=$ituple%> = static_cast<IPort0Type &>(tuple);
  ...
  OPort0Type otuple(<%=$otupleInit%>);
  submit(otuple, 0);
}

In this example, the SPL::CodeGen::getOutputTupleCppInitializer helper routine creates a tuple initializer when given the output port object. This routine simply uses all of the output assignment expressions to create a tuple initializer. This initializer is later used to create an output tuple of the right type and initialize it with the expressions that appear in the attribute assignment expressions for the output port. The generated code is as follows (some namespace qualifiers are omitted for clarity):

OPort0Type otuple(hashCode(tuple$0.get_id()),
                  spl_cast<rstring>(tuple$0.get_id())+rstring("_id"));

An alternate helper routine is SPL::CodeGen::emitSubmitOutputTuple, which can forward a tuple directly from an input port to an output port, if the input and output port schemas are identical, the port mutabilities are compatible, and the output attribute assignments indicate that the tuple is simply being forwarded from the input port to the output port. If not, SPL::CodeGen::getOutputTupleCppInitializer is used to generate the output tuple. An example use from the Functor operator follows:

void MY_OPERATOR::process(Tuple const & tuple, uint32_t port) {
  IPort0Type const & <%=$ituple%> = static_cast<IPort0Type const&>(tuple);
  if (<%=$filterExpr%>) {
    <%SPL::CodeGen::emitSubmitOutputTuple($oport, $iport);%>
  }
}

While not a recommended practice, this result might also be achieved without employing the SPL::CodeGen module, but by manually traversing the output assignments, as follows:

<%
  foreach my $attribute (@{$oport->getAttributes()}) {          
    my $name = $attribute->getName();
    my $value = $attribute->getAssignmentValue();
    my $valueExpr = $value->getCppExpression();
    ...
  }
%>

Look at an output assignment with output functions, that is, output attribute assignments that involve custom output functions. The assignment for the output stream that is named Out is an example in the following SPL code segment:

stream<rstring name, uint64 salary> In = ...
stream<In> Out = Aggregate(In) {
  window
    In : tumbling, count(10);
  output
    Out : salary = Max(salary),
         name = ArgMax(salary, name);
}

The following code segment illustrates how the output port object of the operator instance model can be used to retrieve the output function (such as ArgMax) and the assignment value expressions (ituple$0.get_salary() and ituple$0.get_name()) for each output attribute.

<%
  foreach my $attribute (@{$oport->getAttributes()}) {
    my $name = $attribute->getName();
    my $aggregate = $attribute->getAssignmentOutputFunctionName();
    my $paramValues = $attribute->getAssignmentOutputFunctionParameterValues();
    foreach my $value (@{$paramValues}) {
      my $valueExpr = $value->getCppExpression();
      ...
    }
  }
%>

The following code segment verifies whether the attribute has an assignment that uses a custom output function and shows how to extract the arguments to the custom output function.

<%
  foreach my $attribute (@{$oport->getAttributes()}) {
    my $name = $attribute->getName();
    # Does this attribute have an assignment using a custom output function?
    if ($attribute->hasAssignmentWithOutputFunction()) {
      # Which custom output is being referenced?
      my $fcn = $attribute->getAssignmentOutputFunctionName();
      # extract the arguments to the custom output function
      if ($fcn eq "Max") {
        # implementation of the Max function - extract the first (and only) argument
        my $maxArg = $attribute->getAssignmentOutputFunctionParameterValueAt(0);
        # Access the argument as a C++ expression
        my $maxValue = $maxArg->getCppExpression();
        # Use the argument in some manner, such as: %>
        ... // C++ code here using <%=$maxValue%>
      <%} elsif ($fcn eq "ArgMax") {
        # implementation of the ArgMax function - extract the two arguments
        my $maxArg = $attribute->getAssignmentOutputFunctionParameterValueAt(0);
        my $argArg = $attribute->getAssignmentOutputFunctionParameterValueAt(1);
        # Access the arguments as a C++ expression
        my $maxValue = $maxArg->getCppExpression();
        my $argValue = $argArg->getCppExpression();
        # We may need to know the C++ type of max to declare a temporary variable
        my $maxCppType = $maxArg->getCppType();
        # Use the arguments in some manner, such as:
        ... // C++ code here using <%=$maxValue%>, <%=$argValue%> and possibly <%=$maxCppType%>
      <%}
    } elsif ($attribute->getAssignmentValue()) {
        # An output expression that doesn't use a Custom Output Expression
        # Example code, assuming otuple is already declared:
        my $assign = $attribute->getAssignmentValue()->getCppExpression();%>
        otuple.set_<%=$name%>(<%=$assign%>);
    <%} else {
          # This attribute was not mentioned on the output clause
          # Should we take some default action for this attribute?
    }
  }
%>
1 The value of lit$0 is computed at compile-time and loaded at run time.