Implementing generic operators
Generic operators are implemented through code generation.
The code generation logic
sits within segments that are marked by <%
and %>
and
is used to generate C++ code to augment the C++ code that sits outside
the sections. The generator code is written with the Perl language.
Here is a simple example.
cerr << "<% print 'Hello World'; %>" << endl;
This mixed-mode code simply translates into C++ code that prints a message. While this example is illustrative in terms of the code generation mechanics, it does not represent a common use case. During development of generic primitive operators, the need for code generation arises because the code needs to be customized based on the configuration of the operator instance at hand. The specific instance configuration is accessible to the code generator through a $model variable. As a result, in most cases, the generator code involves variables that depend on the $model. For instance:
<% my $kind = $model->getContext()->getKind(); %>
...
cerr << "The operator is of kind <%=$kind%>" << endl;
The code generation segments can be used throughout the code. Furthermore, variables
that are defined in one segment are available in later segments as
they are visible in the current Perl lexical scope. This example also shows
a shorthand notation for printing variables. The following two forms
are equivalent: <% print $var; %>
and <%=
$var %>
.
The model object represents an operator instance configuration, which is based on how the operator instance is configured in the SPL code. Some of the fundamental pieces of information available in the model include, but are not limited to:
- Windowing configuration, such as window type, eviction policy, trigger policy,
- Parameter configuration, such as parameter names, types, values.
- Input and output port information, such as number of ports, port tuple types.
- Output attribute assignments, such as assignment expressions, output functions.
SPL::Operator::Instance::OperatorInstance
. You
can find API documentation on Perl classes and modules in the $STREAMS_INSTALL/doc/spl/operator/code-generation-api/perl directory
after you install Teracloud®
Streams. In
addition to the classes that represent the operator instance model
and various objects that are contained within it, SPL also provides
a module that is called SPL::CodeGen.pm
, which provides
helper routines for common code generation tasks.A.B.C
, the
code for it is generated into files C.h and C.cpp under
the directory output/src/operator/A/B. Consider the following example:composite MyOp (input In; output Out) {
...
stream<Out> Out = Filter(In) {...}
}
...
stream<MyType> A = MyOp(B) {...}
The instance of the Filter operator that appears
is named as A.Out
and the code for it can be found
under the directory output/src/operator/A in
files Out.h and Out.cpp.
Parameter Handling
size
parameter:stream<MyType> Out = MyOper(In) {
param
size : pow(2,10) - pow(10,2);
...
}
To customize the generated code that is based on the value of the size
parameter,
the code generation support might be used as follows:
<%
my $sizeParam = $model->getParameterByName("size");
my $size = (not $sizeParam) ? "10" :
$sizeParam->getValueAt(0)->getCppExpression();
%>
...
int32 size = <%=$size%>;
The generated code is equivalent to int32 size = pow(2,10)
- pow(10,2);
. However, the operator model allows the compiler
to rewrite expressions, the generated code looks like int32
size = lit$0;
1. This method saves the
compiler from generating repeated code for operator instances that
differ only slightly in their parameter configurations, such as size:
10;
versus size: pow(2, 3);
.
As an example of a custom literal parameter, consider the format
parameter.
stream<MyType> Out = MyOper(In) {
param
format: txt;
...
}
In this example, txt
is a custom literal that
is defined in the operator model. The value of this parameter can
be inspected at code generation time to generate different, specialized
code. For instance:
<%
my $formatParam = $model->getParameterByName("format");
my $format = (not $formatParam) ? "csv" :
$formatParam->getValueAt(0)->getSPLExpression();
%>
...
<%if ($format eq "csv") {%>
... // C++ code for csv
<%} elsif ($format eq "txt") {%>
... // C++ code for txt
<%}%>
In the code $format
does not contain an expression
that might be embedded into the generated code, instead it contains
a value that is to be inspected by the generator code, to emit customized
C++ code.
Inspecting the SPL expression that is returned by getSPLExpression()
should only
be done if rewriteAllowed
is false
,
or the parameter has expression type Constant
. If rewriteAllowed
is true
,
the SPL compiler can create only one version of the operator, using
a runtime value to instantiate several versions. If it is done, then
the value that is returned by getSPLExpression()
corresponds
to only one of the instantiated operators. When rewriteAllowed
is
true, getSPLExpression()
should only be used for
reporting errors at compile time, and not used to generate code.
As an example of a free-form expression that can reference input tuple attributes, consider the filter parameter.
stream<uint64 id, uint32 cnt> In = ...
stream<MyType> Out = MyOper(In) {
param
filter: id !=0 && cnt < 40;
...
}
In this example, the filter parameter is configured
with the expression id !=0 && cnt < 40
.
The expression references tuple attributes from the input stream and
must be evaluated each time that a new tuple is received. The code
generator template for this operator can employ code like the following
to achieve this result:
<%
my $filterParam = $model->getParameterByName("filter");
my $filter = (not $filterParam) ? "true" :
$filterParam->getValueAt(0)->getCppExpression();
my $iport = $model->getInputPortAt(0);
my $ituple = $iport->getCppTupleName();
%>
void MY_OPERATOR::process(Tuple & tuple, uint32_t port) {
assert(port==0); // this op should have a single port
IPort0Type & <%=$ituple%> = static_cast<IPort0Type &>(tuple);
if (<%=$filter%>) {
...
}
...
}
The key point in this example is to make sure that the expression
contained in $filter
is valid when it gets emitted
as C++ code. For that purpose, a tuple with the right type and the
right variable name in scope is needed. As shown by the following
line of code in this example:
IPort0Type & <%=$ituple%> = static_cast<IPort0Type &>(tuple);
The statement performs two important functions. First, it casts
the generic Tuple
to its actual type of IPort0Type
(also
available as <%=$iport->getCppTupleType()%>
).
Second, it creates an alias that is named $ituple
,
which is the same name that is used in the $filter
expression
to refer to the input tuple. In effect, it creates a reference variable
with the right name and type, so that the $filter
expression
is valid.
An alternative, hygienic way of creating a C++ expression that
references input tuples, and is valid within the current context, is
to use the adaptCppExpression
method that is provided
by the SPL::CodeGen
module, which takes, as parameters,
the C++ expression, and the names of the tuple variables that are
in the current scope. It is illustrated as follows:
<% # assume that the 'filter' parameter is mandatory
my $filterParam = $model->getParameterByName("filter");
my $filterExpr = $filterParam->getValueAt(0)->getCppExpression();
%>
void MY_OPERATOR::process(Tuple & tuple, uint32_t port) {
if (<%=SPL::CodeGen:adaptCppExpression($filterExpr, "tuple")%>) {
...
}
}
In the example, the local variable name tuple
is
passed to the adaptCppExpression
method, so that
the tuple references in the C++ expression are tied to the local variable tuple
. If
there can be more than one port that is referenced in an expression,
then a list of tuple variable names can be passed to the adaptCppExpression
method,
one for each input port, in order and with no gaps.
Often a default value or expression is used in a code generation template when a parameter is not specified in the operator instance model. This can be achieved by introducing a simple Perl function that checks for the existence of the parameter and returns the right context-dependent expression. This technique is illustrated in the following example:
<%
sub getFilterExpr {
my $filterParam = $model->getParameterByName("filter");
return "true" unless($filterParam); # the default value is 'true'
my $expr = $filterParam->getValueAt(0)->getCppExpression();
return SPL::CodeGen:adaptCppExpression($expr, @_);
}
%>
void MY_OPERATOR::process(Tuple & tuple, uint32_t port) {
if (<%=getFilterExpr("tuple")%>) {
...
}
}
getParameter_name()
functions
for non-generic operators. The SPL::Operator
class
contains typedefs and functions to access parameter values at run time:const std::tr1::unordered_set<std::string>& getParameterNames() const
returns a set of strings that contain the names of all parameters.const ParameterValueListType& getParameterValues(std::string const & param) const
returns expression value of typeParameterValue
that can be inspected at run time to extract type information and the value of the expression. The values might have different types if no type was specified for the parameter in the operator model.const std::tr1::unordered_map<std::string, std::vector<ConstValueHandle>* >& getParameters()
returns the map from parameter name to a pointer to a list ofConstValueHandle
.
Output Assignment Handling
There are two forms of output assignments, plain assignments and assignments with output functions.
The assignment for the output stream that is named Out
in
the following SPL segment is an example of plain output assignment.
stream<uint64 id> In = ...
stream<rstring name, uint64 id> Out = MyOper(In) {
...
output
Out: id = hashCode(id), name = (rstring)id + "_id";
}
The output assignment can be implemented as follows in the code generator:
<%
my $iport = $model->getInputPortAt(0);
my $ituple = $iport->getCppTupleName();
my $oport = $model->getOutputPortAt(0);
# Get the output tuple constructor initializer based on the output
# attribute assignments that appear in the operator instance model
my $otupleInit = SPL::CodeGen::getOutputTupleCppInitializer($oport);
%>
...
void MY_OPERATOR::process(Tuple & tuple, uint32_t port)
{
IPort0Type & <%=$ituple%> = static_cast<IPort0Type &>(tuple);
...
OPort0Type otuple(<%=$otupleInit%>);
submit(otuple, 0);
}
In this example, the SPL::CodeGen::getOutputTupleCppInitializer
helper
routine creates a tuple initializer when given the output port object.
This routine simply uses all of the output assignment expressions
to create a tuple initializer. This initializer is later used to create
an output tuple of the right type and initialize it with the expressions
that appear in the attribute assignment expressions for the output port.
The generated code is as follows (some namespace qualifiers are omitted
for clarity):
OPort0Type otuple(hashCode(tuple$0.get_id()),
spl_cast<rstring>(tuple$0.get_id())+rstring("_id"));
An alternate helper routine is SPL::CodeGen::emitSubmitOutputTuple
,
which can forward a tuple directly from an input port to an output
port, if the input and output port schemas are identical, the port mutabilities
are compatible, and the output attribute assignments indicate that
the tuple is simply being forwarded from the input port to the output port.
If not, SPL::CodeGen::getOutputTupleCppInitializer
is
used to generate the output tuple. An example use from the Functor operator
follows:
void MY_OPERATOR::process(Tuple const & tuple, uint32_t port) {
IPort0Type const & <%=$ituple%> = static_cast<IPort0Type const&>(tuple);
if (<%=$filterExpr%>) {
<%SPL::CodeGen::emitSubmitOutputTuple($oport, $iport);%>
}
}
While not a recommended practice, this result might also be achieved
without employing the SPL::CodeGen
module, but by
manually traversing the output assignments, as follows:
<%
foreach my $attribute (@{$oport->getAttributes()}) {
my $name = $attribute->getName();
my $value = $attribute->getAssignmentValue();
my $valueExpr = $value->getCppExpression();
...
}
%>
Look at an output assignment with output functions, that is, output
attribute assignments that involve custom output functions. The assignment
for the output stream that is named Out
is an example
in the following SPL code segment:
stream<rstring name, uint64 salary> In = ...
stream<In> Out = Aggregate(In) {
window
In : tumbling, count(10);
output
Out : salary = Max(salary),
name = ArgMax(salary, name);
}
The following code segment illustrates how the output port object
of the operator instance model can be used to retrieve the output
function (such as ArgMax
) and the assignment value
expressions (ituple$0.get_salary()
and ituple$0.get_name()
)
for each output attribute.
<%
foreach my $attribute (@{$oport->getAttributes()}) {
my $name = $attribute->getName();
my $aggregate = $attribute->getAssignmentOutputFunctionName();
my $paramValues = $attribute->getAssignmentOutputFunctionParameterValues();
foreach my $value (@{$paramValues}) {
my $valueExpr = $value->getCppExpression();
...
}
}
%>
The following code segment verifies whether the attribute has an assignment that uses a custom output function and shows how to extract the arguments to the custom output function.
<%
foreach my $attribute (@{$oport->getAttributes()}) {
my $name = $attribute->getName();
# Does this attribute have an assignment using a custom output function?
if ($attribute->hasAssignmentWithOutputFunction()) {
# Which custom output is being referenced?
my $fcn = $attribute->getAssignmentOutputFunctionName();
# extract the arguments to the custom output function
if ($fcn eq "Max") {
# implementation of the Max function - extract the first (and only) argument
my $maxArg = $attribute->getAssignmentOutputFunctionParameterValueAt(0);
# Access the argument as a C++ expression
my $maxValue = $maxArg->getCppExpression();
# Use the argument in some manner, such as: %>
... // C++ code here using <%=$maxValue%>
<%} elsif ($fcn eq "ArgMax") {
# implementation of the ArgMax function - extract the two arguments
my $maxArg = $attribute->getAssignmentOutputFunctionParameterValueAt(0);
my $argArg = $attribute->getAssignmentOutputFunctionParameterValueAt(1);
# Access the arguments as a C++ expression
my $maxValue = $maxArg->getCppExpression();
my $argValue = $argArg->getCppExpression();
# We may need to know the C++ type of max to declare a temporary variable
my $maxCppType = $maxArg->getCppType();
# Use the arguments in some manner, such as:
... // C++ code here using <%=$maxValue%>, <%=$argValue%> and possibly <%=$maxCppType%>
<%}
} elsif ($attribute->getAssignmentValue()) {
# An output expression that doesn't use a Custom Output Expression
# Example code, assuming otuple is already declared:
my $assign = $attribute->getAssignmentValue()->getCppExpression();%>
otuple.set_<%=$name%>(<%=$assign%>);
<%} else {
# This attribute was not mentioned on the output clause
# Should we take some default action for this attribute?
}
}
%>
lit$0
is computed
at compile-time and loaded at run time.