Operators implemented in C++

There are two types of C++ primitive operators: Non-generic and generic. Non-generic operators are written entirely in C++, and generic operators use a mixed-mode templating system.

Mixed-mode uses Perl code to generate the correct C++ code. By using the compile-time Perl API available in mixed-mode, SPL programmers can define operators that are fully type parametric (they can accept and manipulate any kind of stream type). Writing operator templates in mixed-mode is the most powerful means to develop operators. All of the operators in the SPL Standard Toolkit are defined as mixed-mode.

Non-generic operators do not use mixed-mode. There are two kinds of non-generic primitive operators: Hardcoded operators that use only the attribute-based API generated by the complier, and operators that use runtime reflection to inspect the type of tuples. Hardcoded primitives use the attribute-based API to access and manipulate tuples. The compiler generates an attribute-based API that provides getter and setter member functions in the form of get_<attribute_name>() and set_<attribute_name>(). Knowing the exact attribute name is necessary and is why such primitives are considered hardcoded. Developers must hardcode the tuple type into the operator definition. (The tuple type is the set of all of the attributes of a tuple, including both attribute names and attribute types.) The SPL compiler generates member functions that are based on the type as specified in the SPL code.

The advantage of hardcoded, non-generic primitive operators is that they are the simplest primitive operator to write. These operators are implemented purely in C++ with no considerations to generality. The main disadvantage is that these primitive operators reuse is limited since they can be only used with a fixed set of types. Primitive operators that do not use mixed-mode, including hardcoded operators, are further limited in the kinds of parameters that they can accept. Hardcoded operator parameters are accessed with the getter member function getParameter_<param_name>(). The parameters must be a single value and be an attribute-free expression.

The reflection-based API allows developers to have some generality in their implementation while still using only C++. This generality is limited to tuple types only; the number of input and output ports is still fixed. The reflection-based API enables runtime inspection of tuples. In a primitive operator, developers can inspect tuples at run time to determine the number, name, and type of their attributes. Operator developers can then use this information to implement their operators to handle any arbitrary type.

However, there are more levels of indirection in accessing tuple attributes in this manner. As a consequence, there is a performance penalty when compared to the attribute-based API. Additionally, some errors in using the reflection-based API (such as trying to access an attribute name that does not exist) will only manifest at run time. The reflection-based API has the same parameter constraints as hardcoded primitives. While the reflection-based API is always available, if developers want to truly create a primitive operator that is generic regarding types, they must be careful to avoid hardcoding any assumptions about the number of attributes, their names, or types.

Generic operators are implemented in mixed-mode. Based on the streams that are invoked with an operator, the specially designated Perl sections are a template system that tells the SPL compiler how to customize the operator. For example, the Perl API has functions that indicate the number of output ports, which developers use to specify in Perl what C++ code is generated based on this number. Implementing mixed-mode operators is an example of meta-programming. Meta-programming is the practice of writing code to specify what code to generate at compile time that is based on the particular usage of a component. Other examples of meta-programming are using templates in C++ and macros in Lisp dialects. Similar to other kinds of meta-programming, implementing operators in mixed-mode is more complicated. Implementing a fully generic operator requires developers to think about their operator beyond what is required to implement one version of the operator.

However, this complexity also allows for significantly more capabilities. A fully generic operator that can be used with any arbitrary type, and with an arbitrary number of stream inputs and outputs, is a more useful operator than one that is constrained in any of those dimensions. Additionally, mixed-mode operators are specialized based on their use at compile time; there is no runtime penalty with this level of generality. The Perl compile-time API also has functions that can inspect the number, names, and types of attributes of a tuple. These functions enable developers to write code that generates direct calls to those attributes. Another advantage of implementing a mixed-mode operator is that both operator parameters and output assignments can be arbitrary expressions, and outputs can support custom output functions.