Code generation features

The SPL compiler is an incremental compiler, which generates C++ code.

The SPL compiler generates C++ code in the fully expanded main composite for the following entities in an SPL program:

Tuple and enum types
Primitive operator instances (after the composite expansion is done)
SPL functions

It also generates code for partitions (that map to PEs) and the stand-alone application.

The SPL compiler generates C++ code for each unique tuple and enum type in the SPL source. A tuple type is identified by the names and types of its attributes, and their order. The type alias that is assigned to the tuple type in the SPL program does not factor into its identity from the code generation perspective. The generated type names have mangled names, instead. For an enum, the list of enumeration literals and their order identifies the enum. A few examples are given as follows:

type
  Person = tuple<rsrting name, uint32 age>;
  Student = tuple<rstring name, uint32 age>;
  Employee = tuple<rstring name, uint32 age, uint64 salary>;

In these examples, the first two types (Person and Student) share their generated C++ class code and the third type (Employee) has different code that is generated for it.

For primitive operator instances, the operator kind, for example, spl.relational::Join and the operator configuration that is provided in the SPL source (after the expansion) define the identity of an operator instance from the code generation perspective. The SPL compiler generates a C++ class for each operator instance that it considers as unique. Several instances of an operator type can share their code, depending on their operator instance configuration. Sharing code reduces compilation time significantly in certain scenarios. To make it possible, the SPL compiler folds constants and rewrites them into runtime constant literals. Runtime constant literals are special identifiers that represent variables whose values are loaded once at run time and never changed. In effect, they normalize expressions into a common form, such that the generated code can be shared, and the behavior can be customized during runtime initialization. For instance, the following example results in only two classes generated:

stream<rstring name> S1 = FileSource() { param file: "a.txt"; }
stream<rstring name> S2 = FileSource() { param file: "b.txt"; }
stream<int32 value> S3 = FileSource() { param file: "c.txt"; }

In this example, operator instances S1 and S2 share their code, and are parameterized differently during runtime initialization. S3 however, has a different output tuple type, and does not share its code with S1 or S2. This type of code sharing is also applied when the involved expressions contain subexpressions that can be folded into constants and converted into runtime constant literals, such as (x+3)*(y-3) versus (x+pow(2,3))*(y-4). From a code-generation perspective, these expressions are equivalent when they appear in an operator configuration, and generate code like (x+lit$0)*(y-lit$1), with different bindings for the runtime constant literals (lit$0 and lit$1) established during runtime initialization.

The Import and Export operators are special and the sc compiler does not generate any C++ code for them.

SPL functions translate into C++ functions, one-to-one. All overloaded functions with the same name in a namespace are generated into the same file.

When an application is compiled for the first time, future compilations that follow modifications that are made to the SPL source code will take place in an incremental fashion. In many cases, the SPL compiler generates only C++ code for a limited subset of SPL entities that changed. In other cases, no C++ code is generated and only the application bundle file is updated. In general, these incremental compilations are fast, whereas the initial compilation (from scratch) of the SPL application takes more time. For speeding up the latter, the SPL compiler provides the -r, --num-distcc-remote-hosts option. This option can be used for enabling distributed compilation across multiple hosts. The -r option specifies the number of remote hosts to be used, which are picked up using the hosts with the build tag reported by the streamtool lsavailablehosts command.