Code generation features
The SPL compiler is an incremental compiler, which generates C++ code.
The SPL compiler generates C++ code in the fully expanded main composite for the following entities in an SPL program:
- Tuple and enum types
- Primitive operator instances (after the composite expansion is done)
- SPL functions
It also generates code for partitions (that map to PEs) and the stand-alone application.
type
Person = tuple<rsrting name, uint32 age>;
Student = tuple<rstring name, uint32 age>;
Employee = tuple<rstring name, uint32 age, uint64 salary>;
In these examples, the first two types (Person
and Student
)
share their generated C++ class code and the third type (Employee
)
has different code that is generated for it.
spl.relational::Join
and
the operator configuration that is provided in the SPL source (after
the expansion) define the identity of an operator instance from the
code generation perspective. The SPL compiler generates a C++ class
for each operator instance that it considers as unique. Several instances
of an operator type can share their code, depending on their operator
instance configuration. Sharing code reduces compilation time significantly
in certain scenarios. To make it possible, the SPL compiler folds
constants and rewrites them into runtime constant literals. Runtime
constant literals are special identifiers that represent variables
whose values are loaded once at run time and never changed. In effect,
they normalize expressions into a common form, such that the generated
code can be shared, and the behavior can be customized during runtime
initialization. For instance, the following example results in only
two classes generated:stream<rstring name> S1 = FileSource() { param file: "a.txt"; }
stream<rstring name> S2 = FileSource() { param file: "b.txt"; }
stream<int32 value> S3 = FileSource() { param file: "c.txt"; }
In this example, operator instances S1
and S2
share their code, and are parameterized differently during runtime initialization. S3
however, has a different output tuple type, and does not share its code with S1
or S2
. This type of code sharing is also applied when the involved expressions contain subexpressions that can be folded into constants and converted into runtime constant literals, such as (x+3)*(y-3)
versus (x+pow(2,3))*(y-4)
. From a code-generation perspective, these expressions are equivalent when they appear in an operator configuration, and generate code like (x+lit$0)*(y-lit$1)
, with different bindings for the runtime constant literals (lit$0
and lit$1
) established during runtime initialization.
The Import
and Export
operators
are special and the sc compiler does not generate
any C++ code for them.
SPL functions translate into C++ functions, one-to-one. All overloaded functions with the same name in a namespace are generated into the same file.
When an application is compiled for the first time, future compilations that follow modifications that are made to the SPL source code will take place in an incremental fashion. In many cases, the SPL compiler generates only C++ code for a limited subset of SPL entities that changed. In other cases, no C++ code is generated and only the application bundle file is updated. In general, these incremental compilations are fast, whereas the initial compilation (from scratch) of the SPL application takes more time. For speeding up the latter, the SPL compiler provides the -r, --num-distcc-remote-hosts option. This option can be used for enabling distributed compilation across multiple hosts. The -r option specifies the number of remote hosts to be used, which are picked up using the hosts with the build tag reported by the streamtool lsavailablehosts command.