Types and functions
One of the most important goals of programming languages is to enable reuse and improve readability. Languages support this goal by providing the capability to define your own types and functions.
User-defined types and functions foster reuse because
they can be defined once and used multiple times. User-defined types
and functions foster readability because they give a descriptive name
to a concept and unclutter the code that uses it. The WordCount program
example illustrates how to develop a simple streaming application
that counts the lines and words in a file. The WordCount
program
consists of the following stream graph:

Data
stream. The Functor operator
invocation counts the lines and words for each individual line of Data
,
sending the statistics on the OneLine
stream. Unlike
the invocation of the Functor operator in Stream processing,
this invocation of the Functor is stateless; it
has no side-effects or dependencies between tuples. Finally, the Counter operator
invocation aggregates the statistics for all lines in the file, and
prints them at the end. Before you look at the main composite operator,
define some helpers. Use a type LineStat
for the
statistics about a line; a function countWords(rstring line)
to
count the words in a line; and a function addM(mutable LineStat
x, LineStat y)
to add two LineStat
values
and store the result in x
. Here is the definition
of these helpers:type LineStat = tuple<int32 lines, int32 words>;
int32 countWords(rstring line) {
return size(tokenize(line, " \t", false));
}
void addM(mutable LineStat x, LineStat y) {
x.lines += y.lines;
x.words += y.words;
}
You can put this code in a file called WordCount/Helpers.spl.
Line 1 defines type LineStat
to be a tuple with two
attributes for counting lines and words. Lines 2-4 define function countWords
by
using the SPL standard toolkit function tokenize
to
split the line on spaces and tabs (" \t")
, and then
using the SPL standard toolkit function size
to
count the resulting fragments. Lines 5-8 define function addM
.
SPL variables are immutable by default, so you need to explicitly
declare parameter x
as mutable to enable the function
to add values to its attributes. Having the mutable
modifier
in the signature of the function makes it clear to the user what kind
of side-effects the function might have, and the compiler can also
use this information for optimization.
composite WordCount {
graph
stream<rstring line> Data = FileSource() {
param file : getSubmissionTimeValue("file");
format : line;
}
stream<LineStat> OneLine = Functor(Data) {
output OneLine : lines = 1, words = countWords(line);
}
() as Counter = Custom(OneLine) {
logic state : mutable LineStat sum = { lines = 0, words = 0 };
onTuple OneLine : addM(sum, OneLine);
onPunct OneLine : if (currentPunct() == Sys.FinalMarker)
println(sum);
}
}
By this point in the tutorial, you should be able
to read and understand much of this code. Note how type LineStat
is
used both in Line 7 as a schema for stream OneLine
,
and in Line 11 as a type for variable sum
. Line 12
adds the statistics from the newest tuple in stream OneLine
into
the accumulator variable sum
by using the helper
function addM
defined before. Lines 13-14 illustrate
punctuation-handling, which is a new feature not yet described. A punctuation is
a control signal that appears interleaved with the tuples on a stream.
The logic onPunct OneLine
clause gets triggered each
time that a punctuation marker arrives on stream OneLine
.
If the punctuation is Sys.FinalMarker
, that indicates
that the end of the stream is reached. In this example, the FileSource operator
sends a FinalMarker
at the end of the file, and the Functor operator
forwards it after it sends statistics for the last line.
Compile and run the program as a stand-alone application, as you learned in the previous sections. You need to provide an input file in the data directory, and provide the file name as a submission-time value on the command line of the stand-alone application. The program prints the total statistics to the console.
When you learn a new programming language and start writing programs in it, you are bound to
encounter error messages. Some messages can be baffling because you thought your program was
fine, yet the compiler objected to something in it. Therefore, a good exercise when you
learn a language is to make some intentional errors, and familiarize yourself with the error
messages. That way, when you see the same errors again "by accident", you are already
somewhat familiar with them. So inject an error into the example program. Go to file
WordCount/Helpers.spl, and remove the mutable
modifier from the signature of function addM
. In other words, Line 5 now
reads void addM(LineStat x, LineStat y)
. Recompile by doing
sc --data-directory data -M WordCount
. You get something like the
following messages:
Helpers.spl:6:11: CDISP0378E ERROR: The operand modified by '+=' must be mutable.
Helpers.spl:7:11: CDISP0378E ERROR: The operand modified by '+=' must be mutable.
The
compiler complains because the +=
operator tries
to modify the parameter x
, but it is not yet declared
as mutable.
In this section, you saw how to define your own types and functions, which enables reuse and improves readability. For more information about defining your own types and functions, see Compiler reference. Types and functions form a sublanguage that you can easily learn without any other materials as prerequisites. To the contrary, they serve as the foundation for more advanced language features like the ones that are covered in the remaining sections of this tutorial.