Types and functions

One of the most important goals of programming languages is to enable reuse and improve readability. Languages support this goal by providing the capability to define your own types and functions.

User-defined types and functions foster reuse because they can be defined once and used multiple times. User-defined types and functions foster readability because they give a descriptive name to a concept and unclutter the code that uses it. The WordCount program example illustrates how to develop a simple streaming application that counts the lines and words in a file. The WordCount program consists of the following stream graph:

Figure 1. Stream graph of the WordCount program.

This figure is described in the surrounding text.

The FileSource operator invocation reads a file, sending lines on the Data stream. The Functor operator invocation counts the lines and words for each individual line of Data, sending the statistics on the OneLine stream. Unlike the invocation of the Functor operator in Stream processing, this invocation of the Functor is stateless; it has no side-effects or dependencies between tuples. Finally, the Counter operator invocation aggregates the statistics for all lines in the file, and prints them at the end. Before you look at the main composite operator, define some helpers. Use a type LineStat for the statistics about a line; a function countWords(rstring line) to count the words in a line; and a function addM(mutable LineStat x, LineStat y) to add two LineStat values and store the result in x. Here is the definition of these helpers:
type LineStat = tuple<int32 lines, int32 words>;
int32 countWords(rstring line) {
  return size(tokenize(line, " \t", false));
}
void addM(mutable LineStat x, LineStat y) {
  x.lines += y.lines;
  x.words += y.words;
}

You can put this code in a file called WordCount/Helpers.spl. Line 1 defines type LineStat to be a tuple with two attributes for counting lines and words. Lines 2-4 define function countWords by using the SPL standard toolkit function tokenize to split the line on spaces and tabs (" \t"), and then using the SPL standard toolkit function size to count the resulting fragments. Lines 5-8 define function addM. SPL variables are immutable by default, so you need to explicitly declare parameter x as mutable to enable the function to add values to its attributes. Having the mutable modifier in the signature of the function makes it clear to the user what kind of side-effects the function might have, and the compiler can also use this information for optimization.

Now define the main composite operator. You can put the following code in a file called WordCount/WordCount.spl.
composite WordCount {
  graph
    stream<rstring line> Data = FileSource() {
      param  file             : getSubmissionTimeValue("file");
             format           : line;
    }
    stream<LineStat> OneLine = Functor(Data) {
      output OneLine          : lines = 1, words = countWords(line);
    }
    () as Counter = Custom(OneLine) {
      logic  state            : mutable LineStat sum = { lines = 0, words = 0 };
             onTuple OneLine  : addM(sum, OneLine);
             onPunct OneLine  : if (currentPunct() == Sys.FinalMarker)
                                  println(sum);
    }
}

By this point in the tutorial, you should be able to read and understand much of this code. Note how type LineStat is used both in Line 7 as a schema for stream OneLine, and in Line 11 as a type for variable sum. Line 12 adds the statistics from the newest tuple in stream OneLine into the accumulator variable sum by using the helper function addM defined before. Lines 13-14 illustrate punctuation-handling, which is a new feature not yet described. A punctuation is a control signal that appears interleaved with the tuples on a stream. The logic onPunct OneLine clause gets triggered each time that a punctuation marker arrives on stream OneLine. If the punctuation is Sys.FinalMarker, that indicates that the end of the stream is reached. In this example, the FileSource operator sends a FinalMarker at the end of the file, and the Functor operator forwards it after it sends statistics for the last line.

Compile and run the program as a stand-alone application, as you learned in the previous sections. You need to provide an input file in the data directory, and provide the file name as a submission-time value on the command line of the stand-alone application. The program prints the total statistics to the console.

When you learn a new programming language and start writing programs in it, you are bound to encounter error messages. Some messages can be baffling because you thought your program was fine, yet the compiler objected to something in it. Therefore, a good exercise when you learn a language is to make some intentional errors, and familiarize yourself with the error messages. That way, when you see the same errors again "by accident", you are already somewhat familiar with them. So inject an error into the example program. Go to file WordCount/Helpers.spl, and remove the mutable modifier from the signature of function addM. In other words, Line 5 now reads void addM(LineStat x, LineStat y). Recompile by doing sc --data-directory data -M WordCount. You get something like the following messages:

Helpers.spl:6:11: CDISP0378E ERROR: The operand modified by '+=' must be mutable.
Helpers.spl:7:11: CDISP0378E ERROR: The operand modified by '+=' must be mutable.    

The compiler complains because the += operator tries to modify the parameter x, but it is not yet declared as mutable.

In this section, you saw how to define your own types and functions, which enables reuse and improves readability. For more information about defining your own types and functions, see Compiler reference. Types and functions form a sublanguage that you can easily learn without any other materials as prerequisites. To the contrary, they serve as the foundation for more advanced language features like the ones that are covered in the remaining sections of this tutorial.