Types and functions

Try out user-defined SPL types and functions to improve code reuse and readability.

In this tutorial, you will:

  1. Write an SPL file defining a custom type and two functions
  2. Write a stream application that:
    1. Reads the contents of a file specified at run time
    2. Counts the number of lines and words using the custom type and functions
  3. Compile the application using the Streams Compiler (sc)
  4. Run the application

Before you begin

Procedure

  1. Set up your environment for Streams.

    In your command terminal, source the streamsprofile.sh file under Streams installation directory. For example:

    source streams-install-directory/bin/streamsprofile.sh
  2. Create a directory called wordcount.
    For example, use the following commands:
    mkdir wordcount
    cd wordcount
  3. Using your text editor, create a file named Helpers.spl in the wordcount directory.
  4. Paste the following code into your file and save it:
        type LineStat = tuple<int32 lines, int32 words>;
    
        int32 countWords(rstring line) {
          return size(tokenize(line, " \t", false));
        }
        void addM(mutable LineStat x, LineStat y) {
          x.lines += y.lines;
          x.words += y.words;
        }

    This code does the following:

    1. Defines a type LineState to be a tuple with two attributes for counting lines and words.
    2. Defines a function countWords that takes a string as input and returns a 32-byte integer of the number of tokens found after tokenizing the input by space and tab characters.

      The size and tokenize functions are provided by the SPL standard toolkit.

    3. Defines a function addM that takes two LineStat arguments, adds their attributes together, and stores the result into the first LineStat argument.
      Note: SPL variables are immutable by default. The mutable keyword enables a variable to be modifiable and explicitly communicates how a function may affect arguments passed in.
  5. Using your text editor, create a file named WordCount.spl in the wordcount directory.
  6. Paste the following code into your file and save it:
    composite WordCount {
      graph
        stream<rstring line> Data = FileSource() {
          param  format: line;
                 file:   getSubmissionTimeValue("file");
                 
        }
        stream<LineStat> OneLine = Functor(Data) {
          output 
            OneLine: lines = 1, words = countWords(line);
        }
        () as Counter = Custom(OneLine) {
          logic
            state:           mutable LineStat sum = { lines = 0, words = 0 };
            onTuple OneLine: addM(sum, OneLine);
            onPunct OneLine:
              if (currentPunct() == Sys.FinalMarker) {
                println(sum);
              }
        }
    }

    This SPL code does the following:

    1. Declares a main composite operator called WordCount.
    2. Starts a data flow graph with the graph clause.
    3. Invokes a FileSource operator to produce a stream called Data whose tuples have one attribute, line, of type rstring. The invocation is configured to:
      1. Read one line at a time (format: line)
      2. From a file specified via a file parameter at submission-time (file: getSubmissionTimeValue("file"))
    4. Invokes a Functor operator which reads from stream Data and produces a stream called OneLine. The OneLine stream has LineStat as its schema (which you defined containing a lines and a words attribute). The invocation is configured to output to stream OneLine tuples with lines set to 1 (because each tuple from Data is already a single line) and words set to the result of the countWords function.
    5. Invokes a Custom operator which reads from stream OneLine. The invocation does not output any streams evident by the () and is named Counter. Counter is configured to:
      1. Maintain state of a LineStat stored in variable sum initialized to 0.
      2. Call the addM function with sum and the tuple from OneLine when one arrives.
      3. Check the current punctuation to see if it is a final marker (end of stream); if it is, print value of sum.
  7. Compile the code.

    In the wordcount directory, run the following command:

    sc -M WordCount

    The command instructs the SPL compiler to create a stream application from the WordCount main composite operator.

  8. Run the application, specifying a file to count the lines and words:
    ./output/bin/standalone file="$PWD/WordCount.spl"

    The program prints out the lines and words of our source file.

    Note: Since a data directory was not specified, an absolute path to the file to process must be provided.

What to do next

Continue to the Composite operator tutorial to learn how to create composite operators to further improve code reuse and readability.