Types and functions
Try out user-defined SPL types and functions to improve code reuse and readability.
In this tutorial, you will:
- Write an SPL file defining a custom type and two functions
- Write a stream application that:
- Reads the contents of a file specified at run time
- Counts the number of lines and words using the custom type and functions
- Compile the application using the Streams Compiler (
sc
) - Run the application
Before you begin
- Download and install Teracloud® Streams
- Open a Bash command terminal
- Open a text editor
Procedure
-
Set up your environment for Streams.
In your command terminal, source the streamsprofile.sh file under Streams installation directory. For example:
source streams-install-directory/bin/streamsprofile.sh
-
Create a directory called wordcount.
For example, use the following commands:
mkdir wordcount cd wordcount
- Using your text editor, create a file named Helpers.spl in the wordcount directory.
-
Paste the following code into your file and save it:
type LineStat = tuple<int32 lines, int32 words>; int32 countWords(rstring line) { return size(tokenize(line, " \t", false)); } void addM(mutable LineStat x, LineStat y) { x.lines += y.lines; x.words += y.words; }
This code does the following:
- Defines a type
LineState
to be a tuple with two attributes for counting lines and words. - Defines a function
countWords
that takes a string as input and returns a 32-byte integer of the number of tokens found after tokenizing the input by space and tab characters.The
size
andtokenize
functions are provided by the SPL standard toolkit. - Defines a function
addM
that takes twoLineStat
arguments, adds their attributes together, and stores the result into the firstLineStat
argument.Note: SPL variables are immutable by default. Themutable
keyword enables a variable to be modifiable and explicitly communicates how a function may affect arguments passed in.
- Defines a type
- Using your text editor, create a file named WordCount.spl in the wordcount directory.
-
Paste the following code into your file and save it:
composite WordCount { graph stream<rstring line> Data = FileSource() { param format: line; file: getSubmissionTimeValue("file"); } stream<LineStat> OneLine = Functor(Data) { output OneLine: lines = 1, words = countWords(line); } () as Counter = Custom(OneLine) { logic state: mutable LineStat sum = { lines = 0, words = 0 }; onTuple OneLine: addM(sum, OneLine); onPunct OneLine: if (currentPunct() == Sys.FinalMarker) { println(sum); } } }
This SPL code does the following:
- Declares a main composite operator called
WordCount
. - Starts a data flow graph with the
graph
clause. - Invokes a
FileSource
operator to produce a stream calledData
whose tuples have one attribute,line
, of type rstring. The invocation is configured to:- Read one line at a time (
format: line
) - From a file specified via a file parameter at
submission-time (
file: getSubmissionTimeValue("file")
)
- Read one line at a time (
- Invokes a
Functor
operator which reads from streamData
and produces a stream calledOneLine
. TheOneLine
stream hasLineStat
as its schema (which you defined containing alines
and awords
attribute). The invocation is configured to output to streamOneLine
tuples withlines
set to 1 (because each tuple fromData
is already a single line) andwords
set to the result of thecountWords
function. - Invokes a
Custom
operator which reads from streamOneLine
. The invocation does not output any streams evident by the()
and is namedCounter
.Counter
is configured to:- Maintain state of a
LineStat
stored in variablesum
initialized to 0. - Call the
addM
function withsum
and the tuple fromOneLine
when one arrives. - Check the current punctuation to see if it is a final marker (end of stream); if
it is, print value of
sum
.
- Maintain state of a
- Declares a main composite operator called
-
Compile the code.
In the
wordcount
directory, run the following command:sc -M WordCount
The command instructs the SPL compiler to create a stream application from the
WordCount
main composite operator. -
Run the application, specifying a file to count the lines and words:
./output/bin/standalone file="$PWD/WordCount.spl"
The program prints out the lines and words of our source file.
Note: Since a data directory was not specified, an absolute path to the file to process must be provided.