Reading data from a source file
Suppose that the stock transaction data for a day contains the following sample data in a file named StockTrades.csv.
"BR","27-DEC-2024","14:30:04.894",86.25
"WHR","27-DEC-2024","14:30:06.400",83.84
"IBM","27-DEC-2024","14:30:07.521",83.48
"OMC","27-DEC-2024","14:30:11.905",86.68
"BTU","27-DEC-2024","14:30:12.523",82.5
"IBM","27-DEC-2024","14:30:13.518",83.45
"BDK","27-DEC-2024","14:30:13.519",88.02
"AMG","27-DEC-2024","14:30:21.779",80.88
"IBM","27-DEC-2024","14:30:24.118",83.44
"BR","27-DEC-2024","14:30:24.120",86.23
"OMC","27-DEC-2024","14:30:24.397",86.68
"UNP","27-DEC-2024","14:31:38.856",80.01
"IBM","27-DEC-2024","14:31:40.564",83.57
"POT","27-DEC-2024","14:31:43.965",80.44
"BR","27-DEC-2024","14:31:48.688",86.22
"IBM","27-DEC-2024","14:31:49.208",83.59
"BR","27-DEC-2024","14:31:49.213",86.22
"ATW","27-DEC-2024","14:31:49.703",80.98
"IBM","27-DEC-2024","14:31:50.221",83.6
"BR","27-DEC-2024","14:31:50.224",86.23
"WHR","27-DEC-2024","14:31:54.281",83.85
"ATW","27-DEC-2024","14:31:57.739",80.92
You can use the FileSource operator to read data from a file and generate tuples for that data. As shown in this example, there is no input stream for the FileSource operator; the output stream is all the transactions.

In SPL terminology, a tuple is a unit of data for an operator. In this example, each transaction record is a tuple. A stream is a sequence of tuples. In this example, "All transactions" is an output stream. An operator transforms an input stream into an output stream. In this example, the operator reads the data from the file and submits a tuple to the output stream for each stock transaction.
Suppose that each transaction record (that is, a tuple) contains the following four fields. Each field in a record corresponds to an attribute in a tuple.
Field | Type | Name | Description |
---|---|---|---|
1 | string | ticker | ticker name |
2 | string | date | transaction date |
3 | string | time | transaction time |
4 | decimal | price | trading price |
In SPL code, the transaction record can be represented by the following
TransactionRecord
type:
type
TransactionRecord = rstring ticker,
rstring date,
rstring time,
decimal64 price;
where rstring
is a sequence of raw bytes that supports string
processing when the character encoding is known, and decimal64
is the IEEE 754
decimal 64-bit floating point number.
The SPL code for the FileSource operator is shown. Because the FileSource operator does not ingest an input stream, you do not specify a value in the parentheses that follow the operator name.
stream<TransactionRecord> AllTransactions = FileSource() {
param
file : "StockTrades.csv";
format : csv;
}
In general, the FileSource operator reads data from a file that is specified by the file parameter and submits the data to the output stream as individual tuples. The format of the file is specified by the format parameter. For the csv format, the FileSource operator expects the file to contain a series of lines where each line is a list of comma-separated values.
In this example, the FileSource operator performs the following steps:
- Reads a line of data from the input file (StockTrades.csv).
- Converts each comma-separated value to the corresponding attribute in the tuple.
- Submits the line of data as a tuple to the output stream (
AllTransactions
). - Repeats Steps 1 to 2 until all the lines of data are read from the input file.
The FileSource operator requires that the type
of the data from the input file is the same as (or can be converted
to) the type of the output stream. The type of the output stream
is specified by the tupleType
in
the "stream<tupleType> OutputStream =
FileSource()
" declaration. In this example, the type of
the output stream is TransactionRecord
.