Reading data from a source file

Suppose that the stock transaction data for a day contains the following sample data in a file named StockTrades.csv.

"BR","27-DEC-2024","14:30:04.894",86.25
"WHR","27-DEC-2024","14:30:06.400",83.84
"IBM","27-DEC-2024","14:30:07.521",83.48
"OMC","27-DEC-2024","14:30:11.905",86.68
"BTU","27-DEC-2024","14:30:12.523",82.5
"IBM","27-DEC-2024","14:30:13.518",83.45
"BDK","27-DEC-2024","14:30:13.519",88.02
"AMG","27-DEC-2024","14:30:21.779",80.88
"IBM","27-DEC-2024","14:30:24.118",83.44
"BR","27-DEC-2024","14:30:24.120",86.23
"OMC","27-DEC-2024","14:30:24.397",86.68
"UNP","27-DEC-2024","14:31:38.856",80.01
"IBM","27-DEC-2024","14:31:40.564",83.57
"POT","27-DEC-2024","14:31:43.965",80.44
"BR","27-DEC-2024","14:31:48.688",86.22
"IBM","27-DEC-2024","14:31:49.208",83.59
"BR","27-DEC-2024","14:31:49.213",86.22
"ATW","27-DEC-2024","14:31:49.703",80.98
"IBM","27-DEC-2024","14:31:50.221",83.6
"BR","27-DEC-2024","14:31:50.224",86.23
"WHR","27-DEC-2024","14:31:54.281",83.85
"ATW","27-DEC-2024","14:31:57.739",80.92

You can use the FileSource operator to read data from a file and generate tuples for that data. As shown in this example, there is no input stream for the FileSource operator; the output stream is all the transactions.


This figure is described in the surrounding text.

In SPL terminology, a tuple is a unit of data for an operator. In this example, each transaction record is a tuple. A stream is a sequence of tuples. In this example, "All transactions" is an output stream. An operator transforms an input stream into an output stream. In this example, the operator reads the data from the file and submits a tuple to the output stream for each stock transaction.

Suppose that each transaction record (that is, a tuple) contains the following four fields. Each field in a record corresponds to an attribute in a tuple.

Table 1. Attributes for a transaction record

This table describes the properties required to specify attributes for a transaction record. Field numbers are listed in the first column, attribute types are listed in the second column, attribute names are listed in the third column, and attribute descriptions are listed in the fourth column.

Field Type Name Description
1 string ticker ticker name
2 string date transaction date
3 string time transaction time
4 decimal price trading price

In SPL code, the transaction record can be represented by the following TransactionRecord type:

   type
      TransactionRecord = rstring ticker,
                          rstring date,
                          rstring time,
                          decimal64 price;

where rstring is a sequence of raw bytes that supports string processing when the character encoding is known, and decimal64 is the IEEE 754 decimal 64-bit floating point number.

The SPL code for the FileSource operator is shown. Because the FileSource operator does not ingest an input stream, you do not specify a value in the parentheses that follow the operator name.

      stream<TransactionRecord> AllTransactions = FileSource() {
         param
            file : "StockTrades.csv";
            format : csv;
      }

In general, the FileSource operator reads data from a file that is specified by the file parameter and submits the data to the output stream as individual tuples. The format of the file is specified by the format parameter. For the csv format, the FileSource operator expects the file to contain a series of lines where each line is a list of comma-separated values.

In this example, the FileSource operator performs the following steps:

  1. Reads a line of data from the input file (StockTrades.csv).
  2. Converts each comma-separated value to the corresponding attribute in the tuple.
  3. Submits the line of data as a tuple to the output stream (AllTransactions).
  4. Repeats Steps 1 to 2 until all the lines of data are read from the input file.

The FileSource operator requires that the type of the data from the input file is the same as (or can be converted to) the type of the output stream. The type of the output stream is specified by the tupleType in the "stream<tupleType> OutputStream = FileSource()" declaration. In this example, the type of the output stream is TransactionRecord.