Filtering large data sets

In this example, the stream application needs to filter the stock transaction data for IBM transaction records. You use the Filter operator to extract relevant information from potentially large volumes of data. As shown, the input for the Filter operator is all the transactions; the output is only the IBM transactions.


This figure is described in the surrounding text.

The SPL code for the Filter operator is shown. To read the code, you say that the output stream is produced by operating on the input stream. In this case, you say that IBMTransactions is produced by filtering AllTransactions.

      stream<TransactionRecord> IBMTransactions = Filter(AllTransactions) {
         param
            filter : ticker == "IBM";
      }

In general, the Filter operator receives tuples from an input stream and submits a tuple to the output stream only if the tuple satisfies the criteria that are specified by the filter parameter.

In this example, the Filter operator performs the following steps:

  1. Receives a tuple from the input stream (AllTransactions).
  2. If the value of the ticker attribute is IBM, it submits the tuple to the output stream (IBMTransactions).
  3. Repeats Steps 1 to 2 until all the tuples from the input stream are processed.

The Filter operator requires that the type of the output stream is the same as the type of the input stream. The type of the output stream is specified by the tupleType in the "stream<tupleType> OutputStream = Filter(InputStream)" declaration. In this example, the type of the output and input streams is TransactionRecord.