Integrating a new parser

For input formats that cannot be parsed with a built-in parser operator, you must implement a new parser operator - either in SPL, Java, or C++ - and integrate it into the ITE application. The resulting composite operators are typically named parser composite operators.

Procedure

Run the following procedure for each new input format.

  1. If the new parser operator is in another toolkit, add the toolkit dependency to the ITE application, for example, by editing the info.xml file. Also, add the toolkit location to the STREAMS_SPLPATH environment variable or to the toolkitsList.xml file in the project folder. If the file does not exist, create it. For more information about the file format, see Developing > Compiling streams applications > Compiling SPL applications > Working with toolkits paths.
  2. Rename or if you want to integrate several new parsers create a renamed copy of the <namespace>.chainprocessor.reader.custom::CustomReaderTemplate composite operator.

  3. Open the copied or renamed parser composite operator with the SPL editor.

  4. Modify the composite operator. The modifications depend on the programming language that your parser operator is implemented in, and the features of your parser operator. For example, edit the code section between parser code begin and parser code end, replace the enclosing Custom operator or modify the whole composite operator.

    There are few requirements the parser composite operator must fulfill:

    • The parser composite operator has one input port for the data that is the file information.
    • The parser composite operator has two output ports. The first output port gets the parsed or invalid tuples and the second output ports gets metrics information at the end of each file. The file information that is received on the input port, must be forwarded to the output ports.

    • The parser composite operator must support the groupId and chainId parameters that are typically used for traces or other debug outputs. For example, you might want to add these IDs to the names of debug output files to distinguish between the composite operator instances.

  5. Save the parser composite operator.

Hint: For more information about integrating a new parser, for example, how to handle invalid tuples that leave your parser on a separate output port, inspect the code of the built-in parser composite operators that are:

  • <namespace>.chainprocessor.reader.FileReaderASN1
  • <namespace>.chainprocessor.reader.FileReaderCSV
  • <namespace>.chainprocessor.reader.FileReaderStructure