Runtime errors in primitive operators

Runtime errors in primitive operators can be catastrophic or recoverable. For unrecoverable errors, you can specify to throw an exception and log the error. Recoverable error handling can use logging, metrics, or error ports.

The FileSource operator serves as a case-study for how primitive operators can handle errors. Adapter operators that must deal with the world outside of an SPL application tend to have more error conditions because their interactions are less constrained.

Unrecoverable errors

For unrecoverable errors, the only option is to throw an exception from within C++ or Java. In C++, operator developers can throw an exception that is a subtype of std::exception, and whose what() member function provides a meaningful cause of the error. In Java, developers can throw any exception in the java.lang.Exception family with a meaningful message. Uncaught exceptions terminate the operator and the Processing Element that contains that operator. Eventually, its unhealthy state is reported to the Teracloud® Streams instance, and can be inspected with streamtool, Streams Console, or through log messages.

In addition to throwing an exception, developers must write a trace entry. Primitive operators in C++ can use the SPLAPPTRC macro, and primitive operators in Java can use the java.util.logging.Logger interface. Create trace messages that are both descriptive of the problem, and structured so that they are easy to search for using standard utilities such as grep. Tag related errors with the same aspects to make them easier to find inside trace files.

Examples of unrecoverable errors in the FileSource operator are when a specified file cannot be opened for reading, or the optional parameter that specifies where to move the resulting file is an invalid directory. Both of these errors are fundamental to the purpose of the operator. If these errors occur, there is nothing further for the operator to do. In such circumstances, the only option is to terminate.

Recoverable errors

There are several common practices for reporting that errors occurred (such as logging the error). Logs are useful for reconstructing the events that led to an error. However, logs by themselves do not provide aggregate information about the behavior of an application. Recording recoverable errors as metrics can provide such an overview. Operator developers can define error metrics with the custom metrics API.

If other operators in the application must know about errors at run time, operator developers can define optional output ports that receive only bad tuples. These error output ports must be optional so that users are not forced to deal with bad tuples. Such ports contain only data that is related to errors. The error data can be in the form of status messages that indicate the type of error that occurred, or just the tuple that triggered the error.

Dependent on parsing mode, the FileSource operator handles recoverable errors differently. First, whether the FileSource operator considers parsing errors catastrophic or recoverable is configurable. When parsing is set to strict, any parsing error is considered an unrecoverable error. While it is possible to recover from parsing errors, this practice demonstrates using operator parameters to tailor runtime error handling. If there are multiple valid ways to handle an error, then the best practice is to allow users of the operator to specify the behavior they want through an operator parameter. The default behavior must be the most conservative option.

When the FileSource operator is in permissive mode, it tries to recover from errors in its input by ignoring the current potential tuple and moving to the next. It logs such occurrences and updates its metrics accordingly.

The final parsing behavior in the FileSource operator is fast, which assumes that all input tuples are formatted correctly. This mode performs no error checking. Therefore, the parsing is as fast as possible, but as a consequence, errors in the input result in undefined behavior. The fast mode actually demonstrates the lack of error handling. When assurances are provided on the format of the input, and speed is a primary concern, it is reasonable to have no error handling strategy. However, the FileSource operator clearly advertises that this mode is dangerous, and it is not the default. By default, operators try to recover from errors. Providing optimized modes that do not recognize errors is a valid strategy, but like the FileSource operator, they must be clearly marked as being dangerous. Such modes must be explicitly requested through operator parameters.