Runtime errors in primitive operators
Runtime errors in primitive operators can be catastrophic or recoverable. For unrecoverable errors, you can specify to throw an exception and log the error. Recoverable error handling can use logging, metrics, or error ports.
The FileSource operator serves as a case-study for how primitive operators can handle errors. Adapter operators that must deal with the world outside of an SPL application tend to have more error conditions because their interactions are less constrained.
Unrecoverable errors
For unrecoverable errors, the only option is to throw an exception from within C++ or Java™. In C++, operator developers can throw an exception that
is a subtype of std::exception
, and whose what() member
function provides a meaningful cause of the error. In Java™, developers can throw any exception in the java.lang.Exception
family with a meaningful message. Uncaught exceptions terminate the operator and the Processing
Element that contains that operator. Eventually, its unhealthy state is reported to the Teracloud® Streams
instance, and can be inspected with streamtool, Streams Console, or through log messages.
In addition to throwing an exception,
developers must write a trace entry. Primitive operators
in C++ can use the SPLAPPTRC macro, and primitive operators in Java™ can use the java.util.logging.Logger
interface.
Create trace messages that are both descriptive of the problem, and
structured so that they are easy to search for using standard utilities
such as grep. Tag related errors with the same
aspects to make them easier to find inside trace files.
Examples of unrecoverable errors in the FileSource operator are when a specified file cannot be opened for reading, or the optional parameter that specifies where to move the resulting file is an invalid directory. Both of these errors are fundamental to the purpose of the operator. If these errors occur, there is nothing further for the operator to do. In such circumstances, the only option is to terminate.
Recoverable errors
There are several common practices for reporting that errors occurred (such as logging the error). Logs are useful for reconstructing the events that led to an error. However, logs by themselves do not provide aggregate information about the behavior of an application. Recording recoverable errors as metrics can provide such an overview. Operator developers can define error metrics with the custom metrics API.
If other operators in the application must know about errors at run time, operator developers can define optional output ports that receive only bad tuples. These error output ports must be optional so that users are not forced to deal with bad tuples. Such ports contain only data that is related to errors. The error data can be in the form of status messages that indicate the type of error that occurred, or just the tuple that triggered the error.
Dependent on parsing
mode, the FileSource operator handles recoverable
errors differently. First, whether the FileSource operator
considers parsing errors catastrophic or recoverable is configurable.
When parsing is set to strict
, any parsing error
is considered an unrecoverable error. While it is possible to recover
from parsing errors, this practice demonstrates using operator parameters
to tailor runtime error handling. If there are multiple valid ways
to handle an error, then the best practice is to allow users of the
operator to specify the behavior they want through an operator parameter.
The default behavior must be the most conservative option.
When
the FileSource operator is in permissive
mode,
it tries to recover from errors in its input by ignoring the current
potential tuple and moving to the next. It logs such occurrences and
updates its metrics accordingly.
The final
parsing behavior in the FileSource operator is fast
,
which assumes that all input tuples are formatted correctly. This
mode performs no error checking. Therefore, the parsing is as fast
as possible, but as a consequence, errors in the input result in undefined
behavior. The fast
mode actually demonstrates the
lack of error handling. When assurances are provided on the format
of the input, and speed is a primary concern, it is reasonable to
have no error handling strategy. However, the FileSource operator
clearly advertises that this mode is dangerous, and it is not the
default. By default, operators try to recover from errors. Providing
optimized modes that do not recognize errors is a valid strategy,
but like the FileSource operator, they must be
clearly marked as being dangerous. Such modes must be explicitly requested
through operator parameters.