Default values for operator parameters
Operators that accept optional parameters at their invocation site must provide default values or require the parameter to be provided. Choose default values that are safe, that do not adversely affect performance, and that follow the principle of least surprise.
The primary concern for defaults is that they are safe.
In this context, safety means that the operator performs all appropriate
error checking and reports all errors. For example, the FileSource operator
in the SPL Standard Toolkit has a parsing
parameter.
The default mode is strict
, which performs all error
checking on input files and generates a runtime exception if it encounters
an error. The strict
mode is the safest because if
there are errors in the input, they are discovered because the application terminates.
Developers must explicitly request the less safe options that either
try to recover from the error (permissive
mode) or
avoid error checking completely (fast
mode). Typically,
operators follow this pattern: default values are safe, and unsafe
options must be explicitly requested.
The only exception to this rule is when the safest option has unacceptable
performance. For example, by default the FileSink operator
in the SPL Standard Toolkit does not explicitly flush its output.
Instead, it relies on buffered I/O as provided by C++. Flushing each
tuple to disk as the operator receives them is the safest behavior,
as it avoids losing data if the operator crashes. However, the FileSink operator
would then run at the speed of the disk, which is unacceptable performance.
If developers want FileSink to explicitly flush
tuples, then they specify it with the flush
parameter.
Do not use the safest default value if it results in unacceptable
performance. However, when in doubt, err on the side of caution.
When safety and performance concerns do not affect the feature,
then follow the principle of least surprise for default parameters.
For example, the filter
parameter in the Functor operator
from the SPL Standard Toolkit is optional; it defaults to the value true
.
This value is the least surprising because many invocations of the Functor operator
do not need a filter
expression. If the default was false
,
then for each of those invocations, the programmer must explicitly
set the parameter to true
to ensure no tuple filtering.
Another example is the order
parameter for the Sort operator
in the SPL Standard Toolkit. The default value is ascending
,
which sorts items in increasing order. When people want to sort a
sequence, they typically want it sorted from least to greatest. Defaulting
to sorting in descending order (from greatest to least) is more likely
to surprise users. When you choose among possible default values,
choose the value that results in the least surprising behavior.
Related to the principle of least surprise is choosing common formats.
If the parameter specifies a particular format, and one format is
more common than the others, make that format the default. An example
of this practice is the format
parameter on both
the FileSource and FileSink operators
in the SPL Standard Toolkit. Their default file formats are comma-separated
values (csv). The csv format is the most common way of representing
structured data in text form. Also, note that in this instance, it
would not make sense for the FileSource and FileSink operators
to use different default values. If an operator is in a family of
operators, or if it has a logical counterpart (such as sources and
sinks), make the defaults the same.
Finally, if none of these criteria apply and all of the available options are equally valid, then do not have a default value for the parameter. Not having a default value implies that users of the operator must choose a value to give the parameter at its invocation site. An example is the position parameter in the Punctor operator from the SPL Standard Toolkit. The options are to generate punctuation before or after each incoming tuple. Neither option is safer, faster, more common, or less surprising than the other. Hence, users must specify which behavior they want.