streamtool submitjob

Usage

submitjob [-d,--domain-id <did>] [-i,--instance-id <instance>] [--override <override-list>] {[-f,--file <file-name>] | [<sab-pathname>]} [-g,--jobConfig <file-name>] [-P,--P <parameter-name>] ... [-C,--config <config-setting>] ... [-J,--jobgroup <jobgroup-name>] [--outfile <file-name>] [--jobname <job-name>] [--parallelRegionWidth <parallelRegionName=width>] [-s,--jobResourceSharing <job-resource-sharing-value>] [-y,--proposedOperatorsPerResource <integer>] [-w,--preview [-q,--out_jobConfig <file-name>]] [-U,--User <user>] [-h,--help] [--trace <level>] [-v,--verbose <level>] [--zkconnect {<host>:<port>},... | --embeddedzk]

The streamtool submitjob command previews or submits one or more jobs.

Authority

You must have add authority for the appropriate jobgroup_name instance object or else submit permission for the job group. By default, the DomainAdministrator, InstanceAdministrator, and InstanceUser roles have this authority.

You must have add authority for the jobs-override instance object to use the --override ResourceLoadProtection option. By default, the DomainAdministrator and InstanceAdministrator roles have this authority.

For more information about access control lists, see the streamtool getacl and streamtool lsjobpermission commands.

Description

A submitted job runs an application that is defined by an application bundle. Application bundles are created by the Stream Processing Language (SPL) compiler. A job consists of one or more processing elements (PEs). The PEs are placed on one or more of the application resources for the instance. The submission fails if the PE placement constraints can't be met.

Jobs remain in the system until they are canceled or the instance is stopped.

Job groups

When you submit the job, it is assigned to the "default" job group unless you specify the -J or --jobgroup options. The job group that you specify in those options must exist before you run this command. For more information about creating job groups, see streamtool man mkjobgroup.

Job identifiers

The command generates a job identifier (ID) for each job that you submit. You can also optionally specify a job name. The job name must satisfy the following requirements:

  • The name must be unique within the instance.
  • The name must contain alphanumeric characters. You cannot use the following alphanumeric characters: ^!#$%&'*+,/;<>=?@[]`{|}˜(). You also cannot use the following Unicode and hexadecimal characters: u0000; u0001-u001F; u007F-u009F; ud800-uF8FF; uFFF0-uFFFF; x{10000}-x{10FFFF}.
  • The maximum length of the name is 1 KB.
If you do not specify a job name, Streams creates a unique job name with the following format: applicationName_jobID. You cannot change the job name after you submit the job.
Previewing a job

You can preview the set of resources that will be allocated to a job by using the --preview option. In preview mode, you cannot use the --file or --outfile option. The --out_jobConfig option is available only in preview mode.

Submission parameters

An application can define optional or required submission parameters that can be used to control its behavior. If you do not provide required submission parameters when you submit the job, the submission fails. If your application requires submission-time values, you must still set those values in the application, even though they will be overridden. If you specify a parameter name that is not known to the application, the command generates a warning and the parameter is ignored.

When you set a value with the streamtool submitjob command, the value overrides any corresponding value that is specified in a job configuration overlay file, any corresponding value that is provided by a submission-time parameter, or any corresponding value that is provided in the application.

Application configuration settings

You can optionally specify an application configuration setting by using the -C, --config option. For example, you can use this option to set the location of the data directory:

streamtool submitjob -C data-directory=/myDataDirectory myappl.sab
Submitting more than one job at a time

To submit more than one job in a single command, use the -f, --file option. Each job submission must be provided on a separate line in the file. Blank lines and lines that have a # character as the first non-white space character are ignored. In the file, job submission specifications consist of job parameter specifications and configuration settings followed by the path name for the application bundle. For example:

-P "myParm=myValue" -C "myConfig=myValue" myappp1.sab

Use only double or single quotation marks within the file to deal with parameter values that contain white space characters. Put quotation marks around the combined name and value specification. For example: --P "myParm=myValue". The same advice applies to this situation when you submit a job from the command line.

Shell expansion processing does not occur for the contents of the file. For example, the command does not expand path name wildcards or shell variables.

If one of the jobs that are listed in a file fails, subsequent jobs are nonetheless parsed and can succeed.

Setting the parallel width

You can use the --parallelRegionWidthparallel-region-name=width option to set a parallel region width in the job. If you want to set the width of multiple parallel regions at in the same command, use the --jobConfigfile-name option to specify a job configuration overlay file, and then specify the parallel regions in the TargetParallelRegion section of the job configuration overlay file. Alternatively, you can use the Streams Console.

Options

-C,--config <config-setting>
Specifies the name of one of the following application configuration settings and its value:
data-directory
Specifies the location of the data directory. For example: /myDataDirectory.
fusionScheme
Specifies the scheme that is used to determine how to fuse the operators in a stream processing application into partitions. By default, the command uses the instance.fusionScheme instance property value.
Valid values are automatic (default), legacy, and manual. When you use the automatic value, the system determines the most efficient the number of PEs to assign to a job.
Using the legacy value prompts operator fusion to behave as it did before Streams Version 4.2. Typically, each operator is fused into a separate PE.
With the manual fusion scheme, you can specify the target number of PEs to use.
fusionTargetPeCount
Specifies the target number of processing elements to use when fusing operators. This parameter can be specified only when fusionScheme is set to manual.
parallelRegionConfig
Specifies how the operators in the channels in parallel regions are fused into PEs.
  • channelIsolation: Operators within a channel are fused into a PE only with operators from the same channel. Operators outside the parallel region or from other channels in the same region are fused into different PEs. One or more PEs for a channel can be created depending on the fusion constraints.

    You can explicitly colocate an operator in a channel with an operator outside the region or from a different channel in the same region. A separate PE is created if all of the explicitly colocated operators are not members of the same channel.

  • channelExlocation: Operators within a channel are fused with operators from the same channel or with operators from outside the region. Operators from other channels in the same region are not fused into the same PEs. One or more PEs for a channel can be created depending on the fusion constraints.

    You can explicitly colocate an operator in a channel with an operator outside the region or from a different channel in the same region. A separate PE is created if all of the explicitly colocated operators are not members of the same channel.

    If you use this value, you cannot change the width of a parallel region while the job is running. You would need to resubmit the job with the parallelRegionConfig parameter set to channelIsolation or to noChannelInfluence.

  • noChannelInfluence: (default) Inclusion in this parallelRegion has no impact on the fusion process.
placementScheme
Specifies a scheme to determine which PEs get placed on which hosts. Valid values are balancedInstance (default), balancedJob, and legacy.
When you use the balancedInstance value, the system distributes PEs across candidate hosts proportionally to the number of cores on the hosts. This method takes into account PE placements from previous jobs.
Using the balancedJob value means that PEs are distributed across candidate hosts proportionally to the number of cores on the hosts, but PE placements from previous jobs are ignored.
When you use the legacy value, the estimated load of PEs to be placed is matched with existing ldavg numbers of hosts that are adjusted for the number of cores in hosts.
preloadApplicationBundles
Specifies whether to preload the application bundle files onto all resources in the instance, even if not currently needed there. Valid values are true and false. By default, the command uses the instance.preloadApplicationBundles instance property value. Preloading the application bundle files can improve the performance in situations where a PE is relocated to a new resource that was not already hosting a PE from that application.
dynamicThreadingElastic
Specifies whether elasticity is on or off for dynamicThreading. Valid values are true for on, and false for off. This setting can be used only if instance.threadingModel is set to dynamic, or the application configuration threadingModel is set to dynamic.
dynamicThreadingThreadCount
Specifies the number of threads for scheduled ports to use. Inherits its value from instance.dynamicThreadingThreadCount, if set. Valid values are integers greater than or equal to 1. This setting can be used only if instance.threadingModel is set to dynamic, or the application configuration threadingModel is set to dynamic.
threadingModel
Specifies the threading model to use as the default for each PE in the job. Inherits its value from instance.threadingModel, if set. This default can be overridden by @threading annotations in the SPL code. Valid values are manual, automatic, scheduledPorts, allThreadedPorts, and default.
tracing
Specifies the PE trace setting. The following valid levels are listed in order of increasing verbosity, which is to say that the first level in the list generates the least amount of information:
  • off
  • error
  • warn
  • info
  • debug
  • trace
By default, the trace setting is error unless you specify otherwise in the stream processing application.
For example:
--config
tracing=off,preloadApplicationBundles=true
.
-d,--domain-id <did>
Specifies the domain identifier.

If you do not specify this option, Streams uses the domain name that is set in the STREAMS_DOMAIN_ID environment variable. By default, that domain name is StreamsDomain. If you are using the interactive streamtool interface, it uses the name of the active domain for the current streamtool session or else it prompts you for the domain name.

The active domain for the current streamtool session is set every time that you successfully run a streamtool command with a -d or --domain-id option. Alternatively, you can run the streamtool domain command in the interactive interface.

--embeddedzk

Specifies to use the embedded copy of ZooKeeper. This option is not supported within the interactive streamtool interface.

If you are not using the interactive streamtool interface and you do not specify either this option or the --zkconnect option, Streams uses the ZooKeeper connection that is associated with the active domain or the domain that is specified in the --domain-id option. Streams determines which connection maps to the domain by using cached information about the domains. In this scenario, if the domain identifier is not unique in the Streams configuration cache, the command fails.

-f,--file <file-name>
Specifies the name of a file that contains a list of job submission specifications. Each job submission must be provided on a separate line in the file. Blank lines and lines that have a # character as the first non-white space character are ignored. In the file, job submission specifications consist of job parameter specifications and configuration settings followed by the path name for the application bundle.
-g,--jobConfig <file-name>
Specifies the name of an external file that defines a job configuration overlay. You can use a job configuration overlay to set the job configuration when the job is submitted or to change the configuration of a running job.
-h,--help
Specifies to show the command syntax.
-i,--instance-id <instance>
Specifies the instance identifier.

If you do not specify this option, Streams uses the instance identifier that is set in the STREAMS_INSTANCE_ID environment variable. By default, that instance identifier is StreamsInstance. If you are using the interactive streamtool interface, it tries to use an instance ID that you specified in a previous command. If no such value is found, the command uses the STREAMS_INSTANCE_ID environment variable. Alternatively, you can run the streamtool instance command in the interactive interface.

-J,--jobgroup <jobgroup-name>
Specifies the job group. If you do not specify this option, the command uses the following job group: default.
--jobname <job-name>
Specifies the name of the job.
--outfile <file-name>
Specifies the path and file name of the output file in which the command writes the list of submitted job IDs. The path can be an absolute or relative path. If you do not specify a path, the file is created in the directory where you run the command.
--override <override-list>
Specifies to override the instance.resourceLoadProtectionEnabled instance property for the job and treat it as false. The resource load protection property prevents Streams from allocating processing elements (PEs) to an overloaded resource. A resource is overloaded when the values for any one of the metrics for CPU utilization, memory utilization, or network bandwidth utilization exceed their thresholds. For example:
streamtool submitjob --override ResourceLoadProtection myJob.sab
-P,--P <parameter-name>
Specifies a submission-time parameter and value for the job. You can specify this option multiple times in the command.
--parallelRegionWidth <parallelRegionName=width>
Specifies a parallel region name and its width. The parallel region name is the name of the logical operator that the @parallel annotation is applied to.
-q,--out_jobConfig <file-name>
Specifies the name of the output file in which the command writes the operator configuration information.
-s,--jobResourceSharing <job-resource-sharing-value>
When instance.applicationResourceAllocationMode=Job, specifies whether a job can run on resources with other jobs. Valid values:
sameJob
Job resources can be shared only within this job.
sameUser
If unused resources aren't available, job resources can be shared by other jobs from the same user that are also specified as sameUser sharing.
sameInstance
If unused resources aren't available, job resources can be shared by other jobs from the same instance that are also specified as sameInstance sharing.

If a value is not specified, the command uses the instance.jobResourceSharing instance property value.

--trace <level>
Specifies the trace setting. The following valid levels are listed in order of increasing verbosity, which is to say that the first level in the list generates the least amount of information:
  • off
  • error
  • warn
  • info
  • debug
  • trace
The default value is off.
-U,--User <user>
Specifies an Streams user ID that has authority to run the command.
-v,--verbose <level>
Specifies to provide more detailed command output. The verbosity level can be 0-3, where 0 disables detailed reporting and each increment provides more detailed output.
-w,--preview
Indicates to preview the changes that would be made. This option assumes that the complete resource requirements are able to be satisfied, without validating those assumptions. A results object is returned in the current working directory as a job configuration overlay file in JSON format. The file name is indicated in the output message. You can change the file name by using the --out_jobConfig option.
-y,--proposedOperatorsPerResource <integer>
The ratio of the number of operators per resource that is used to compute the number of resources to request for the job. The default value is 8, meaning that an additional resource is requested for every 8 operators in an application. This property applies only if the value of instance.applicationResourceAllocationMode is Job.This value is a suggestion only. Many factors determine the actual number of operators that are placed on a resource.
--zkconnect <{<host>:<port>},...>

The name of one or more host and port pairs that specify the configured ZooKeeper servers. This option is not supported within the interactive streamtool interface.

If you are not using the interactive streamtool interface and you do not specify this option, Streams tries to use:

  1. The --embeddedzk option
  2. The value from the STREAMS_ZKCONNECT environment variable
  3. A ZooKeeper connection string that is derived from cached information about the current domain.

Arguments

sab-pathname
Specifies the path name for the application bundle file. If you do not specify an absolute path, the command seeks the file in the directory where you ran the command. Alternatively, you can specify the path name for the application description language (ADL) file if the application bundle file exists in the same directory.

Examples

The following command submits a job with submission time values for a distributed application:

[streamtool <bsmith@StreamsDomain.StreamsInstance>] submitjob -P students='[{name="Mary", grade=100},{name="John", grade=90}]' output/Main.sab