Home
Reference
Find details on the SPL language, toolkits, APIs, commands, and more.
Toolkits
A toolkit is a set of SPL artifacts, which are organized into a package. Toolkits make SPL or native functions and primitive or composite operators reusable across different applications.
Toolkits
Toolkit com.teracloud.streams.teda 4.0.0
Developing applications
Configuring Solutions
Configuring ITE Applications
Data ingestion
Using load distribution to distribute files

Welcome
Learn about the core capabilities of Teracloud® Streams, its architecture, and key concepts.
Installing
Use this information to install, upgrade, and uninstall the Teracloud® Streams product.
Configuring
Create a basic or an enterprise domain which is a single point for configuring and managing common resources, security, and instances.
Administering
Administer the product by using the Teracloud® Streams graphical user interface, APIs, or the streamtool command-line interface.
Developing
Develop stream applications with the Streams Processing Language (SPL), Java, and Python.
Troubleshooting
Resolve problems with Teracloud® Streams using the troubleshooting tools provided with the product as well as the resources offered by Teracloud Support.
Reference
Find details on the SPL language, toolkits, APIs, commands, and more.
- APIs
  Application programming interfaces (APIs) provide functions that simplify applications development.
- Job configuration overlays
  You can specify or change configuration parameters when you use the submitjob or updateoperators operations. The settings control submission constraints, host constraints and locations, and operator status. Parameters are captured and updated in the configuration JSON file. When you change the values in the job configuration overlay file, the changes override any previously set values.
- Commands
  Teracloud® Streams provides a number of commands you use to perform tasks.
- Operator model
  An operator model is an XML document that describes the basic properties of a primitive operator.
- Streams Processing Language (SPL)
  Streams Processing Language (SPL) is a distributed data flow composition language that is used in Teracloud® Streams. SPL has primitive types, program structures, and definitions that are tailored for streaming data.
- Toolkits
  A toolkit is a set of SPL artifacts, which are organized into a package. Toolkits make SPL or native functions and primitive or composite operators reusable across different applications.
  - Specialized toolkit requirements and restrictions
    Several of the specialized toolkits require RPMs that are not required by the Teracloud® Streams product. There are also toolkit restrictions on some platforms.
  - Toolkits
- SPLDOC markup
  You can use SPLDOC markup in the descriptions associated with the SPL artifacts in a toolkit. The spl-make-doc command generates HTML documentation from the marked up artifacts.
Glossary
Use this glossary to find terms and definitions for Teracloud® Streams.

Using load distribution to distribute files

To increase the throughput of your application, you distribute the detected input files to many processing chains, which all work on the data in parallel. You define the number of chains that are used in your application and then distribute the files to them (wizard’s variants A and B are using this distribution method). Alternatively, when you can determine the group membership of the data from the file name, you distribute files only to the chains, which belong to that group (wizard’s variant C uses this method).

The Equal Load distribution tracks the processing state of the chains. When a chain finished a file, it sends an acknowledgment to the load distribution, and immediately receives the next file name. This method should lead to balanced load on the chains, but has the disadvantage of incurring more effort for the acknowledgment processing. The file name queue is kept in the File Ingestion and is not limited in size to cope with bursts of incoming data files. Choose this configuration whenever you expect files of significantly different sizes and files landing in chunks.

Equal Load distribution

About this task

Distribute input files in equal load manner to the parallel processing chains.

Procedure

In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.loadDistribution parameter description
To select the equal load file distribution, set the parameter value as follows: ite.ingest.loadDistribution=equalLoad
In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.loadDistribution.numberOfParallelChains parameter description
Set the parameter to the wanted number of chains, for example: ite.ingest.loadDistribution.numberOfParallelChains=11