Using load distribution to distribute files

To increase the throughput of your application, you distribute the detected input files to many processing chains, which all work on the data in parallel. You define the number of chains that are used in your application and then distribute the files to them (wizard’s variants A and B are using this distribution method). Alternatively, when you can determine the group membership of the data from the file name, you distribute files only to the chains, which belong to that group (wizard’s variant C uses this method).

The Equal Load distribution tracks the processing state of the chains. When a chain finished a file, it sends an acknowledgment to the load distribution, and immediately receives the next file name. This method should lead to balanced load on the chains, but has the disadvantage of incurring more effort for the acknowledgment processing. The file name queue is kept in the File Ingestion and is not limited in size to cope with bursts of incoming data files. Choose this configuration whenever you expect files of significantly different sizes and files landing in chunks.


Equal Load distribution

About this task

Distribute input files in equal load manner to the parallel processing chains.

Procedure

  1. In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.loadDistribution parameter description
  2. To select the equal load file distribution, set the parameter value as follows: ite.ingest.loadDistribution=equalLoad

  3. In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.loadDistribution.numberOfParallelChains parameter description

  4. Set the parameter to the wanted number of chains, for example: ite.ingest.loadDistribution.numberOfParallelChains=11