Processing files with file content based group processing (Variant B)

When you must process files that contain group information as part of the tuple contents, use a configuration that bases the group split on file contents, and each tuple determines the context. The wizard offers this configuration under the Variant B name.

About this task

Configure the framework to process the input data with file content based grouping.

Procedure

  1. In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.loadDistribution.numberOfParallelChains parameter description
  2. Set the parameter to the wanted number of chains, for example: ite.ingest.loadDistribution.numberOfParallelChains=8

  3. In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.fileGroupSplit parameter description

  4. Disable the file group splitting by setting the parameter to off as follows: ite.ingest.fileGroupSplit=off

  5. In the file <PathToYourApplication>/config/config.cfg, find the ite.businessLogic.group parameter description

  6. Enable grouping of data by setting the parameter to on as follows: ite.businessLogic.group=on

  7. In the file <PathToYourApplication>/config/config.cfg, find the ite.businessLogic.transformation.tupleGroupSplit parameter description

  8. Enable grouping of tuples by setting the parameter to on as follows: ite.businessLogic.transformation.tupleGroupSplit=on

  9. In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.loadDistribution.groupConfigFile parameter description

  10. Assign the path to the group configuration file to the parameter, for example (by using the default here): ite.ingest.loadDistribution.groupConfigFile=<PathToYourApplication>/config/groups.cfg

In this configuration, the group configuration file is necessary to define the expected tuples per day for each group. Each uncommented line in this file stands for one group. Internally these lines are numbered from 00, 01 and on. Only the third column is relevant in this use case, so add here the number of tuples you expect for the groups.


Processing files with file content based group processing

For grouping based on tuple attributes, you implement your custom business logic in the <namespace>.chainprocessor.transfomer.custom::DataProcessor composite. The logic must produce a destination group ID in the groupID SPL output attribute. The groupID is a 2 digits rstring attribute that supports the range 00 - 99. The default groupID value is 00. If you set the groupID to an unknown value, then the Processing Element is going to shut down.