Processing files with file content based group processing (Variant B)

When you must process files that contain group information as part of the tuple contents, use a configuration that bases the group split on file contents, and each tuple determines the context. The wizard offers this configuration under the Variant B name.

About this task

Configure the framework to process the input data with file content based grouping.

Procedure

In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.loadDistribution.numberOfParallelChains parameter description
Set the parameter to the wanted number of chains, for example: ite.ingest.loadDistribution.numberOfParallelChains=8
In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.fileGroupSplit parameter description
Disable the file group splitting by setting the parameter to off as follows: ite.ingest.fileGroupSplit=off
In the file <PathToYourApplication>/config/config.cfg, find the ite.businessLogic.group parameter description
Enable grouping of data by setting the parameter to on as follows: ite.businessLogic.group=on
In the file <PathToYourApplication>/config/config.cfg, find the ite.businessLogic.transformation.tupleGroupSplit parameter description
Enable grouping of tuples by setting the parameter to on as follows: ite.businessLogic.transformation.tupleGroupSplit=on
In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.loadDistribution.groupConfigFile parameter description
Assign the path to the group configuration file to the parameter, for example (by using the default here): ite.ingest.loadDistribution.groupConfigFile=<PathToYourApplication>/config/groups.cfg

In this configuration, the group configuration file is necessary to define the expected tuples per day for each group. Each uncommented line in this file stands for one group. Internally these lines are numbered from 00, 01 and on. Only the third column is relevant in this use case, so add here the number of tuples you expect for the groups.

Processing files with file content based group processing

For grouping based on tuple attributes, you implement your custom business logic in the <namespace>.chainprocessor.transfomer.custom::DataProcessor composite. The logic must produce a destination group ID in the groupID SPL output attribute. The groupID is a 2 digits rstring attribute that supports the range 00 - 99. The default groupID value is 00. If you set the groupID to an unknown value, then the Processing Element is going to shut down.