Processing files with file name based group processing (Variant C)
When you have a use case where the names of files contain information to determine the group, you configure the framework to send scanned files to the group-specific processing chains. The wizard offers this configuration under the Variant C name.
About this task
Configure the framework to process the input data with file name based grouping.
Procedure
- In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.fileGroupSplit parameter description
-
Enable the file group splitting by setting the parameter to off as follows: ite.ingest.fileGroupSplit=on
-
In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.fileGroupSplit.pattern parameter description
-
Assign a regular expression to the parameter that identifies and cuts out the group identifier, for example:
ite.ingest.fileGroupSplit.pattern=.*_RGN_([1-9])_.*.asn$
- In the file <PathToYourApplication>/config/config.cfg, find the ite.businessLogic.group parameter description
-
Enable grouping of data by setting the parameter to on as follows: ite.businessLogic.group=on
-
In the file <PathToYourApplication>/config/config.cfg, find the ite.businessLogic.transformation.tupleGroupSplit parameter description
-
Disable grouping of tuples by setting the parameter to on as follows: ite.businessLogic.transformation.tupleGroupSplit=off
-
In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.loadDistribution.groupConfigFile parameter description
-
Assign the path to the group configuration file to the parameter, for example (by using the default here): ite.ingest.loadDistribution.groupConfigFile=<PathToYourApplication>/config/groups.cfg

In this configuration, the group configuration file is necessary to define the group identifier, the number of chains per group, and the expected tuples per day for each group. Each uncommented line in this file stands for one group. Internally these lines are numbered from 00, 01 and on.

The groups.cfg file controls the following settings: First, it defines the number of required groups and group identifiers by the uncommented lines. Then, it configures the number of processing chains per group. Finally, the configuration file defines the number of expected tuples per group and the number of days to keep. The latter value is used to dimension the maximum number of BloomFilter entries per group when you use deduplication.
Attention: You must ensure that the <PathToYourApplication>/config/groups.cfg file has at least the “default” group configured.
Example
In the following examples, the lines that start with "default" define the group with the "00" groupID, and the line that starts with "X" defines the group with the "01" groupID.
Content for an ITE application that has a single group only:
"default",1,4000000
Content for an ITE application that configures two groups:
"default",1,4000000
"X",1,4000000