Using file group split to distribute files

To increase the throughput of your application, you distribute the detected input files to many processing chains which all work in parallel on the data. When the file names contain information that can be used to define groups, for example a region identifier, you use that to route the data streams in your application.

About this task

Distribute input files according to their group membership derived from file names.

Procedure

  1. In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.fileGroupSplit parameter description
  2. To enable the file name based group split, set the parameter value as follows: ite.ingest.fileGroupSplit=on

  3. In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.fileGroupSplit.pattern parameter description

  4. Assign a regular expression to the parameter that identifies and cuts out the group identifier, for example: ite.ingest.fileGroupSplit.pattern=.*_RGN_([A-Z]{3}).*.asn$

Remember that the regular expressions assigned to the ite.ingest.directoryScan.processFilePattern identify the files to load while the ite.ingest.fileGroupSplit.pattern parameter uses parts of the file name to derive a group membership and thus looks at different parts of the file name.