Ensuring processing of files

In a running production system, you occasionally encounter the condition that incoming data cannot be enriched as the enrichment data is not using the latest data updates. In your custom logic, you write these data records into a new file and submit the file to the input directory to reprocess the data later. Unfortunately, deduplication sorts out those resubmitted file names. To circumvent the deduplication and ensure the reprocessing of these data records, the framework defines a parameter that is set to a regular expression that defines the name schema for reprocessing files.

About this task

Define a regular expression to identify input files that are reprocessed. For the procedure assume that the process file pattern is set as follows: ite.ingest.directoryScan.processFilePattern=.*_[0-9]{14}.asn$

Procedure

  1. In the file <PathToYourApplication>/config/config.cfg, find the ite.ingest.deduplication.reprocessFilePattern parameter description
  2. To enable the custom file statistics, set the parameter to on: ite.ingest.deduplication.reprocessFilePattern=.*_[0-9]{14}_reproc[0-9]+.asn$

File names that match this pattern bypass the duplicate check of the file ingestion logic and are processed again.