Data ingestion
- Identifying input files
- Applications read files from the configured input directory or directories.
- Sorting input files by file name
- When your business logic relies on a certain order to read the input files, you configure scanning of the directory to report the identified files in a sorted sequence.
- Sorting input files by file size
- When your business logic relies on a certain order to read the input files, you configure scanning of the directory to report the identified files in a sorted sequence.
- Sorting input files by file time
- When your business logic relies on a certain order to read the input files, you configure scanning of the directory to report the identified files in a sorted sequence.
- Sorting input files by special file time
- When your business logic relies on a certain order to read the input files, you configure scanning of the directory to report the identified files in a sorted sequence.
- Finding file duplicates
- Occasionally, input files arrive more than once in your application's landing zones.
- Using load distribution to distribute files
- To increase the throughput of your application, you distribute the detected input files to many processing chains, which all work on the data in parallel.
- Distributing files to processing chains defined on job submission
- To increase the throughput of your application, you distribute the detected input files to many processing chains which all work in parallel on the data.
- Using file group split to distribute files
- To increase the throughput of your application, you distribute the detected input files to many processing chains which all work in parallel on the data.
- Choosing a parser
- To read the data from the input files, you need to configure a parser.
- Using many parsers
- Sometimes a single parser does not do the job, for example in use cases where the logic needs to read many different file types.
- Activating the compression parameter for file readers
- Sometimes the input files you get are compressed and you need to extract them before processing.
- Activating the encoding parameter for file readers
- Sometimes the text input files you get use a different encoding and you need to configure your reader.
- Using file preprocessing
- If you cannot derive the file type easily from the file name and some more processing is necessary, the framework provides a preprocessing class that you customize to your needs.