Bulk Processing with GPT
This little tutorial gives an introduction on bulk processing with the command shell on Windows and Unix systems. The provided scripts try to stay very generic in order to serve multiple processing requirements. However, not every edge case can be covered. The intention is to cover at least the main use cases. The scripts can probably be improved at multiple points but they can give you a starting point to write your own scripts. If you know improvements to the scripts or have questions regarding the usage of the script you are kindly invited to the BEAM Forum.
A general introduction to GPT and graphs can be found at Creating a GPF Graph.
The four files mentioned below are attached for download.
- processDataset.bat (Windows)
- processDataset.bash (Unix)
- case2r.xml
- case2r.properties
Table of Contents
The Windows Script
The Unix Script
Description of the Scripts
- Unix: The first line tells the shell what interpreter to use to run the script. Here it is bash.
Windows: The first thing done in the script is to enable delayed expansion. This allows the evaluation of variables within a loop and is needed when iterating over the source products later on. - Next the path to the gpt batch file is specified. This script is later called to process the data products. The path to the script has to be adapted by the user.
- The four parameters which must be given to the script are stored in variables for easier readability.
- The path to the XML graph file which defines the processing graph performed on the source data product.
- The path to a parameter file. The parameters specified in this file are used to configure graph file. The parameters can be given in the plain properties format format.
- The path pointing to the directory which contains the source products.
- A file prefix in order to alter the name of the source product and indicate the type of processing.
- Unix: Some helper function are defined. They are later used in the main processing section.
- An output directory is created to keep hold of the processed data.
- Now the iteration over all source products starts. Here only products with the file extension 'N1' are considered.
- The absolute path of the current source file is retrieved.
- The path to the target file is compiled using the target directory, the file prefix and the name of the source product without extension. The file extension for the BEAM-DIMAP format is always appended.
- The command line is assembled using the path to the gpt.bat file, the XML graph file, the parameter file and the source file and target file. The
-eoption is added in order to get longer messages in case of an error. - As last step the command line is executed.
Known Limitations of the Scripts
- Naming of the target product is limited. Use cases might by that the name should remain or a more complex pattern should be applied.
- Which products of the source directory are used for processing is currently hard coded (only with '.N1' extension). This should be configurable.
- The format of the target product is not configurable.
Example Usage
A set of input products shall be processed with the Case2Regional processor.
Therefore a XML graph is defined. For the three parameters outputReflecAs, outputNormReflec, cloudIceExpression, variables are used (reflecAs, normReflec, cloudIceExpr) which are set to the values from the parameter file. Only the smile correction is set to false in this case.
In the parameter file the three used parameters are defined.
Now as we have all information we can call the scripts from the command line.
On Windows it is assumed that the source products are located in the directory 'C:\Eodata\toProcess' and the XML graph file and the parameter are in the same directory as the batch file.
>processDataset.bat case2r.xml case2r.properties "C:\Eodata\toProcess" C2R
After executing this command the processed data can be found in 'C:\Eodata\toProcess\output'
On Unix the directory of source products is assumed to be '/Eodata/toProcess' whilethe XML graph file and the parameter are in the same directory as the script file.>processDataset.bash case2r.xml case2r.properties "/Eodata/toProcess" C2R
After executing this command the processed data can be found in '/Eodata/toProcess/output'
Helpful Links
www.microsoft.com - General introduction to batch files
www.microsoft.com - Explains usage of the for loop
www.dostips.com - Gives examples for string operations in batch files.
BEAM in batch mode - A presentation introducing the batch mode capabilities of BEAM. It was given at a CoastColour workshop.
1 Comment
Hide/Show CommentsNov 10, 2011
Jose M. Beltran
Thank you Marco, much clearer now how do it !