Child pages
  • Bulk Processing with GPT
Skip to end of metadata
Go to start of metadata

Bulk Processing with GPT

This little tutorial gives an introduction on bulk processing with the command shell on Windows and Unix systems. The provided scripts try to stay very generic in order to serve multiple processing requirements. However, not every edge case can be covered. The intention is to cover at least the main use cases. The scripts can probably be improved at multiple points but they can give you a starting point to write your own scripts. If you know improvements to the scripts or have questions regarding the usage of the script you are kindly invited to the BEAM Forum.
A general introduction to GPT and graphs can be found at Creating a GPF Graph.

The four files mentioned below are attached for download.

Table of Contents

The Windows Script

Windows Batch Script (processDataset.bat)
@echo off
:: enable delayed expansion - used to resolve variable in loop
:: variable has to be used with '!' instead of '%'
setlocal ENABLEDELAYEDEXPANSION

::::::::::::::::::::::::::::::::::::::::::::
:: User Configuration
::::::::::::::::::::::::::::::::::::::::::::

:: adapt this path to your needs
set gptPath="C:\Program Files\beam\beam-5.0\bin\gpt.bat"

::::::::::::::::::::::::::::::::::::::::::::
:: Command line handling
::::::::::::::::::::::::::::::::::::::::::::

:: first parameter is a path to the graph xml
set graphXmlPath=%1

:: second parameter is a path to a parameter file
set parameterFilePath=%2

:: use third parameter for path to source products
set sourceDirectory=%3
:: if sourceDirectory ends with '\' remove it
if %sourceDirectory:~-1%==\ set sourceDirectory=%sourceDirectory:~0,-1%

:: use third parameter for path to target products
set targetDirectory=%4
:: if targetDirectory ends with '\' remove it
if %targetDirectory:~-1%==\ set targetDirectory=%targetDirectory:~0,-1%

:: the fourth parameter is a file prefix for the target product name, 
:: typically indicating the type of processing
set targetFilePrefix=%5

:: Create the target directory
md %targetDirectory%

::::::::::::::::::::::::::::::::::::::::::::
:: Main processing
::::::::::::::::::::::::::::::::::::::::::::

:: double '%' in batch file and only a single '%' on command line
for /R %sourceDirectory% %%F in (*.N1) do (
  echo.
  :: '~fF' means abolute path of 'F' 
  set sourceFile=%%~fF
  echo Processing !sourceFile!
  :: '~nF' means filename without extension of 'F'
  set targetFile=%targetDirectory%\%targetFilePrefix%_%%~nF.dim
  set procCmd=%gptPath% %graphXmlPath% -e -p %parameterFilePath% -Ssource="!sourceFile!" -t "!targetFile!"  
  call !procCmd!  
)

The Unix Script

Unix Bash Script (processDataset.bash)
#!/bin/bash
#

############################################
# User Configuration
############################################

# adapt this path to your needs
gptPath="/opt/beam-4.11/bin/gpt.sh"

############################################
# Command line handling
############################################

# first parameter is a path to the graph xml
graphXmlPath="$1"

# second parameter is a path to a parameter file
parameterFilePath="$2"

# use third parameter for path to source products
sourceDirectory="$3"

# use fourth parameter for path to target products
targetDirectory="$4"

# the fifth parameter is a file prefix for the target product name, typically indicating the type of processing
targetFilePrefix="$5"

############################################
# Helper functions
############################################

# Borrowed from http://www.linuxjournal.com/content/normalizing-path-names-bash
function normalizePath() {
    # Remove all /./ sequences.
    local path="${1//\/.\//\/}"
    # Remove first dir/.. sequence.
    local npath=$(echo "$path" | sed -e 's;[^/][^/]*/\.\./;;')
    # Remove remaining dir/.. sequence.
    while [[ "$npath" != "$path" ]]; do
        path="$npath"
        npath=$(echo "$path" | sed -e 's;[^/][^/]*/\.\./;;')
    done
    echo "$path"
}

getAbsolutePath() {
    file="$1"
    if [ "${file:0:1}" = "/" ]; then
        # already absolute
        echo "$file"
        return
    fi
    absfile="$(pwd)/${file}"
    absfile="$(normalizePath "${absfile}")"
    echo "${absfile}"
}

removeExtension() {
    file="$1"
    echo "$(echo "$file" | sed -r 's/\.[^\.]*$//')"
}

############################################
# Main processing
############################################

# Create the target directory
mkdir -p "${targetDirectory}"

IFS=$'\n'
for F in $(ls -1 "${sourceDirectory}"/*.N1); do
  sourceFile="$(getAbsolutePath "$F")"
  targetFile="${targetDirectory}/${targetFilePrefix}_$(removeExtension "${F}").dim"
  procCmd="\"${gptPath}\" \"${graphXmlPath}\" -e -p \"${parameterFilePath}\" -Ssource=\"${sourceFile}\" -t \"${targetFile}\""
  "${procCmd}"
done

Description of the Scripts

  1. Unix: The first line tells the shell what interpreter to use to run the script. Here it is bash.
    Windows: The first thing done in the script is to enable delayed expansion. This allows the evaluation of variables within a loop and is needed when iterating over the source products later on.
  2. Next the path to the gpt batch file is specified. This script is later called to process the data products. The path to the script has to be adapted by the user.
  3. The five parameters which must be given to the script are stored in variables for easier readability.
    1. The path to the XML graph file which defines the processing graph performed on the source data product.
    2. The path to a parameter file. The parameters specified in this file are used to configure graph file. The parameters can be given in the plain properties format format.
    3. The path pointing to the directory which contains the source products.
    4. The path pointing to the directory where the processed data shall be placed.
    5. A file prefix in order to alter the name of the source product and indicate the type of processing.
  4. Unix: Some helper function are defined. They are later used in the main processing section.
  5. An output directory is created to keep hold of the processed data.
  6. Now the iteration over all source products starts. Here only products with the file extension 'N1' are considered.
    • The absolute path of the current source file is retrieved.
    • The path to the target file is compiled using the target directory, the file prefix and the name of the source product without extension. The file extension for the BEAM-DIMAP format is always appended.
    • The command line is assembled using the path to the gpt.bat file, the XML graph file, the parameter file and the source file and target file. The -e option is added in order to get longer messages in case of an error.
    • As last step the command line is executed.

Known Limitations of the Scripts

  • Naming of the target product is limited. Use cases might be that the name should remain or a more complex pattern should be applied.
  • Which products of the source directory are used for processing is currently hard coded (only with '.N1' extension). This should be configurable.
  • The format of the target product is not configurable.

Example Usage

A set of input products shall be processed with the Case2Regional processor.
Therefore a XML graph is defined. For the three parameters outputReflecAs, outputNormReflec, cloudIceExpression, variables are used (reflecAs, normReflec, cloudIceExpr) which are set to the values from the parameter file. Only the smile correction is set to false in this case.

XML Graph File for Case2R (case2r.xml)
<graph id="Case2RGraph">
  <version>1.0</version>
  <node id="case2r">
    <operator>Meris.Case2Regional</operator>
    <sources>
      <source>${source}</source>
    </sources>
    <parameters>
      <doSmileCorrection>false</doSmileCorrection>
      <outputReflecAs>${reflecAs}</outputReflecAs>
      <outputNormReflec>${normReflec}</outputNormReflec>
      <cloudIceExpression>${cloudIceExpr}</cloudIceExpression>
    </parameters>
  </node>
</graph>

In the parameter file the three used parameters are defined.

Parameters File (case2r.properties)
reflecAs=IRRADIANCE_REFLECTANCES
normReflec=true
cloudIceExpr=toa_reflec_14 > 0.025

Now as we have all information we can call the scripts from the command line.

On Windows it is assumed that the source products are located in the directory 'C:\Eodata\toProcess' and the XML graph file and the parameter are in the same directory as the batch file. The processed files will go to 'C:\Eodata\toProcess\output' and have the prefix C2R

>processDataset.bat case2r.xml case2r.properties "C:\Eodata\toProcess" "C:\Eodata\toProcess\output" C2R

On Unix the directory of source products is assumed to be '/Eodata/toProcess' while the XML graph file and the parameter are in the same directory as the script file. The processed files will go to '/Eodata/toProcess/output' and have the prefix C2R

>processDataset.bash case2r.xml case2r.properties "/Eodata/toProcess" "/Eodata/toProcess/output" C2R

www.microsoft.com - General introduction to batch files
www.microsoft.com - Explains usage of the for loop
www.dostips.com - Gives examples for string operations in batch files.
BEAM in batch mode - A presentation introducing the batch mode capabilities of BEAM. It was given at a CoastColour workshop.

  • No labels

1 Comment

  1. Thank you Marco, much clearer now how do it !