Stochastic Tools Batch Mode

The SamplerFullSolveMultiApp and SamplerTransientMultiApp are capable of running sub-applications in one of three different modes:

  1. normal: One sub-application is created for each row of data (num_rows) supplied by the Sampler object.

  2. batch-reset: sub-applications are created, where the sub-applications are destroyed and re-created (on the same existing MPI communicator) for each row of data supplied by the Sampler object.

  3. batch-restore: sub-applications are created, where the sub-application is backed up after initialization. Then for each row of data supplied by the Sampler object the sub-application is restored to the initial state prior to execution.

For the two "batch" options, indicates the number of applications created, which in the most general expression is given by

where is the num_rows parameter on the Sampler object, is the number of processors (launched to the mpiexec command), and is the min_procs_per_app, or the minimum number of processors which should be used to run each sub-application. If you launch your Stochastic Tools main application with fewer than min_procs_per_app, the simulation will still proceed, but will just use the maximum number of ranks you have provided as the "effective" min_procs_per_app. We can also illustrate this with an example. Table 1 shows the number of applications created () for two different choices of num_rows, min_procs_per_app, and number of processors.

Table 1: Number of sub-apps launched for the "batch" modes for different example choices of num_rows and min_procs_per_app

num_rowsmin_procs_per_appProcessorsNumber of Sub-Apps
4294
20294
4231
4211

All three modes are available when using SamplerFullSolveMultiApp, the "batch-reset" mode is not available for SamplerTransientMultiApp because the sub-application has state that must be maintained as simulation time progresses.

The primary benefit to using a batch mode is to improve performance of a simulation by reducing the memory of the running application. The performance gains depend on the type of sub-application being executed as well as the number of samples being evaluated. The following sections highlight the the performance improvements that may be expected for full solve and transient sub-applications.

Example 1: Full Solve Sub-Application

The first example demonstrates the performance improvements to expect when using SamplerFullSolveMultiApp with sub-applications. In this case, the sub-application solves steady-state diffusion on a unit cube domain with Dirichlet boundary conditions on the left, , and right, , sides of the domain, the complete input file for this problem is given in Listing 1.

Listing 1: Complete input file for steady-state diffusion problem.

[Mesh]
  type = GeneratedMesh
  dim = 3
  nx = 10
  ny = 10
  nz = 10
[]

[Variables]
  [u]
  []
[]

[Kernels]
  [diff]
    type = ADDiffusion
    variable = u
  []
  [time]
    type = ADTimeDerivative
    variable = u
  []
[]

[BCs]
  [left]
    type = DirichletBC
    variable = u
    boundary = left
    value = 0
  []
  [right]
    type = DirichletBC
    variable = u
    boundary = right
    value = 1
  []
[]

[Postprocessors]
  [average]
    type = AverageNodalVariableValue
    variable = u
  []
[]

[Executioner]
  type = Transient
  num_steps = 1
  dt = 0.25
  solve_type = NEWTON
[]

[Controls]
  [receiver]
    type = SamplerReceiver
  []
[]

[Outputs]
[]
(modules/stochastic_tools/examples/batch/sub.i)

The master application does not perform a solve, it performs a stochastic analysis using the MonteCarlo object to perturb the values of the two Dirichlet conditions on the sub-applications to vary with a uniform distribution. The complete input file for the master application is given in Listing 1.

Listing 2: Complete input file for master application that performs stochastic simulations of the steady-state diffusion problem in Listing 1 using Monte Carlo sampling.

[StochasticTools]
[]

[Distributions]
  [uniform]
    type = Uniform
    lower_bound = 1
    upper_bound = 9
  []
[]

[Samplers]
  [mc]
    type = MonteCarlo
    num_rows = 10
    distributions = 'uniform uniform'
  []
[]

[MultiApps]
  [runner]
    type = SamplerFullSolveMultiApp
    sampler = mc
    input_files = 'sub.i'
    mode = batch-restore
  []
[]

[Transfers]
  [runner]
    type = SamplerParameterTransfer
    to_multi_app = runner
    parameters = 'BCs/left/value BCs/right/value'
    sampler = mc
  []
  [data]
    type = SamplerPostprocessorTransfer
    from_multi_app = runner
    to_vector_postprocessor = storage
    from_postprocessor = average
    sampler = mc
  []
[]

[VectorPostprocessors]
  [storage]
    type = StochasticResults
  []
[]

[Postprocessors]
  [total]
    type = MemoryUsage
    execute_on = 'INITIAL TIMESTEP_END'
  []
  [per_proc]
    type = MemoryUsage
    value_type = "average"
    execute_on = 'INITIAL TIMESTEP_END'
  []
  [max_proc]
    type = MemoryUsage
    value_type = "max_process"
    execute_on = 'INITIAL TIMESTEP_END'
  []
[]

[Outputs]
  csv = true
  perf_graph = true
[]
(modules/stochastic_tools/examples/batch/full_solve.i)

The example is executed to demonstrate memory performance of the various modes of operation: "normal", "batch-reset", and "batch-restore". Each mode is executed with increasing number of Monte Carlo samples by setting the "n_samples" parameter of the MonteCarloSampler object. Figure 1 and Figure 2 show the resulting memory use at the end of the simulation for each mode of operation with increasing sample numbers in serial and in parallel, respectively.

Figure 1: Total memory at the end of the simulation using a SamplerFullSolveMultiApp with increasing number of Monte Carlo samples for the three available modes of operation running on a single processor.

Figure 2: Total memory and maximum memory per processor at the end of the simulation using a SamplerFullSolveMultiApp with increasing number of Monte Carlo samples for the three available modes of operation running on 56 processors.

An important feature of the various modes of operation is that run-time is not negatively impacted by changing the mode, in some cases using a batch mode can actually decrease total simulation run time.

The total run time results for the full solve problem in serial and parallel are shown in Figure 3 and Figure 4, respectively. The time shown in these plots is the total simulation time, which encompasses both the simulation initialization and solve. The differences in speed are mainly due to the installation and destruction of the sub-application. When running in 'batch-reset' mode, each data sample causes the sub-application to be created and destroyed during the solve, causing the slowest performance. The 'normal' mode creates all sub-applications up front, and the 'batch-restore' method uses the backup-restore capability to save the state of the sub-applications, thus does not require as many instantiations and has the lowest run-time. For this example, the solve portion is minimal as such the sub-application creation time plays a large role. As the solve time increases time gains can be expected to be minimal.

Figure 3: Total execution time of a simulation using SamplerFullSolveMultiApp with increasing number of Monte Carlo samples for the available modes of operation on a single processor.

Figure 4: Total execution time of a simulation using SamplerFullSolveMultiApp with increasing number of Monte Carlo samples for the available modes of operation on 56 processors.

Example 2: Transient Sub-Application

The second example is nearly identical to the first, except the master application is a transient solve that sets the boundary conditions at the end of each time step. The only difference occurs in the master input file, in the Executioner and MultiApps block, as shown in Listing 3.

Listing 3: Complete input file for a transient master application that performs stochastic simulations of a diffusion problem with time varying boundary conditions using using Monte Carol sampling.

[Executioner]
  type = Transient
  num_steps = 10
[]

[MultiApps]
  [runner]
    type = SamplerFullSolveMultiApp
    sampler = mc
    input_files = 'sub.i'
    execute_on = 'INITIAL TIMESTEP_END'
    mode = batch-restore
  []
[]
(modules/stochastic_tools/examples/batch/transient.i)

The results shown in Figure 5 and Figure 6 include the memory use at the end of the simulation (10 time steps) for each mode of operation within increasing number of samples in serial and parallel. Recall, as mentioned above, that the "batch-reset" mode is not available in the SamplerTransientMultiApp.

Figure 5: Total memory at the end of the simulation using a SamplerTransientMultiApp with increasing number of Monte Carlo samples for the two available modes of operation running on a single processor.

Figure 6: Total memory and maximum memory per processor at the end of the simulation using a SamplerTransientMultiApp with increasing number of Monte Carlo samples for the two available modes of operation running on 56 processors.

Again, an important feature of the various modes of operation is that run-time is not negatively impacted by changing the mode as seen in Figure 7 and Figure 8. The solve portion of this example is significantly longer than the steady-state example. As such the differences in execution time due to the instantiating of objects is diminished and both modes behave similarly.

Figure 7: Total execution time of a simulation using SamplerTransientMultiApp with increasing number of Monte Carlo samples for the available modes of operation on a single processor.

Figure 8: Total execution time of a simulation using SamplerTransientMultiApp with increasing number of Monte Carlo samples for the available modes of operation on 56 processors.