Skip to content

How to control memory use and parallism#

The diagnostics packages used by the REF all use Dask to process data that is larger than memory in parallel. By default, Dask uses its threaded scheduler, which may not be optimal for more complicated computations, and it configures this threaded scheduler to use as many worker threads as there are CPU cores on the machine. Because the REF typically runs multiple executors in parallel (see Executors), and if unconfigured, each executor uses as many threads as there are CPU cores, this can lead to excessive memory use and too much parallism, which can cause the system to run out of memory or become slow because of excessive context switching. Inefficient scheduling by the threaded scheduler can also lead to excessive memory use and/or slow computations. Therefore, it is highly recommended that you take a moment to configure Dask for your system. For an in-depth introduction to these topics, see the Dask documentation on configuration and scheduling.

Configuring ESMValTool#

ESMValCore, the framework powering ESMValTool, works best with the Dask Distributed scheduler. It is recommended to set max_parallel_tasks (an ESMValCore setting), to a low number, e.g. 1, 2 or 3, because only one ESMValCore preprocessing task will submit jobs to the Distributed scheduler at a time to avoid overloading the workers. Therefore, the following settings are recommended for ESMValCore:

max_parallel_tasks: 2
dask:
  use: local_distributed
  profiles:
    local_distributed:
      cluster:
        type: distributed.LocalCluster
        n_workers: 2
        threads_per_worker: 2
        memory_limit: 4GiB

These settings should be put in a file with the extension .yaml in the directory ~/.config/esmvaltool, for example: ~/.config/esmvaltool/dask.yaml.

With the settings above, the total memory per REF diagnostic execution will be n_workers * memory_limit = 8GB. It is recommended to use at least 4GB of RAM per Dask Distributed worker. Some diagnostics may be able to run with 2GB per worker, but probably not all of them. You can tune the total memory / CPU use by specifying the number of workers. The number of CPU cores used will be n_workers * threads_per_worker.

Note that the REF may run multiple executors in parallel, and each executor running an ESMValTool diagnostic will use the resources specified above.

More information on how to configure ESMValCore is available in its documentation.

ESMValTool users

If you are using ESMValTool outside of the REF on the same computer, it is highly recommended that you create a separate ESMValTool configuration directory for the version of ESMValTool used by the REF to avoid conflicts, e.g. accidentally using input data that is not managed by the REF. You can do this by setting the ESMVALTOOL_CONFIG_DIR environment variable to a different directory, e.g. ~/.config/esmvaltool-ref, and then creating the Dask configuration file in that directory, e.g. ~/.config/esmvaltool-ref/dask.yaml with the settings described above.

Configuring PMP and ILAMB/IOMB#

Both ILAMB/IOMB and PMP use Dask through Xarray, but they do not expose their own Dask configuration options. Therefore, you need to configure Dask globally for these packages by creating a Dask configuration file, e.g. at ~/.config/dask/config.yaml.

While testing the REF, we have seen occasional crashes when running ILAMB/IOMB and PMP diagnostics with the threaded scheduler, so we recommend using the synchronous scheduler for these diagnostics by adding the following content to the global Dask configuration file:

scheduler: synchronous

For faster processing, you can can use the threaded scheduler with a limited number of worker threads (adjust to your system's resources):

scheduler: threads
num_workers: 4

Note that the REF may run multiple executors in parallel, and each executor running a PMP diagnostic will use the resources specified above.

ILAMB/IOMB diagnostics

The ILAMB/IOMB diagnostics are currently restricted to the synchronous scheduler, so they will not respect the global Dask settings.