Skip to content

climate_ref_core.output_files #

Raw-file operations on diagnostic execution outputs.

This module groups every operation that manipulates the files produced by a diagnostic execution. Execution output is written to the scratch directory. When the execution is ingested, a curated subset of outputs are copied from the scratch directory to the results directory:

  • logs
  • the metric bundle
  • the output bundle
  • the files the output bundle references (plots/data/html)
  • the series bundle

Only files in the results directory are accessed by the API/public.

For some tests we must sanitise paths to files as well as the contents of text files (:func:to_placeholders / :func:from_placeholders). This ensures that the regression data is machine independent.

PLACEHOLDER_OUTPUT_DIR = '<OUTPUT_DIR>' module-attribute #

Placeholder substituted for the absolute execution output directory.

PLACEHOLDER_TEST_DATA_DIR = '<TEST_DATA_DIR>' module-attribute #

Placeholder substituted for the absolute provider test-data directory.

SANITISED_FILE_GLOBS = ('*.json', '*.txt', '*.yaml', '*.yml', '*.html', '*.xml') module-attribute #

Text artefacts whose absolute paths are rewritten for portability.

Binary outputs (.nc, .png, ...) are never rewritten.

copy_execution_outputs(scratch_directory, results_directory, fragment, result, *, include_log=False) #

Copy the curated set of persisted outputs from scratch to results.

This is the canonical definition of what REF persists for a successful execution,

  • the metric bundle
  • the output bundle
  • every file it references (plots/data/html)
  • the series bundle
  • the execution log (if include_log=True)

Parameters:

Name Type Description Default
scratch_directory Path

Base scratch directory the diagnostic wrote into.

required
results_directory Path

Base results directory to copy the curated subset into.

required
fragment Path | str

The per-execution fragment under both base directories.

required
result ExecutionResult

The successful execution result (must carry a metric bundle filename).

required
include_log bool

If True, copy the execution log.

False

Returns:

Type Description
list[Path]

The copied files, each relative to fragment (the manifest key set).

Source code in packages/climate-ref-core/src/climate_ref_core/output_files.py
def copy_execution_outputs(
    scratch_directory: Path,
    results_directory: Path,
    fragment: Path | str,
    result: ExecutionResult,
    *,
    include_log: bool = False,
) -> list[Path]:
    """
    Copy the curated set of persisted outputs from scratch to results.

    This is the canonical definition of *what REF persists* for a successful execution,

    - the metric bundle
    - the output bundle
    - every file it references (plots/data/html)
    - the series bundle
    - the execution log (if ``include_log=True``)

    Parameters
    ----------
    scratch_directory
        Base scratch directory the diagnostic wrote into.
    results_directory
        Base results directory to copy the curated subset into.
    fragment
        The per-execution fragment under both base directories.
    result
        The successful execution result (must carry a metric bundle filename).
    include_log
        If True, copy the execution log.

    Returns
    -------
    :
        The copied files, each relative to ``fragment`` (the manifest key set).
    """
    if result.metric_bundle_filename is None:
        raise ValueError("Cannot copy outputs for a result without a metric bundle")

    copied: list[Path] = []

    if include_log:
        copied.append(
            copy_output_file(scratch_directory, results_directory, fragment, EXECUTION_LOG_FILENAME)
        )

    copied.append(
        copy_output_file(scratch_directory, results_directory, fragment, result.metric_bundle_filename)
    )

    if result.output_bundle_filename:
        output_bundle_relpath = copy_output_file(
            scratch_directory, results_directory, fragment, result.output_bundle_filename
        )
        copied.append(output_bundle_relpath)
        bundle_path = scratch_directory / fragment / output_bundle_relpath
        copied.extend(_copy_output_bundle_files(scratch_directory, results_directory, fragment, bundle_path))

    if result.series_filename:
        copied.append(
            copy_output_file(scratch_directory, results_directory, fragment, result.series_filename)
        )

    return copied

copy_output_file(scratch_directory, results_directory, fragment, filename) #

Copy a single output file from the scratch directory to the results directory.

Parameters:

Name Type Description Default
scratch_directory Path

The base directory where the file is currently located.

required
results_directory Path

The base directory where the file should be copied to.

required
fragment Path | str

The per-execution fragment under both base directories.

required
filename Path | str

The file to copy, relative to scratch_directory / fragment (an absolute path under that directory is also accepted).

required

Returns:

Type Description
Path

The copied file's path, relative to fragment.

Source code in packages/climate-ref-core/src/climate_ref_core/output_files.py
def copy_output_file(
    scratch_directory: Path,
    results_directory: Path,
    fragment: Path | str,
    filename: Path | str,
) -> Path:
    """
    Copy a single output file from the scratch directory to the results directory.

    Parameters
    ----------
    scratch_directory
        The base directory where the file is currently located.
    results_directory
        The base directory where the file should be copied to.
    fragment
        The per-execution fragment under both base directories.
    filename
        The file to copy, relative to ``scratch_directory / fragment`` (an absolute
        path under that directory is also accepted).

    Returns
    -------
    :
        The copied file's path, relative to ``fragment``.
    """
    if results_directory == scratch_directory:
        raise ValueError("results_directory and scratch_directory must differ")

    input_directory = scratch_directory / fragment
    output_directory = results_directory / fragment

    relative_filename = ensure_relative_path(filename, input_directory)

    if not (input_directory / relative_filename).exists():
        raise FileNotFoundError(f"Could not find {relative_filename} in {input_directory}")

    output_filename = output_directory / relative_filename
    output_filename.parent.mkdir(parents=True, exist_ok=True)

    shutil.copy(input_directory / relative_filename, output_filename)
    return relative_filename

from_placeholders(directory, *, output_dir, test_data_dir, globs=SANITISED_FILE_GLOBS) #

Rewrite portable placeholders back to absolute paths ("from").

Inverse of :func:to_placeholders: replaces <OUTPUT_DIR> with the absolute output_dir and <TEST_DATA_DIR> with the absolute test_data_dir in every text artefact under directory. Binary files are never touched.

Parameters:

Name Type Description Default
directory Path

The tree of artefacts to hydrate in place.

required
output_dir Path

The absolute execution output directory to substitute in.

required
test_data_dir Path

The absolute provider test-data directory to substitute in.

required
globs tuple[str, ...]

File globs whose contents are rewritten.

SANITISED_FILE_GLOBS
Source code in packages/climate-ref-core/src/climate_ref_core/output_files.py
def from_placeholders(
    directory: Path,
    *,
    output_dir: Path,
    test_data_dir: Path,
    globs: tuple[str, ...] = SANITISED_FILE_GLOBS,
) -> None:
    """
    Rewrite portable placeholders back to absolute paths ("from").

    Inverse of :func:`to_placeholders`: replaces ``<OUTPUT_DIR>`` with the absolute ``output_dir``
    and ``<TEST_DATA_DIR>`` with the absolute ``test_data_dir`` in every text artefact under ``directory``.
    Binary files are never touched.

    Parameters
    ----------
    directory
        The tree of artefacts to hydrate in place.
    output_dir
        The absolute execution output directory to substitute in.
    test_data_dir
        The absolute provider test-data directory to substitute in.
    globs
        File globs whose contents are rewritten.
    """
    rewrite_tree(
        directory,
        {PLACEHOLDER_OUTPUT_DIR: str(output_dir), PLACEHOLDER_TEST_DATA_DIR: str(test_data_dir)},
        globs,
    )

rewrite_tree(directory, replacements, globs=SANITISED_FILE_GLOBS) #

Apply replacements to the text content of every matching file under directory.

Keys are applied longest-first so that an overlapping shorter path cannot partially shadow a longer one. Only files matching globs are rewritten while binary artefacts are never touched.

Parameters:

Name Type Description Default
directory Path

The tree of files to rewrite in place.

required
replacements dict[str, str]

Mapping of substring to replacement, applied to every matching file.

required
globs tuple[str, ...]

File globs whose contents are rewritten.

SANITISED_FILE_GLOBS
Source code in packages/climate-ref-core/src/climate_ref_core/output_files.py
def rewrite_tree(
    directory: Path,
    replacements: dict[str, str],
    globs: tuple[str, ...] = SANITISED_FILE_GLOBS,
) -> None:
    """Apply ``replacements`` to the text content of every matching file under ``directory``.

    Keys are applied longest-first so that an overlapping shorter path cannot
    partially shadow a longer one.
    Only files matching ``globs`` are rewritten while binary artefacts are never touched.

    Parameters
    ----------
    directory
        The tree of files to rewrite in place.
    replacements
        Mapping of substring to replacement, applied to every matching file.
    globs
        File globs whose contents are rewritten.
    """
    ordered = sorted(replacements.items(), key=lambda kv: len(kv[0]), reverse=True)
    for glob in globs:
        for file in sorted(directory.rglob(glob)):
            text = file.read_text(encoding="utf-8")
            rewritten = text
            for old, new in ordered:
                rewritten = rewritten.replace(old, new)
            if rewritten != text:
                file.write_text(rewritten, encoding="utf-8")

to_placeholders(directory, *, output_dir, test_data_dir, globs=SANITISED_FILE_GLOBS) #

Rewrite absolute paths in committed artefacts to portable placeholders ("to").

Replaces the absolute output_dir with <OUTPUT_DIR> and the absolute test_data_dir with <TEST_DATA_DIR> in every text artefact under directory. Binary files are never touched.

Parameters:

Name Type Description Default
directory Path

The tree of committed artefacts to sanitise in place.

required
output_dir Path

The absolute execution output directory.

required
test_data_dir Path

The absolute provider test-data directory.

required
globs tuple[str, ...]

File globs whose contents are rewritten.

SANITISED_FILE_GLOBS
Source code in packages/climate-ref-core/src/climate_ref_core/output_files.py
def to_placeholders(
    directory: Path,
    *,
    output_dir: Path,
    test_data_dir: Path,
    globs: tuple[str, ...] = SANITISED_FILE_GLOBS,
) -> None:
    """
    Rewrite absolute paths in committed artefacts to portable placeholders ("to").

    Replaces the absolute ``output_dir`` with ``<OUTPUT_DIR>`` and the absolute
    ``test_data_dir`` with ``<TEST_DATA_DIR>`` in every text artefact under ``directory``.
    Binary files are never touched.

    Parameters
    ----------
    directory
        The tree of committed artefacts to sanitise in place.
    output_dir
        The absolute execution output directory.
    test_data_dir
        The absolute provider test-data directory.
    globs
        File globs whose contents are rewritten.
    """
    rewrite_tree(
        directory,
        {str(output_dir): PLACEHOLDER_OUTPUT_DIR, str(test_data_dir): PLACEHOLDER_TEST_DATA_DIR},
        globs,
    )