Skip to content

climate_ref_core.regression.compare #

Content comparison utilities for regression testing.

Tolerance #

Float comparison tolerance for bundle regression checks.

Parameters:

Name Type Description Default
rtol

Relative tolerance — the allowed proportional difference between expected and actual float values (default 1e-6).

required
atol

Absolute tolerance — the floor difference always permitted, regardless of magnitude (default 1e-8).

required
Source code in packages/climate-ref-core/src/climate_ref_core/regression/compare.py
@frozen
class Tolerance:
    """
    Float comparison tolerance for bundle regression checks.

    Parameters
    ----------
    rtol
        Relative tolerance — the allowed proportional difference between expected
        and actual float values (default ``1e-6``).
    atol
        Absolute tolerance — the floor difference always permitted,
        regardless of magnitude (default ``1e-8``).
    """

    rtol: float = 1e-6
    atol: float = 1e-8

assert_bundle_regression(expected_path, actual_path, *, slug, tol=Tolerance(), replacements) #

Assert that a regenerated committed-bundle JSON file matches the committed copy.

Algorithm:

  1. Byte-equal fast path — if the raw bytes of both files are identical, return immediately.
  2. Parse both files as JSON.
  3. Rewrite both dict keys and leaf string values in the regenerated (actual) document using replacements, applied longest-key-first so that a shorter placeholder cannot pre-empt a longer overlapping one.
  4. Call :func:compare_json_content with the given tolerance.
  5. Raise AssertionError with the full mismatch list and a remediation hint if any mismatches are found.

The replacements map follows the convention used throughout the testing infrastructure: keys are real runtime values (absolute paths, recipe-dir timestamps), values are the committed-bundle placeholders (e.g. "<OUTPUT_DIR>", "<TEST_DATA_DIR>", "<RECIPE_RUN>"). Only the actual document is rewritten: the committed expected file is assumed to already contain placeholders, which :func:~climate_ref_core.regression.capture.write_committed_bundle guarantees at capture time. A hand-edited baseline with raw paths will surface as ordinary value mismatches.

Both <OUTPUT_DIR> and <RECIPE_RUN> participate in dict-KEY rewriting because ESMValTool's output.json embeds absolute paths as object keys.

Parameters:

Name Type Description Default
expected_path Path

Path to the committed (on-disk) bundle file containing placeholders.

required
actual_path Path

Path to the regenerated bundle file containing real runtime paths.

required
slug str

Diagnostic slug used in error messages.

required
tol Tolerance

Float comparison tolerance.

Tolerance()
replacements dict[str, str]

Mapping of {real_value: placeholder} applied to the actual document.

required

Raises:

Type Description
AssertionError

If the bundles differ beyond tolerance after sanitisation.

Notes

If expected_path does not exist (a legacy regression without a committed bundle), the comparison is skipped silently and the function returns.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/compare.py
def assert_bundle_regression(
    expected_path: Path,
    actual_path: Path,
    *,
    slug: str,
    tol: Tolerance = Tolerance(),
    replacements: dict[str, str],
) -> None:
    """
    Assert that a regenerated committed-bundle JSON file matches the committed copy.

    Algorithm:

    1. **Byte-equal fast path** — if the raw bytes of both files are identical, return immediately.
    2. Parse both files as JSON.
    3. Rewrite both dict *keys* and leaf string *values* in the regenerated
       (``actual``) document using ``replacements``, applied **longest-key-first**
       so that a shorter placeholder cannot pre-empt a longer overlapping one.
    4. Call :func:`compare_json_content` with the given tolerance.
    5. Raise ``AssertionError`` with the full mismatch list and a remediation hint
       if any mismatches are found.

    The replacements map follows the convention used throughout the testing
    infrastructure: keys are real runtime values (absolute paths, recipe-dir
    timestamps), values are the committed-bundle placeholders (e.g.
    ``"<OUTPUT_DIR>"``, ``"<TEST_DATA_DIR>"``, ``"<RECIPE_RUN>"``).
    Only the *actual* document is rewritten: the committed *expected* file is
    assumed to already contain placeholders, which
    :func:`~climate_ref_core.regression.capture.write_committed_bundle` guarantees
    at capture time. A hand-edited baseline with raw paths will surface as ordinary
    value mismatches.

    Both ``<OUTPUT_DIR>`` and ``<RECIPE_RUN>`` participate in dict-KEY rewriting
    because ESMValTool's ``output.json`` embeds absolute paths as object keys.

    Parameters
    ----------
    expected_path
        Path to the committed (on-disk) bundle file containing placeholders.
    actual_path
        Path to the regenerated bundle file containing real runtime paths.
    slug
        Diagnostic slug used in error messages.
    tol
        Float comparison tolerance.
    replacements
        Mapping of ``{real_value: placeholder}`` applied to the actual document.

    Raises
    ------
    AssertionError
        If the bundles differ beyond tolerance after sanitisation.

    Notes
    -----
    If ``expected_path`` does not exist (a legacy regression without a committed bundle),
    the comparison is skipped silently and the function returns.
    """
    if not expected_path.exists():
        # Legacy regression data without this bundle file — skip silently
        return

    expected_bytes = expected_path.read_bytes()
    actual_bytes = actual_path.read_bytes()

    # Fast path: byte-identical means no difference.
    if expected_bytes == actual_bytes:
        return

    expected_obj = json.loads(expected_bytes.decode("utf-8"))
    actual_obj = json.loads(actual_bytes.decode("utf-8"))

    # Rewrite actual — both keys and leaf values — before comparison.
    actual_sanitised = _rewrite_keys_and_values(actual_obj, ordered_replacements(replacements))

    mismatches = compare_json_content(expected_obj, actual_sanitised, tol=tol)
    if mismatches:
        mismatch_detail = "\n  ".join(mismatches)
        msg = (
            f"Diagnostic {slug!r}: committed bundle {expected_path.name!r} "
            f"differs from regenerated output after sanitisation.\n"
            f"Mismatches ({len(mismatches)}):\n  {mismatch_detail}\n\n"
            f"Remediation: if the change is intentional, bump the diagnostic's "
            f"test_case_version and regenerate with `ref test-cases run --force-regen`."
        )
        raise AssertionError(msg)

compare_json_content(expected, actual, *, tol, path='') #

Recursively compare two parsed JSON values with float tolerance.

Rules: - Floats: compared with relative tolerance tol.rtol and absolute tolerance tol.atol. - Ints, strings, bools, None: exact equality. - Lists: element-by-element, same length required. - Dicts: key sets must match; values compared recursively.

Parameters:

Name Type Description Default
expected Any

The reference (committed) parsed JSON value.

required
actual Any

The regenerated parsed JSON value.

required
tol Tolerance

Float comparison tolerance.

required
path str

Dot-/bracket-notation path prefix for error messages (empty at top level).

''

Returns:

Type Description
list[str]

A list of human-readable mismatch descriptions. An empty list means the values are equivalent within tolerance.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/compare.py
def compare_json_content(
    expected: Any,
    actual: Any,
    *,
    tol: Tolerance,
    path: str = "",
) -> list[str]:
    """
    Recursively compare two parsed JSON values with float tolerance.

    Rules:
    - **Floats**: compared with relative tolerance ``tol.rtol``
      and absolute tolerance ``tol.atol``.
    - **Ints, strings, bools, ``None``**: exact equality.
    - **Lists**: element-by-element, same length required.
    - **Dicts**: key sets must match; values compared recursively.

    Parameters
    ----------
    expected
        The reference (committed) parsed JSON value.
    actual
        The regenerated parsed JSON value.
    tol
        Float comparison tolerance.
    path
        Dot-/bracket-notation path prefix for error messages (empty at top level).

    Returns
    -------
    :
        A list of human-readable mismatch descriptions.
        An empty list means the values are equivalent within tolerance.
    """
    mismatches: list[str] = []
    _compare_recursive(expected, actual, tol=tol, path=path, out=mismatches)
    return mismatches