Skip to content

climate_ref_core.regression #

Regression-baseline primitives for diagnostic test cases.

This package holds the building blocks for the two-bundle regression model, where we capture and commit a small, sanitised committed bundle of artefacts for a test case. These committed bundles are tracked in git and form the regression baseline for the test case.

We also snapshot the native files persisted by the execution. These files are typically large and not always portable, so we keep them out of git and refer to them by their sha256 digest in the committed bundle. These data will be able to be fetched from an object store in CI and replayed locally for debugging.

COMMITTED_BUNDLE_FILES = ('series.json', 'diagnostic.json', 'output.json') module-attribute #

The committed CMEC artefacts tracked in git.

Their digests are tracked in :attr:Manifest.committed.

SCHEMA_VERSION = 1 module-attribute #

Current manifest schema version.

Action #

Bases: Enum

The verification action the CI gate selects for a single test case.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/gate.py
class Action(enum.Enum):
    """The verification action the CI gate selects for a single test case."""

    SKIP = "skip"
    """Nothing relevant changed, or the case is not under regression management."""

    REPLAY = "replay"
    """Replay the cached native baseline and compare to the committed bundle."""

    EXECUTE = "execute"
    """Re-run the diagnostic end-to-end and compare to the committed bundle."""

    FAIL = "fail"
    """The change is not permissible (unauthorised baseline change or bad version)."""

EXECUTE = 'execute' class-attribute instance-attribute #

Re-run the diagnostic end-to-end and compare to the committed bundle.

FAIL = 'fail' class-attribute instance-attribute #

The change is not permissible (unauthorised baseline change or bad version).

REPLAY = 'replay' class-attribute instance-attribute #

Replay the cached native baseline and compare to the committed bundle.

SKIP = 'skip' class-attribute instance-attribute #

Nothing relevant changed, or the case is not under regression management.

GateDecision #

The gate's decision for one test case: an :class:Action and why.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/gate.py
@frozen
class GateDecision:
    """The gate's decision for one test case: an :class:`Action` and why."""

    action: Action
    """The selected verification action."""

    reason: str
    """A human-readable explanation, surfaced in CI logs."""

action instance-attribute #

The selected verification action.

reason instance-attribute #

A human-readable explanation, surfaced in CI logs.

LocalFilesystemStore #

Content-addressed blob store backed by a local filesystem directory.

Intended for tests and development. Supports both read and write without credentials.

Blobs are stored under root using a two-level layout::

<root>/<digest[:2]>/<digest>

This mirrors the git object-store convention, keeping individual subdirectories manageable for large collections.

Parameters:

Name Type Description Default
root

Root directory for the content-addressed store. Created on first write if it does not exist.

required
Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
@frozen
class LocalFilesystemStore:
    """
    Content-addressed blob store backed by a local filesystem directory.

    Intended for tests and development.
    Supports both read and write without credentials.

    Blobs are stored under ``root`` using a two-level layout::

        <root>/<digest[:2]>/<digest>

    This mirrors the git object-store convention,
    keeping individual subdirectories manageable for large collections.

    Parameters
    ----------
    root
        Root directory for the content-addressed store.
        Created on first write if it does not exist.
    """

    root: Path

    def _blob_path(self, digest: str) -> Path:
        """Return the canonical on-disk path for a blob with the given digest.

        The digest is validated as 64-character lowercase hex first,
        so a malformed or hostile digest cannot be used to construct a path outside the store root.
        """
        _validate_digest(digest)
        return self.root / digest[:2] / digest

    def has(self, digest: str) -> bool:
        """
        Return ``True`` if the blob is present on disk.

        Parameters
        ----------
        digest
            The sha256 hex digest of the blob.

        Returns
        -------
        :
            ``True`` when the blob file exists at its canonical path.
        """
        return self._blob_path(digest).exists()

    def fetch(self, digest: str, dest: Path) -> None:
        """
        Copy the blob to ``dest`` and verify its sha256 matches ``digest``.

        Parameters
        ----------
        digest
            The sha256 hex digest of the blob to fetch.
        dest
            Destination path to write the blob to.
            Parent directories are created if they do not exist.

        Raises
        ------
        FileNotFoundError
            If the blob is not present in the store.
        ValueError
            If the blob's sha256 does not match ``digest``.
        """
        blob = self._blob_path(digest)
        if not blob.exists():
            raise FileNotFoundError(f"Blob {digest!r} not found in local store at {self.root}")
        dest.parent.mkdir(parents=True, exist_ok=True)
        shutil.copy2(str(blob), str(dest))
        _verify_hash_matches(dest, digest)
        logger.debug(f"LocalFilesystemStore.fetch: {digest} -> {dest}")

    def put(self, path: Path) -> str:
        """
        Store the file at ``path`` in the content-addressed store.

        Computes the sha256 digest, copies the file to its canonical location, and returns the digest.
        If a blob with the same digest already exists, the copy is skipped.

        Parameters
        ----------
        path
            Path to the file to store.

        Returns
        -------
        :
            The sha256 hex digest of the stored blob.
        """
        digest = sha256_file(path)
        blob = self._blob_path(digest)
        if not blob.exists():
            blob.parent.mkdir(parents=True, exist_ok=True)
            shutil.copy2(str(path), str(blob))
            logger.debug(f"LocalFilesystemStore.put: {path} -> {blob}")
        else:
            logger.debug(f"LocalFilesystemStore.put: {digest} already present, skipping copy")
        return digest

    def preflight(self) -> None:
        """
        Verify the store root exists (creating it if needed) and is writable.

        Raises
        ------
        NativeStoreUnavailableError
            If the root cannot be created or is not writable.
        """
        try:
            self.root.mkdir(parents=True, exist_ok=True)
        except OSError as exc:
            raise NativeStoreUnavailableError(
                f"Local native store root {self.root} could not be created: {exc}"
            ) from exc
        if not os.access(self.root, os.W_OK):
            raise NativeStoreUnavailableError(f"Local native store root {self.root} is not writable.")
        logger.debug(f"Local native store ready at {self.root}")

fetch(digest, dest) #

Copy the blob to dest and verify its sha256 matches digest.

Parameters:

Name Type Description Default
digest str

The sha256 hex digest of the blob to fetch.

required
dest Path

Destination path to write the blob to. Parent directories are created if they do not exist.

required

Raises:

Type Description
FileNotFoundError

If the blob is not present in the store.

ValueError

If the blob's sha256 does not match digest.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def fetch(self, digest: str, dest: Path) -> None:
    """
    Copy the blob to ``dest`` and verify its sha256 matches ``digest``.

    Parameters
    ----------
    digest
        The sha256 hex digest of the blob to fetch.
    dest
        Destination path to write the blob to.
        Parent directories are created if they do not exist.

    Raises
    ------
    FileNotFoundError
        If the blob is not present in the store.
    ValueError
        If the blob's sha256 does not match ``digest``.
    """
    blob = self._blob_path(digest)
    if not blob.exists():
        raise FileNotFoundError(f"Blob {digest!r} not found in local store at {self.root}")
    dest.parent.mkdir(parents=True, exist_ok=True)
    shutil.copy2(str(blob), str(dest))
    _verify_hash_matches(dest, digest)
    logger.debug(f"LocalFilesystemStore.fetch: {digest} -> {dest}")

has(digest) #

Return True if the blob is present on disk.

Parameters:

Name Type Description Default
digest str

The sha256 hex digest of the blob.

required

Returns:

Type Description
bool

True when the blob file exists at its canonical path.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def has(self, digest: str) -> bool:
    """
    Return ``True`` if the blob is present on disk.

    Parameters
    ----------
    digest
        The sha256 hex digest of the blob.

    Returns
    -------
    :
        ``True`` when the blob file exists at its canonical path.
    """
    return self._blob_path(digest).exists()

preflight() #

Verify the store root exists (creating it if needed) and is writable.

Raises:

Type Description
NativeStoreUnavailableError

If the root cannot be created or is not writable.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def preflight(self) -> None:
    """
    Verify the store root exists (creating it if needed) and is writable.

    Raises
    ------
    NativeStoreUnavailableError
        If the root cannot be created or is not writable.
    """
    try:
        self.root.mkdir(parents=True, exist_ok=True)
    except OSError as exc:
        raise NativeStoreUnavailableError(
            f"Local native store root {self.root} could not be created: {exc}"
        ) from exc
    if not os.access(self.root, os.W_OK):
        raise NativeStoreUnavailableError(f"Local native store root {self.root} is not writable.")
    logger.debug(f"Local native store ready at {self.root}")

put(path) #

Store the file at path in the content-addressed store.

Computes the sha256 digest, copies the file to its canonical location, and returns the digest. If a blob with the same digest already exists, the copy is skipped.

Parameters:

Name Type Description Default
path Path

Path to the file to store.

required

Returns:

Type Description
str

The sha256 hex digest of the stored blob.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def put(self, path: Path) -> str:
    """
    Store the file at ``path`` in the content-addressed store.

    Computes the sha256 digest, copies the file to its canonical location, and returns the digest.
    If a blob with the same digest already exists, the copy is skipped.

    Parameters
    ----------
    path
        Path to the file to store.

    Returns
    -------
    :
        The sha256 hex digest of the stored blob.
    """
    digest = sha256_file(path)
    blob = self._blob_path(digest)
    if not blob.exists():
        blob.parent.mkdir(parents=True, exist_ok=True)
        shutil.copy2(str(path), str(blob))
        logger.debug(f"LocalFilesystemStore.put: {path} -> {blob}")
    else:
        logger.debug(f"LocalFilesystemStore.put: {digest} already present, skipping copy")
    return digest

Manifest #

The on-disk manifest for a test case regression bundle.

Serialised as manifest.json with stable key ordering and a trailing newline, so repeated dumps are byte-identical.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/manifest.py
@frozen
class Manifest:
    """
    The on-disk manifest for a test case regression bundle.

    Serialised as ``manifest.json`` with stable key ordering and a trailing newline,
    so repeated dumps are byte-identical.
    """

    schema: int
    """Manifest schema version; equals :data:`SCHEMA_VERSION` for current manifests."""

    test_case_version: int
    """Monotonic, author-bumped version coupling the bundle to its native outputs."""

    committed: dict[str, str]
    """Digests of committed regression JSON artefacts: ``{relpath: sha256}``."""

    native: dict[str, NativeEntry]
    """Digests of curated native output files: ``{relpath: NativeEntry}``."""

    catalog_hash: str | None = None
    """Hash of the test case input ``catalog.yaml`` (its ``_metadata.hash``) that produced this baseline.
    This couples the baseline to its inputs.
    The CI gate fails a case whose live catalog hash no longer matches this value.
    """

    @classmethod
    def load(cls, path: Path) -> Manifest:
        """
        Load a manifest from ``manifest.json``.

        Parameters
        ----------
        path
            Path to the manifest file.

        Returns
        -------
        :
            The parsed manifest.

        Raises
        ------
        ValueError
            If the manifest is missing required keys or has malformed native entries
            (e.g. hand-edited or written by an incompatible version).
        """
        return cls.loads(path.read_text(encoding="utf-8"), source=str(path))

    @classmethod
    def loads(cls, text: str, *, source: str = "<string>") -> Manifest:
        """
        Parse a manifest from its JSON text.

        Used when the manifest does not live on disk at parse time,
        e.g. when reading the base-branch copy via ``git show`` for the CI coupling gate.

        Parameters
        ----------
        text
            The manifest JSON.
        source
            A label for the text's origin, used in error messages.

        Returns
        -------
        :
            The parsed manifest.

        Raises
        ------
        ValueError
            If the manifest is missing required keys or has malformed native entries
            (e.g. hand-edited or written by an incompatible version).
        """
        data = json.loads(text)
        missing = [key for key in ("schema", "test_case_version", "committed", "native") if key not in data]
        if missing:
            raise ValueError(
                f"Invalid manifest {source}: missing required keys {missing}. "
                "The manifest may be corrupted or written by an incompatible version; "
                "regenerate it with `ref test-cases run --force-regen`."
            )
        schema = data["schema"]
        if isinstance(schema, bool) or not isinstance(schema, int) or schema != SCHEMA_VERSION:
            raise ValueError(
                f"Invalid manifest {source}: unsupported schema {schema!r}, "
                f"expected {SCHEMA_VERSION}. The manifest was written by an incompatible "
                "version; regenerate it with `ref test-cases run --force-regen`."
            )
        try:
            native = {
                relpath: NativeEntry(sha256=entry["sha256"], size=entry["size"])
                for relpath, entry in data["native"].items()
            }
        except (KeyError, TypeError, AttributeError) as exc:
            raise ValueError(
                f"Invalid manifest {source}: malformed 'native' entry ({exc!r}). "
                "Each entry must be a mapping with 'sha256' and 'size' keys."
            ) from exc
        # Reject hand-edited or hostile manifests that could escape the
        # destination directory or carry a malformed digest when materialised.
        for relpath, entry in native.items():
            try:
                safe_path(relpath, label="native path")
            except ValueError as exc:
                raise ValueError(f"Invalid manifest {source}: {exc}") from exc
            _validate_digest(entry.sha256)
            if isinstance(entry.size, bool) or not isinstance(entry.size, int) or entry.size < 0:
                raise ValueError(
                    f"Invalid manifest {source}: native entry {relpath!r} has invalid size "
                    f"{entry.size!r}; expected a non-negative integer."
                )
        return cls(
            schema=data["schema"],
            test_case_version=data["test_case_version"],
            committed=dict(data["committed"]),
            native=native,
            # TODO: remove optonality when all manifests have this field.
            catalog_hash=data.get("catalog_hash"),
        )

    def dump(self, path: Path) -> None:
        """
        Write the manifest to ``manifest.json``.

        Keys are stably ordered (``sort_keys=True``) and a trailing newline is added,
        so ``dump`` followed by ``load`` round-trips byte-identically.

        Parameters
        ----------
        path
            Path to write the manifest to.
        """
        payload = {
            "schema": self.schema,
            "test_case_version": self.test_case_version,
            "catalog_hash": self.catalog_hash,
            "committed": self.committed,
            "native": {relpath: asdict(entry) for relpath, entry in self.native.items()},
        }
        text = json.dumps(payload, indent=2, sort_keys=True) + "\n"
        path.write_text(text, encoding="utf-8")

    @classmethod
    def seed_v1(cls, committed_digests: dict[str, str], catalog_hash: str | None = None) -> Manifest:
        """
        Create an initial manifest at ``test_case_version == 1`` with no native outputs.

        Parameters
        ----------
        committed_digests
            Digests of the committed regression JSON artefacts.
        catalog_hash
            Hash of the input ``catalog.yaml`` that produced the baseline, if known.

        Returns
        -------
        :
            A fresh manifest with ``test_case_version=1`` and ``native={}``.
        """
        return cls(
            schema=SCHEMA_VERSION,
            test_case_version=1,
            committed=dict(committed_digests),
            catalog_hash=catalog_hash,
            native={},
        )

catalog_hash = None class-attribute instance-attribute #

Hash of the test case input catalog.yaml (its _metadata.hash) that produced this baseline. This couples the baseline to its inputs. The CI gate fails a case whose live catalog hash no longer matches this value.

committed instance-attribute #

Digests of committed regression JSON artefacts: {relpath: sha256}.

native instance-attribute #

Digests of curated native output files: {relpath: NativeEntry}.

schema instance-attribute #

Manifest schema version; equals :data:SCHEMA_VERSION for current manifests.

test_case_version instance-attribute #

Monotonic, author-bumped version coupling the bundle to its native outputs.

dump(path) #

Write the manifest to manifest.json.

Keys are stably ordered (sort_keys=True) and a trailing newline is added, so dump followed by load round-trips byte-identically.

Parameters:

Name Type Description Default
path Path

Path to write the manifest to.

required
Source code in packages/climate-ref-core/src/climate_ref_core/regression/manifest.py
def dump(self, path: Path) -> None:
    """
    Write the manifest to ``manifest.json``.

    Keys are stably ordered (``sort_keys=True``) and a trailing newline is added,
    so ``dump`` followed by ``load`` round-trips byte-identically.

    Parameters
    ----------
    path
        Path to write the manifest to.
    """
    payload = {
        "schema": self.schema,
        "test_case_version": self.test_case_version,
        "catalog_hash": self.catalog_hash,
        "committed": self.committed,
        "native": {relpath: asdict(entry) for relpath, entry in self.native.items()},
    }
    text = json.dumps(payload, indent=2, sort_keys=True) + "\n"
    path.write_text(text, encoding="utf-8")

load(path) classmethod #

Load a manifest from manifest.json.

Parameters:

Name Type Description Default
path Path

Path to the manifest file.

required

Returns:

Type Description
Manifest

The parsed manifest.

Raises:

Type Description
ValueError

If the manifest is missing required keys or has malformed native entries (e.g. hand-edited or written by an incompatible version).

Source code in packages/climate-ref-core/src/climate_ref_core/regression/manifest.py
@classmethod
def load(cls, path: Path) -> Manifest:
    """
    Load a manifest from ``manifest.json``.

    Parameters
    ----------
    path
        Path to the manifest file.

    Returns
    -------
    :
        The parsed manifest.

    Raises
    ------
    ValueError
        If the manifest is missing required keys or has malformed native entries
        (e.g. hand-edited or written by an incompatible version).
    """
    return cls.loads(path.read_text(encoding="utf-8"), source=str(path))

loads(text, *, source='<string>') classmethod #

Parse a manifest from its JSON text.

Used when the manifest does not live on disk at parse time, e.g. when reading the base-branch copy via git show for the CI coupling gate.

Parameters:

Name Type Description Default
text str

The manifest JSON.

required
source str

A label for the text's origin, used in error messages.

'<string>'

Returns:

Type Description
Manifest

The parsed manifest.

Raises:

Type Description
ValueError

If the manifest is missing required keys or has malformed native entries (e.g. hand-edited or written by an incompatible version).

Source code in packages/climate-ref-core/src/climate_ref_core/regression/manifest.py
@classmethod
def loads(cls, text: str, *, source: str = "<string>") -> Manifest:
    """
    Parse a manifest from its JSON text.

    Used when the manifest does not live on disk at parse time,
    e.g. when reading the base-branch copy via ``git show`` for the CI coupling gate.

    Parameters
    ----------
    text
        The manifest JSON.
    source
        A label for the text's origin, used in error messages.

    Returns
    -------
    :
        The parsed manifest.

    Raises
    ------
    ValueError
        If the manifest is missing required keys or has malformed native entries
        (e.g. hand-edited or written by an incompatible version).
    """
    data = json.loads(text)
    missing = [key for key in ("schema", "test_case_version", "committed", "native") if key not in data]
    if missing:
        raise ValueError(
            f"Invalid manifest {source}: missing required keys {missing}. "
            "The manifest may be corrupted or written by an incompatible version; "
            "regenerate it with `ref test-cases run --force-regen`."
        )
    schema = data["schema"]
    if isinstance(schema, bool) or not isinstance(schema, int) or schema != SCHEMA_VERSION:
        raise ValueError(
            f"Invalid manifest {source}: unsupported schema {schema!r}, "
            f"expected {SCHEMA_VERSION}. The manifest was written by an incompatible "
            "version; regenerate it with `ref test-cases run --force-regen`."
        )
    try:
        native = {
            relpath: NativeEntry(sha256=entry["sha256"], size=entry["size"])
            for relpath, entry in data["native"].items()
        }
    except (KeyError, TypeError, AttributeError) as exc:
        raise ValueError(
            f"Invalid manifest {source}: malformed 'native' entry ({exc!r}). "
            "Each entry must be a mapping with 'sha256' and 'size' keys."
        ) from exc
    # Reject hand-edited or hostile manifests that could escape the
    # destination directory or carry a malformed digest when materialised.
    for relpath, entry in native.items():
        try:
            safe_path(relpath, label="native path")
        except ValueError as exc:
            raise ValueError(f"Invalid manifest {source}: {exc}") from exc
        _validate_digest(entry.sha256)
        if isinstance(entry.size, bool) or not isinstance(entry.size, int) or entry.size < 0:
            raise ValueError(
                f"Invalid manifest {source}: native entry {relpath!r} has invalid size "
                f"{entry.size!r}; expected a non-negative integer."
            )
    return cls(
        schema=data["schema"],
        test_case_version=data["test_case_version"],
        committed=dict(data["committed"]),
        native=native,
        # TODO: remove optonality when all manifests have this field.
        catalog_hash=data.get("catalog_hash"),
    )

seed_v1(committed_digests, catalog_hash=None) classmethod #

Create an initial manifest at test_case_version == 1 with no native outputs.

Parameters:

Name Type Description Default
committed_digests dict[str, str]

Digests of the committed regression JSON artefacts.

required
catalog_hash str | None

Hash of the input catalog.yaml that produced the baseline, if known.

None

Returns:

Type Description
Manifest

A fresh manifest with test_case_version=1 and native={}.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/manifest.py
@classmethod
def seed_v1(cls, committed_digests: dict[str, str], catalog_hash: str | None = None) -> Manifest:
    """
    Create an initial manifest at ``test_case_version == 1`` with no native outputs.

    Parameters
    ----------
    committed_digests
        Digests of the committed regression JSON artefacts.
    catalog_hash
        Hash of the input ``catalog.yaml`` that produced the baseline, if known.

    Returns
    -------
    :
        A fresh manifest with ``test_case_version=1`` and ``native={}``.
    """
    return cls(
        schema=SCHEMA_VERSION,
        test_case_version=1,
        committed=dict(committed_digests),
        catalog_hash=catalog_hash,
        native={},
    )

NativeEntry #

A single curated native output file recorded in the manifest.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/manifest.py
@frozen
class NativeEntry:
    """A single curated native output file recorded in the manifest."""

    sha256: str
    """Hex-encoded sha256 digest of the curated file."""

    size: int
    """Size of the curated file in bytes."""

sha256 instance-attribute #

Hex-encoded sha256 digest of the curated file.

size instance-attribute #

Size of the curated file in bytes.

NativeStore #

Bases: Protocol

Protocol for a content-addressed native-bundle blob store.

Blobs are keyed by their sha256 hex digest. Read operations (has, fetch) are anonymous and credential-free. put requires write credentials and raises :class:NotImplementedError on read-only implementations.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
@runtime_checkable
class NativeStore(Protocol):
    """
    Protocol for a content-addressed native-bundle blob store.

    Blobs are keyed by their sha256 hex digest.
    Read operations (``has``, ``fetch``) are anonymous and credential-free.
    ``put`` requires write credentials and raises :class:`NotImplementedError` on read-only implementations.
    """

    def has(self, digest: str) -> bool:
        """
        Return ``True`` if the blob identified by ``digest`` is available in the store.

        Parameters
        ----------
        digest
            The sha256 hex digest of the blob.

        Returns
        -------
        :
            ``True`` when the blob is present, ``False`` when it is not.
        """
        ...

    def fetch(self, digest: str, dest: Path) -> None:
        """
        Fetch the blob identified by ``digest`` and write it to ``dest``.

        The sha256 of the written file is verified to equal ``digest``.
        Raises :class:`ValueError` on hash mismatch and
        :class:`FileNotFoundError` if the blob is not found.

        Parameters
        ----------
        digest
            The sha256 hex digest of the blob to fetch.
        dest
            Destination path to write the blob to.
            Parent directories are created if they do not exist.
        """
        ...

    def put(self, path: Path) -> str:
        """
        Store the file at ``path`` and return its sha256 hex digest.

        Requires write credentials.
        Read-only implementations raise :class:`NotImplementedError`.

        Parameters
        ----------
        path
            Path to the file to store.

        Returns
        -------
        :
            The sha256 hex digest of the stored blob.
        """
        ...

    def preflight(self) -> None:
        """
        Verify the store is reachable and usable before relying on it.

        For writable stores this checks the credentials and target (bucket, or that the local
        directory is writable); for anonymous read-only stores it may be a no-op. Intended to
        be called once up front (e.g. before a slow ``mint`` run) so a misconfiguration is
        caught early.

        Raises
        ------
        NativeStoreUnavailableError
            If the store cannot be reached or used, with an operator-facing message.
        """
        ...

fetch(digest, dest) #

Fetch the blob identified by digest and write it to dest.

The sha256 of the written file is verified to equal digest. Raises :class:ValueError on hash mismatch and :class:FileNotFoundError if the blob is not found.

Parameters:

Name Type Description Default
digest str

The sha256 hex digest of the blob to fetch.

required
dest Path

Destination path to write the blob to. Parent directories are created if they do not exist.

required
Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def fetch(self, digest: str, dest: Path) -> None:
    """
    Fetch the blob identified by ``digest`` and write it to ``dest``.

    The sha256 of the written file is verified to equal ``digest``.
    Raises :class:`ValueError` on hash mismatch and
    :class:`FileNotFoundError` if the blob is not found.

    Parameters
    ----------
    digest
        The sha256 hex digest of the blob to fetch.
    dest
        Destination path to write the blob to.
        Parent directories are created if they do not exist.
    """
    ...

has(digest) #

Return True if the blob identified by digest is available in the store.

Parameters:

Name Type Description Default
digest str

The sha256 hex digest of the blob.

required

Returns:

Type Description
bool

True when the blob is present, False when it is not.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def has(self, digest: str) -> bool:
    """
    Return ``True`` if the blob identified by ``digest`` is available in the store.

    Parameters
    ----------
    digest
        The sha256 hex digest of the blob.

    Returns
    -------
    :
        ``True`` when the blob is present, ``False`` when it is not.
    """
    ...

preflight() #

Verify the store is reachable and usable before relying on it.

For writable stores this checks the credentials and target (bucket, or that the local directory is writable); for anonymous read-only stores it may be a no-op. Intended to be called once up front (e.g. before a slow mint run) so a misconfiguration is caught early.

Raises:

Type Description
NativeStoreUnavailableError

If the store cannot be reached or used, with an operator-facing message.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def preflight(self) -> None:
    """
    Verify the store is reachable and usable before relying on it.

    For writable stores this checks the credentials and target (bucket, or that the local
    directory is writable); for anonymous read-only stores it may be a no-op. Intended to
    be called once up front (e.g. before a slow ``mint`` run) so a misconfiguration is
    caught early.

    Raises
    ------
    NativeStoreUnavailableError
        If the store cannot be reached or used, with an operator-facing message.
    """
    ...

put(path) #

Store the file at path and return its sha256 hex digest.

Requires write credentials. Read-only implementations raise :class:NotImplementedError.

Parameters:

Name Type Description Default
path Path

Path to the file to store.

required

Returns:

Type Description
str

The sha256 hex digest of the stored blob.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def put(self, path: Path) -> str:
    """
    Store the file at ``path`` and return its sha256 hex digest.

    Requires write credentials.
    Read-only implementations raise :class:`NotImplementedError`.

    Parameters
    ----------
    path
        Path to the file to store.

    Returns
    -------
    :
        The sha256 hex digest of the stored blob.
    """
    ...

NativeStoreUnavailableError #

Bases: RuntimeError

Raised when a native store cannot be reached or used.

Covers rejected credentials, a missing bucket, or an unwritable local directory. The message is operator-facing and actionable (it names the env vars / path to check), so callers can surface it directly.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
class NativeStoreUnavailableError(RuntimeError):
    """
    Raised when a native store cannot be reached or used.

    Covers rejected credentials, a missing bucket, or an unwritable local directory.
    The message is operator-facing and actionable (it names the env vars / path to check),
    so callers can surface it directly.
    """

PoochReadStore #

Anonymous public-read blob store backed by a remote URL.

Uses :mod:pooch for caching, retry, and hash verification, mirroring the pattern in :mod:climate_ref_core.dataset_registry.

Blobs are fetched from {base_url}/{digest} and cached under cache_dir. put is intentionally unsupported; minting uses the write backend.

Parameters:

Name Type Description Default
base_url

Base URL from which blobs are served (no trailing slash). Example: https://baselines.climate-ref.org.

required
cache_dir

Local directory used by pooch to cache downloaded blobs.

required
Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
@frozen
class PoochReadStore:
    """
    Anonymous public-read blob store backed by a remote URL.

    Uses :mod:`pooch` for caching, retry, and hash verification,
    mirroring the pattern in :mod:`climate_ref_core.dataset_registry`.

    Blobs are fetched from ``{base_url}/{digest}`` and cached under ``cache_dir``.
    ``put`` is intentionally unsupported; minting uses the write backend.

    Parameters
    ----------
    base_url
        Base URL from which blobs are served (no trailing slash).
        Example: ``https://baselines.climate-ref.org``.
    cache_dir
        Local directory used by pooch to cache downloaded blobs.
    """

    base_url: str
    cache_dir: Path

    def _cached_blob_path(self, digest: str) -> Path:
        """Return the pooch cache path for a blob."""
        return self.cache_dir / digest

    def has(self, digest: str) -> bool:
        """
        Return ``True`` if the blob is already in the local pooch cache.

        Parameters
        ----------
        digest
            The sha256 hex digest of the blob.

        Returns
        -------
        :
            ``True`` when the cached file exists.
        """
        return self._cached_blob_path(digest).exists()

    def fetch(self, digest: str, dest: Path) -> None:
        """
        Download the blob from ``{base_url}/{digest}`` and write it to ``dest``.

        Uses pooch for caching and retry.
        Verifies the sha256 of the downloaded file against ``digest``.

        Parameters
        ----------
        digest
            The sha256 hex digest of the blob to fetch.
        dest
            Destination path to write the blob to.
            Parent directories are created if they do not exist.

        Raises
        ------
        ValueError
            If the downloaded blob's sha256 does not match ``digest``.
        """
        registry = _pooch_manager(self.base_url, str(self.cache_dir))
        registry.registry[digest] = digest  # content-addressed: hash == name

        cached = registry.fetch(digest)
        # pooch should already verify the hash against the registry entry
        _verify_hash_matches(cached, digest)

        dest.parent.mkdir(parents=True, exist_ok=True)
        shutil.copy2(cached, str(dest))
        logger.debug(f"PoochReadStore.fetch: {digest} -> {dest}")

    def put(self, path: Path) -> str:
        """
        Not supported on this read-only store.

        Raises
        ------
        NotImplementedError
            Always; minting uses the write backend, not the read store.
        """
        raise NotImplementedError(
            "PoochReadStore is a public-read store; put() is not supported. "
            "Use a writable store (LocalFilesystemStore or R2WriteStore) for minting."
        )

    def preflight(self) -> None:
        """
        No-op: an anonymous public-read store has nothing to verify up front.

        It has no credentials, and every read is hash-checked per blob; this exists only to
        satisfy the :class:`NativeStore` protocol.
        """
        return None

fetch(digest, dest) #

Download the blob from {base_url}/{digest} and write it to dest.

Uses pooch for caching and retry. Verifies the sha256 of the downloaded file against digest.

Parameters:

Name Type Description Default
digest str

The sha256 hex digest of the blob to fetch.

required
dest Path

Destination path to write the blob to. Parent directories are created if they do not exist.

required

Raises:

Type Description
ValueError

If the downloaded blob's sha256 does not match digest.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def fetch(self, digest: str, dest: Path) -> None:
    """
    Download the blob from ``{base_url}/{digest}`` and write it to ``dest``.

    Uses pooch for caching and retry.
    Verifies the sha256 of the downloaded file against ``digest``.

    Parameters
    ----------
    digest
        The sha256 hex digest of the blob to fetch.
    dest
        Destination path to write the blob to.
        Parent directories are created if they do not exist.

    Raises
    ------
    ValueError
        If the downloaded blob's sha256 does not match ``digest``.
    """
    registry = _pooch_manager(self.base_url, str(self.cache_dir))
    registry.registry[digest] = digest  # content-addressed: hash == name

    cached = registry.fetch(digest)
    # pooch should already verify the hash against the registry entry
    _verify_hash_matches(cached, digest)

    dest.parent.mkdir(parents=True, exist_ok=True)
    shutil.copy2(cached, str(dest))
    logger.debug(f"PoochReadStore.fetch: {digest} -> {dest}")

has(digest) #

Return True if the blob is already in the local pooch cache.

Parameters:

Name Type Description Default
digest str

The sha256 hex digest of the blob.

required

Returns:

Type Description
bool

True when the cached file exists.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def has(self, digest: str) -> bool:
    """
    Return ``True`` if the blob is already in the local pooch cache.

    Parameters
    ----------
    digest
        The sha256 hex digest of the blob.

    Returns
    -------
    :
        ``True`` when the cached file exists.
    """
    return self._cached_blob_path(digest).exists()

preflight() #

No-op: an anonymous public-read store has nothing to verify up front.

It has no credentials, and every read is hash-checked per blob; this exists only to satisfy the :class:NativeStore protocol.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def preflight(self) -> None:
    """
    No-op: an anonymous public-read store has nothing to verify up front.

    It has no credentials, and every read is hash-checked per blob; this exists only to
    satisfy the :class:`NativeStore` protocol.
    """
    return None

put(path) #

Not supported on this read-only store.

Raises:

Type Description
NotImplementedError

Always; minting uses the write backend, not the read store.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def put(self, path: Path) -> str:
    """
    Not supported on this read-only store.

    Raises
    ------
    NotImplementedError
        Always; minting uses the write backend, not the read store.
    """
    raise NotImplementedError(
        "PoochReadStore is a public-read store; put() is not supported. "
        "Use a writable store (LocalFilesystemStore or R2WriteStore) for minting."
    )

R2WriteStore #

Credentialed S3-compatible write backend for a Cloudflare R2 bucket.

Used by the mint verb to upload native blobs. Reads in CI and for local replay go through :class:PoochReadStore against the public read URL, so this store's fetch / has exist mainly for mint-time idempotence and verification.

Blobs are content-addressed with a flat key layout (object key == key_prefix + digest), so a blob is served at {public_url}/{key_prefix}{digest}. The public read domain is expected to map to the bucket root, so key_prefix defaults to "".

boto3 is imported lazily (see :func:_s3_client); constructing this store does not require boto3, only the endpoint and bucket. Credentials are passed in explicitly and are never sourced from the persisted config.

Parameters:

Name Type Description Default
endpoint_url

S3 API endpoint for the bucket's account, without the bucket (e.g. https://<account>.eu.r2.cloudflarestorage.com).

required
bucket

Name of the R2 bucket (e.g. ref-baselines).

required
access_key_id

R2 access-key id, or "" to fall through to profile / boto3's default chain.

required
secret_access_key

R2 secret-access-key, or "" to fall through to profile / boto3's default chain.

required
profile

Named AWS/R2 profile to authenticate with, or "" for the default session. Ignored when explicit access_key_id / secret_access_key are supplied.

required
key_prefix

Optional object-key prefix. Defaults to "" (flat, bucket-root layout).

required
Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
@frozen
class R2WriteStore:
    """
    Credentialed S3-compatible write backend for a Cloudflare R2 bucket.

    Used by the ``mint`` verb to upload native blobs. Reads in CI and for local replay go
    through :class:`PoochReadStore` against the public read URL, so this store's ``fetch`` /
    ``has`` exist mainly for mint-time idempotence and verification.

    Blobs are content-addressed with a **flat** key layout (object key == ``key_prefix`` +
    digest), so a blob is served at ``{public_url}/{key_prefix}{digest}``. The public read
    domain is expected to map to the bucket root, so ``key_prefix`` defaults to ``""``.

    boto3 is imported lazily (see :func:`_s3_client`); constructing this store does not
    require boto3, only the endpoint and bucket. Credentials are passed in explicitly and
    are never sourced from the persisted config.

    Parameters
    ----------
    endpoint_url
        S3 API endpoint for the bucket's account, without the bucket
        (e.g. ``https://<account>.eu.r2.cloudflarestorage.com``).
    bucket
        Name of the R2 bucket (e.g. ``ref-baselines``).
    access_key_id
        R2 access-key id, or ``""`` to fall through to ``profile`` / boto3's default chain.
    secret_access_key
        R2 secret-access-key, or ``""`` to fall through to ``profile`` / boto3's default chain.
    profile
        Named AWS/R2 profile to authenticate with, or ``""`` for the default session.
        Ignored when explicit ``access_key_id`` / ``secret_access_key`` are supplied.
    key_prefix
        Optional object-key prefix. Defaults to ``""`` (flat, bucket-root layout).
    """

    endpoint_url: str
    bucket: str
    access_key_id: str = ""
    secret_access_key: str = ""
    profile: str = ""
    key_prefix: str = ""

    def __attrs_post_init__(self) -> None:
        """Fail fast at construction (mint startup) when routing config is missing."""
        if not self.endpoint_url:
            raise ValueError(
                "R2 native store requires an S3 endpoint URL; set REF_NATIVE_STORE_S3_ENDPOINT_URL "
                "(e.g. https://<account>.eu.r2.cloudflarestorage.com)."
            )
        if not self.bucket:
            raise ValueError(
                "R2 native store requires a bucket name; set REF_NATIVE_STORE_BUCKET (e.g. ref-baselines)."
            )

    def _key(self, digest: str) -> str:
        """Return the object key for a blob, validating the digest first.

        The digest is validated as 64-character lowercase hex, so a malformed or hostile
        digest cannot inject an unexpected object key.
        """
        _validate_digest(digest)
        return f"{self.key_prefix}{digest}"

    def _client(self) -> Any:
        """Return the cached boto3 S3 client for this store's endpoint and credentials."""
        return _s3_client(self.endpoint_url, self.access_key_id, self.secret_access_key, self.profile)

    @staticmethod
    def _is_missing(exc: Exception) -> bool:
        """Return ``True`` when a botocore ``ClientError`` denotes a missing object (404)."""
        response = getattr(exc, "response", None)
        if not isinstance(response, dict):
            return False
        code = response.get("Error", {}).get("Code")
        status = response.get("ResponseMetadata", {}).get("HTTPStatusCode")
        return code in _MISSING_OBJECT_CODES or status == _HTTP_NOT_FOUND

    def has(self, digest: str) -> bool:
        """
        Return ``True`` if the blob is present in the bucket.

        Parameters
        ----------
        digest
            The sha256 hex digest of the blob.

        Returns
        -------
        :
            ``True`` when a ``HEAD`` on the object succeeds, ``False`` on a 404.
        """
        from botocore.exceptions import ClientError  # noqa: PLC0415 - optional dependency

        try:
            self._client().head_object(Bucket=self.bucket, Key=self._key(digest))
        except ClientError as exc:
            if self._is_missing(exc):
                return False
            raise
        return True

    def fetch(self, digest: str, dest: Path) -> None:
        """
        Download the blob to ``dest`` and verify its sha256 matches ``digest``.

        Parameters
        ----------
        digest
            The sha256 hex digest of the blob to fetch.
        dest
            Destination path to write the blob to.
            Parent directories are created if they do not exist.

        Raises
        ------
        FileNotFoundError
            If the blob is not present in the bucket.
        ValueError
            If the downloaded blob's sha256 does not match ``digest``.
        """
        from botocore.exceptions import ClientError  # noqa: PLC0415 - optional dependency

        dest.parent.mkdir(parents=True, exist_ok=True)
        try:
            self._client().download_file(self.bucket, self._key(digest), str(dest))
        except ClientError as exc:
            if self._is_missing(exc):
                raise FileNotFoundError(f"Blob {digest!r} not found in R2 bucket {self.bucket!r}") from exc
            raise
        _verify_hash_matches(dest, digest)
        logger.debug(f"R2WriteStore.fetch: {digest} -> {dest}")

    def put(self, path: Path) -> str:
        """
        Upload the file at ``path`` to the bucket and return its sha256 hex digest.

        The blob is content-addressed: the upload is skipped when an object with the same
        digest already exists, so minting is idempotent and re-mints are cheap.

        Parameters
        ----------
        path
            Path to the file to store.

        Returns
        -------
        :
            The sha256 hex digest of the stored blob.
        """
        digest = sha256_file(path)
        if self.has(digest):
            logger.debug(f"R2WriteStore.put: {digest} already present, skipping upload")
            return digest
        self._client().upload_file(str(path), self.bucket, self._key(digest))
        logger.debug(f"R2WriteStore.put: {path} -> s3://{self.bucket}/{self._key(digest)}")
        return digest

    @staticmethod
    def _http_status(exc: Exception) -> int | None:
        """Return the HTTP status code from a botocore ``ClientError``, if present."""
        response = getattr(exc, "response", None)
        if isinstance(response, dict):
            status = response.get("ResponseMetadata", {}).get("HTTPStatusCode")
            if isinstance(status, int):
                return status
        return None

    def preflight(self) -> None:
        """
        Verify the bucket is reachable and the credentials are accepted, before any upload.

        Performs a cheap authenticated ``HEAD`` on a sentinel key (expected to be absent). A
        ``404`` means the request authenticated and the store is usable; ``401`` / ``403`` are
        translated into actionable :class:`NativeStoreUnavailableError` messages so a
        misconfigured credential is caught before the (slow) diagnostic run rather than after.

        ``head_object`` is used rather than ``head_bucket`` so the check works with
        least-privilege, object-scoped tokens (which cannot perform bucket-level operations).

        Raises
        ------
        NativeStoreUnavailableError
            If the credentials are rejected (401), access is denied (403), or the probe
            otherwise fails.
        """
        from botocore.exceptions import ClientError  # noqa: PLC0415 - optional dependency

        probe_key = f"{self.key_prefix}.ref-preflight-probe"
        try:
            self._client().head_object(Bucket=self.bucket, Key=probe_key)
        except ClientError as exc:
            status = self._http_status(exc)
            if status == _HTTP_NOT_FOUND:
                pass  # authenticated; the probe object is simply absent — store is usable
            elif status in _AUTH_REJECTED_STATUSES:
                raise NativeStoreUnavailableError(
                    f"Native store authentication failed (HTTP {status}) for bucket {self.bucket!r} at "
                    f"{self.endpoint_url}: the credentials were rejected or malformed. Check "
                    f"REF_NATIVE_STORE_PROFILE, or REF_NATIVE_STORE_ACCESS_KEY_ID / "
                    f"REF_NATIVE_STORE_SECRET_ACCESS_KEY."
                ) from exc
            elif status == _HTTP_FORBIDDEN:
                raise NativeStoreUnavailableError(
                    f"Native store access denied (HTTP 403) for bucket {self.bucket!r} at "
                    f"{self.endpoint_url}: the request was forbidden — the secret key may be wrong, or "
                    f"the token may lack object read & write on this bucket. Check the credentials and "
                    f"the token's permissions."
                ) from exc
            else:
                raise NativeStoreUnavailableError(
                    f"Native store preflight failed (HTTP {status}) for bucket {self.bucket!r} at "
                    f"{self.endpoint_url}: {exc}"
                ) from exc
        logger.info(f"Native store authenticated: bucket {self.bucket!r} at {self.endpoint_url}")

__attrs_post_init__() #

Fail fast at construction (mint startup) when routing config is missing.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def __attrs_post_init__(self) -> None:
    """Fail fast at construction (mint startup) when routing config is missing."""
    if not self.endpoint_url:
        raise ValueError(
            "R2 native store requires an S3 endpoint URL; set REF_NATIVE_STORE_S3_ENDPOINT_URL "
            "(e.g. https://<account>.eu.r2.cloudflarestorage.com)."
        )
    if not self.bucket:
        raise ValueError(
            "R2 native store requires a bucket name; set REF_NATIVE_STORE_BUCKET (e.g. ref-baselines)."
        )

fetch(digest, dest) #

Download the blob to dest and verify its sha256 matches digest.

Parameters:

Name Type Description Default
digest str

The sha256 hex digest of the blob to fetch.

required
dest Path

Destination path to write the blob to. Parent directories are created if they do not exist.

required

Raises:

Type Description
FileNotFoundError

If the blob is not present in the bucket.

ValueError

If the downloaded blob's sha256 does not match digest.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def fetch(self, digest: str, dest: Path) -> None:
    """
    Download the blob to ``dest`` and verify its sha256 matches ``digest``.

    Parameters
    ----------
    digest
        The sha256 hex digest of the blob to fetch.
    dest
        Destination path to write the blob to.
        Parent directories are created if they do not exist.

    Raises
    ------
    FileNotFoundError
        If the blob is not present in the bucket.
    ValueError
        If the downloaded blob's sha256 does not match ``digest``.
    """
    from botocore.exceptions import ClientError  # noqa: PLC0415 - optional dependency

    dest.parent.mkdir(parents=True, exist_ok=True)
    try:
        self._client().download_file(self.bucket, self._key(digest), str(dest))
    except ClientError as exc:
        if self._is_missing(exc):
            raise FileNotFoundError(f"Blob {digest!r} not found in R2 bucket {self.bucket!r}") from exc
        raise
    _verify_hash_matches(dest, digest)
    logger.debug(f"R2WriteStore.fetch: {digest} -> {dest}")

has(digest) #

Return True if the blob is present in the bucket.

Parameters:

Name Type Description Default
digest str

The sha256 hex digest of the blob.

required

Returns:

Type Description
bool

True when a HEAD on the object succeeds, False on a 404.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def has(self, digest: str) -> bool:
    """
    Return ``True`` if the blob is present in the bucket.

    Parameters
    ----------
    digest
        The sha256 hex digest of the blob.

    Returns
    -------
    :
        ``True`` when a ``HEAD`` on the object succeeds, ``False`` on a 404.
    """
    from botocore.exceptions import ClientError  # noqa: PLC0415 - optional dependency

    try:
        self._client().head_object(Bucket=self.bucket, Key=self._key(digest))
    except ClientError as exc:
        if self._is_missing(exc):
            return False
        raise
    return True

preflight() #

Verify the bucket is reachable and the credentials are accepted, before any upload.

Performs a cheap authenticated HEAD on a sentinel key (expected to be absent). A 404 means the request authenticated and the store is usable; 401 / 403 are translated into actionable :class:NativeStoreUnavailableError messages so a misconfigured credential is caught before the (slow) diagnostic run rather than after.

head_object is used rather than head_bucket so the check works with least-privilege, object-scoped tokens (which cannot perform bucket-level operations).

Raises:

Type Description
NativeStoreUnavailableError

If the credentials are rejected (401), access is denied (403), or the probe otherwise fails.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def preflight(self) -> None:
    """
    Verify the bucket is reachable and the credentials are accepted, before any upload.

    Performs a cheap authenticated ``HEAD`` on a sentinel key (expected to be absent). A
    ``404`` means the request authenticated and the store is usable; ``401`` / ``403`` are
    translated into actionable :class:`NativeStoreUnavailableError` messages so a
    misconfigured credential is caught before the (slow) diagnostic run rather than after.

    ``head_object`` is used rather than ``head_bucket`` so the check works with
    least-privilege, object-scoped tokens (which cannot perform bucket-level operations).

    Raises
    ------
    NativeStoreUnavailableError
        If the credentials are rejected (401), access is denied (403), or the probe
        otherwise fails.
    """
    from botocore.exceptions import ClientError  # noqa: PLC0415 - optional dependency

    probe_key = f"{self.key_prefix}.ref-preflight-probe"
    try:
        self._client().head_object(Bucket=self.bucket, Key=probe_key)
    except ClientError as exc:
        status = self._http_status(exc)
        if status == _HTTP_NOT_FOUND:
            pass  # authenticated; the probe object is simply absent — store is usable
        elif status in _AUTH_REJECTED_STATUSES:
            raise NativeStoreUnavailableError(
                f"Native store authentication failed (HTTP {status}) for bucket {self.bucket!r} at "
                f"{self.endpoint_url}: the credentials were rejected or malformed. Check "
                f"REF_NATIVE_STORE_PROFILE, or REF_NATIVE_STORE_ACCESS_KEY_ID / "
                f"REF_NATIVE_STORE_SECRET_ACCESS_KEY."
            ) from exc
        elif status == _HTTP_FORBIDDEN:
            raise NativeStoreUnavailableError(
                f"Native store access denied (HTTP 403) for bucket {self.bucket!r} at "
                f"{self.endpoint_url}: the request was forbidden — the secret key may be wrong, or "
                f"the token may lack object read & write on this bucket. Check the credentials and "
                f"the token's permissions."
            ) from exc
        else:
            raise NativeStoreUnavailableError(
                f"Native store preflight failed (HTTP {status}) for bucket {self.bucket!r} at "
                f"{self.endpoint_url}: {exc}"
            ) from exc
    logger.info(f"Native store authenticated: bucket {self.bucket!r} at {self.endpoint_url}")

put(path) #

Upload the file at path to the bucket and return its sha256 hex digest.

The blob is content-addressed: the upload is skipped when an object with the same digest already exists, so minting is idempotent and re-mints are cheap.

Parameters:

Name Type Description Default
path Path

Path to the file to store.

required

Returns:

Type Description
str

The sha256 hex digest of the stored blob.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def put(self, path: Path) -> str:
    """
    Upload the file at ``path`` to the bucket and return its sha256 hex digest.

    The blob is content-addressed: the upload is skipped when an object with the same
    digest already exists, so minting is idempotent and re-mints are cheap.

    Parameters
    ----------
    path
        Path to the file to store.

    Returns
    -------
    :
        The sha256 hex digest of the stored blob.
    """
    digest = sha256_file(path)
    if self.has(digest):
        logger.debug(f"R2WriteStore.put: {digest} already present, skipping upload")
        return digest
    self._client().upload_file(str(path), self.bucket, self._key(digest))
    logger.debug(f"R2WriteStore.put: {path} -> s3://{self.bucket}/{self._key(digest)}")
    return digest

Tolerance #

Float comparison tolerance for bundle regression checks.

Parameters:

Name Type Description Default
rtol

Relative tolerance — the allowed proportional difference between expected and actual float values (default 1e-6).

required
atol

Absolute tolerance — the floor difference always permitted, regardless of magnitude (default 1e-8).

required
Source code in packages/climate-ref-core/src/climate_ref_core/regression/compare.py
@frozen
class Tolerance:
    """
    Float comparison tolerance for bundle regression checks.

    Parameters
    ----------
    rtol
        Relative tolerance — the allowed proportional difference between expected
        and actual float values (default ``1e-6``).
    atol
        Absolute tolerance — the floor difference always permitted,
        regardless of magnitude (default ``1e-8``).
    """

    rtol: float = 1e-6
    atol: float = 1e-8

assert_bundle_regression(expected_path, actual_path, *, slug, tol=Tolerance(), replacements) #

Assert that a regenerated committed-bundle JSON file matches the committed copy.

Algorithm:

  1. Byte-equal fast path — if the raw bytes of both files are identical, return immediately.
  2. Parse both files as JSON.
  3. Rewrite both dict keys and leaf string values in the regenerated (actual) document using replacements, applied longest-key-first so that a shorter placeholder cannot pre-empt a longer overlapping one.
  4. Call :func:compare_json_content with the given tolerance.
  5. Raise AssertionError with the full mismatch list and a remediation hint if any mismatches are found.

The replacements map follows the convention used throughout the testing infrastructure: keys are real runtime values (absolute paths, recipe-dir timestamps), values are the committed-bundle placeholders (e.g. "<OUTPUT_DIR>", "<TEST_DATA_DIR>", "<RECIPE_RUN>"). Only the actual document is rewritten: the committed expected file is assumed to already contain placeholders, which :func:~climate_ref_core.regression.capture.write_committed_bundle guarantees at capture time. A hand-edited baseline with raw paths will surface as ordinary value mismatches.

Both <OUTPUT_DIR> and <RECIPE_RUN> participate in dict-KEY rewriting because ESMValTool's output.json embeds absolute paths as object keys.

Parameters:

Name Type Description Default
expected_path Path

Path to the committed (on-disk) bundle file containing placeholders.

required
actual_path Path

Path to the regenerated bundle file containing real runtime paths.

required
slug str

Diagnostic slug used in error messages.

required
tol Tolerance

Float comparison tolerance.

Tolerance()
replacements dict[str, str]

Mapping of {real_value: placeholder} applied to the actual document.

required

Raises:

Type Description
AssertionError

If the bundles differ beyond tolerance after sanitisation.

Notes

If expected_path does not exist (a legacy regression without a committed bundle), the comparison is skipped silently and the function returns.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/compare.py
def assert_bundle_regression(
    expected_path: Path,
    actual_path: Path,
    *,
    slug: str,
    tol: Tolerance = Tolerance(),
    replacements: dict[str, str],
) -> None:
    """
    Assert that a regenerated committed-bundle JSON file matches the committed copy.

    Algorithm:

    1. **Byte-equal fast path** — if the raw bytes of both files are identical, return immediately.
    2. Parse both files as JSON.
    3. Rewrite both dict *keys* and leaf string *values* in the regenerated
       (``actual``) document using ``replacements``, applied **longest-key-first**
       so that a shorter placeholder cannot pre-empt a longer overlapping one.
    4. Call :func:`compare_json_content` with the given tolerance.
    5. Raise ``AssertionError`` with the full mismatch list and a remediation hint
       if any mismatches are found.

    The replacements map follows the convention used throughout the testing
    infrastructure: keys are real runtime values (absolute paths, recipe-dir
    timestamps), values are the committed-bundle placeholders (e.g.
    ``"<OUTPUT_DIR>"``, ``"<TEST_DATA_DIR>"``, ``"<RECIPE_RUN>"``).
    Only the *actual* document is rewritten: the committed *expected* file is
    assumed to already contain placeholders, which
    :func:`~climate_ref_core.regression.capture.write_committed_bundle` guarantees
    at capture time. A hand-edited baseline with raw paths will surface as ordinary
    value mismatches.

    Both ``<OUTPUT_DIR>`` and ``<RECIPE_RUN>`` participate in dict-KEY rewriting
    because ESMValTool's ``output.json`` embeds absolute paths as object keys.

    Parameters
    ----------
    expected_path
        Path to the committed (on-disk) bundle file containing placeholders.
    actual_path
        Path to the regenerated bundle file containing real runtime paths.
    slug
        Diagnostic slug used in error messages.
    tol
        Float comparison tolerance.
    replacements
        Mapping of ``{real_value: placeholder}`` applied to the actual document.

    Raises
    ------
    AssertionError
        If the bundles differ beyond tolerance after sanitisation.

    Notes
    -----
    If ``expected_path`` does not exist (a legacy regression without a committed bundle),
    the comparison is skipped silently and the function returns.
    """
    if not expected_path.exists():
        # Legacy regression data without this bundle file — skip silently
        return

    expected_bytes = expected_path.read_bytes()
    actual_bytes = actual_path.read_bytes()

    # Fast path: byte-identical means no difference.
    if expected_bytes == actual_bytes:
        return

    expected_obj = json.loads(expected_bytes.decode("utf-8"))
    actual_obj = json.loads(actual_bytes.decode("utf-8"))

    # Rewrite actual — both keys and leaf values — before comparison.
    actual_sanitised = _rewrite_keys_and_values(actual_obj, ordered_replacements(replacements))

    mismatches = compare_json_content(expected_obj, actual_sanitised, tol=tol)
    if mismatches:
        mismatch_detail = "\n  ".join(mismatches)
        msg = (
            f"Diagnostic {slug!r}: committed bundle {expected_path.name!r} "
            f"differs from regenerated output after sanitisation.\n"
            f"Mismatches ({len(mismatches)}):\n  {mismatch_detail}\n\n"
            f"Remediation: if the change is intentional, bump the diagnostic's "
            f"test_case_version and regenerate with `ref test-cases run --force-regen`."
        )
        raise AssertionError(msg)

build_native_snapshot(base_dir, relpaths) #

Record a sha256 + size snapshot of each persisted native file.

Parameters:

Name Type Description Default
base_dir Path

The per-execution results directory the relpaths are resolved against.

required
relpaths list[Path]

The persisted files (relative to base_dir), e.g. the return value of :func:~climate_ref_core.output_files.copy_execution_outputs.

required

Returns:

Type Description
dict[str, NativeEntry]

Mapping of POSIX relpath -> :class:NativeEntry for every persisted file.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/capture.py
def build_native_snapshot(base_dir: Path, relpaths: list[Path]) -> dict[str, NativeEntry]:
    """
    Record a sha256 + size snapshot of each persisted native file.

    Parameters
    ----------
    base_dir
        The per-execution results directory the relpaths are resolved against.
    relpaths
        The persisted files (relative to ``base_dir``), e.g. the return value of
        :func:`~climate_ref_core.output_files.copy_execution_outputs`.

    Returns
    -------
    :
        Mapping of POSIX relpath -> :class:`NativeEntry` for every persisted file.
    """
    entries: dict[str, NativeEntry] = {}
    for relpath in relpaths:
        path = base_dir / relpath
        entries[relpath.as_posix()] = NativeEntry(sha256=sha256_file(path), size=path.stat().st_size)
    return entries

build_native_store(config, *, writable) #

Build an appropriate :class:NativeStore from a native-store config object.

Accepts any object that exposes url: str and cache_dir: Path (satisfying :class:_NativeStoreConfigProtocol), so callers pass config.native_store rather than the full :class:~climate_ref.config.Config.

With writable=False the returned store is always anonymous and credential-free (suitable for CI read/replay paths). With writable=True and a local URL/path a :class:LocalFilesystemStore is returned; with a remote (http(s)) URL a credentialed :class:R2WriteStore is returned. The S3 endpoint and bucket come from the config; authentication is read from the environment (REF_NATIVE_STORE_ACCESS_KEY_ID / REF_NATIVE_STORE_SECRET_ACCESS_KEY, else REF_NATIVE_STORE_PROFILE, else boto3's default chain), so secrets never live in the persisted config.

Parameters:

Name Type Description Default
config _NativeStoreConfigProtocol

A config object providing url, cache_dir, s3_endpoint_url and bucket. Typically app_config.native_store.

required
writable bool

When False, return a read-only store (no credentials required). When True, return a writable store (LocalFilesystemStore for local paths, or a :class:R2WriteStore for remote URLs).

required

Returns:

Type Description
NativeStore

A :class:NativeStore implementation appropriate for the configuration.

Raises:

Type Description
ValueError

If the URL scheme is unrecognised, or a writable remote store is requested without an S3 endpoint / bucket configured.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/store.py
def build_native_store(config: _NativeStoreConfigProtocol, *, writable: bool) -> NativeStore:
    """
    Build an appropriate :class:`NativeStore` from a native-store config object.

    Accepts any object that exposes ``url: str`` and ``cache_dir: Path``
    (satisfying :class:`_NativeStoreConfigProtocol`), so callers pass
    ``config.native_store`` rather than the full :class:`~climate_ref.config.Config`.

    With ``writable=False`` the returned store is always anonymous and
    credential-free (suitable for CI read/replay paths).
    With ``writable=True`` and a local URL/path a :class:`LocalFilesystemStore` is returned;
    with a remote (``http(s)``) URL a credentialed :class:`R2WriteStore` is returned. The S3
    endpoint and bucket come from the config; authentication is read from the environment
    (``REF_NATIVE_STORE_ACCESS_KEY_ID`` / ``REF_NATIVE_STORE_SECRET_ACCESS_KEY``, else
    ``REF_NATIVE_STORE_PROFILE``, else boto3's default chain), so secrets never live in the
    persisted config.

    Parameters
    ----------
    config
        A config object providing ``url``, ``cache_dir``, ``s3_endpoint_url`` and ``bucket``.
        Typically ``app_config.native_store``.
    writable
        When ``False``, return a read-only store (no credentials required).
        When ``True``, return a writable store (``LocalFilesystemStore`` for local
        paths, or a :class:`R2WriteStore` for remote URLs).

    Returns
    -------
    :
        A :class:`NativeStore` implementation appropriate for the configuration.

    Raises
    ------
    ValueError
        If the URL scheme is unrecognised, or a writable remote store is requested
        without an S3 endpoint / bucket configured.
    """
    url: str = config.url
    cache_dir: Path = config.cache_dir

    parts = urlsplit(url)
    scheme = parts.scheme

    if scheme in ("http", "https"):
        if writable:
            return R2WriteStore(
                endpoint_url=config.s3_endpoint_url,
                bucket=config.bucket,
                access_key_id=os.environ.get("REF_NATIVE_STORE_ACCESS_KEY_ID", ""),
                secret_access_key=os.environ.get("REF_NATIVE_STORE_SECRET_ACCESS_KEY", ""),
                profile=os.environ.get("REF_NATIVE_STORE_PROFILE", ""),
            )
        return PoochReadStore(base_url=url.rstrip("/"), cache_dir=cache_dir)

    if scheme == "file":
        # Parse properly so malformed variants fail loudly instead of
        # silently producing a wrong (e.g. relative) path.
        if parts.netloc not in ("", "localhost"):
            raise ValueError(
                f"Unsupported file URL {url!r}: a host component ({parts.netloc!r}) is not "
                "supported. Use the file:///absolute/path form (three slashes)."
            )
        return LocalFilesystemStore(root=Path(unquote(parts.path)))

    if scheme == "":
        return LocalFilesystemStore(root=Path(url))

    raise ValueError(
        f"Unsupported native store URL {url!r}: scheme {scheme!r} is not recognised. "
        "Use http(s):// for a remote store, or file:///absolute/path or a bare filesystem "
        "path for a local store."
    )

capture_execution(scratch_directory, results_directory, fragment, result, *, regression_dir, output_dir, test_data_dir, include_log=False) #

Persist a successful execution and capture its committed bundle + native snapshot.

Copies the curated output set from scratch to results via :func:~climate_ref_core.output_files.copy_execution_outputs (the production persistence path), then writes the committed bundle and snapshots every persisted native file.

Parameters:

Name Type Description Default
scratch_directory Path

Base scratch directory the diagnostic wrote into.

required
results_directory Path

Base results directory to persist the curated subset into.

required
fragment Path | str

The per-execution fragment under both base directories.

required
result ExecutionResult

The successful execution result (must carry a metric bundle filename).

required
regression_dir Path

The test case regression/ directory for the committed bundle.

required
output_dir Path

The absolute execution output directory, for path substitution.

required
test_data_dir Path

The absolute provider test-data directory, for path substitution.

required
include_log bool

If True, the execution log is included in the persisted/native set.

Defaults to False, matching the behaviour of :func:~climate_ref_core.output_files.copy_execution_outputs.

False

Returns:

Type Description
tuple[dict[str, str], dict[str, NativeEntry]]

A (committed_digests, native_snapshot) tuple.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/capture.py
def capture_execution(  # noqa: PLR0913
    scratch_directory: Path,
    results_directory: Path,
    fragment: Path | str,
    result: ExecutionResult,
    *,
    regression_dir: Path,
    output_dir: Path,
    test_data_dir: Path,
    # TODO: Unify the log handling
    include_log: bool = False,
) -> tuple[dict[str, str], dict[str, NativeEntry]]:
    """
    Persist a successful execution and capture its committed bundle + native snapshot.

    Copies the curated output set from scratch to results via
    :func:`~climate_ref_core.output_files.copy_execution_outputs`
    (the production persistence path),
    then writes the committed bundle and snapshots every persisted native file.

    Parameters
    ----------
    scratch_directory
        Base scratch directory the diagnostic wrote into.
    results_directory
        Base results directory to persist the curated subset into.
    fragment
        The per-execution fragment under both base directories.
    result
        The successful execution result (must carry a metric bundle filename).
    regression_dir
        The test case ``regression/`` directory for the committed bundle.
    output_dir
        The absolute execution output directory, for path substitution.
    test_data_dir
        The absolute provider test-data directory, for path substitution.
    include_log
        If True, the execution log is included in the persisted/native set.

        Defaults to False, matching the behaviour of
        :func:`~climate_ref_core.output_files.copy_execution_outputs`.

    Returns
    -------
    :
        A ``(committed_digests, native_snapshot)`` tuple.
    """
    relpaths = copy_execution_outputs(
        scratch_directory,
        results_directory,
        fragment,
        result,
        include_log=include_log,
    )
    base_dir = results_directory / fragment
    committed = write_committed_bundle(
        base_dir,
        regression_dir,
        output_dir=output_dir,
        test_data_dir=test_data_dir,
    )
    native = build_native_snapshot(base_dir, relpaths)
    return committed, native

compare_json_content(expected, actual, *, tol, path='') #

Recursively compare two parsed JSON values with float tolerance.

Rules: - Floats: compared with relative tolerance tol.rtol and absolute tolerance tol.atol. - Ints, strings, bools, None: exact equality. - Lists: element-by-element, same length required. - Dicts: key sets must match; values compared recursively.

Parameters:

Name Type Description Default
expected Any

The reference (committed) parsed JSON value.

required
actual Any

The regenerated parsed JSON value.

required
tol Tolerance

Float comparison tolerance.

required
path str

Dot-/bracket-notation path prefix for error messages (empty at top level).

''

Returns:

Type Description
list[str]

A list of human-readable mismatch descriptions. An empty list means the values are equivalent within tolerance.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/compare.py
def compare_json_content(
    expected: Any,
    actual: Any,
    *,
    tol: Tolerance,
    path: str = "",
) -> list[str]:
    """
    Recursively compare two parsed JSON values with float tolerance.

    Rules:
    - **Floats**: compared with relative tolerance ``tol.rtol``
      and absolute tolerance ``tol.atol``.
    - **Ints, strings, bools, ``None``**: exact equality.
    - **Lists**: element-by-element, same length required.
    - **Dicts**: key sets must match; values compared recursively.

    Parameters
    ----------
    expected
        The reference (committed) parsed JSON value.
    actual
        The regenerated parsed JSON value.
    tol
        Float comparison tolerance.
    path
        Dot-/bracket-notation path prefix for error messages (empty at top level).

    Returns
    -------
    :
        A list of human-readable mismatch descriptions.
        An empty list means the values are equivalent within tolerance.
    """
    mismatches: list[str] = []
    _compare_recursive(expected, actual, tol=tol, path=path, out=mismatches)
    return mismatches

compute_committed_digests(regression_dir) #

Compute sha256 digests of the committed regression JSON artefacts.

The digests are taken over the bytes exactly as they sit on disk (placeholder text included), so a CI recompute is deterministic. Only files that exist are included.

Parameters:

Name Type Description Default
regression_dir Path

The test case regression/ directory.

required

Returns:

Type Description
dict[str, str]

Mapping of {relpath: sha256} for each present committed artefact.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/manifest.py
def compute_committed_digests(regression_dir: Path) -> dict[str, str]:
    """
    Compute sha256 digests of the committed regression JSON artefacts.

    The digests are taken over the bytes exactly as they sit on disk (placeholder text included),
    so a CI recompute is deterministic. Only files that exist are included.

    Parameters
    ----------
    regression_dir
        The test case ``regression/`` directory.

    Returns
    -------
    :
        Mapping of ``{relpath: sha256}`` for each present committed artefact.
    """
    digests: dict[str, str] = {}
    for relpath in COMMITTED_BUNDLE_FILES:
        candidate = regression_dir / relpath
        if candidate.exists():
            digests[relpath] = sha256_file(candidate)
    return digests

decide_coupling(manifest, base_manifest, *, extraction_changed=False, committed_integrity_ok=True, catalog_integrity_ok=True) #

Decide how CI should verify a single test case's regression baseline.

The decision is pure: it performs no I/O. All on-disk reality is summarised by its arguments — the two manifests, whether the diagnostic's extraction code changed, and whether the committed bundle on disk still matches the current manifest's digests.

The gate fails if any state cannot be positively verified. A deleted managed manifest, a committed bundle that drifted from its manifest, an input catalog that drifted from its manifest is a failure rather than a silent skip.

Changes to the native baseline are not failures. This is due to the workflow for minting requires credentials. This means fork contributors cannot author or edit native blobs. replay is therefore only selected when native blobs actually exist to replay; an absent or removed native baseline downgrades to skip (with a warning in the reason), never fail.

See the module docstring for the meaning of each :class:Action.

Parameters:

Name Type Description Default
manifest Manifest | None

The current manifest.json for the test case, or None if the case has no manifest on this branch (never managed, or deleted in this change).

required
base_manifest Manifest | None

The manifest.json as it exists on the base branch, or None if the manifest is newly added in this change (seeding).

required
extraction_changed bool

Whether code that influences build_execution_result for this test case changed in this pull request. Only consulted when the committed bundle is unchanged and the version was not bumped.

False
committed_integrity_ok bool

Whether the committed bundle on disk matches the current manifest's committed digests exactly (no edited, added, or removed committed file). The caller computes this against the working tree. False means the manifest no longer describes the bundle it is supposed to gate, which is a hard failure.

Ignored when manifest is None (nothing to verify).

True
catalog_integrity_ok bool

Whether the test case's input catalog.yaml still matches the current manifest's catalog_hash. The caller computes this against the working tree. False means the inputs changed without the baseline being regenerated, so the committed bundle no longer reflects its inputs — a hard failure.

Always True when the manifest carries no catalog_hash

True

Returns:

Type Description
GateDecision

The gate's decision, pairing an :class:Action with a reason.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/gate.py
def decide_coupling(  # noqa: PLR0911, PLR0912
    manifest: Manifest | None,
    base_manifest: Manifest | None,
    *,
    extraction_changed: bool = False,
    committed_integrity_ok: bool = True,
    catalog_integrity_ok: bool = True,
) -> GateDecision:
    """
    Decide how CI should verify a single test case's regression baseline.

    The decision is pure: it performs no I/O.
    All on-disk reality is summarised by its arguments
    — the two manifests, whether the diagnostic's extraction code
    changed, and whether the committed bundle on disk still matches the current
    manifest's digests.

    The gate fails if any state cannot be positively verified.
    A deleted managed manifest, a committed bundle that drifted from its manifest,
    an input catalog that drifted from its manifest
    is a failure rather than a silent skip.

    Changes to the native baseline are not failures.
    This is due to the workflow for minting requires credentials.
    This means fork contributors cannot author or edit native blobs.
    ``replay`` is therefore only selected when native blobs actually exist to replay;
    an absent or removed native baseline downgrades to ``skip`` (with a warning in the reason),
    never ``fail``.

    See the module docstring for the meaning of each :class:`Action`.

    Parameters
    ----------
    manifest
        The current ``manifest.json`` for the test case,
        or ``None`` if the case has no manifest on this branch (never managed, or deleted in this change).
    base_manifest
        The ``manifest.json`` as it exists on the base branch,
        or ``None`` if the manifest is newly added in this change (seeding).
    extraction_changed
        Whether code that influences ``build_execution_result`` for this test case
        changed in this pull request.
        Only consulted when the committed bundle is unchanged and the version was not bumped.
    committed_integrity_ok
        Whether the committed bundle on disk matches the current manifest's ``committed`` digests exactly
        (no edited, added, or removed committed file).
        The caller computes this against the working tree.
        ``False`` means the manifest no longer describes the bundle it is supposed to gate,
        which is a hard failure.

        Ignored when ``manifest`` is ``None`` (nothing to verify).
    catalog_integrity_ok
        Whether the test case's input ``catalog.yaml`` still matches the current
        manifest's ``catalog_hash``.
        The caller computes this against the working tree.
        ``False`` means the inputs changed without the baseline being regenerated,
        so the committed bundle no longer reflects its inputs — a hard failure.

        Always ``True`` when the manifest carries no ``catalog_hash``

    Returns
    -------
    :
        The gate's decision, pairing an :class:`Action` with a reason.
    """
    if manifest is None:
        # Distinguish a never-managed case from the deletion of a managed baseline.
        # Deleting manifest.json must not be a silent way to disable the gate.
        if base_manifest is not None:
            return GateDecision(
                Action.FAIL,
                "manifest.json is absent but exists on the base branch; a managed "
                "regression baseline cannot be removed without review",
            )
        return GateDecision(
            Action.SKIP,
            "no manifest.json; test case not under regression-baseline management",
        )

    # The manifest must faithfully describe the committed bundle it gates.
    # If the bundle on disk drifted from the manifest digests
    # (an edit/add/remove without regenerating the manifest), the metadata comparisons below are meaningless.
    if not committed_integrity_ok:
        return GateDecision(
            Action.FAIL,
            "committed bundle on disk does not match manifest.json digests; "
            "regenerate the manifest with `ref test-cases run` after changing the bundle",
        )

    # The input catalog must still describe the baseline it produced.
    if not catalog_integrity_ok:
        return GateDecision(
            Action.FAIL,
            "input catalog.yaml does not match manifest.json catalog_hash; "
            "regenerate the baseline with `ref test-cases run` after changing the inputs",
        )

    if base_manifest is None:
        # Seeding a newly added manifest.
        # Replay only verifies something when native blobs exist
        if manifest.native:
            return GateDecision(
                Action.REPLAY,
                "manifest newly added (seeding); replaying native baseline against committed bundle",
            )
        return GateDecision(
            Action.SKIP,
            "manifest newly added (seeding) with no native baseline; "
            "the committed bundle is the only signal and is reviewed in the diff",
        )

    if manifest.test_case_version < base_manifest.test_case_version:
        return GateDecision(
            Action.FAIL,
            f"test_case_version decreased ({base_manifest.test_case_version} -> "
            f"{manifest.test_case_version}); version must be monotonic",
        )

    version_bumped = manifest.test_case_version > base_manifest.test_case_version
    committed_changed = manifest.committed != base_manifest.committed
    native_changed = manifest.native != base_manifest.native

    if version_bumped:
        version_change = f"{base_manifest.test_case_version} -> {manifest.test_case_version}"
        if manifest.native:
            # A native baseline ships with the bump, so the replay can prove the new committed bundle
            # actually reproduces from those blobs.
            return GateDecision(
                Action.REPLAY,
                f"test_case_version bumped ({version_change}) with a native baseline present; "
                "replaying to confirm the native baseline reproduces the new committed bundle",
            )
        return GateDecision(
            Action.EXECUTE,
            f"test_case_version bumped ({version_change}) with no native baseline to replay; "
            "full end-to-end re-run required to verify the new committed bundle",
        )

    if committed_changed:
        return GateDecision(
            Action.FAIL,
            "committed bundle changed without a test_case_version bump; "
            "bump test_case_version to authorise the new baseline",
        )

    if native_changed:
        if manifest.native:
            # Native blobs were re-authored (re-minted) without a version bump.
            # The committed bundle is unchanged,
            # so verify the new native snapshot still reproduces it rather than skipping unverified.
            return GateDecision(
                Action.REPLAY,
                "native baseline changed with committed bundle unchanged; "
                "replaying to confirm the new native snapshot reproduces the committed bundle",
            )
        # De-mint: the native baseline was removed while the committed bundle stayed.
        return GateDecision(
            Action.SKIP,
            "WARNING: native baseline removed (de-mint) with committed bundle unchanged; "
            "the committed bundle still gates this case but native replay is no longer possible",
        )

    if extraction_changed:
        if manifest.native:
            return GateDecision(
                Action.REPLAY,
                "extraction code changed with committed bundle unchanged; "
                "replaying cached native baseline to verify",
            )
        return GateDecision(
            Action.SKIP,
            "extraction code changed but no native baseline exists to replay; "
            "the committed bundle is unchanged and remains the only signal",
        )

    return GateDecision(
        Action.SKIP,
        "no committed-bundle, native, version, or extraction-code change; nothing to verify",
    )

materialise_native(native, store, dest) #

Materialise a native snapshot from a store into a destination directory.

For each (relpath, entry) the blob is fetched from store (keyed by its sha256 digest) to dest / relpath, creating parent directories as needed.

Parameters:

Name Type Description Default
native dict[str, NativeEntry]

Mapping of relpath -> :class:NativeEntry (from a manifest).

required
store NativeStore

A content-addressed :class:~climate_ref_core.regression.store.NativeStore.

required
dest Path

The destination directory the snapshot is materialised into.

required
Source code in packages/climate-ref-core/src/climate_ref_core/regression/capture.py
def materialise_native(native: dict[str, NativeEntry], store: NativeStore, dest: Path) -> None:
    """
    Materialise a native snapshot from a store into a destination directory.

    For each ``(relpath, entry)`` the blob is fetched from ``store`` (keyed by its
    sha256 digest) to ``dest / relpath``, creating parent directories as needed.

    Parameters
    ----------
    native
        Mapping of relpath -> :class:`NativeEntry` (from a manifest).
    store
        A content-addressed :class:`~climate_ref_core.regression.store.NativeStore`.
    dest
        The destination directory the snapshot is materialised into.
    """
    for relpath, entry in native.items():
        # Defend against path traversal: a hand-edited or hostile manifest could
        # carry an absolute path or one with '..' components that escapes dest.
        target = safe_path(relpath, dest, label="native path")
        target.parent.mkdir(parents=True, exist_ok=True)
        store.fetch(entry.sha256, target)

paths_under(changed_files, roots) #

Return whether any changed file lies within one of the given directory roots.

A small helper for deriving the extraction_changed signal from a pull request's changed-file list. Paths are compared textually as POSIX-style, repo-relative strings, so callers must normalise both changed_files and roots to the same convention (e.g. git diff --name-only output and a package source directory relative to the repo root).

Parameters:

Name Type Description Default
changed_files Iterable[str]

Repo-relative paths changed in the pull request.

required
roots Iterable[str]

Repo-relative directory prefixes to test against. A trailing slash is optional; an empty root never matches.

required

Returns:

Type Description
bool

True if any changed file equals or sits beneath any root.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/gate.py
def paths_under(changed_files: Iterable[str], roots: Iterable[str]) -> bool:
    """
    Return whether any changed file lies within one of the given directory roots.

    A small helper for deriving the ``extraction_changed`` signal from a pull request's changed-file list.
    Paths are compared textually as POSIX-style, repo-relative strings,
    so callers must normalise both ``changed_files`` and ``roots`` to the same convention
    (e.g. ``git diff --name-only`` output and a package source directory relative to the repo root).

    Parameters
    ----------
    changed_files
        Repo-relative paths changed in the pull request.
    roots
        Repo-relative directory prefixes to test against. A trailing slash is
        optional; an empty root never matches.

    Returns
    -------
    :
        ``True`` if any changed file equals or sits beneath any root.
    """
    normalised_roots = [root.rstrip("/") for root in roots if root.rstrip("/")]
    for changed in changed_files:
        for root in normalised_roots:
            if changed == root or changed.startswith(root + "/"):
                return True
    return False

sha256_bytes(data) #

Compute the sha256 digest of an in-memory byte string.

Parameters:

Name Type Description Default
data bytes

The bytes to hash.

required

Returns:

Type Description
str

The hex-encoded sha256 digest.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/manifest.py
def sha256_bytes(data: bytes) -> str:
    """
    Compute the sha256 digest of an in-memory byte string.

    Parameters
    ----------
    data
        The bytes to hash.

    Returns
    -------
    :
        The hex-encoded sha256 digest.
    """
    return hashlib.sha256(data).hexdigest()

sha256_file(path) #

Compute the sha256 digest of a file.

Reuses :func:pooch.hashes.file_hash so the digest agrees with pooch elsewhere.

Parameters:

Name Type Description Default
path Path

Path to the file to hash.

required

Returns:

Type Description
str

The hex-encoded sha256 digest.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/manifest.py
def sha256_file(path: Path) -> str:
    """
    Compute the sha256 digest of a file.

    Reuses :func:`pooch.hashes.file_hash` so the digest agrees with pooch elsewhere.

    Parameters
    ----------
    path
        Path to the file to hash.

    Returns
    -------
    :
        The hex-encoded sha256 digest.
    """
    return pooch.hashes.file_hash(str(path), alg="sha256")

verify_committed_integrity(manifest, regression_dir) #

Check that the committed regression artefacts match the manifest digests.

Used by the CI integrity check. An empty return value means the bundle is intact.

Parameters:

Name Type Description Default
manifest Manifest

The manifest holding the expected committed digests.

required
regression_dir Path

The test case regression/ directory to verify against.

required

Returns:

Type Description
list[str]

A list of human-readable mismatch descriptions; empty when everything matches.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/manifest.py
def verify_committed_integrity(manifest: Manifest, regression_dir: Path) -> list[str]:
    """
    Check that the committed regression artefacts match the manifest digests.

    Used by the CI integrity check. An empty return value means the bundle is intact.

    Parameters
    ----------
    manifest
        The manifest holding the expected committed digests.
    regression_dir
        The test case ``regression/`` directory to verify against.

    Returns
    -------
    :
        A list of human-readable mismatch descriptions; empty when everything matches.
    """
    mismatches: list[str] = []
    for relpath, expected in manifest.committed.items():
        candidate = regression_dir / relpath
        if not candidate.exists():
            mismatches.append(
                f"{relpath}: missing on disk — expected at {candidate} (manifest sha256 {expected})"
            )
            continue
        actual = sha256_file(candidate)
        if actual != expected:
            mismatches.append(
                f"{relpath}: content differs from manifest — {candidate} "
                f"(manifest sha256 {expected}, on-disk sha256 {actual})"
            )
    return mismatches

write_committed_bundle(source_dir, regression_dir, *, output_dir, test_data_dir) #

Write the sanitised committed CMEC bundle into regression_dir.

Copies each committed artefact present in source_dir into regression_dir, then rewrites absolute paths to portable placeholders in place (:func:~climate_ref_core.output_files.to_placeholders). When a committed artefact is absent from source_dir, any stale copy left in regression_dir from a previous capture is removed so it is not re-digested.

Parameters:

Name Type Description Default
source_dir Path

Directory holding the freshly persisted CMEC artefacts (the per-execution results directory).

required
regression_dir Path

The destination regression/ directory (created if needed).

required
output_dir Path

The absolute execution output directory, for path substitution.

required
test_data_dir Path

The absolute provider test-data directory, for path substitution.

required

Returns:

Type Description
dict[str, str]

The committed digests {filename: sha256} of the bytes just written, suitable for :attr:Manifest.committed.

Source code in packages/climate-ref-core/src/climate_ref_core/regression/capture.py
def write_committed_bundle(
    source_dir: Path,
    regression_dir: Path,
    *,
    output_dir: Path,
    test_data_dir: Path,
) -> dict[str, str]:
    """
    Write the sanitised committed CMEC bundle into ``regression_dir``.

    Copies each committed artefact present in ``source_dir`` into ``regression_dir``,
    then rewrites absolute paths to portable placeholders in place
    (:func:`~climate_ref_core.output_files.to_placeholders`).
    When a committed artefact is absent from ``source_dir``,
    any stale copy left in ``regression_dir`` from a previous capture is removed so it is not re-digested.

    Parameters
    ----------
    source_dir
        Directory holding the freshly persisted CMEC artefacts (the per-execution
        results directory).
    regression_dir
        The destination ``regression/`` directory (created if needed).
    output_dir
        The absolute execution output directory, for path substitution.
    test_data_dir
        The absolute provider test-data directory, for path substitution.

    Returns
    -------
    :
        The committed digests ``{filename: sha256}`` of the bytes just written,
        suitable for :attr:`Manifest.committed`.
    """
    regression_dir.mkdir(parents=True, exist_ok=True)

    for filename in COMMITTED_BUNDLE_FILES:
        source = source_dir / filename
        dest = regression_dir / filename
        if source.exists():
            shutil.copy(source, dest)
        else:
            # Drop a stale copy from a previous capture so it is not re-digested.
            dest.unlink(missing_ok=True)

    to_placeholders(regression_dir, output_dir=output_dir, test_data_dir=test_data_dir)
    # Round floats in place before digesting,
    # so the committed bytes (and their recorded digests) are the stable, rounded ones.
    # Placeholder substitution only rewrites path strings, so order relative to it does not matter for floats.
    _round_committed_floats(regression_dir)
    return compute_committed_digests(regression_dir)

sub-packages#

Sub-package Description
_quantise Float quantisation for committed regression bundles.
capture Capture of regression baselines from a diagnostic execution.
compare Content comparison utilities for regression testing.
gate CI coupling gate for test case regression bundles.
manifest Manifest model and digest utilities for test case regression bundles.
store Data store for native bundles.