climate_ref.models.dataset
#
CMIP6Dataset
#
Bases: Dataset
Represents a CMIP6 dataset
Fields that are not in the DRS are marked optional.
Source code in packages/climate-ref/src/climate_ref/models/dataset.py
calendar = mapped_column(nullable=True)
class-attribute
instance-attribute
#
CF calendar type (e.g. 'standard', '360_day', 'noleap')
instance_id = mapped_column(index=True)
class-attribute
instance-attribute
#
Unique identifier for the dataset (including the version).
time_units = mapped_column(nullable=True)
class-attribute
instance-attribute
#
Time encoding units (e.g. 'days since 1850-01-01')
CMIP7Dataset
#
Bases: Dataset
Represents a CMIP7 dataset
Based on CMIP7 Global Attributes v1.0 (DOI: 10.5281/zenodo.17250297). Includes core DRS attributes, additional mandatory attributes, and parent info.
Source code in packages/climate-ref/src/climate_ref/models/dataset.py
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 | |
activity_id = mapped_column()
class-attribute
instance-attribute
#
CV - e.g., "CMIP", "ScenarioMIP"
branch_time_in_child = mapped_column(nullable=True)
class-attribute
instance-attribute
#
Float - when parent exists
branch_time_in_parent = mapped_column(nullable=True)
class-attribute
instance-attribute
#
Float - when parent exists
branding_suffix = mapped_column()
class-attribute
instance-attribute
#
Template - e.g., "tavg-h2m-hxy-u"
calendar = mapped_column(nullable=True)
class-attribute
instance-attribute
#
CF calendar type (e.g. 'standard', '360_day', 'noleap')
experiment_id = mapped_column(index=True)
class-attribute
instance-attribute
#
CV - experiment name
external_variables = mapped_column(nullable=True)
class-attribute
instance-attribute
#
Space-separated list of cell measure variable names (when cell_measures are specified)
frequency = mapped_column()
class-attribute
instance-attribute
#
CV - e.g., "mon", "day"
grid_label = mapped_column()
class-attribute
instance-attribute
#
CV - e.g., "gn", "gr"
instance_id = mapped_column(index=True)
class-attribute
instance-attribute
#
CMIP7 DRS format unique identifier
institution_id = mapped_column()
class-attribute
instance-attribute
#
CV - registered by modeling group
license_id = mapped_column(nullable=True)
class-attribute
instance-attribute
#
CV - e.g., "CC-BY-4.0", "CC0-1.0"
long_name = mapped_column(nullable=True)
class-attribute
instance-attribute
#
Human-readable description
mip_era = mapped_column()
class-attribute
instance-attribute
#
Always "CMIP7"
nominal_resolution = mapped_column(nullable=True)
class-attribute
instance-attribute
#
CV - e.g., "100 km"
parent_activity_id = mapped_column(nullable=True)
class-attribute
instance-attribute
#
String - parent activity identifier
parent_experiment_id = mapped_column(nullable=True)
class-attribute
instance-attribute
#
String - parent experiment identifier
parent_mip_era = mapped_column(nullable=True)
class-attribute
instance-attribute
#
String - "CMIP6" or "CMIP7"
parent_source_id = mapped_column(nullable=True)
class-attribute
instance-attribute
#
String - parent model identifier
parent_time_units = mapped_column(nullable=True)
class-attribute
instance-attribute
#
String - time units used in parent
parent_variant_label = mapped_column(nullable=True)
class-attribute
instance-attribute
#
String - parent variant label
realm = mapped_column(nullable=True)
class-attribute
instance-attribute
#
CV - e.g., "atmos", "ocean" (replaces table_id for filtering)
region = mapped_column()
class-attribute
instance-attribute
#
CV - e.g., "glb" (global)
source_id = mapped_column(index=True)
class-attribute
instance-attribute
#
CV - model identifier
standard_name = mapped_column(nullable=True)
class-attribute
instance-attribute
#
CF standard name
time_units = mapped_column(nullable=True)
class-attribute
instance-attribute
#
Time encoding units (e.g. 'days since 1850-01-01')
units = mapped_column(nullable=True)
class-attribute
instance-attribute
#
Variable units
variable_id = mapped_column()
class-attribute
instance-attribute
#
CV - variable root name
variant_label = mapped_column()
class-attribute
instance-attribute
#
Template - e.g., "r1i1p1f1" (CMIP7 uses prefixed strings)
version = mapped_column()
class-attribute
instance-attribute
#
Template - e.g., "v20250622"
branded_variable()
#
Return branded variable: {variable_id}_{branding_suffix}.
Dataset
#
Bases: Base
Represents a dataset
A dataset is a collection of data files, that is used as an input to the benchmarking process. Adding/removing or updating a dataset will trigger a new diagnostic calculation.
A polymorphic association is used to capture the different types of datasets as each dataset type may have different metadata fields. This enables the use of a single table to store all datasets, but still allows for querying specific metadata fields for each dataset type.
Source code in packages/climate-ref/src/climate_ref/models/dataset.py
created_at = mapped_column(server_default=(func.now()))
class-attribute
instance-attribute
#
When the dataset was added to the database
dataset_type = mapped_column(nullable=False, index=True)
class-attribute
instance-attribute
#
Type of dataset
finalised = mapped_column(default=True, nullable=False)
class-attribute
instance-attribute
#
Whether the complete set of metadata for the dataset has been finalised.
For CMIP6, ingestion may initially create unfinalised datasets (False) until all metadata is extracted. For other dataset types (e.g., obs4MIPs, PMP climatology), this should be True upon creation.
slug = mapped_column(unique=True)
class-attribute
instance-attribute
#
Globally unique identifier for the dataset.
In the case of CMIP6 datasets, this is the instance_id.
updated_at = mapped_column(server_default=(func.now()), onupdate=(func.now()))
class-attribute
instance-attribute
#
When the dataset was updated.
Updating a dataset will trigger a new diagnostic calculation.
DatasetFile
#
Bases: Base
Capture the metadata for a file in a dataset
A dataset may have multiple files, but is represented as a single dataset in the database. A lot of the metadata will be duplicated for each file in the dataset, but this will be more efficient for querying, filtering and building a data catalog.
Source code in packages/climate-ref/src/climate_ref/models/dataset.py
dataset_id = mapped_column(ForeignKey('dataset.id', ondelete='CASCADE'), nullable=False, index=True)
class-attribute
instance-attribute
#
Foreign key to the dataset table
end_time = mapped_column(nullable=True)
class-attribute
instance-attribute
#
End time of a given file (ISO string, supports cftime calendars)
path = mapped_column()
class-attribute
instance-attribute
#
Prefix that describes where the dataset is stored relative to the data directory
start_time = mapped_column(nullable=True)
class-attribute
instance-attribute
#
Start time of a given file (ISO string, supports cftime calendars)
tracking_id = mapped_column(nullable=True)
class-attribute
instance-attribute
#
Unique file identifier.
For CMIP7, this is the handle identifier (e.g., "hdl:21.14107/uuid").
Obs4MIPsDataset
#
Bases: Dataset
Represents a obs4mips dataset
TODO: Should the metadata fields be part of the file or dataset?
Source code in packages/climate-ref/src/climate_ref/models/dataset.py
instance_id = mapped_column()
class-attribute
instance-attribute
#
Unique identifier for the dataset.
PMPClimatologyDataset
#
Bases: Dataset
Represents a climatology dataset from PMP
These data are similar to obs4MIPs datasets, but are post-processed
Source code in packages/climate-ref/src/climate_ref/models/dataset.py
instance_id = mapped_column()
class-attribute
instance-attribute
#
Unique identifier for the dataset.