Requirements & Validation

PIPES performs several steps of validation to help users catch errors and maintain consistency throughout a project.

Requirements

What is a requirement?

Requirements are model or project specific metadata that are used for dataset validation to facilitate data handoffs and tracking.

Why track requirements?

PIPES uses requirements to validate metadata of datasets, and it informs the progress status of the project. Requirements help with tracking needs for handoffs and ensure a smooth data pipeline.

For instance: If Model A produced a dataset with time_step = X but Model B requires time_step = Y, a transformation will be needed before the dataset can be handed off.

By listing these requirements in PIPES, the handoff process is more efficient and can be tracked throughout the modeling pipeline. More requirements means there will be more conditions for acceptance.

Examples

model_years =[2025,2030]

The example requirement above indicates that modeling work, and therefore dataset outputs, should be relevant to the years 2025 and 2030.

Requirements can be single key/value pairs, as in the example above, or any valid nested TOML key/value pairs such as:

[spatial_info]
    [spatial_info.fidelity]
        dx = .1
        dy = .2
    [spatial_info.range]
        x = [0,5]
        y = [0,10]

The Pydantic schemas for the various PIPES objects have several pre-defined requirement keys (see Configs for more info). If a requirement key is not pre-defined by PIPES, the key/value pair will be stored in a field called other in the Pydantic schema.

How requirements validation works

Requirements are specified at three different node types in PIPES: project, project run, and model. Requirements at higher-level nodes (e.g., project) are superseded by requirements at lower-level nodes (e.g., models).

For example, consider the following PIPES graph:

PIPES Validation

Let’s assume the nodes in this graph have the following requirements:

Project requirements:

[project.requirements]
    model_years = [2020,2025,2030]
    geographic_extent = "regional"
    weather_year = 2024

Project run requirement:

[project_runs.requirements]
    model_years = [2020,2025]
    geographic_extent = "sub-regional"

Model A requirements:

[project_runs.models.requirements]
    model_years = [2020,2022,2024]

When Dataset 1 is checked in, PIPES will validate the metadata with the following requirements metadata:

model_years = [2020,2022,2024]
geographic_extent = "sub-regional"
weather_year = 2024

In other words, it uses all requirements defined at the model level first. Any requirements listed at the project run that are not included in the model requirements are included, such as the geographic extent. Finally, any project requirements not overwritten by the project run or model are used, such as the weather year. This allows projects to scope work and build up results iteratively.

Now let’s assume that Model B has the following requirements:

[project_runs.models.requirements]
    geographic_extent = "nodal"

In order for Model A to handoff data to Model B, a dataset needs to meet the following requirements:

model_years = [2020,2025]
geographic_extent = "nodal"
weather_year = 2024

In other words, Dataset 1 needs to be transformed such that the model years and geographic extent match what Model B expects to ingest. When Dataset 1 Transformed is checked-in, PIPES will verify that its metadata meets the requirements expected by Model B. If these requirements are not met, the data is not ready to be handed off and more work needs to be done. If the dataset does meet these requirements, the Model B team will be notified that they have data ready from Model A.

Schedule validation

PIPES ensures that all dates are within valid ranges.

For project, project run, and model, the scheduled start is before the scheduled end.
Project run schedule is within project schedule.
Model schedule is within project run schedule.
Handoffs schedule is within the project schedule.

ID name uniqueness

At the project scope level:

Model names are unique.
Project scenario names are unique.
Handoff IDs are unique.

Note

Model names have ids and optionally “types”. This allows users to define two models of the same type but with different purposes, e.g., to perform circular work.

At the model scope level:

Model scenario names are unique.

At the model run scope level:

Task IDs are unique.

Resource existence

PIPES checks for existence of resources specified in the input.

Validate that user specified tasks exist.

PIPES validation is a critical step for ensuring the integrity and consistency of the project configuration.