The Graph

PIPES represents the integrated modeling pipeline as a graph structure. All components of an integrated modeling project —such as projects, models, datasets, and tasks—are modeled as nodes, while their relationships are captured as edges. Metadata for each project activity is stored as properties on these nodes and edges, enabling PIPES to efficiently query the graph data, perform validations, track status, send notifications, and more.

Here we show an example of a PIPES project in graph diagram that contains all node and edge types.

PIPES Graph

Graphs are well suited to represent entity relationships and data flows, making them perfect for data pipelines. Graphs also have the advantage of supporting unstructured querying and query evolution. This means that as the requirements of a project evolve and new types of queries need to be performed, the graph structure can easily adapt to these changes without requiring a complete overhaul of the system.

Graph levels

The PIPES graph is comprised of two levels which distinguish the planning (Level 1) and operational activity (Level 2) phases of a project.

Level 1:

Highest-level of the graph.
Represents the planning details of a project.
Nodes include Project , Project Run , Model , edges include and handoff expectations and scheduling.
Initialized after a PI/data manager submits a Project Config.

Level 2:

Lower-level of the graph.
Represents the operational activities of a project.
Nodes include Model Run, Dataset, and Task (QAQC , Visualization , Transformation).
The Model Run Config initializes Level 2 activity in the graph for each Model Run.
Other configs that initialize lower level activities include Dataset Config, Task Creation, and Task Planning configs.

Although PIPES was designed to manage the integrated modeling projects from planning to execution, users can decide to only interface with Level 1 for planning purposes if desired.

Warning

At this stage, PIPES smart update capabilities are limited. If a configuration file needs to be updated, the graph pipeline gets regenerated. Project pipeline update is under development.

Node Types

There are 8 node types in the PIPES graph:

Node type	Description	Level
Project	The root node for any PIPES project (e.g., ”LA100”).	1
Project Run	Represents different scrum-like phases of a project (e.g., “LA100 Run 0”, ”LA100 Run Final”). It may have different requirements, assumptions, or scenarios from the project (e.g., might be a subset of scenarios or model-years considered).	1
Model	Represents different modeling activities (e.g., “dsgrid”). A descendant of a project run, part of the actual pipeline topology.	1
Model Run	An instance of running a model with certain specifications	2
Dataset	The artifacts that are part of final project results or are produced/consume by other models in the project	2
Task	Represents a task performed on one or more datasets. Task types include transformation, QAQC, and visualization	2
Modeling Team	Node which is the parent of a modeling, consisting of users	N/A
User	The human participants in a project.	N/A

Edge Types

Edges between nodes of specific types have unique names that describe the relationship between the nodes. Although some edge type names are synonyms (e.g., generated & output), the unique names makes it easier to query the graph for specific node relationships.

Edge Types

The Handoffs

A key feature of PIPES is the data handoff management between two models in an integrated modeling project. PIPES manages these data handoffs across both the planning (Level 1) and the operations (Level 2) of a project. In general, handoffs should be specified for each dataset a model plans to provide to other models, and for each unique exchange of that dataset.