Skip to content

The graph

PIPES uses a graph to represent the integrated modeling pipeline. All activity related to an integrated modeling project (e.g., project, models, datasets, tasks, etc.) are represented as nodes in the graph and their relationships are represented by edges. Metadata associated with project activities are stored as properties on the nodes and edges in the graph database and are used by PIPES to query the graph, run validations, check status, send notifications, etc. 

Here we show an example of a PIPES project in the graph database that contains all node and edge types. 

PIPES Graph

Graphs are well suited to represent entity relationships and data flows, making them perfect for data pipelines. Graphs also have the advantage of supporting unstructured querying and query evolution. This means that as the requirements of a project evolve and new types of queries need to be performed, the graph structure can easily adapt to these changes without requiring a complete overhaul of the system. 

Graph levels

The PIPES graph is comprised of two levels which distinguish the planning (Level 1) and operational activity (Level 2) phases of a project. 

Level 1:

  • Highest-level of the graph.
  • Represents the planning details of a project.
  • Nodes include Project , Project Run , Model , edges include and handoff expectations and scheduling
  • Initialized after a PI/data manager submits a Project Config 

Level 2:

  • Lower-level of the graph
  • Represents the operational activities of a project
  • Nodes include Model Run , Dataset , and Task  ( QAQC , Visualization , Transformation )
  • The Model Run Config  initializes Level 2 activity in the graph for each Model Run
  • Other configs that initialize lower level activities include Dataset Config , Task Creation , and Task Planning  configs

Although PIPES was designed to manage the integrated modeling projects from planning to execution, users can decide to only interface with Level 1 for planning purposes only (if desired).

Warning

At this stage, PIPES does not currently have smart update capabilities. If a configuration file needs to be updated, the graph pipeline gets regenerated. Project pipeline update is under development.

Node types

There are 8 node types in the PIPES graph:

Node type Description Level
Project The root node for any PIPES project (e.g., ”LA100”).  1
Project Run Represents different scrum-like phases of a project (e.g., “LA100 Run 0”, ”LA100 Run Final”). It may have different requirements, assumptions, or scenarios from the project (e.g., might be a subset of scenarios or model-years considered).  1
Model Represents different modeling activities (e.g., “dsgrid”). A descendant of a project run, part of the actual pipeline topology. 1
Model Run An instance of running a model with certain specifications 2
Dataset The artifacts that are part of final project results or are produced/consume by other models in the project 2
Task Represents a task performed on one or more datasets. Task types include transformation, QAQC, and visualization 2
Modeling Team Node which is the parent of a modeling, consisting of users N/A
User The human participants in a project. N/A

Edge types

Edges between nodes of specific types have unique names that describe the relationship between the nodes. Although some edge type names are synonyms (e.g., generated & output), the unique names makes it easier to query the graph for specific node relationships.

Edge Types

Somewhere we need to capture node/edge properties. Esp. as it relates to requirements, scenarios, assumptions, delegates, scheduling.

The handoffs

A key feature of PIPES is the data handoff management between two models in an integrated modeling project. PIPES manages these data handoffs across both the planning (Level 1) and the operations (Level 2) of a project. In general, handoffs should be specified for each dataset a model plans to provide other models, and for each unique exchange of that dataset.