Create tasks
Overview
Tasks in PIPES represent units of work performed on datasets during a Model Run. During the Model Run initialization process, users can define expected tasks (see Model Run Initialization) and this helps the project team plan out expected work and expectations of the broader project. As the Model Run is started and work is completed, tasks are performed and submitted to PIPES with more detailed information (model version, system, output files, etc.) about the task. These task creation can then be compared against expected tasks to determine the progress of the Model Run and the broader Project.
Note
Additional tasks that were not defined in the Model Run initialization process can be added to Model Runs, but these tasks won’t count towards the progress of the Model Run, but can be useful for tracking ad hoc analysis or visualization steps. They can also be incorporated into future Model Runs if they become a critical part of the workflow.
Task Creation Config
The Task Submission Config includes information about the task as well as scripts used to perform the task and any output locations of visualizations, reports, etc. To submit metadata on completed tasks during a Model Run, the user first requests a template with the following CLI command:
$ pipes task template -t task-creation --task-type {Transformation/QAQC/Visualization}-o my-transformation-task-creation.toml
where the user should pick one of Transformation, QAQC, or Viusalization task types
To log progress in PIPES, users should use the subtask_ids list to indicate which tasks planned in the Model Run were performed.
Note
Task creation can include more than one planned task id. For example, it might make sense to perform two transformations (like changing the model and weather years) at the same time. When this is the case, users should include both of these task ids in subtask_ids. Tasks that are different types (e.g., Transformation and QAQC) cannot be created together.
Any checked-in datasets that are inputs to the task should be indicated in the dataset_ids list.
Artifacts of tasks that are not datasets, such as images or video files, can be listed under outputs where a location should be specified and any other pieces of metadata a user wishes to log can also be added.
A complete list and description of the schema can be found here.
The CLI method for submitting the config to PIPES is pipes handoff create-tasks . An example call is:
$ pipes task create -p test1 -r 1 -m dsgrid -x model-run-1 -f project/tasks/test_qaqc.toml --{task-fail/task-pass}
where users should pick either task-pass or task-fail depending on the outcome of their work. Failed tasks are not required to be submitted to PIPES, this is most useful for capturing the outcomes of automated tasks.
See defining your context for more information on the rest of the flags.
Transformations
Transformation tasks requires additional information to be submitted about the relevant output datasets. They follow the same structure defined above, but they also require a Dataset Config (using the -d or --dataset CLI flag) to be provided during the CLI task creation process. As an example:
$ pipes task create -p test1 -r 1 -m dsgrid -x model-run-1 -f project/tasks/test_transform.toml -d project/tasks/transformed_dataset.toml --task-pass
Referencing datasets in other model runs
Transformations might need to reference datasets outside of the Model Run or Project (but still in the PIPES universe). To add links from these datasets to the transformed dataset, use the vertex_ids option. This takes a list of vertex IDs for any dataset and links them as an input to the transformed dataset. This can be useful for weather data or other static GIS data that is used across many projects but wasn’t specifically created during the current Model Run.