Checkin a dataset

Checking in data with PIPES can be done in two ways: either as a standalone call or in conjunction with a task submission.

Method 1: Standalone call

To check-in a Dataset, first request the metadata template from PIPES: (see Tip below on automation if needed)

$ pipes dataset template -t dataset-checkin -s <system-type>

Where <system-type> indicates the system where the dataset is stored, which tells PIPES what metadata to expect.

Info

For PIPES MVP, supported storage system types are one of: - ESIFRepoAPI - AmazonS3 - HPCStorage - DataFoundry

Additional storage options will be added with future releases. If you have a specific storage type that you need, please reach out to the PIPES development team.

See the schema requirements for more information on the metadata fields in the Dataset Config file.

Once you have filled in the metadata fields in the config file, you can submit it to PIPES in the following way:

$ pipes dataset checkin -p project_name -r project_run_name -m model_name -x model_run_name -f path/to/toml/dataset.toml

Tip

If you would like to automate Dataset checkins, you do not need to request a template from PIPES. Instead, your application can reference the API directly to produce a config file that meets the schema requirements for Datasets and then submit the config to PIPES.

Method 2: In conjunction with transformation task submission

Warning

Please note that this method only applies if your dataset is a result of transformation task. Otherwise, please use the direct command in Method 1.

There are 3 steps in this workflow,

Step 1: Get the transformation Task Creation Config template,

$ pipes task template -t task-submission --task-type Transformation -o transformation-task-submission1.toml

Then fill in this config template.

Step 2: get a Dataset Config checkin template by specifying a system type - ESIFRepoAPI , AmazonS3 , HPCStorage, or DataFoundry .

$ pipes dataset template -t dataset-checkin -s <system-type> -o my-transformed-dataset.toml

Step 3: Next submit the transformation task with transformed dataset generated from this task,

$ pipes task submit -p test1 -r 1 -m dsgrid -x model-run-1 -f transformation-task-submission1.toml -d my-transformed-dataset.toml --task-pass

For additional information, please refer to submitting a transformation task.