Checkin a dataset
Checking in data with PIPES can be done in two ways: either as a standalone call or in conjunction with a task submission.
Method 1: Standalone call
To check-in a Dataset, first request the metadata template from PIPES: (see Tip below on automation if needed)
Where <system-type>
indicates the system where the dataset is stored, which tells PIPES what metadata to expect.
Info
For PIPES MVP, supported storage system types are one of: - ESIFRepoAPI - AmazonS3 - HPCStorage - DataFoundry
Additional storage options will be added with future releases. If you have a specific storage type that you need, please reach out to the PIPES development team.
See the schema requirements for more information on the metadata fields in the Dataset Config file.
Once you have filled in the metadata fields in the config file, you can submit it to PIPES in the following way:
$ pipes dataset checkin -p project_name -r project_run_name -m model_name -x model_run_name -f path/to/toml/dataset.toml
Tip
If you would like to automate Dataset checkins, you do not need to request a template from PIPES. Instead, your application can reference the API directly to produce a config file that meets the schema requirements for Datasets and then submit the config to PIPES.
Method 2: In conjunction with transformation task submission
Warning
Please note that this method only applies if your dataset is a result of transformation task. Otherwise, please use the direct command in Method 1.
There are 3 steps in this workflow,
Step 1: Get the transformation Task Creation Config template,
$ pipes task template -t task-submission --task-type Transformation -o transformation-task-submission1.toml
Then fill in this config template.
Step 2: get a Dataset Config checkin template by specifying a system type - ESIFRepoAPI , AmazonS3 , HPCStorage, or DataFoundry .
Step 3: Next submit the transformation task with transformed dataset generated from this task,
$ pipes task submit -p test1 -r 1 -m dsgrid -x model-run-1 -f transformation-task-submission1.toml -d my-transformed-dataset.toml --task-pass
For additional information, please refer to submitting a transformation task.