Skip to content

Checkin a dataset

Checking in data with PIPES can be done in two ways: either as a standalone call or in conjunction with a task submission.

Method 1: Standalone call

  1. Request the metadata template from PIPES: (see Tip below on automation if needed)

    $ pipes dataset template -t dataset-checkin -s <system-type>
    

    Where <system-type> indicates the system where the dataset is stored, which tells PIPES what metadata to expect.

    Info

    For PIPES MVP, supported storage system types are one of: - ESIFRepoAPI - AmazonS3 - HPCStorage - DataFoundry

    Additional storage options will be added with future releases. If you have a specific storage type that you need, please reach out to the PIPES development team.

    See the schema requirements for more information on the metadata fields in the Dataset Config file.

  2. Fill in the metadata fields in the config file.

  3. Submit it to PIPES using the following command:

    $ pipes dataset checkin -p project_name -r project_run_name -m model_name -x model_run_name -f path/to/toml/dataset.toml
    

    Tip

    If you would like to automate Dataset checkins, you do not need to request a template from PIPES. Instead, your application can reference the API directly to produce a config file that meets the schema requirements for Datasets and then submit the config to PIPES.

Method 2: In conjunction with transformation task submission

Warning

Please note that this method only applies if your dataset is a result of transformation task. Otherwise, please use the direct command in Method 1.

  1. Get the transformation Task Creation Config template,

    $ pipes task template -t task-submission --task-type Transformation -o transformation-task-submission1.toml
    
  2. Then fill in this config template.

  3. Get a Dataset Config checkin template by specifying a system type - ESIFRepoAPI , AmazonS3 , HPCStorage, or DataFoundry.

    $ pipes dataset template -t dataset-checkin -s <system-type> -o my-transformed-dataset.toml
    
  4. Submit the transformation task with transformed dataset generated from this task:

    $ pipes task submit -p test1 -r 1 -m dsgrid -x model-run-1 -f transformation-task-submission1.toml -d my-transformed-dataset.toml --task-pass
    

    For additional information, please refer to submitting a transformation task.

  5. For more description of datasets, please see the Dataset Reference.

  6. For more specifics on the metadata keys and their types in the Dataset template, check out the Dataset Config.