Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Case: how to reference data entities within a referenced RO-Crate #340

Open
elichad opened this issue Jul 19, 2024 · 2 comments
Open
Labels
use-case A (potential) use-case for ROLite creation, consumption or integration

Comments

@elichad
Copy link
Contributor

elichad commented Jul 19, 2024

As a RO-Crate creator, I want to reference a file within another RO-Crate (which may be a contextual or data entity) so that I do not need to duplicate it into my own crate/so that consumers of my crate can find the file I reference.

@elichad elichad added the use-case A (potential) use-case for ROLite creation, consumption or integration label Jul 19, 2024
@elichad
Copy link
Contributor Author

elichad commented Jul 19, 2024

More specifically: I am building a Workflow Run RO-Crate where the mainEntity workflow is a separate Workflow RO-Crate which is already published on WorkflowHub.

That workflow crate contains an example dataset which I want to reference in the workflow run crate (because in this case we can't include the actual inputs in the crate, so the example dataset acts as a reference for the format used). However, it's not clear how to identify this dataset, as it's a data entity with a local path in the other crate.

@elichad
Copy link
Contributor Author

elichad commented Jul 24, 2024

Data entities proposal

Using arcp it is probably the way to go here. The issue is how/where is most sensible to provide the arcp URI in the metadata. Here's an example:

...
{
  "@id": "https://workflowhub.eu/workflows/000/",
  "@type": [
    "Dataset",
    "ComputationalWorkflow"
  ],
  "conformsTo": [
    {
      "@id": "https://w3id.org/ro/crate"
    },
    {
      "@id": "https://w3id.org/ro/wfrun/process"
    }
  ],
  "url": "https://workflowhub.eu/workflows/000/",
  "distribution": {
    "@id": "https://workflowhub.eu/workflows/000/ro_crate"
  },
},
{
  "@id": "https://workflowhub.eu/workflows/000/ro_crate",
  "@type": "DataDownload",
  "encodingFormat": [
    "application/zip"
  ],
  "conformsTo": {
    "@id": "https://w3id.org/ro/crate"
  },
  "identifier": [
    "https://workflowhub.eu/workflows/000/ro_crate",
    "arcp://uuid,b89b5d50-3146-4600-b8b8-6dafc332e56e/",
  ]
},
{
  "@id": "arcp://uuid,b89b5d50-3146-4600-b8b8-6dafc332e56e/data.csv",
  "@type": "File",
  "name": "Data file from external crate.",
  "encodingFormat": "CSV",
},
...

In this case the base arcp URI is included as an identifier on a DataDownload which provides a zip of the external crate (which is in turn referenced as a distribution on a more general contextual entity representing the external crate). I can then use arcp URIs to reference data entities within the external crate.

Welcome suggestions on where else the base arcp URI can be defined, this is just one suggestion for standardization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
use-case A (potential) use-case for ROLite creation, consumption or integration
Projects
None yet
Development

No branches or pull requests

1 participant