Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support export_workspace to static STAC #606

Closed
jdries opened this issue Dec 11, 2023 · 2 comments
Closed

support export_workspace to static STAC #606

jdries opened this issue Dec 11, 2023 · 2 comments
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented Dec 11, 2023

See: Open-EO/openeo-processes#485

I would like for us to be able to define a (static) STAC collection that should receive the job metadata, as opposed to only storing it under batch job results.

One of the first use cases would be to build a catalog of reference data extractions.

@jdries jdries changed the title support export_collection to static STAC support export_workspace to static STAC Jan 24, 2024
@jdries
Copy link
Contributor Author

jdries commented Jan 24, 2024

Constraint: the process that copies the (meta)data should not run as part of the batch job, to avoid that batch jobs require elevated privileges or credentials.

This means that implementing this as part of job_tracker or a nifi flow is perhaps the better alternative.

I propose a kind of 'default' workspace that is just a subdirectory of the existing bucket for openEO data as first type of workspace. This requires minimal/no work.

So then we need code that looks up the batch job results, copies data and merges metadata, and adjusts job result metadata accordingly.

The user will have to configure a filename prefix to avoid filename overlap between jobs.

EDIT: as there can be followup nodes, we would actually need to do the writing within the jobs. Current assumption is that the user running the job also owns the workspace, so granting write access is safe.

@bossie
Copy link
Collaborator

bossie commented Mar 6, 2024

Current implementation:

export_workspace (#676) writes STAC (Collection and Item) metadata next to a job's output assets in a workspace; this STAC Collection (a path on disk) is readable with load_stac.

export_workspace takes a merge parameter that is essentially a subdirectory within the workspace; this can be used to avoid file overlap between jobs in the same workspace. Currently it will overwrite rather than merge (#677) with an existing STAC Collection though.

If it is mounted as a directory on disk, a bucket in object storage can also act as a workspace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants