Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup ground truth geoparquet generation pipeline using RDM API #125

Open
kvantricht opened this issue Aug 21, 2024 · 7 comments
Open

Setup ground truth geoparquet generation pipeline using RDM API #125

kvantricht opened this issue Aug 21, 2024 · 7 comments
Assignees

Comments

@kvantricht
Copy link
Contributor

kvantricht commented Aug 21, 2024

Create a function and an openEO job:

The function:

Creates a GeoParquet of points samples and matching metadata:

  • Either from a local dataset of polygons/points (WorldCereal Harmonized from @cbutsko)
  • Either from a spatial/temporal extent used to query the RDM

The function implements a sampling of points from polygons with a strategy defined by the R&D team (start with centroid). And returns a GeoParquet of points with minimal metadata (GT code, SampleID, ...)

OpenEO job:

Takes as input a GeoParquet of points to sample with additional metadata (GT, sampleID...)

From the collections of Sentinel1 and Sentinel2 patches, create a job that from input points samples:

  • Sentinel 1 and Sentinel 2 data from the local stac collections
  • METEO data from the pre-processed STAC collection available on CDSE bucket
  • DEM data from global collection available on CDSE

Then performs processing such as monthly composites. Finally, concatenate GT and other metadata in a post-job action, returning a final GeoParquet with all the sample information alongside monthly-composited signal data and METEO/DEM.

@VincentVerelst
Copy link
Contributor

@VincentVerelst
Copy link
Contributor

Additional note: the GeoParquet of points with minimal metadata (GT code, SampleID, ...) should be uploaded to somewhere like Artifactory, so that it can be read with openeo.

@kvantricht
Copy link
Contributor Author

Additional note: the GeoParquet of points with minimal metadata (GT code, SampleID, ...) should be uploaded to somewhere like Artifactory, so that it can be read with openeo.

based on what? Extractions that are finished?

@VincentVerelst
Copy link
Contributor

Additional note: the GeoParquet of points with minimal metadata (GT code, SampleID, ...) should be uploaded to somewhere like Artifactory, so that it can be read with openeo.

based on what? Extractions that are finished?

Well I think there are two options:

  • Create the point sample geoparquet file with STAC metadata. Then we can load_stac from disk
  • Create the point sample geoparquet and directly load the Parquet file in. This is only possible with load_url, which only works over HTTP

@VincentVerelst
Copy link
Contributor

After discussions with the openeo devs, we have decided to use the load_url process to load the reference data in openeo. For this two features need to be implemented on the VITO/CDSE backend:

@GriffinBabe
Copy link
Contributor

Also depends on

I'm taking care of this this week

@HansVRP
Copy link
Contributor

HansVRP commented Sep 17, 2024

The decision has been made to develop this on the python client side.
It is very specific to world cereal, so it will end up in the world cereal repo.

Underlying technology to evaluate is duckdb/pandas.

requirement: location to store the geoparquet. (need to be accessible by terrascope back-end)
--> DIscuss with Peter/Jeroen

What defines success: Speed and efficiency

VincentVerelst added a commit that referenced this issue Sep 19, 2024
VincentVerelst added a commit that referenced this issue Sep 19, 2024
VincentVerelst added a commit that referenced this issue Sep 19, 2024
VincentVerelst added a commit that referenced this issue Sep 19, 2024
VincentVerelst added a commit that referenced this issue Sep 24, 2024
VincentVerelst added a commit that referenced this issue Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants