-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML Data Cube Regularization #444
Conversation
@m-mohr , I am seeking your eyes whenever you get to have a moment as I have fixed most failures but I am taking way longer to trace this. |
fyi: I won't get to it anytime soon, sorry. |
Thanks for getting back. It's fine. I'll figure it out soon. |
I'm not sure I understand why this process is necessary. The description talks about "irregular" but if your data is in a openEO data cube, then it's pretty regular already. Your time instants could be spaced unevenly, but that doesn't mean that an ML model could not handle that. This process looks like a combination between
In this state, I think more generally: is there a compelling reason to define |
The use case has even been explored quite extensively in openEO platform, and made it into public examples: https://github.com/Open-EO/openeo-community-examples/blob/main/python/BasicSentinelMerge/sentinel_merge.ipynb |
@soxofaan thanks for the feedback, on the OEMC project we are planning to come up with a new openEO backend with a more focus on ML and DL capabilities for Satellite Image Time Series. Regular data cube in our case encompasses: (a) there is a unique field function; (b) the spatial support is georeferenced; (c) temporal continuity is assured; and (d) all spatiotemporal locations share the same set of attributes, and (e) there are no gaps or missing values in the spatiotemporal extent. In our discussion, there were philosophies as shown in the image below and we would like to support both i.e. (1) allowing users to define their processes before ML/DL operations and (2)not bothering the users with underlying processes. @jdries cool, I will check out the examples. |
Nice, this is exactly what I happen to be working on for the moment, in support of a couple of projects using ML. Maybe you already know, but openEO has a mechanism to build this kind of convenience function that is a combination of existing processes, the openEO 'user defined processes' (UDP). Using this has a couple of advantages:
I see this case arising more often, so maybe we can create an open source github repo, with the definitions of these UDP's. That would allow users to reference the central repo, or allow backends to import those definitions. Now about the actual process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you already know, but openEO has a mechanism to build this kind of convenience function that is a combination of existing processes, the openEO 'user defined processes' (UDP).
Yeah, maybe some of these processes should go into openeo-community-examples if they can be built on top of other processes? This could also apply to the ard_* processes. All these are very heavyweight processes, that may not 100% fit into the current process landscape. I'll take this into the PSC for discussion.
I think we should at least consider trying to solve this use case with existing processes, i.e. add a "process_graph" member to the process description.
@PondiB I think it would make sense to make PRs against the ml branch because otherwise all changes from the ML branch will also appear in this PR. This leads to confusion. Please rebase your changes against the ML branch if necessary and set the base branch of the PR to ml. |
Sure. |
Closing this. |
Regularized datacubes are a necessity for machine learning and deep learning in EO time series data. This process aims to eliminate the need for a user chaining processes to have a consistent data cube