-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Date Math processes #247
Comments
Indeed, we had date processes in the initial processes draft, because I thought that's an obvious use case, but there were also people arguing that you could do that via the client libraries better so that got removed. Anyway, I'm happy to make a draft - I think it will be based on ISO8601 though and then cli libraries hopefully make it easy enough to construct those?! |
Indeed, but those dynamics change once you start working with UDPs |
This seems not as trivial as it looks like. Some things that need to be considered:
So we need language for that but not sure yet what most libraries actually do. Will dig into some examples. |
Another question is whether we support ISO 8601 timespans (e.g. |
Should we support strings that contain only date or time? That seems to make it much more bothersome to document and implement it seems, so I may restrict it to date AND time. Thoughts? |
We always work in UTC, right? So I don't think we should bother with summer-winter time changes.
I think that can cause invalid dates, e.g.
Alternative solutions:
|
I'd expect that the proverbial 99% use case will be just shifting one date level, so I think that your latter proposal is just fine. Clients libraries can still jump in to hide the fact that you have to do multiple calls when you want to shift multiple date levels.
I wonder how important the time level actually is in EO in general. In all VITO use cases I've seen we never bother about time granularity. That being said, I think it shouldn't be too hard to cover both the cases "date" and "date+time" simultaneously in the spec and implementation. @jdries any thoughts? |
Unfortunately not (yet).
Indeed, okay, yet another "solution" that has disadvantages. There seems to be a good reason for panda deprecating it.
Indeed, 12 months != 1 year would be an issue and can lead to quite unpredictable results. Then it's probably better to disallow month and year and let people compute the number of days on their own. At least then no unexpected surprises will happen.
Could be an option, but I think we should look into libraries more into detail and figure out what they are actually doing. If we find a common ground there, it's makes implementing the process easier in most cases. |
What would date_shift('2020-01-01', 1, 'hour') do? Throw an error or return '2020-01-01T01:00:00Z'? We already have the "defaults to 0" behavior for milliseconds... |
I would return "2020-01-01": assume time 00:00:00, keep granularity for output and round down to do so Likewise, for |
A couple of quick observations from the wild (Python):
|
we could also start with not supporting year and months, and only provide support for days, (maybe weeks), hours, minutes, ... The discussion about years and months could be a different follow up issue |
On a more general note, what you commonly find in date math libraries like Currently in #248 we have the idea to do the date shift for a single temporal level (day/hour/..) at a time because that should be enough for most use cases. But for date replacing, you probably want to set multiple components at the same time (e.g round a date down to first of month, or up to last of month).
where
|
Keeping granularity sounds reasonable.
Thanks for those observations. I'll try to add some JS, Java and R later.
Indeed, although I feel like that in our domain year and month would be the core use-cases, right? I mean hours, minutes and (milli)seconds are not the granularity of usual EO processing so we'd be left with just days (and the shortcut weeks).
While this looks like a unified behavior on the first look, the JSON Schemas would still be different:
So an alignment doesn't seem to help a lot here and I think we can define date_shift mostly independently of a potential date_replace. Also, in process definitions I usually try to avoid objects as parameters as they are harder to grasp from a user perspective. |
That behavior seems pretty logical and reasonable to me. Would that explain it well enough?
I found that much simpler to understand in contrast to:
|
I understand that it looks underwhelming to have a solution where a day level shift is the only practically relevant option, but in the original use case of this feature request (date manipulation in UDPs) that would already be a game changer.
Good point about the schemas, but that doesn't mean that you should drop a more loose form of uniform interface all together. Actually, in my former job I often had to do a lot of "date math" (using the arrow library) and that typically involved combinations of shift and replace (e.g.
True, but how would you propose to define The value of having both
Yes I agree, from all options I've seen so far this one would also my preference now. Also note that this behavior could also be used for date_replace : |
I've just pushed a new commit, please have another look. It mostly adds the date-only support.
The minutes/hours/seconds/milliseconds don't add much complexity so it's fine to keep them. With months and years we may seem to settle on something reasonable right now. I'll do further investigations in other languages next to verify that it's easy to implement though.
On the other hand, we could also make date_replace work with the same API that is available in date_shift. For the usecase above that's totally fine at least.
In a world where there are only named parameters available, I'd go with parameters. But in JS for example that could lead to something like |
I agree here, having the days would be already a good start and covering many use cases. |
So I had a look at R's lubridate (nice cheatsheet, by the way: https://rawgit.com/rstudio/cheatsheets/master/lubridate.pdf ), JS's moment.js and Java's java.time. For them, all examples work correctly, including leap month and leap second "snapping". From what you have written, I assume Pythons dateutil package will works, too. See the PR for a new version of the process, please. |
date_shift has been merged as a new process, date_replace and/or date_get could be the next potential candidate. See #254 |
I just got this feature request:
I think this is an interesting use case to start introducing "date math" processes, so that they can be used inside the UDP of this use case.
A good one to start seems to be something like
date_shift
(ordate_delta
,date_offset
), taking arguments:date
(string): reference date to work fromdelta
(string): a string expressing a time shift, like "90D" (90 days) in the example (also see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_timedelta.html). The delta probably also have to include a sign to indicated if the delta has to be added or subtracted.and returning the shifted date (as string)
The text was updated successfully, but these errors were encountered: