-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time coordinate values as strings #14
Comments
While the format/meaning of the time strings may be obvious to you or a
human and may even follow a standard like ISO 8601:2004(E), there is no
standard way in CF to specify the format of a time string (e.g., Java's
"yyyy-MM-dd'T'HH:mm:ssZ")
Yes, there is: in the “timedelta units since an_epoch” string, the format
of the epoch is specified—I’m pretty sure it’s ISO 8601.
The problem is that CF requires time coordinates to be stored in that
“encoding”, rather than as an array of datetime strings.
That was a proposed and rejected a couple years ago, though there is a
whole active discussion about time that MAY re-omen that discussion.
But while datetime strings may seem more JSON friendly— I think the real
driver is use cases - the CF way if describing time is a good one if you
want to work with the time access numerically — computing rates of change,
etc.
The string representation on the other hand is better for things like
timestamps when a measurement was taken.
But these considerations really aren’t any different for JSON than netCDF.
Either way, someone is going to need a decent datetime library for working
with time.
…-CHB
so there is no way for software written to follow the CF specification to
deal with String dimension values and know what the format is (how to parse
them).
There are literally 1000's of time formats in use in scientific data files.
Some of them can't even be deciphered by humans because 1 or 2-digit year
values make the values ambiguous. Let's avoid this problem or deal with it
properly (in CF).
One of the big advantages of following a standard is that software can work
with the files automatically. Otherwise, everyone has to write custom
software to deal with each of the non-standard file variants.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA38YLfDdcznQOqg_e4wmGUTXyMqHURkks5vdAEtgaJpZM4cZaix>
.
|
I totally agree that the "real driver" is, or should be, common use cases, and I think that's where JSON and netCDF have at least one major difference: JSON is human-readable (or rather can be, and in the vast majority of use cases I've come across is), netCDF isn't. So allowing human-readable date / time strings (ideally probably a very small subset of ISO 8601, or maybe even some derivative of RFC3339 with some small modifications?) preserves this characteristic of JSON, which I think is very valuable (for development, especially across languages and related software ecosystems, manual manipulation in interactive environments of interpreted languages, logging, debugging, network inspection, in-browser inspection, database support, etc.). |
Good point. In environments where the common format for the storage or transport of this kind of data is netCDF, I think there's a high chance that you'll also be able to directly or indirectly use the official netCDF (C/C++/Fortran; from memory I think the Java one can't write netCDF4, right?) and/or udunits (C) libraries. JSON, on the other hand, is used in a lot of environments where that is not the case (e.g., in-browser, in-database, serverless cloud functions-as-a-service), so providing functionality to mimic udunits behaviour is tricky... while functionality to deal with ISO 8601 strings (or at least some subset) is widely available. |
Good point -- I'll mention that the fact that the C netcdf lib's "ncdump" supports: ''' Was used as an argument for why a string encoding for datetime was not necessary in netcdf. |
udunits is almost irrelevant here -- yes, it does handle translation of seconds to hours, etc, (and annoyingly defines "month" and "year" for such translations... but it is not a full feature datetime lib. From the docs: "You should use a true calendar package rather than the UDUNITS-2 package to handle time. To do real work, you need something more complete -- e.g. Python datetime module.
The string parsing is only a small set of what you need -- and it's the easy part (and ironically, not handled by Python's DateTime :-) ). My point is that if you want to do things like compute how much time as passed between two timestamps, you need something beyond string parsing. JSON is by no means only used in Javascript environment, but there have GOT to be datetime libs available. |
Right, I vaguely remember reading that right at the end of that old trac ticket... So can we say that the fact that we're now having this discussion in the context of human-readable JSON rather than (only) binary netCDF is a new argument, and in favour of reopening that discussion for CF? |
Well, I'm not sure where the community is on the idea of de-coupling CF from netcdf at this point -- so I have no idea if that's an argument that will "fly" But NOTE: IF one allows datetime strings in cf-json, then you are going to run deep into the whole CAlendar question: What does "gregorian" mean? None of this is easy :-) -- but at least if you stick with the time encoding in CF, the problems are the same everywhere :-) |
Don't current CF time coordinate conventions already not only allow but actually require a datetime string "encoding", namely in the reference time in the units attribute? And therefore have all the associated calendar issues anyway? How does the encoding (as in "numerical value in single given time unit since reference date and time" vs. "year/month/day/hour/minute/second/fraction/offset" with an implied since-calendar-origin) of the actual time coordinate variable values change that? I think I'm roughly aware of at least some of these calendar issues, but obviously not an expert :) Do I need to read all the way through that "Add calendars gregorian_tai and gregorian_utc" CF issue to understand this, or is there a simpler explanation that I'm missing? I agree that for further computational / numerical processing, having the the numerical values directly available is more efficient. But once I've decided to work with JSON, computational / numerical efficiency is really not the first thing on my mind... more like the last :) |
@BobSimons , I agree with pretty much everything you said, I'm just honestly not sure whether we're reaching slightly different conclusions from the same premises, or we're both preaching to the choir and (I?) just haven't realised it yet... :)
Absolutely, I'll make every effort not to claim compliance unless it is truly compliant. Which is one of the reasons why I think we can't quite (yet) fully ditch
Are these ambiguities in ISO8601 that you're thinking of? I haven't encountered any, but I've only ever dealt with certain subsets of the ISO8601 options. And I very much hope that any CF or
Yup, exactly that last bit is one of the reasons why I like something like a subset of ISO8601, just that I'm probably thinking of a different environment / software ecosystem: if we only allow the current CF / udunits way, the consumers that I'm thinking of would be forced to reimplement a subset of the udunits functionality, instead of being able to plug-and-play one of the many existing implementations of ISO8601 parsing / generating functionality. |
no -- I"m not sure why there is discussion about hat, parsing the string is NOT the problem.
I was thinking primarily of converting a CF netcdf file to JSON, but the same applies if you have, e.g. model data that that is naturally in seconds since the start of the model run.
Well, that's not a very efficient way to get the info -- but yes, kinda :-( They key issue is that there are two primary use cases -- the one that CF was originally developed for, often model data -- is naturally in some timedelta since a timestamp. 00 that is, you started the model at some datetime, and the output every timestep. A very different use case is things like measurements taken by an instrument with a timestamp provided when the measurement was taken -- these are discreet events, each with a timestamp, so a string timestamp is natural here. (though there are still issues with Calendars, and leap seconds, and timezones...) But in the end, this is a big mess -- but if cf-json follows CF, at least it's the SAME mess :-)
It's not about efficiency, it's about computability: How many seconds have passed between Jan 20, 1985 at 10:43 and March 3, 1985 at 1213 ? You need a datetime lib to do that. whereas if you want to know a time duration between two points on a time access encoded as "seconds since 1980-01-01T00:00" it's a trivial computation. And once you have that time lib, then working with the standard CF approach is not hard. And no, we aren't having to implement a subset of UDunits --more like a superset, when it coems to time processing -- UDunits does not provide much. |
Please see my comments below...
Yes.
I agree very much about making a few small changes to CF. I wrote up 6 basic proposals and tried to get CF to make progress on the first two. But it was horrible. It just got bogged down in endless discussion where people were talking past each other. I gave up on trying to make changes to CF. Clearly, some people are more suited to that process and have the time a patience for it. Perhaps you will have better luck than I did.
Yes. There was a 1988 version of ISO 8601 that listed a larger number of formats. The 2004 version of ISO deprecated many of those in favor of a few formats (one for each of several different purposes). I advocate that people use that ISO 8601:2004(E) standard format, for example, 1985-01-02T00:00:00Z (but also with more or less precision, and with different time offset formats also allowed). There is a 2019 version of ISO 8601 -- I haven't read it yet. See [https://en.wikipedia.org/wiki/ISO_8601]
RFC3339 is similar to ISO 8601:2004(E) (same format) but less suitable because it is just for the Gregorian calendar and so doesn't deal well with dates before 1582 (the switch from Julian to Gregorian) or with other calendars (360 day? 365 day? used in models). (It's complicated.) RFC3339 is really just intended for a very limited scope: recent dates and common usage (e.g., dates on web documents). But ISO 8601 also doesn't deal with other eras. ISO 8601 says groups can extend use of ISO 8601 to BCE years for use in their group by following an agreed upon convention. For this, I advocate (and use in ERDDAP) using Astronomical Year Numbers (2 CE is year 2 in astronomical years, 1 CE is year 1, 1 BCE is year 0, 2 BCE is year -1, etc), not eras. Astronomical Year Numbers have a lot of advantages.
I understand. That is why I said "Well, anyone can do what they want" within their community. Yes, dealing with CF times is a pain, but CF has answers for many of the complications related to time, and it is (reasonably) easy to work with numerically given a good date time library. Although standardizing on ISO 8601 (hopefully :2004(E)) works well for humans, you still need a library to deal with those values if you want to compare them, manipulate them, analyze related data, etc. The real problem is that human dealings with time are horribly complex. The more you get into it, the more complex you see it is. There are no easy solutions (unless you limit the scope). (E.g. how does a given system deal with leap seconds? [http://leapsecond.com/java/gpsclock.htm] ). I am all for picking as few standards as possible, and sticking to those in order to minimize the complexity and maximize the size of the community that it works for. But again, your community should feel free to do what is best for your community -- but consciously understand that in doing so you are walking away from other communities, their software tools, and their ability to work with your data (without extra effort). (I think you understand that.) Good luck. Best wishes. |
Split out from #10 , where the discussion had gotten this far:
On 2019-03-29T05:52:10Z, @aportagain said:
On 2019-03-29T15:45:45Z, @ChrisBarker-NOAA said:
On 2019-03-29T17:07:33Z, @BobSimons said:
The text was updated successfully, but these errors were encountered: