-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidation of cf-json
and nco-json
#10
Comments
Kyle,
|
Also hoping for clarification on if
|
As I see it, CF was/is built on the netcdf data model. Thus you should be able to "do" CF in any format that fully support netcdf. With that in mind, ideally, "CF JSON" would simply be: "Use nco-json, and follow all the same rules that CF has" And it sounds like we are close to that, except that maybe it's: "Use nco-json in pedantic mode, and follow all the same rules that CF has". So ideally, if there are any discrepancies still, we hash those out for nco-json. On to some points:
Interesting -- I had thought of "optional data" as being when you have a variable only in order to store the attributes, but don't really have any meaningful data to attach. IN that case, while I think netcdf simply does not allow data-less variables, JSON does, so we should allow it -- no data means, well, no data that I care about. But this use case -- there IS real data, but I want a meta-data-only representation is tricky. In theory, you could want that with netcdf as well -- but I suspect there is any CF legal way to specify metadata-only. Frankly, I think the the way to handle this is to alter CF, rather than being JSON specific, this is off the cuff, but something like: """ In any case, whether in all of CF or jsut JSON, it needs to be clear whether no data means there is no data, or if it means I've left the data out -- and if you've left the data out, it's probably a good idea to specify the data type and shape (dimensions?)
Really? why is that? makes no sense to me -- I would expect the JSON parser in a JS interpreter to be about as efficient as possible, and if you are going to generate all those sub-arrays anyway, there is no memory advantage. Personally, I'd much rather "one way to do it", and that way should be keeping the structure in the JSON. |
@czender: what does this have to do with the Consolidation of Did you accidentally post to the wrong issue? |
Ooops! Ignore my TMTS (Too-many-tab-syndrome) behavior |
I thought there would be unanimous endorsement of making pedantic ( |
I'm in favor of a more open ended spec allowing all of current |
It sounds like @kwilcox and I agree that it should all be valid. I guess I'm still looking for guidance on whether the NCO default should be jsn_fmt=0, 1, or 2. Ideally that choice would be consistent with the wishes of the largest fraction of the userbase. It's almost a guess at this point, yet based on my informal polling at AGU in December, I think the ERDDAP folks generate the most NCO-JSON. Am I wrong? And Kyle, would it be helpful if NCO added a new jsn_fmt=3 that types all attributes except |
@czender wrote:
I"m not sure that quite captures the difference -- I don't know that JSON embodies minimalism, but rather it IS a format with minimal complexity. It was originally designed to be a serialization for Javascript, which is also a pretty simple language (or at least has a limited type system):
netcdf and CF support a wider set of types, so there is a mismatch. It does seem that there are enough use-cases for simpler JSON that having only one full featured mode is not going to fly. But I wonder if we can more clearly define the "simpler" use cases, and thus have semi-lossless conversion without being pedantic -- that is, there is a subset of netcdf data that can expressed in a simpler form, and it is clearly defined how to go from JSON to netcdf without a full spec. I"ve lost track a bit, but somethign like: If there is no datatype specified, then numerical data are doubles, and string data are unicode text with a standard encoding. (Sorry, I've really lost track of how to express that in netcdf :-) ) If there is no dimensionality provided for an array, it is the dimensionality it is presented in -- that is, 1D is 1D, nested for higher dimensions. If there is no data, then the data is irrelevant (that is, an attribute only variable) -- in that case. [note: we could maybe make a distinction between integers and floats if we wanted] When going from netcdf => JSON, we could also have well defined rules: All numeric types are JSON numbers This would allow almost loss-free round tripping .... In fact, the JSON writer would check for values that are out of bounds and raise a warning or error. Though now that I think about it, out of bounds values are going to be a problem anyway... Finally, maybe the default behaviour could be the less pedantic mode when there is no ambiguity. -CHB |
@ChrisBarker-NOAA These are helpful suggestions, and they appeal to my minimalistic side. Moreover, they largely though not completely consistent with how the default NCO implementation, aka jsn_fmt=0, prints attributes. Level 0 formats attributes without types if possible. One difference is that NCO currently has no way of printing variables that way. It calls for a new formatting level which simplifies the printing of variable data similar to attribute data. I think it is worth doing because, as you say, it preserves the essential data in the sprit of JSON, with minimal overhead. |
How often are non-string attributes actually used? For my part, I always thought that strings were the only option for attributes (until the conversation about JSON...) But anyway, it is fully specified and lossless to have default types for JSON. That is, specifying that: [1.0, 2.3, 4.5] is an array or doubles is exactly the same as not specifying the type. So could we say that anywhere a type is not specified it IS the JSON type? Then folks converting netcdf to JSON would, by default get "untyped" JSON for doubles, and could opt in for untyped (uptyping) for floats, for example. I'm still a bit stuck on strings, as I am very confused about how one is supposed to handle unicode in netcdf. |
Non-string attributes are essential. I also wanted to clear up that the intention of
I'm glad this spurred more ideas and thoughts than I thought it would. I've noticed other groups talking about a generic text-based metadata format and
I'd like to see
I'm good with always keeping the data in the shape of the variable. We decided something already. High fives. ✋
For a lossless round trip of netCDF -> JSON -> netCDF, both the attributes and variables need to be typed. There is no alternative. If |
Thanks @kwilcox for bringing this discussion here, and everyone else for their contributions so far! And apologies for not getting involved earlier, unfortunate timing, this issue was opened the day after I went on annual leave and I just got back yesterday :) I'll have a chat with some of our (internal) active users of CF-JSON over the next few days to get their opinions and then reply in more detail. |
Just one quick question from my side in the meantime: @czender, in addition to what's explained in the NCO User Guide (http://nco.sourceforge.net/nco.html#index-_002d_002djson) and some of your recent posters / slides (https://www.essoar.org/doi/pdf/10.1002/essoar.10500689.1 , http://dust.ess.uci.edu/smn/smn_json_esip_201807.pdf), is there any separate, single-document "specification" describing the NCO-JSON format (and levels)? I feel like this could be very valuable, for people like us who are now comparing multiple specs, but maybe even more so for other producer / consumer developers to code against? |
@aportagain You have mentioned the major written documentation sources. Those cover most but not all of the bases. I am (still) working on a manuscript to document NCO-JSON. It's almost ready to share, and I will share it once it is, as feedback would be valuable. The NCO implementation (e.g., |
I've had discussions with our various internal stakeholders, and also had a look at the 2018 email thread (http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2017/009423.html) again that @davemetocean initiated before I myself got involved (he's taken some extended time off to sail around the world so probably can't comment himself anytime soon). Seems to me we just kind of lost momentum back then? So the good news in summary from my perspective: from the |
In the interest of one-issue-per-issue I'd probably prefer to split this discussion up into multiple individual issues now, and keep this one for overarching questions if that's alright for everyone... so unless someone objects. Otherwise I'm worried it might get a bit overwhelming for anyone potentially joining or reading up on this later :) |
I'd also be happy to try to organise a conference call, if people think that might be more efficient please let me know. Otherwise I'll prep my various comments within the next week. Sorry for the delay, busy days... |
Alright, I'll start with an easy one, coming back to @kwilcox 's original four points:
My own perspective in summary: if making Split out to: #11 |
And another from my perspective easy one:
I'm pretty certain for our typical current Split out to: #12 |
Regarding
I guess Having said that, I do think we'd like to preserve a clearly defined (and easily referenced) as-simple-as-possible format... most of our use cases really are quite simple, and we do not want to put extra burden on developers that know they'll only ever process the simple cases (especially on the parser / consumer side). I think we have at least three options that have fairly high-level effects on the data structure / schema: classic or netCDF4 "structure" (maybe without opaque types), pedantic or non-pedantic handling of types (or level 0 / 1 / 2 / ...), metadata-only or metadata-and-data. Most of our use cases somehow interact with what I'd vaguely call the "web space", so for us it's very valuable to have a clear association of expected data structure / schema with some agreed upon MIME / media type, and ideally also being able to represent it in OpenAPI, which in turn is somewhat tied to JSON SCHEMA ("uses an extended subset of"... yet another somewhat messy coupling of standards). I haven't quite yet wrapped my head around how one format with multiple "format levels", and then possibly additional "options" like metadata-only would translate to this space... |
Like @ChrisBarker-NOAA and @kwilcox have mentioned, I too wish that for our |
The last separate issue that I can think of at the moment concerns JSON That's probably all from my side before the weekend... :) Let me know what you guys think, and also if you think any of these should be split up further obviously feel free to create new / separate issues! |
Maybe not impossible, but right now one deal breaker remaining seems to
be time coordinate variables: the current CF working draft to my
understanding still only allows numerical values, but we really want to
keep strings as an option for this, so technically cf-json "breaks" CF
compliance in this respect.
No more than storing time with a different encoding in netCDF breaks CF.
In order to work with datetimes in any environment, you need a decent
datetime library — expecting that in a consumer of the data is quite
reasonable.
Ideally, we have one spec that maps the netCDF data model to JSON, and one
that maps CF to JSON (which is probably almost “just do the same thing as
you do in netCDF” we should not extend CF just to make it a little easier
for readers without the right tools.
( and most ( I think JavaScript as well) datetime software works with C
style seconds-since-an-ephoc under the hood anyway.)
I'm not entirely across the whole udunits "coupling"... CF still seems to
do this, but maybe the Common Data Model version 4 actually doesn't?
I hope not — I hate that too. But again, that’s an argument for CF — it’s a
bad Idea to change/extend CF itself just for JSON.
And much as I don’t like the Units coupling, if you see it as; “unit
handling and definitions for CF are specified in this other doc” it’s not
so bad.
And funny enough this has recently come up again in a calendar issue:
cf-convention/cf-conventions#148
<cf-convention/cf-conventions#148> so maybe
there's hope?
Maybe — but that’s kinda stalled out.
…-CHB
|
The last separate issue that I can think of at the moment concerns JSON null
.
What would you use it for? Is null supported in netCDF?
If not, do you see it as an alternative to special “missing values” or
Fill_Value?
…-CHB
|
TL;DR
I don’t think cf-json should be used as a way to extend CF. As frustrating
as it is, the place to move forward is CF is CF itself.
2. `nco-json` supports `groups`. I've barely kept up with CF-2.0
discussions but supporting this now might be a good idea?
I guess cf-json currently implicitly only supports the "classic" / netCDF3
data model, while nco-json supports most of netCDF4,
Again, CF is CF, if CF doesn’t support groups, neither does CF-JSON.
Whether nco-json supports groups or not is irrelevant.
…-CHB
|
Well, anyone can do whatever they want.
But there are advantages to sticking to the standard and (in this case) not
writing String times as dimension values in files that are supposed to be
CF compliant (until the CF standard says it's okay and how to do it):
While the format/meaning of the time strings may be obvious to you or a
human and may even follow a standard like ISO 8601:2004(E), there is no
standard way in CF to specify the format of a time string (e.g., Java's
"yyyy-MM-dd'T'HH:mm:ssZ"), so there is no way for software written to
follow the CF specification to deal with String dimension values and know
what the format is (how to parse them).
There are literally 1000's of time formats in use in scientific data files.
Some of them can't even be deciphered by humans because 1 or 2-digit year
values make the values ambiguous. Let's avoid this problem or deal with it
properly (in CF).
One of the big advantages of following a standard is that software can work
with the files automatically. Otherwise, everyone has to write custom
software to deal with each of the non-standard file variants.
…On Thu, Mar 28, 2019 at 10:52 PM Alex Port ***@***.***> wrote:
Like @ChrisBarker-NOAA <https://github.com/ChrisBarker-NOAA> and @kwilcox
<https://github.com/kwilcox> have mentioned, I too wish that for our
cf-json purposes we could just use some variation of nco-json as the
actual format with a global Conventions attribute of CF-X.Y... Maybe not
impossible, but right now one deal breaker remaining seems to be time
coordinate variables: the current CF working draft to my understanding
still only allows numerical values, but we really want to keep strings as
an option for this, so technically cf-json "breaks" CF compliance in this
respect. I'm not entirely across the whole udunits "coupling"... CF still
seems to do this, but maybe the Common Data Model version 4 actually
doesn't? I found a related trac ticket from over a decade ago:
https://cf-trac.llnl.gov/trac/ticket/14 , and funny enough this has
recently come up again in a calendar issue:
cf-convention/cf-conventions#148
<cf-convention/cf-conventions#148> so maybe
there's hope?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABarOADavrf3u_wonUp4YAxtkMSLwv_hks5vbaoMgaJpZM4a5i02>
.
--
Sincerely,
Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
99 Pacific St., Suite 255A
Monterey, CA 93940
Phone: (831)333-9878
Fax: (831)648-8440
Email: bob.simons@noaa.gov
The opinions in this message are mine personally and do
not necessarily reflect any position of the U.S. Government
or the National Oceanic and Atmospheric Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <><
|
On data types and "levels": I finally looked at the docs in NCO (thanks Charlie!) I think there is a distinction between what a file means and what you write. And for the spec, ideally we don't have a bunch of "levels". AND I think we can, in fact, do that by using essentially the level-1 spec: If the type is not specified, then it is the base JSON type: int, float64, string. If the type is specified, then it is (Obviously) the type specified :-) So a READER can simply use that spec -- it only has to read one thing. It does make reader code a bit more complex, 'cause it has to check whether or not the type info is there, but once that code is written, you're done. writers, on the other hand, do need to know how they want to write the data -- essentially whether it's lossy or not (with regard to type), but either way, they are always writing to a single spec. maybe we could call it: "type_complete" or "type_lossy" Or something like that? NOTE: I'm a bit confused about the need for pedantic mode. If we say, for instance, that no type spec always means float64, then there is no ambiguity, is there? Though the way I'm thinking of it, a "pedantic" file would conform to the spec in any case. The only danger is that a reader that expects pedantic files will not be able to read not-quite-pedantic files. |
Yup, "alternative" sounds right: since JSON conveniently does provide Split out to: #13 |
I've also split out "time coordinate values as strings": #14 . It's obviously a can of worms... no surprise there... :) |
Hey folks, I think most of you were involved in the conversation about a year and a half ago regarding
CovJSON
,cf-json
andnco-json
. Most was on thecf-metadata
mailing list and carried over to covjson/specification#86.I'd like to focus in this issue on resolving the differences between
cf-json
andnco-json
so we can focus on developing and promoting a single spec. I myself usenco-json
extensively. Here are what I consider the main differences and please post any objections or clarifications as it has been a long while since we had this conversationdata
is required incf-json
and optional innco-json
nco-json
supportsgroups
. I've barely kept up with CF-2.0 discussions but supporting this now might be a good idea?data
arrays. The client is then responsible for reshaping to the defined variable shape@czender @BobSimons @ChrisBarker-NOAA @rsignell-usgs @pedro-vicente @davemetocean @aportagain
The text was updated successfully, but these errors were encountered: