Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regrid example #305

Merged
merged 3 commits into from
Oct 24, 2023
Merged

Conversation

anton-seaice
Copy link
Collaborator

@anton-seaice anton-seaice commented Oct 23, 2023

Closes #300

After the re-grid, the new index dimensions (x/y) are renamed to longitude and latitude. I added a line to explicitly delete the old longitude and latitude coordinates first to avoid there being a name conflict.

The fix is backward compatible, it works on both conda/analysis and analysis-unstable

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 23, 2023

View / edit / reply to this conversation on ReviewNB

navidcy commented on 2023-10-23T14:27:30Z
----------------------------------------------------------------

can we deal with these warnings? they are trying to warn us about something and we ignore them. @dougiesquire?


anton-seaice commented on 2023-10-23T21:48:11Z
----------------------------------------------------------------

I don't think they are relevant for our circumstance.

  • We could just load the whole da into memory, its small, but this makes the notebook harder/more confusing to modify later.
  • Or if we leave as is, the chunks seem fine - one chunk per file. I think the bottleneck in doing an operation on this da would be opening all the files, so one chunk per file seems fine (1 chunk per core would be better presumably, by we don't know how many cores the user has).

Saying that, i am definitely a dask amateur :)

dougiesquire commented on 2023-10-23T22:33:03Z
----------------------------------------------------------------

I think we want to understand what's going on here. The warnings are telling us that the cookbook is opening the data with dask chunks that split the chunking on disk, which is definitely not something we want to do. The default chunking of the cookbook should be the chunking on disk, so to me this looks like it could be a bug in either the cookbook or in xarray.

However I can't reproduce the warnings. I don't get them with either analysis3-23.04 or analysis3-23.01.@anton-seaice, @navidcy, can you confirm that you do?

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 23, 2023

View / edit / reply to this conversation on ReviewNB

navidcy commented on 2023-10-23T14:27:31Z
----------------------------------------------------------------

Line #3.    ssh_1_regridded = tidy_coords(ssh_1_regridded)

this is just 1-2 operations so let's not define a new function, e.g., tidy_coords but rather let's do them in line one after the other

also, let's avoid numpy indexing, e.g., [:, 0] and use instead xarray.


@navidcy navidcy added the 🛸 updating An existing notebook needs to be updated label Oct 23, 2023
Copy link
Collaborator Author

anton-seaice commented Oct 23, 2023

I don't think they are relevant for our circumstance.

  • We could just load the whole da into memory, its small, but this makes the notebook harder/more confusing to modify later.
  • Or if we leave as is, the chunks seem fine - one chunk per file. I think the bottleneck in doing an operation on this da would be opening all the files, so one chunk per file seems fine (1 chunk per core would be better presumably, by we don't know how many cores the user has).

Saying that, i am definitely a dask amateur :)


View entire conversation on ReviewNB

Copy link
Collaborator

I think we want to understand what's going on here. The warnings are telling us that the cookbook is opening the data with dask chunks that split the chunking on disk, which is definitely not something we want to do. The default chunking of the cookbook should be the chunking on disk, so to me this looks like it could be a bug in either the cookbook or in xarray.

However I can't reproduce the warnings. I don't get them with either analysis3-23.04 or analysis3-23.01.@anton-seaice, @navidcy, can you confirm that you do?


View entire conversation on ReviewNB

@anton-seaice
Copy link
Collaborator Author

Ah thanks!

However I can't reproduce the warnings. I don't get them with either analysis3-23.04 or analysis3-23.01.@anton-seaice, @navidcy, can you confirm that you do?

They occur in analysis3-23.07

@dougiesquire
Copy link
Collaborator

They occur in analysis3-23.07

Ah whoops - didn't notice 23.07 - ta

@anton-seaice
Copy link
Collaborator Author

adding a chunks={'time':'auto'} keyword argument to getvar removes the warning, which leads to looking at this line in the cookbook for getvar:

xr_kwargs = {"chunks": _parse_chunks(ncfiles[0].NCVar)}

but I can't see anything obviously wrong there.

@dougiesquire
Copy link
Collaborator

dougiesquire commented Oct 24, 2023

I think this is a bug in xarray. There's something funny going on the with calculation of the preferred_chunks attribute for NetCDF files. The implementation of preferred_chunks for NetCDF files is a new feature in v2023.09.0 which is why this warning has only started recently. I'll keep digging, but for now I think we'll just have to accept (or ignore) this warning.

See update here: #305 (comment)

@navidcy
Copy link
Collaborator

navidcy commented Oct 24, 2023

OK!

@dougiesquire
Copy link
Collaborator

dougiesquire commented Oct 24, 2023

The issue comes about because NetCDF UNLIMITED dimensions can have ChunkSizes that are larger than the dimension size (as is the case for time dimension in the data in this notebook and , it would seem, much of the COSIMA data) and the recently-added preferred_chunks attribute implementation doesn't account for this. I'll open an issue with xarray

See update here: #305 (comment)

@dougiesquire
Copy link
Collaborator

dougiesquire commented Oct 24, 2023

Specifying chunks="auto" (or chunks={"time": "auto"}) as mentioned by @anton-seaice will avoid these warnings and is probably good practice in most cases anyway (note, however, that you can't do "auto" chunking with object dtype).

@anton-seaice
Copy link
Collaborator Author

Specifying chunks="auto" (or chunks={"time": "auto"}) as mentioned by @anton-seaice will avoid these warnings and is probably good practice in most cases anyway (note, however, that you can't do "auto" chunking with object dtype).

Maybe chunks='auto' should be the default in the cookbook, rather than loading the chunksizes from the file?

In this workbook, they re-chunk before doing processing, so probably no change needed for this one?

@dougiesquire
Copy link
Collaborator

dougiesquire commented Oct 24, 2023

Maybe chunks='auto' should be the default in the cookbook

I thought about this for Intake-ESM, but unfortunately object dtype data is too common for this to be a good default.

In this workbook, they re-chunk before doing processing, so probably no change needed for this one?

I didn't get that far through the notebook. If that's the case, it's best to open the data with chunks as close to the final chunks as possible - see here

@anton-seaice
Copy link
Collaborator Author

Maybe chunks='auto' should be the default in the cookbook

I thought about this for Intake-ESM, but unfortunately object dtype data is too common for this to be a good default.

In this workbook, they re-chunk before doing processing, so probably no change needed for this one?

I didn't get that far through the notebook. If that's the case, it's best to open the data with chunks as close to the final chunks as possible - see here

Thanks - I can't make the chunks='auto' change due to a cookbook issue, but will do chunks={"time": "auto"}

@navidcy
Copy link
Collaborator

navidcy commented Oct 24, 2023

Maybe chunks='auto' should be the default in the cookbook

I thought about this for Intake-ESM, but unfortunately object dtype data is too common for this to be a good default.

In this workbook, they re-chunk before doing processing, so probably no change needed for this one?

I didn't get that far through the notebook. If that's the case, it's best to open the data with chunks as close to the final chunks as possible - see here

Thanks - I can't make the chunks='auto' change due to a cookbook issue, but will do chunks={"time": "auto"}

cc @angus-g , @micaeljtoliveira

@dougiesquire
Copy link
Collaborator

Sorry, I'm actually wrong about this being an issue in xarray. I didn't notice at first, but the average_* variables in the data have NetCDF _ChunkSizes = 512 (along time). The cosima-cookbook tries to open these data with the NetCDF chunking of the variable requested (in this case {'time': 1, 'yt_ocean': 300, 'xt_ocean': 360}), which does divide the chunks for the average_* variables. Of course, these variables aren't actually returned in this case, but the warning is still thrown.

The fix is to get rid of the logic in the cosima-cookbook that sets the default chunks to the variable NetCDF chunks. This should be replaced with chunks={} which opens each variable with it's own NetCDF chunks.

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 24, 2023

View / edit / reply to this conversation on ReviewNB

anton-seaice commented on 2023-10-24T02:48:47Z
----------------------------------------------------------------

@dougiesquire Does this new description make sense?


dougiesquire commented on 2023-10-24T02:58:09Z
----------------------------------------------------------------

Looks good.

Copy link
Collaborator

Looks good.


View entire conversation on ReviewNB

@anton-seaice anton-seaice merged commit c9af3d9 into COSIMA:main Oct 24, 2023
2 checks passed
@access-hive-bot
Copy link

This pull request has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/xarray-warnings-while-loading-data-using-cosima-cookbook/2169/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🛸 updating An existing notebook needs to be updated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regridding.ipynb fails in conda/analysis3-23.07
4 participants