Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading CICE data is very expensive #287

Open
MartinDix opened this issue Jun 8, 2022 · 4 comments
Open

Loading CICE data is very expensive #287

MartinDix opened this issue Jun 8, 2022 · 4 comments

Comments

@MartinDix
Copy link

Loading a CICE variable takes much more time and memory than a MOM variable. E.g.

import cosima_cookbook as cc
session = cc.database.create_session()
expt = '025deg_jra55_ryf9091_gadi'
aice = cc.querying.getvar(expt, 'aice_m', session, n=120)

takes 90 s and several GB of memory (from notebook on OOD) compared to

sea_level = cc.querying.getvar(expt, 'sea_level', session, n=120)

which takes ~15s. Trying to load the full run for a CICE variable takes a crazy amount of memory.

I think the issue is that the CICE variables have

                aice_m:coordinates = "TLON TLAT time" ;

where TLON and TLAT are 2D variables included in the CICE files. MOM variables have

                sea_level:coordinates = "geolon_t geolat_t" ;

where geolon_t and geolat_t are not in the files.

I think this means that xarray.open_mfdataset is reading TLON and TLAT for each file to check if it has to concatenate on those coordinates.

I couldn't see a way of persuading xarray that it should only try to concatenate on the time dimension.

@rmholmes
Copy link

rmholmes commented Jun 8, 2022

Hi Martin. I'm not sure of your specific case, but when loading datasets using xr.open_mfdataset I typically use something like:

OISST = xr.open_mfdataset('/g/data/ua8/NOAA_OISST/AVHRR/v2-1_modified/*_' + str(year) + '.nc',concat_dim="time", combine="nested", data_vars='minimal', coords='minimal', compat='override',parallel=True)

This makes some extra assumptions about concat variables etc. and makes the loading much quicker. It's described in more detail in the "Note" at https://xarray.pydata.org/en/stable/user-guide/io.html#reading-multi-file-datasets

I would have to differ to @angus-g or @aidanheerdegen as to whether these options are/should be implemented in the cookbook.

@adele-morrison
Copy link

decode_coords = False speeds it up a lot, as in this IcePlottingExample.

@MartinDix
Copy link
Author

Thanks Adele, decode_coords is what I'd been looking for.

@access-hive-bot
Copy link

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/issues-loading-access-om2-01-data-from-cycle-4/418/3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants