Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve netCDF4 compression #125

Merged
merged 9 commits into from
Sep 17, 2024
Merged

Improve netCDF4 compression #125

merged 9 commits into from
Sep 17, 2024

Conversation

brmather
Copy link
Collaborator

@brmather brmather commented Nov 7, 2023

  • Add option for quantisation by specifying significant digits
  • Replace default zlib compression for newer zstd compression
  • Increase compression level to 9

@brmather
Copy link
Collaborator Author

Some of these compression methods are not compatible with all versions of netCDF and some of the data quantisation handled by significant_digits keyword sometimes garbles the netCDF data. More investigation is required.

@brmather brmather self-assigned this Sep 11, 2024
@brmather
Copy link
Collaborator Author

brmather commented Sep 13, 2024

Turns out zlib is still the best way to ensure the .nc files can be uncompressed without errors.

significant_digits does work, and will preserve nan masks so long as there is at least two significant digits.

Some timings on a 3601 x 1801 grid:

  • complevel=4 = 5.6MB file in 386ms
  • complevel=6 = 5.1MB file in 410 ms
  • complevel=9 = 5.1MB file in 550ms
  • significant_digits=2, complevel=4 = 850KB file in 300ms
  • significant_digits=2, complevel=6 = 450KB file in 350ms
  • significant_digits=2, complevel=9 = 311KB file in 1s

A complevel of 6 or 7 seems to be the best tradeoff between compression speed and file size. Specifying significant_digits yields a huge reduction in file size for negligible speed penalty.

@brmather
Copy link
Collaborator Author

The seafloor age gridding workflow has now been modified to use the new compression options in write_netcdf_grid. The read_netcdf_grid could have more flexible reading of .nc files. Will address that later down the track.

@brmather brmather merged commit 982efe1 into master Sep 17, 2024
13 checks passed
@brmather brmather deleted the grid-compression branch September 17, 2024 01:29
@jcannon-gplates
Copy link
Contributor

Nice job with the compression! Not easy to compress floating-point numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants