Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2 / v3 compat] add Group.array and data kwarg to array creation #2042

Merged
merged 17 commits into from
Jul 26, 2024

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Jul 19, 2024

  • add a data keyword argument for array creation, matching the behavior from v2. data is npt.ArrayLike | None, if None (the default) an array is initialized as normal (i.e., only metadata is written). Otherwise, after creating array metadata, we will write the contents of data to the array.
  • add an array method to Group, matching behavior from v2. this is basically an alias for Group.create_array, and has a deprecation warning
  • adds a bunch of tests to the group tests.
  • fixes a bug in MemoryStore.list_dir that was revealed by the aforementioned tests
  • Adds contains_array and contains_group functions that report on whether a StorePath contains an array or group, respectively.
  • Adds errors.py, a top-level module that contains ContainsGroupError and ContainsArrayError , which are much like the identically named classes from v2, but these ones also report the path to the container root in the error message which is much better than the v2 behavior of only reporting the name of the array / group.
  • Allows users to leave the chunks unspecified, in which case a chunk size will be guessed using logic re-used from zarr-python 2.x

fixes #2028

cc @tomwhite

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@d-v-b d-v-b requested review from jhamman and normanrz July 19, 2024 16:21
@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 19, 2024

another change worth noting: in main we have some very thin array creation routines that just pass **kwargs down to another function. I filled out a complete function signature and docstring for these.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 19, 2024

docs are failing but I don't think that's on me.

@dstansby
Copy link
Contributor

I think the doc fails are on you. They are:

docstring of zarr.AsyncGroup.create_array:22: WARNING: Line block ends without a blank line.
docstring of zarr.Group.array:22: WARNING: Line block ends without a blank line.
docstring of zarr.Group.create_array:22: WARNING: Line block ends without a blank line.

At a guess, it might be because you've used | to separate types in the docstring, whereas it should be a , used to separate types.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 19, 2024

Are warnings failures?

@dstansby
Copy link
Contributor

Yes

@dstansby
Copy link
Contributor

(because warnings are caused by things that mess up formatting in the docs - sphinx can still generate docs if warnings are emitted, but they might not be formatted correctly, hence it is sensible to error the doc build on warnings)

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 19, 2024

the doc failures were due to breaking a long type annotation into multiple lines. it should be fixed now.

On the subect of docs, I don't think the sphinx error message was very clear, and I also don't think sphinx is doing a good job. See the below example. The function signature is very hard to read.

image

For a different project I'm using mkdocs-material, and I think the way it renders docstrings is much nicer.

Is there some way we can get sphinx to display API docs like this?

image

from zarr.sync import sync
from zarr.v2.util import guess_chunks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't import from zarr.v2 in v3 code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally I agree, but the explicit goal here is v2 compatibility. we could re-implement the exact same code in the v3 codebase, but I think that has greater drawbacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we decide to remove the zarr.v2 modules before releasing 3.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the other hand, it would be good to get this routine in v3 so that we can make improvements to it. So I'm +1 on removing the v2 import and porting the logic. I will make that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we decide to remove the zarr.v2 modules before releasing 3.0?

yes, I think we did. that's another good reason to port this routine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5abaab6 brings guess_chunks over to v3, and adds tests.

Copy link
Contributor

@normanrz normanrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. See my one comment on importing from zarr.v2.

Copy link
Contributor

@normanrz normanrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 22, 2024

this is not quite ready -- I realized that we are checking for pre-existing arrays, but not for pre-existing groups. fix + tests incoming.

@d-v-b d-v-b requested a review from normanrz July 22, 2024 18:38
@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 22, 2024

in main, when creating an array or group with exists_ok = False, we check for metadata documents and raise an AssertionError if the metadata documents are there. Notably, for v2, when creating an array, we only check for .zarray, which means that an existing .zgroup would be ignored, potentially resulting in an invalid hierarchy!

In this PR, I added checks for groups or arrays (or both, in the case of v2), and created specific exceptions for both. I tried to implement the array / group checks in a way that's somewhat respectful of IO budget, which means that the functions for checking for groups / arrays returns Literal["group", "array", "nothing"] instead of a simple boolean. Until python adds a trinary boolean type I think this will suffice, but I'm happy to revise it.

@d-v-b d-v-b mentioned this pull request Jul 24, 2024
@d-v-b d-v-b merged commit aef47ac into zarr-developers:v3 Jul 26, 2024
24 checks passed
@d-v-b d-v-b deleted the array_on_group branch July 26, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v3] Missing array method on groups
3 participants