Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Beam Opener PTransforms #375

Merged
merged 18 commits into from
Jun 13, 2022

Conversation

rabernat
Copy link
Contributor

@rabernat rabernat commented Jun 6, 2022

This PR tries to provide a template that we can use to efficiently add more and more beam PTransforms to pangeo_forge_recipes (see #376 for more context).

New public API members

It implements two PTransforms:

  • OpenURLWithFSSpec: - Open indexed string-based URLs with fsspec
  • OpenWithXarray: - Open indexed items with Xarray. Accepts either fsspec open-file-like objects
    or string URLs that can be passed directly to Xarray.

In addition, it exposes two functions in a new module called openers. The PTransforms are lightweight wrappers around these.

  • open_url - Open a string-based URL with fsspec.
  • open_with_xarray - Open item with Xarray.

All of these functions and transforms are fully type hinted, and the test pipelines are run with type checking.

Improved Testing

Along the way, I made some improvements to the test suit which I hope are creating a more solid foundation for moving forward, towards the goal of more flexible openers. Some of the changes I made are:

  1. make_netcdf_local_paths -> make_local_paths - this fixture is now parametrized to create NetCDF4, NetCDF3, and Zarr inputs. This gives us much better coverage of different possible inputs to our recipes. One downside is that conftest.py is now extremely large and quite hard to follow. We should revisit this once the refactor is farther along and perhaps try to cut some fat.
  2. New testing framework for PTransforms. I hope I have not made it too Byzantine. I am trying to avoid creating another rats nest of fixture as we did for XarrayZarrRecipe, but pytest just makes to too easy! 😅

Documentation

So far, I am just creating API docs. They can be previewed here: https://pangeo-forge--375.org.readthedocs.build/en/375/pangeo_forge_recipes/api_reference.html#processing-functions

Questions

  • Everyone - do these functions and transforms seem like a useful level of generality to expose as part of our public API?
  • @alxmrs - do these transforms seem to follow beam best practices?
  • @alxmrs - I would love to find a way to show the transformation type hints. I have followed the Beam type safety documentation, and I have verified via tests that runtime type checking is working. However, the sphinx documentation does not expose the transform-level type hints at all. Nor can I introspect the type hints manually, e.g.
    from pangeo_forge_recipes.transforms import OpenURLWithFSSpec
    pt = OpenURLWithFSSpec()
    pt.get_type_hints()
    # -> IOTypeHints[inputs=None, outputs=None]
    I would have expected something like OTypeHints[inputs=Tuple[Index, str], outputs=Tuple[Index, OpenFileType].
    Do you know of any way to discover these and inject them into the docs?

@rabernat rabernat changed the title Simplify beam transforms Restructure beam transforms Jun 8, 2022
@rabernat rabernat changed the title Restructure beam transforms Improved Beam Opener PTransforms Jun 8, 2022
@rabernat rabernat marked this pull request as ready for review June 8, 2022 14:30
@rabernat rabernat requested a review from cisaacstern June 8, 2022 14:30
Copy link
Member

@cisaacstern cisaacstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Ryan! Just one small question for you below re: use of thing as parameter name.

Of course this is one of those large restructures that's hard to review in too great of detail, but the structure does make a lot sense to me, and seems like a clear template for how to proceed. Thanks for digging so far into the Beam best practices, to establish this foundation on solid footing.

Also totally agree that conftest.py is due for a revisit. Your comment about pytest making Byzantine layers of combinations too easy made me laugh. So true! 😅



def open_with_xarray(
thing: Union[OpenFileType, str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like the generality of thing as a parameter name, I'm open to it, but my gut reaction is that this is a somewhat opaque name, and perhaps something more descriptive would make the code more self-documenting. Even, obj_to_open or the like?

@cisaacstern
Copy link
Member

Thanks! The new parameter name is much more intuitive, imo.

@rabernat
Copy link
Contributor Author

Merging this so we can move forward. Alex, there will be many more opportunities to review if you are game. 😉

@rabernat rabernat merged commit a433941 into pangeo-forge:beam-refactor Jun 13, 2022
@alxmrs
Copy link
Contributor

alxmrs commented Jun 13, 2022

I'll try to catch up when I can! I just got back from my vacation :)

@rabernat rabernat mentioned this pull request Aug 10, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants