-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Beam Opener PTransforms #375
Merged
rabernat
merged 18 commits into
pangeo-forge:beam-refactor
from
rabernat:simplify-beam-transforms
Jun 13, 2022
Merged
Changes from 17 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
2b7bbe7
refactored OpenWithXarray
rabernat 9fbfdfb
pre commit
rabernat 83dade7
tighten up type hints; beam is happy but mypy is not
rabernat 9f6b4d6
finally satisfy mypy and beam type checks
rabernat 019d4f7
remove wrong type hint
rabernat 0f24c1c
cleaned up testing
rabernat 0f649ec
add test_open_url
rabernat e10f80b
tweak testing
rabernat 22ce8d2
test coverage for openers.py
rabernat 55729b4
added zarr fixture
rabernat 7bbbd97
add netcdf3 fixture
rabernat d0382f0
comprehensive testing
rabernat 4cc71bb
clean up type hints and docstring
rabernat a44eefe
rename make_netcdf_local_paths -> make_local_paths
rabernat 1ed1a79
use pipeline fixture
rabernat 829fab5
rename section in API docs
rabernat ee985ac
add copy_to_local option
rabernat 7f321be
rename thing
rabernat File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
"""Standalone functions for opening sources as Dataset objects.""" | ||
|
||
import io | ||
import tempfile | ||
import warnings | ||
from typing import Dict, Optional, Union | ||
|
||
import xarray as xr | ||
|
||
from .patterns import FileType | ||
from .storage import CacheFSSpecTarget, OpenFileType, _copy_btw_filesystems, _get_opener | ||
|
||
|
||
def open_url( | ||
url: str, | ||
cache: Optional[CacheFSSpecTarget] = None, | ||
secrets: Optional[Dict] = None, | ||
open_kwargs: Optional[Dict] = None, | ||
) -> OpenFileType: | ||
"""Open a string-based URL with fsspec. | ||
|
||
:param url: The URL to be parsed by fsspec. | ||
:param cache: If provided, data will be cached in the object before opening. | ||
:param secrets: If provided these secrets will be injected into the URL as a query string. | ||
:param open_kwargs: Extra arguments passed to fsspec.open. | ||
""" | ||
|
||
kw = open_kwargs or {} | ||
if cache is not None: | ||
# this has side effects | ||
cache.cache_file(url, secrets, **kw) | ||
open_file = cache.open_file(url, mode="rb") | ||
else: | ||
open_file = _get_opener(url, secrets, **kw) | ||
return open_file | ||
|
||
|
||
OPENER_MAP = { | ||
FileType.netcdf3: dict(engine="scipy"), | ||
FileType.netcdf4: dict(engine="h5netcdf"), | ||
FileType.zarr: dict(engine="zarr"), | ||
} | ||
|
||
|
||
def _set_engine(file_type, xr_open_kwargs): | ||
kw = xr_open_kwargs.copy() | ||
if "engine" in kw: | ||
engine_message_base = ( | ||
"pangeo-forge-recipes will automatically set the xarray backend for " | ||
f"files of type '{file_type.value}' to '{OPENER_MAP[file_type]}', " | ||
) | ||
warn_matching_msg = engine_message_base + ( | ||
"which is the same value you have passed via `xarray_open_kwargs`. " | ||
f"If this input file is actually of type '{file_type.value}', you can " | ||
f"remove `{{'engine': '{kw['engine']}'}}` from `xarray_open_kwargs`. " | ||
) | ||
error_mismatched_msg = engine_message_base + ( | ||
f"which is different from the value you have passed via " | ||
"`xarray_open_kwargs`. If this input file is actually of type " | ||
f"'{file_type.value}', please remove `{{'engine': '{kw['engine']}'}}` " | ||
"from `xarray_open_kwargs`. " | ||
) | ||
engine_message_tail = ( | ||
f"If this input file is not of type '{file_type.value}', please update" | ||
" this recipe by passing a different value to `FilePattern.file_type`." | ||
) | ||
warn_matching_msg += engine_message_tail | ||
error_mismatched_msg += engine_message_tail | ||
|
||
if kw["engine"] == OPENER_MAP[file_type]["engine"]: | ||
warnings.warn(warn_matching_msg) | ||
elif kw["engine"] != OPENER_MAP[file_type]["engine"]: | ||
raise ValueError(error_mismatched_msg) | ||
else: | ||
kw.update(OPENER_MAP[file_type]) | ||
return kw | ||
|
||
|
||
def open_with_xarray( | ||
thing: Union[OpenFileType, str], | ||
file_type: FileType = FileType.unknown, | ||
load: bool = False, | ||
copy_to_local=False, | ||
xarray_open_kwargs: Optional[Dict] = None, | ||
) -> xr.Dataset: | ||
"""Open item with Xarray. Accepts either fsspec open-file-like objects | ||
or string URLs that can be passed directly to Xarray. | ||
|
||
:param thing: The thing to be opened. | ||
:param file_type: Provide this if you know what type of file it is. | ||
:param load: Whether to eagerly load the data into memory ofter opening. | ||
:param copy_to_local: Whether to copy the file-like-object to a local path | ||
and pass the path to Xarray. Required for some file types (e.g. Grib). | ||
Can only be used with file-like-objects, not URLs. | ||
:xarray_open_kwargs: Extra arguments to pass to Xarray's open function. | ||
""" | ||
# TODO: check file type matrix | ||
|
||
kw = xarray_open_kwargs or {} | ||
kw = _set_engine(file_type, kw) | ||
|
||
if copy_to_local: | ||
if file_type in [FileType.zarr or FileType.opendap]: | ||
raise ValueError(f"File type {file_type} can't be copied to a local file.") | ||
if isinstance(thing, str): | ||
raise ValueError( | ||
"Won't copy string URLs to local files. Please call ``open_url`` first." | ||
) | ||
ntf = tempfile.NamedTemporaryFile() | ||
tmp_name = ntf.name | ||
target_opener = open(tmp_name, mode="wb") | ||
_copy_btw_filesystems(thing, target_opener) | ||
thing = tmp_name | ||
|
||
if isinstance(thing, str): | ||
pass | ||
elif isinstance(thing, io.IOBase): | ||
# required to make mypy happy | ||
# LocalFileOpener is a subclass of io.IOBase | ||
pass | ||
elif hasattr(thing, "open"): | ||
# work around fsspec inconsistencies | ||
thing = thing.open() | ||
ds = xr.open_dataset(thing, **kw) | ||
if load: | ||
ds.load() | ||
|
||
if copy_to_local and not load: | ||
warnings.warn( | ||
"Input has been copied to a local file, but the Xarray dataset has not been loaded. " | ||
"The data may not be accessible from other hosts. Consider adding ``load=True``." | ||
) | ||
|
||
return ds |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like the generality of
thing
as a parameter name, I'm open to it, but my gut reaction is that this is a somewhat opaque name, and perhaps something more descriptive would make the code more self-documenting. Even,obj_to_open
or the like?