Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binder #6

Merged
merged 13 commits into from
Feb 25, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .binder/dask_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
distributed:
version: 2

dashboard:
link: /user/{JUPYTERHUB_USER}/proxy/{port}/status

scheduler:
idle-timeout: 3600s

admin:
tick:
limit: 5s

logging:
distributed: warning
bokeh: critical
tornado: critical
tornado.application: error

labextension:
factory:
module: distributed
class: LocalCluster
args: []
kwargs: {}
20 changes: 20 additions & 0 deletions .binder/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: xpublish
channels:
- conda-forge
dependencies:
- python=3
- xarray
- netcdf4
- zarr
- numcodecs
- fastapi
- uvicorn
- fsspec
- dask
- distributed
- dask-labextension
- jupyter-server-proxy
- toolz
- bokeh
- ipytree
- pip
16 changes: 16 additions & 0 deletions .binder/postBuild
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash

set -euo pipefail

pip install -e .

# labextensions
jupyter labextension install --clean dask-labextension \
@jupyter-widgets/jupyterlab-manager \
ipytree

# dask config
# ${KERNEL_PYTHON_PREFIX} is set by repo2docker to sys.prefix
# of the python that the kernel is run in.
mkdir -p ${KERNEL_PYTHON_PREFIX}/etc/dask
cp .binder/dask_config.yaml ${KERNEL_PYTHON_PREFIX}/etc/dask/dask.yaml
5 changes: 5 additions & 0 deletions .binder/start
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash

python .binder/test.py > logfile.txt 2>&1 &

exec "$@"
6 changes: 6 additions & 0 deletions test.py → .binder/test.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
import xarray as xr
from dask.distributed import Client

import xpublish # noqa: F401

if __name__ == "__main__":

client = Client(n_workers=4, dashboard_address=8787)
print(client.cluster)
print(client.cluster.dashboard_link)

ds = xr.tutorial.open_dataset("air_temperature", chunks=dict(lat=5, lon=5), decode_cf=False)
print(ds)

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[![GitHub Workflow Status](https://img.shields.io/github/workflow/status/jhamman/xpublish/CI?logo=github)](https://github.com/jhamman/xpublish/actions?query=workflow%3ACI)
[![Documentation Status](https://readthedocs.org/projects/xpublish/badge/?version=latest)](https://xpublish.readthedocs.io/en/latest/?badge=latest)

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/jhamman/xpublish/master)

# xpublish

Expand Down
4 changes: 3 additions & 1 deletion dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
fsspec
netcdf4
pytest
pytest-sugar
pytest-cov
netcdf4
requests
-r requirements.txt
7 changes: 6 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,12 @@
xpublish
========

Xpublish lets you publish Xarray datasets via a Zarr-compatible REST API.
**Xpublish lets you publish Xarray datasets via a Zarr-compatible REST API.**

*You can run a short example application in a live session here:* |Binder|

.. |Binder| image:: https://mybinder.org/badge_logo.svg
:target: https://mybinder.org/v2/gh/jhamman/xpublish/master

On the server-side, datasets are published using a simple Xarray accessor:

Expand Down
1 change: 1 addition & 0 deletions docs/source/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ REST API
* ``/keys``: returns a list of variable keys, equivalent to ``list(ds.variables)``.
* ``/info``: returns a concise summary of a Dataset variables and attributes, equivalent to ``ds.info()``.
* ``/dict``: returns a json dictionary of the full dataset. Accpets the ``?data={value}`` parameter to specify if the return dictionary should include the data in addition to the dataset schema.
* ``/versions``: returns a plain text summary of the versions of xarray and related libraries on the server side, equivalent to ``xr.show_versions()``.

Zarr API
~~~~~~~~
Expand Down
205 changes: 205 additions & 0 deletions examples/open_dataset.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from json import loads\n",
"\n",
"import numpy as np\n",
"import xarray as xr\n",
"import zarr\n",
"from dask.distributed import Client\n",
"from fsspec.implementations.http import HTTPFileSystem\n",
"from xarray.testing import assert_chunks_equal, assert_equal, assert_identical\n",
"\n",
"import xpublish"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Let's check to make sure our server started alright\n",
"!head logfile.txt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Start a dask cluster for use on the client side\n",
"client = Client(n_workers=4, dashboard_address=43757)\n",
"client"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can now open three more browser tabs/windows:\n",
"\n",
"_Note that you will have to modify the url prefix slightly, to do this, just copy the first part of your browser's URL_\n",
"\n",
"1. Xpublish Web App: e.g. https://hub.gke.mybinder.org/user/jhamman-xpublish-gbbqbxfi/proxy/9000\n",
"2. Xpublish's Dask Cluster Dashboard: e.g. https://hub.gke.mybinder.org/user/jhamman-xpublish-gbbqbxfi/proxy/8787/status\n",
"3. This Notebook's Dask Cluster Dashboard: e.g. https://hub.gke.mybinder.org/user/jhamman-xpublish-gbbqbxfi/proxy/43757/status\n",
"\n",
"_Also note that this port numbers may change. The server side ports are available in `logfile.txt` (see above) and the client-side port is in the cell above._"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# We can access our API using fsspec's HTTPFileSystem\n",
"fs = HTTPFileSystem()\n",
"\n",
"# The http mapper gives us a dict-like interface to the API\n",
"http_map = fs.get_mapper(\"http://0.0.0.0:9000\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# We can access API enpoints by key now...\n",
"for key in [\".zmetadata\", \"keys\"]:\n",
" print(key, http_map[key], \"\\n\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The .zmetadata key returns the json dictionary of consolidated zarr metadata\n",
"# We can load/decode that and access one array's attributes\n",
"d = loads(http_map[\".zmetadata\"])\n",
"d[\"metadata\"][\"air/.zattrs\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# We can pass that mapper object directly to zarr's open_consolidated function\n",
"# This returns a zarr groups\n",
"zg = zarr.open_consolidated(http_map, mode=\"r\")\n",
"zg.tree()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# And we can do the same with xarray's open_zarr function\n",
"ds = xr.open_zarr(http_map, consolidated=True)\n",
"ds"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The rest of this notebook applies some simple tests to show that the served dataset is indentical to the\n",
"# \"air_temperature\" dataset in xarray's tutorial dataset.\n",
"ds_tutorial = xr.tutorial.open_dataset(\n",
" \"air_temperature\", chunks=dict(lat=5, lon=5), decode_cf=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds_tutorial.air.attrs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def test(actual, expected, index):\n",
" \"\"\"a simple equality test with index as a parameter\"\"\"\n",
" assert np.array_equal(actual[index].values, expected[index].values)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# test a bunch of indexing patterns\n",
"for index in [\n",
" (0, 0, 0),\n",
" (slice(0, 4), 0, 0),\n",
" (slice(0, 4), slice(0, 4), 0),\n",
" (slice(0, 4), slice(0, 4), slice(0, 4)),\n",
" (slice(-4), slice(0, 4), slice(0, 4)),\n",
" (slice(None), slice(0, 4), slice(0, 4)),\n",
" (slice(None), slice(None), slice(0, 4)),\n",
" (slice(None), slice(None), slice(None)),\n",
"]:\n",
" print(index)\n",
" test(ds_tutorial[\"air\"], ds[\"air\"], index)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"assert_equal(ds, ds_tutorial)\n",
"assert_chunks_equal(ds, ds_tutorial)\n",
"assert_identical(ds, ds_tutorial)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading