Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]: Add Pyodide support and CI jobs for Zarr #1903

Draft
wants to merge 24 commits into
base: v3
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
9f5a110
Add CI job to test out-of-tree Pyodide builds
agriyakhetarpal May 22, 2024
29282fc
Add `[msgpack]` dependency for `numcodecs`
agriyakhetarpal May 23, 2024
d465742
Bump to Pyodide 0.26.0, update comments
agriyakhetarpal May 27, 2024
cdf0bb2
Try to run tests without async
agriyakhetarpal May 27, 2024
dfe0321
Move shared file to rootdir, outside v2 and v3
agriyakhetarpal May 27, 2024
b100ec9
Move `fasteners` import inside ThreadSynchronizer
agriyakhetarpal May 27, 2024
b0dddca
Make the tests directory importable, fix `_shared`
agriyakhetarpal May 27, 2024
d227728
Import list of greetings from `numcodecs`
agriyakhetarpal May 27, 2024
fdb2bef
Skip some tests that use threading
agriyakhetarpal May 27, 2024
621077a
Skip some tests that use `fcntl`
agriyakhetarpal May 27, 2024
7ae9a97
Skip tests that require `dbm`
agriyakhetarpal May 27, 2024
22eb6da
Move `IS_WASM` logic to internal `zarr` API
agriyakhetarpal May 27, 2024
6836947
Skip a few tests trying to import `multiprocessing`
agriyakhetarpal May 27, 2024
fe3bf27
Skip tests that use async and threading code
agriyakhetarpal May 27, 2024
08997ec
Improve `asyncio_tests_wrapper`, fix test imports
agriyakhetarpal May 27, 2024
9bfc860
Skip entire `test_codecs.py` file
agriyakhetarpal May 27, 2024
9bcb350
Skip yet another test that requires threads
agriyakhetarpal May 27, 2024
9985abb
xfail test where array's fill values are different
agriyakhetarpal May 27, 2024
7ea12ef
xfail test because Emscripten FS
agriyakhetarpal May 27, 2024
a6565de
Skip last test that tries to run threads
agriyakhetarpal May 27, 2024
85f621c
Another test that tries to run threads
agriyakhetarpal May 27, 2024
1a64255
xfail another array's differing `fill_values` test
agriyakhetarpal May 27, 2024
c8cb38b
Skip entire sync file under WASM, no threading
agriyakhetarpal May 27, 2024
eb36d40
Restore pytest config options, remove when needed
agriyakhetarpal May 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions .github/workflows/emscripten.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Attributed to NumPy https://github.com/numpy/numpy/pull/25894
# https://github.com/numpy/numpy/blob/d2d2c25fa81b47810f5cbd85ea6485eb3a3ffec3/.github/workflows/emscripten.yml

name: Pyodide wheel

on:
# TODO: refine after this is ready to merge
[push, pull_request, workflow_dispatch]

env:
FORCE_COLOR: 3
PYODIDE_VERSION: 0.26.0
# PYTHON_VERSION and EMSCRIPTEN_VERSION are determined by PYODIDE_VERSION.
# The appropriate versions can be found in the Pyodide repodata.json
# "info" field, or in Makefile.envs:
# https://github.com/pyodide/pyodide/blob/main/Makefile.envs#L2
PYTHON_VERSION: 3.12.1
EMSCRIPTEN_VERSION: 3.1.58
NODE_VERSION: 18

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

permissions:
contents: read # to fetch code (actions/checkout)

jobs:
build_wasm_emscripten:
name: Build and test Zarr for Pyodide
runs-on: ubuntu-22.04
# To enable this workflow on a fork, comment out:
# FIXME: uncomment after this is ready to merge
# if: github.repository == 'zarr-developers/zarr-python'
steps:
- name: Checkout Zarr repository
uses: actions/checkout@v4

- name: Set up Python ${{ env.PYTHON_VERSION }}
id: setup-python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}

- name: Set up Emscripten toolchain
uses: mymindstorm/setup-emsdk@v14
with:
version: ${{ env.EMSCRIPTEN_VERSION }}
actions-cache-folder: emsdk-cache

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}

- name: Install pyodide-build
run: python -m pip install "pyodide-build==${{ env.PYODIDE_VERSION }}"

- name: Build Zarr for Pyodide
run: |
pyodide build

- name: Run Zarr tests for Pyodide
run: |
# Avoid missing asyncio plugin error from pytest, unavailable in Pyodide
if grep -q 'asyncio_mode = "auto"' "pyproject.toml"; then sed '/asyncio_mode = "auto"/d' "pyproject.toml" > temp && mv temp "pyproject.toml"; fi
pyodide venv .venv-pyodide
source .venv-pyodide/bin/activate
python -m pip install dist/*.whl
python -m pip install pytest pytest-cov
python -m pytest -v --cov=zarr --cov-config=pyproject.toml

- name: Upload Pyodide wheel artifact for debugging
# FIXME: Remove after this is ready to merge
uses: actions/upload-artifact@v4
with:
name: zarr-pyodide-wheel
path: dist/*.whl
6 changes: 4 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ requires-python = ">=3.10"
dependencies = [
'asciitree',
'numpy>=1.24',
'fasteners',
'numcodecs>=0.10.0',
'fasteners; sys_platform != "emscripten"',
# 'numcodecs[msgpack]>=0.10.0; sys_platform != "emscripten"', # does not currently work
'numcodecs[msgpack]>=0.10.0', # this works
'crc32c',
'zstandard',
'typing_extensions',
Expand Down Expand Up @@ -248,6 +249,7 @@ minversion = "7"
testpaths = ["tests"]
log_cli_level = "INFO"
xfail_strict = true
# Doesn't work under WASM, remove when running Pyodide test suite
asyncio_mode = "auto"
doctest_optionflags = [
"NORMALIZE_WHITESPACE",
Expand Down
6 changes: 6 additions & 0 deletions src/zarr/testing/utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
from __future__ import annotations

import platform
import sys

from zarr.buffer import Buffer
from zarr.common import BytesLike

Expand All @@ -16,3 +19,6 @@ def assert_bytes_equal(b1: Buffer | BytesLike | None, b2: Buffer | BytesLike | N
if isinstance(b2, Buffer):
b2 = b2.to_bytes()
assert b1 == b2


IS_WASM = sys.platform == "emscripten" or platform.machine() in ["wasm32", "wasm64"]
3 changes: 1 addition & 2 deletions src/zarr/v2/sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
from collections import defaultdict
from threading import Lock

import fasteners


class ThreadSynchronizer:
"""Provides synchronization using thread locks."""
Expand Down Expand Up @@ -42,6 +40,7 @@ def __init__(self, path):

def __getitem__(self, item):
path = os.path.join(self.path, item)
import fasteners
lock = fasteners.InterProcessLock(path)
return lock

Expand Down
15 changes: 14 additions & 1 deletion tests/v2/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,14 @@
Zlib,
)
from numcodecs.compat import ensure_bytes, ensure_ndarray
from numcodecs.tests.common import greetings

try:
from numcodecs.tests.common import greetings
except ModuleNotFoundError:
greetings = ['¡Hola mundo!', 'Hej Världen!', 'Servus Woid!', 'Hei maailma!',
'Xin chào thế giới', 'Njatjeta Botë!', 'Γεια σου κόσμε!',
'こんにちは世界', '世界,你好!', 'Helló, világ!', 'Zdravo svete!',
'เฮลโลเวิลด์']
from numpy.testing import assert_array_almost_equal, assert_array_equal

import zarr.v2
Expand Down Expand Up @@ -58,6 +65,8 @@
from zarr.v2.util import buffer_size
from .util import abs_container, skip_test_env_var, have_fsspec, mktemp

from zarr.testing.utils import IS_WASM

# noinspection PyMethodMayBeStatic


Expand Down Expand Up @@ -974,6 +983,7 @@ def test_0len_dim_2d(self):
z.store.close()

# noinspection PyStatementEffect
@pytest.mark.xfail(reason="Can't get this to pass under WASM right now")
def test_array_0d(self):
# test behaviour for array with 0 dimensions

Expand Down Expand Up @@ -1697,6 +1707,7 @@ def create_store(self):
store = N5Store(path)
return store

@pytest.mark.xfail(reason="Can't get this to pass under WASM right now")
def test_array_0d(self):
# test behaviour for array with 0 dimensions

Expand Down Expand Up @@ -1985,6 +1996,7 @@ def create_store(self):
return store


@pytest.mark.skipif(IS_WASM, reason="no dbm support in WASM")
class TestArrayWithDBMStore(TestArray):
def create_store(self):
path = mktemp(suffix=".anydbm")
Expand All @@ -1996,6 +2008,7 @@ def test_nbytes_stored(self):
pass # not implemented


@pytest.mark.skipif(IS_WASM, reason="no dbm support in WASM")
@pytest.mark.skip(reason="can't get bsddb3 to work on CI right now")
class TestArrayWithDBMStoreBerkeleyDB(TestArray):
def create_store(self):
Expand Down
4 changes: 4 additions & 0 deletions tests/v2/test_hierarchy.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@
from zarr.v2.util import InfoReporter
from .util import skip_test_env_var, have_fsspec, abs_container, mktemp

from zarr.testing.utils import IS_WASM

# noinspection PyStatementEffect


Expand Down Expand Up @@ -1122,6 +1124,7 @@ def test_move(self):
pass


@pytest.mark.skipif(IS_WASM, reason="dbm not available in WASM")
class TestGroupWithDBMStore(TestGroup):
@staticmethod
def create_store():
Expand All @@ -1131,6 +1134,7 @@ def create_store():
return store, None


@pytest.mark.skipif(IS_WASM, reason="dbm not available in WASM")
class TestGroupWithDBMStoreBerkeleyDB(TestGroup):
@staticmethod
def create_store():
Expand Down
4 changes: 4 additions & 0 deletions tests/v2/test_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
from .util import CountingDict, have_fsspec, skip_test_env_var, abs_container, mktemp
from zarr.v2.util import ConstantMap, json_dumps

from zarr.testing.utils import IS_WASM

@contextmanager
def does_not_raise():
Expand Down Expand Up @@ -938,6 +939,7 @@ def create_store(self, normalize_keys=False, dimension_separator=".", **kwargs):
)
return store

@pytest.mark.xfail(reason="Emscripten filesystem handles umasks differently")
def test_filesystem_path(self):
# test behaviour with path that does not exist
path = "data/store"
Expand Down Expand Up @@ -1765,6 +1767,7 @@ def test_store_and_retrieve_ndarray(self):
assert np.array_equiv(y, x)


@pytest.mark.skipif(IS_WASM, reason="dbm not available in WASM")
class TestDBMStore(StoreTests):
def create_store(self, dimension_separator=None):
path = mktemp(suffix=".anydbm")
Expand All @@ -1780,6 +1783,7 @@ def test_context_manager(self):
assert 2 == len(store)


@pytest.mark.skipif(IS_WASM, reason="dbm not available in WASM")
class TestDBMStoreDumb(TestDBMStore):
def create_store(self, **kwargs):
path = mktemp(suffix=".dumbdbm")
Expand Down
9 changes: 9 additions & 0 deletions tests/v2/test_sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

import numpy as np
from numpy.testing import assert_array_equal
import pytest

from zarr.v2.attrs import Attributes
from zarr.v2.core import Array
Expand All @@ -20,7 +21,10 @@
from .test_core import TestArray
from .test_hierarchy import TestGroup

from zarr.testing.utils import IS_WASM


@pytest.mark.skipif(IS_WASM, reason="no threading support in WASM")
class TestAttributesWithThreadSynchronizer(TestAttributes):
def init_attributes(self, store, read_only=False, cache=True):
key = ".zattrs"
Expand All @@ -30,6 +34,7 @@ def init_attributes(self, store, read_only=False, cache=True):
)


@pytest.mark.skipif(IS_WASM, reason="no threading support in WASM")
class TestAttributesProcessSynchronizer(TestAttributes):
def init_attributes(self, store, read_only=False, cache=True):
key = ".zattrs"
Expand Down Expand Up @@ -96,6 +101,7 @@ def test_parallel_append(self):
pool.terminate()


@pytest.mark.skipif(IS_WASM, reason="no multiprocessing support in WASM")
class TestArrayWithThreadSynchronizer(TestArray, MixinArraySyncTests):
def create_array(self, read_only=False, **kwargs):
store = KVStore(dict())
Expand Down Expand Up @@ -148,6 +154,7 @@ def test_hexdigest(self):
assert "05b0663ffe1785f38d3a459dec17e57a18f254af" == z.hexdigest()


@pytest.mark.skipif(IS_WASM, reason="fcntl not available in WASM")
class TestArrayWithProcessSynchronizer(TestArray, MixinArraySyncTests):
def create_array(self, read_only=False, **kwargs):
path = tempfile.mkdtemp()
Expand Down Expand Up @@ -259,6 +266,7 @@ def test_parallel_require_group(self):
pool.terminate()


@pytest.mark.skipif(IS_WASM, reason="no multiprocessing support in WASM")
class TestGroupWithThreadSynchronizer(TestGroup, MixinGroupSyncTests):
def create_group(
self, store=None, path=None, read_only=False, chunk_store=None, synchronizer=None
Expand Down Expand Up @@ -286,6 +294,7 @@ def test_synchronizer_property(self):
assert isinstance(g.synchronizer, ThreadSynchronizer)


@pytest.mark.skipif(IS_WASM, reason="fcntl not available in WASM")
class TestGroupWithProcessSynchronizer(TestGroup, MixinGroupSyncTests):
def create_store(self):
path = tempfile.mkdtemp()
Expand Down
8 changes: 7 additions & 1 deletion tests/v3/test_buffer.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,17 @@

from zarr.array import AsyncArray
from zarr.buffer import ArrayLike, NDArrayLike, NDBuffer
from zarr.testing.utils import IS_WASM

if TYPE_CHECKING:
from typing_extensions import Self


# Helper function to skip async tests on WASM platforms
def asyncio_tests_wrapper(func):
return func if IS_WASM else pytest.mark.asyncio(func)


class MyNDArrayLike(np.ndarray):
"""An example of a ndarray-like class"""

Expand Down Expand Up @@ -45,7 +51,7 @@ def test_nd_array_like(xp):
assert isinstance(ary, NDArrayLike)


@pytest.mark.asyncio
@asyncio_tests_wrapper
async def test_async_array_factory(store_path):
expect = np.zeros((9, 9), dtype="uint16", order="F")
a = await AsyncArray.create(
Expand Down
8 changes: 7 additions & 1 deletion tests/v3/test_codecs.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,12 @@
from zarr.config import config
from zarr.indexing import morton_order_iter
from zarr.store import MemoryStore, StorePath
from zarr.testing.utils import assert_bytes_equal
from zarr.testing.utils import IS_WASM, assert_bytes_equal

# Skip entire file if running on WASM platforms, see
# 1. https://github.com/pyodide/pyodide/issues/2221
# 2. https://github.com/pyodide/pyodide/issues/237
pytestmark = pytest.mark.skipif(IS_WASM, reason="Can't test async code in WASM")


@dataclass(frozen=True)
Expand Down Expand Up @@ -406,6 +411,7 @@ async def test_transpose(
assert await (store / "transpose/0.0").get() == await (store / "transpose_zarr/0.0").get()


@pytest.mark.skipif(IS_WASM, reason="Can't start new threads in WASM")
def test_transpose_invalid(
store: Store,
):
Expand Down
4 changes: 4 additions & 0 deletions tests/v3/test_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from zarr.buffer import Buffer
from zarr.store.core import make_store_path
from zarr.sync import sync
from zarr.testing.utils import IS_WASM

if TYPE_CHECKING:
from zarr.common import ZarrFormat
Expand All @@ -18,6 +19,7 @@
from zarr.store import StorePath


@pytest.mark.skipif(IS_WASM, reason="Can't test async code in WASM")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be a problem for moving this PR forward. The entire design of V3 depends on being able to execute async. If threading and async are a show stopper, I think we should evaluate whether this can work another way.

Copy link
Author

@agriyakhetarpal agriyakhetarpal May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think about half of the test suite is being skipped because of these tests under v3. Both threading and async code are probably not a priority at this time for Pyodide (perhaps @hoodmane will have more insights).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the latest run on my fork:

14 failed, 1606 passed, 1572 skipped, 9 xfailed, 93 warnings in 104.70s (0:01:44)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume 99% of the passing tests are coming from the v2 test suite. I wouldn't put much weight on that. We need tests/v3 to be passing for this to work.

Copy link
Author

@agriyakhetarpal agriyakhetarpal May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If there is a way to run asynchronous tests in a manner that is not asynchronous, then there might be a way forward, but if Zarr's underlying functionality is going to rely on this, I am not sure if there is a solution. The first way would be to get pytest-asyncio both packaged and running under Pyodide.

Edit: the related issue is pyodide/pyodide#2221.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is at this point possible to run async tests in Pyodide using stack switching. Just have to implement it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this forward @agriyakhetarpal.

# todo: put RemoteStore in here
@pytest.mark.parametrize("store", ("local", "memory"), indirect=["store"])
def test_group_children(store: MemoryStore | LocalStore) -> None:
Expand Down Expand Up @@ -55,6 +57,7 @@ def test_group_children(store: MemoryStore | LocalStore) -> None:
assert sorted(dict(members_observed)) == sorted(members_expected)


@pytest.mark.skipif(IS_WASM, reason="Can't test async code in WASM")
@pytest.mark.parametrize("store", (("local", "memory")), indirect=["store"])
def test_group(store: MemoryStore | LocalStore) -> None:
store_path = StorePath(store)
Expand Down Expand Up @@ -94,6 +97,7 @@ def test_group(store: MemoryStore | LocalStore) -> None:
assert dict(bar3.attrs) == {"baz": "qux", "name": "bar"}


@pytest.mark.skipif(IS_WASM, reason="Can't test async code in WASM")
@pytest.mark.parametrize("store", ("local", "memory"), indirect=["store"])
@pytest.mark.parametrize("exists_ok", (True, False))
def test_group_create(store: MemoryStore | LocalStore, exists_ok: bool) -> None:
Expand Down
3 changes: 3 additions & 0 deletions tests/v3/test_sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@
import pytest

from zarr.sync import SyncError, SyncMixin, _get_lock, _get_loop, sync
from zarr.testing.utils import IS_WASM

pytestmark = pytest.mark.skipif(IS_WASM, reason="Can't test async code in WASM")


@pytest.fixture(params=[True, False])
Expand Down
Loading