Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving and reloading images directly on sdata within napari #310

Open
rtubelleza opened this issue Sep 6, 2024 · 3 comments
Open

Saving and reloading images directly on sdata within napari #310

rtubelleza opened this issue Sep 6, 2024 · 3 comments

Comments

@rtubelleza
Copy link

rtubelleza commented Sep 6, 2024

Hello,

Following the agenda from the 2024/09/05 SpatialData community meeting, below is a code example to reproduce the process of:

  1. Loading an SpatialData object into napari_spatialdata
  2. Performing image operations directly on the image (i.e. creating a binary mask -> Labels2DModel)
  3. Saving and writing the output image directly back onto the SpatialData object
  4. Refreshing the elements widget (in future to be done via public methods/API?)
# example image
import numpy as np
from spatialdata import SpatialData
from spatialdata.models import Image2DModel, Labels2DModel
from napari_spatialdata import Interactive
import tempfile
from pathlib import Path
import os

# example test SpatialData object containing only an image
temp_dir = tempfile.TemporaryDirectory()
shape = (5, 100, 100)
image_name = "test_image"
image_np = np.random.randint(0, 255, shape, dtype=np.uint8)
image_model = Image2DModel.parse(
    image_np,
    dims=("c", "y", "x"),
    c_coords=[f"chan{x}" for x in range(shape[0])],
    scale_factors=[2, 2, 2] # mimick pyramidal/multiscale
    )

sdata = SpatialData(images={image_name: image_model})
temp_dir_zarr = os.path.join(Path(temp_dir.name), Path(f"{image_name}.zarr"))
sdata.write(temp_dir_zarr) # Write to disk

# Launch napari-spatialdata
interactive = Interactive(sdata)

# Below mimicks the user selecting the global coord, and adding the image as a layer
viewer = interactive._viewer
sd_widget = viewer.window._dock_widgets["SpatialData"].widget()
sd_widget.coordinate_system_widget.setCurrentRow(0) #  User selects the global coord
image_element = sd_widget.elements_widget.item(0) # Get the 'test_image'
sd_widget.elements_widget.itemDoubleClicked.emit(image_element) # User double clicks the image, adds to layers

# Below mimicks an external plugin that operates on the contained sdata object(s)
current_channel = interactive._viewer.dims.current_step[0] # Gets the user selected channel (using the var widget)
selected_layer = viewer.layers.selection.active # Gets the user selected image layer
ext_ref_sdata = selected_layer.metadata["sdata"]  # Gets the sdata metadata from the image layer

# Some skimage funcs arent compatible with DataArrays, so these need to be passed the base numpy/dask arrays 
working_image_scale0 = ext_ref_sdata.images["test_image"]["scale0"].isel(c=current_channel).image.data
working_image = working_image_scale0 - 255 # example image op 1
working_image_binary = working_image < 50 # example image op 2
label_name = "test_label"
labels = Labels2DModel.parse(working_image_binary, dims=("y", "x"))

# 'in-place' write operations
if ext_ref_sdata.is_backed:
    # If same label_name exists, 'overwrite' it by deleting the object from disk if it exists
    if label_name in ext_ref_sdata.labels:
        del ext_ref_sdata.labels[label_name]
        ext_ref_sdata.delete_element_from_disk(label_name)
    ext_ref_sdata.labels[label_name] = labels
    ext_ref_sdata.write_element(label_name)

# Refresh widgets to show updated sdata with new elements. 
# NOTE: comment this line to see the effect of above not being updated in the viewer
sd_widget.elements_widget._onItemChange(selected_layer.metadata["_current_cs"])

# Cleanup
temp_dir.cleanup()
@LucaMarconato
Copy link
Member

LucaMarconato commented Sep 8, 2024

Thanks for the code example. I opened an issue to track the public API for refreshing the elements list widget: #312.

@aeisenbarth
Copy link
Contributor

aeisenbarth commented Sep 10, 2024

While the napari-spatialdata GUI is not notified about changes to the elements list and does not refresh the widget, a similar problem also occurs with SpatialData itself. The in-memory datastructure of SpatialData does not observe external changes on disk, and thus becomes outdated.

In this Napari use case, napari-spatialdata holds a SpatialData datastructure in memory in the layer's metadata["sdata"]. After you write your image to disk, sdata's images dictionary contains the same keys as before because it is not backed on disk, and misses the new image key.

In contrast to that, spatial elements that are dask arrays (an image itself) can be backed on disk and reflect changes on disk when the next read happens, since their image data is not fully stored on disk. In Napari you may have to zoom in/out to refresh the loaded chunks in the view.

A similar use case is when doing parallel processing on elements of the same SpatialData store. A process sees the dictionary (and contained elements) as it was initially, but doesn't see changes added by other process that finished earlier. This becomes a problem when the process writes changes to disk that overwrite changes from other processes. Typically, elements are separate and can be written without conflict, but metadata .zgroup files and tables do cause access conflicts. My (hacky) work-around to refresh elements in the same SpatialData object can be found in the _refresh_spatialdata function here: scverse/spatialdata#412

Possible solutions:

  • SpatialData.refresh method: This is the simplest and clearest, and must be invoked explicitly when you think you need to refresh.
  • disk-backed dictionary: I don't know of any implementation, and it may have a huge performance impact.
  • file observers: These are OS-specific and may have limitations (e.g. Linux has inotify and fanotify, and the older one has a max number of files it can observe). Most of all, observing files for changes does not work for network access (S3…) which is an essential part of Zarr.

@LucaMarconato
Copy link
Member

Thanks for the comment @aeisenbarth. From what I understand here refreshing the SpatialData object would not be needed because this line ext_ref_sdata.labels[label_name] = labels adds the labels to the original SpatialData object, so I would move the discussion in a separate issue. Sounds good to you?

Meanwhile a quick answer. SpatialData.refresh() sound like a good approach to me, but I think it would be identical to SpatialData.read_zarr(). Or do you think about a different behavior? (Please let's continue the discussion in a separate issue or spatialdata-dev thread in Zulip)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants