Skip to content

Commit

Permalink
read_* functions: make limit parameter accept regex pattern or slice
Browse files Browse the repository at this point in the history
  • Loading branch information
schlegelp committed Sep 18, 2024
1 parent 14218c1 commit b187de2
Show file tree
Hide file tree
Showing 7 changed files with 114 additions and 48 deletions.
13 changes: 8 additions & 5 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,17 @@ more consistent and easier to use.
- New function: [`navis.graph.skeleton_adjacency_matrix`][] computes the node adjacency for skeletons
- New function: [`navis.graph.simplify_graph`][] simplifies skeleton graphs to only root, branch and leaf nodes while preserving branch length (i.e. weights)
- New [`NeuronList`][navis.NeuronList] method: [`get_neuron_attributes`][navis.NeuronList.get_neuron_attributes] is analagous to `dict.get`
- [`NeuronLists`][navis.NeuronList] now implemented the `|` (`__or__`) operator which can be used to get the union of two [`NeuronLists`][navis.NeuronList]
- [`NeuronLists`][navis.NeuronList] now implement the `|` (`__or__`) operator which can be used to get the union of two [`NeuronLists`][navis.NeuronList]
- [`navis.Volume`][] now have an (optional) `.units` property similar to neurons

##### Improvements
- Plotting:
- [`navis.plot3d`][]:
- `legendgroup` parameter (plotly backend) now also sets the legend group's title
- new parameters for the plotly backend:
- `legend` (default `True`): determines whether legends is shown
- `legend_orientation` (default `v`): determines whether legend is aranged vertically (`v`) or horizontally (`h`)
- `linestyle` (default `-`): determines line style for skeletons
- `legend` (default `True`): determines whether legends is shown
- `legend_orientation` (default `v`): determines whether legend is aranged vertically (`v`) or horizontally (`h`)
- `linestyle` (default `-`): determines line style for skeletons
- default for `radius` is now `"auto"`
- [`navis.plot2d`][]:
- the `view` parameter now also works with `methods` `3d` and `3d_complex`
Expand All @@ -55,13 +55,16 @@ more consistent and easier to use.
- new parameters for methods `3d` and `3d_complex`: `mesh_shade=False` and `non_view_axes3d`
- the `scalebar` parameter can now be a dictionary used to style (color, width, etc) the scalebar
- the `connectors` parameter can now be used to show specific connector types (e.g. `connectors="pre"`)
- I/O:
- `read_*` functions are now able to read from FTP servers (`ftp://...`)
- the `limit` parameter used in many `read_*` functions can now also be a regex pattern or a `slice`
- General improvements to docs and tutorials

##### Fixes
- Memory usage of `Neuron/Lists` is now correctly re-calculated when the neuron is modified
- Various fixes and improvements for the MICrONS interface (`navis.interfaces.microns`)
- [`navis.graph.node_label_sorting`][] now correctly prioritizes total branch length
- [`navis.TreeNeuron.simple][] now correctly drops soma nodes if they aren't root, branch or leaf points themselves
- [`navis.TreeNeuron.simple`][] now correctly drops soma nodes if they aren't root, branch or leaf points themselves

## Version `1.7.0` { data-toc-label="1.7.0" }
_Date: 25/07/24_
Expand Down
55 changes: 41 additions & 14 deletions navis/io/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,9 @@

DEFAULT_INCLUDE_SUBDIRS = False

# Regular expression to figure out if a string is a regex pattern
rgx = re.compile(r'[\\\.\?\[\]\+\^\$\*]')


def merge_dicts(*dicts: Optional[Dict], **kwargs) -> Dict:
"""Merge dicts and kwargs left to right.
Expand Down Expand Up @@ -541,7 +544,7 @@ def read_ftp(
) -> "core.NeuronList":
"""Read files from an FTP server.
This is a dispatcher for `.read_from_tar`.
This is a dispatcher for `.read_from_ftp`.
Parameters
----------
Expand Down Expand Up @@ -613,6 +616,8 @@ def read_from_ftp(
core.NeuronList
"""
# When reading in parallel, we expect there to be a global FTP connection
# that was initialized once for each worker process.
if ftp == "GLOBAL":
if "_FTP" not in globals():
raise ValueError("No global FTP connection found.")
Expand Down Expand Up @@ -668,8 +673,18 @@ def read_directory(
"""
files = list(self.files_in_dir(Path(path), include_subdirs))

if limit:
if isinstance(limit, int):
files = files[:limit]
elif isinstance(limit, list):
files = [f for f in files if f in limit]
elif isinstance(limit, slice):
files = files[limit]
elif isinstance(limit, str):
# Check if limit is a regex
if rgx.search(limit):
files = [f for f in files if re.search(limit, str(f.name))]
else:
files = [f for f in files if limit in str(f)]

read_fn = partial(self.read_file_path, attrs=attrs)
neurons = parallel_read(read_fn, files, parallel)
Expand Down Expand Up @@ -1123,6 +1138,14 @@ def parallel_read_archive(

if isinstance(limit, list):
to_read = [f for f in to_read if f in limit]
elif isinstance(limit, slice):
to_read = to_read[limit]
elif isinstance(limit, str):
# Check if limit is a regex
if rgx.search(limit):
to_read = [f for f in to_read if re.search(limit, f)]
else:
to_read = [f for f in to_read if limit in f]

prog = partial(
config.tqdm,
Expand Down Expand Up @@ -1159,7 +1182,6 @@ def parallel_read_ftp(
file_ext,
limit=None,
parallel="auto",
ignore_hidden=True,
) -> List["core.NeuronList"]:
"""Read neurons from an FTP server, potentially in parallel.
Expand All @@ -1185,13 +1207,6 @@ def parallel_read_ftp(
parallel : str | bool | int
"auto" or True for n_cores // 2, otherwise int for number of
jobs, or false for serial.
ignore_hidden : bool
Archives zipped on OSX can end up containing a
`__MACOSX` folder with files that mirror the name of other
files. For example if there is a `123456.swc` in the archive
you might also find a `__MACOSX/._123456.swc`. Reading the
latter will result in an error. If ignore_hidden=True
we will simply ignore all file that starts with "._".
Returns
-------
Expand Down Expand Up @@ -1245,11 +1260,21 @@ def parallel_read_ftp(
elif file_ext and fname.endswith(file_ext):
to_read.append(file)

if isinstance(limit, int) and len(to_read) >= limit:
break

if isinstance(limit, list):
if isinstance(limit, int):
to_read = to_read[:limit]
elif isinstance(limit, list):
to_read = [f for f in to_read if f in limit]
elif isinstance(limit, slice):
to_read = to_read[limit]
elif isinstance(limit, str):
# Check if limit is a regex
if rgx.search(limit):
to_read = [f for f in to_read if re.search(limit, f)]
else:
to_read = [f for f in to_read if limit in f]

if not to_read:
return []

prog = partial(
config.tqdm,
Expand All @@ -1269,6 +1294,8 @@ def parallel_read_ftp(
else:
n_cores = int(parallel)

# We can't send the FTP object to the process (because its socket is not pickleable)
# Instead, we need to initialize a new FTP connection in each process via a global variable
with mp.Pool(
processes=n_cores, initializer=_ftp_pool_init, initargs=(server, port, path)
) as pool:
Expand Down
17 changes: 12 additions & 5 deletions navis/io/mesh_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,18 @@ def read_mesh(f: Union[str, Iterable],
Determines function's output. See Returns.
errors : "raise" | "log" | "ignore"
If "log" or "ignore", errors will not be raised.
limit : int, optional
If reading from a folder you can use this parameter to
read only the first `limit` files. Useful when
wanting to get a sample from a large library of
meshes.
limit : int | str | slice | list, optional
When reading from a folder or archive you can use this parameter to
restrict the which files read:
- if an integer, will read only the first `limit` SWC files
(useful to get a sample from a large library of meshes)
- if a string, will interpret it as filename (regex) pattern
and only read files that match the pattern; e.g. `limit='.*_R.*'`
will only read files that contain `_R` in their filename
- if a slice (e.g. `slice(10, 20)`) will read only the files in
that range
- a list is expected to be a list of filenames to read from
the folder/archive
**kwargs
Keyword arguments passed to [`navis.MeshNeuron`][]
or [`navis.Volume`][]. You can use this to e.g.
Expand Down
17 changes: 12 additions & 5 deletions navis/io/nmx_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,11 +204,18 @@ def read_nmx(f: Union[str, pd.DataFrame, Iterable],
Precision for data. Defaults to 32 bit integers/floats.
If `None` will let pandas infer data types - this
typically leads to higher than necessary precision.
limit : int, optional
If reading from a folder you can use this parameter to
read only the first `limit` NMX files. Useful if
wanting to get a sample from a large library of
skeletons.
limit : int | str | slice | list, optional
When reading from a folder or archive you can use this parameter to
restrict the which files read:
- if an integer, will read only the first `limit` SWC files
(useful to get a sample from a large library of skeletons)
- if a string, will interpret it as filename (regex) pattern
and only read files that match the pattern; e.g. `limit='.*_R.*'`
will only read files that contain `_R` in their filename
- if a slice (e.g. `slice(10, 20)`) will read only the files in
that range
- a list is expected to be a list of filenames to read from
the folder/archive
**kwargs
Keyword arguments passed to the construction of
`navis.TreeNeuron`. You can use this to e.g. set
Expand Down
17 changes: 12 additions & 5 deletions navis/io/precomputed_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,11 +252,18 @@ def read_precomputed(f: Union[str, io.BytesIO],
- `False` = do not use/look for `info` file
- `str` = filepath to `info` file
- `dict` = already parsed info file
limit : int, optional
If reading from a folder you can use this parameter to
read only the first `limit` files. Useful if
wanting to get a sample from a large library of
skeletons/meshes.
limit : int | str | slice | list, optional
When reading from a folder or archive you can use this parameter to
restrict the which files read:
- if an integer, will read only the first `limit` SWC files
(useful to get a sample from a large library of neurons)
- if a string, will interpret it as filename (regex) pattern
and only read files that match the pattern; e.g. `limit='.*_R.*'`
will only read files that contain `_R` in their filename
- if a slice (e.g. `slice(10, 20)`) will read only the files in
that range
- a list is expected to be a list of filenames to read from
the folder/archive
parallel : "auto" | bool | int
Defaults to `auto` which means only use parallel
processing if more than 200 files are imported. Spawning
Expand Down
41 changes: 27 additions & 14 deletions navis/io/swc_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,14 +273,20 @@ def read_swc(f: Union[str, pd.DataFrame, Iterable],
Parameters
----------
f : str | pandas.DataFrame | iterable
Filename, folder, SWC string, URL or DataFrame.
If folder, will import all `.swc` files. If a
`.zip`, `.tar` or `.tar.gz` file will read all
SWC files in the file. See also `limit` parameter.
f : str | pandas.DataFrame | list thereof
Filename, folder, SWC string, URL or DataFrame:
- if folder, will import all `.swc` files
- if a `.zip`, `.tar` or `.tar.gz` archive will read all
SWC files from the file
- if a URL (http:// or https://), will download the
file and import it
- FTP address (ftp://) can point to a folder or a single
file
- DataFrames are interpreted as a SWC tables
See also `limit` parameter to read only a subset of files.
connector_labels : dict, optional
If provided will extract connectors from SWC.
Dictionary must map type to label:
Dictionary must map types to labels:
`{'presynapse': 7, 'postsynapse': 8}`
include_subdirs : bool, optional
If True and `f` is a folder, will also search
Expand All @@ -293,7 +299,7 @@ def read_swc(f: Union[str, pd.DataFrame, Iterable],
and joining processes causes overhead and is
considerably slower for imports of small numbers of
neurons. Integer will be interpreted as the
number of cores (otherwise defaults to
number of processes to use (defaults to
`os.cpu_count() // 2`).
precision : int [8, 16, 32, 64] | None
Precision for data. Defaults to 32 bit integers/floats.
Expand Down Expand Up @@ -325,16 +331,23 @@ def read_swc(f: Union[str, pd.DataFrame, Iterable],
read_meta : bool
If True and SWC header contains a line with JSON-encoded
meta data e.g. (`# Meta: {'id': 123}`), these data
will be read as neuron properties. `fmt` takes
will be read as neuron properties. `fmt` still takes
precedence. Will try to assign meta data directly as
neuron attribute (e.g. `neuron.id`). Failing that
(can happen for properties intrinsic to `TreeNeurons`),
will add a `.meta` dictionary to the neuron.
limit : int, optional
If reading from a folder you can use this parameter to
read only the first `limit` SWC files. Useful if
wanting to get a sample from a large library of
skeletons.
limit : int | str | slice | list, optional
When reading from a folder or archive you can use this parameter to
restrict the which files read:
- if an integer, will read only the first `limit` SWC files
(useful to get a sample from a large library of skeletons)
- if a string, will interpret it as filename (regex) pattern
and only read files that match the pattern; e.g. `limit='.*_R.*'`
will only read files that contain `_R` in their filename
- if a slice (e.g. `slice(10, 20)`) will read only the files in
that range
- a list is expected to be a list of filenames to read from
the folder/archive
**kwargs
Keyword arguments passed to the construction of
`navis.TreeNeuron`. You can use this to e.g. set
Expand Down Expand Up @@ -368,7 +381,7 @@ def read_swc(f: Union[str, pd.DataFrame, Iterable],
>>> s = navis.read_swc('skeletons.zip') # doctest: +SKIP
Sample first 100 SWC files in a zip archive:
Sample the first 100 SWC files in a zip archive:
>>> s = navis.read_swc('skeletons.zip', limit=100) # doctest: +SKIP
Expand Down
2 changes: 2 additions & 0 deletions navis/utils/misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,8 @@ def is_url(x: str) -> bool:
False
>>> is_url('http://www.google.com')
True
>>> is_url("ftp://download.ft-server.org:8000")
True
"""
parsed = urllib.parse.urlparse(x)
Expand Down

0 comments on commit b187de2

Please sign in to comment.