Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cv_spatial - double intersection #40

Open
bfakos opened this issue Oct 10, 2023 · 3 comments
Open

cv_spatial - double intersection #40

bfakos opened this issue Oct 10, 2023 · 3 comments

Comments

@bfakos
Copy link

bfakos commented Oct 10, 2023

Dear Roozbeh,
I tried to run cv_spatial(x), where nrow(x) == 37000000, and it does not finish within a day, even if iteration == 1 or selection == "systematic". I wanted to understand the source code, find the bottleneck, and modify the code in a way it run parallel in multiple cores, when I found this issue. The code of cv_spatial() calls sf::st_intersects() twice, even though st_intersects() is a really time-consuming function (and can be easily parallelized).

L254:  sub_blocks <- blocks[x, ]

Here, you run "[.sf", which call st_intersects() and create a logical mask from the results (please refer to L325 in sf.R in package "sf"), but drop the results of the predicate function

L277: sf::st_intersects(sf::st_geometry(x), sf::st_geometry(sub_blocks))

You call again the time-consuming st_intersects(), with almost the same inputs.
I recommend calling st_intersects() once, then calculating the logical mask, and finally subsetting the blocks. E.g. something like this:

blocks_intersection <- sf::st_intersects(sf::st_geometry(x), sf::st_geometry(sub_blocks))
mask <- lengths(blocks_intersection) != 0 
sub_blocks <- blocks[mask, ]
blocks_len <- nrow(sub_blocks)
blocks_df <- as.data.frame(blocks_intersection)

Also I suggest parallelizing st_intersects() that could make cv_spatial() much faster. Please let me know if I could collaborate in this development.
Have a nice week,
Ákos

@bfakos
Copy link
Author

bfakos commented Oct 10, 2023

Sorry, there were some mistakes, now corrected:

blocks_intersection <- sf::st_intersects(sf::st_geometry(x), sf::st_geometry(blocks))
mask <- lengths(blocks_intersection) != 0 
sub_blocks <- blocks[mask, ]
blocks_len <- nrow(sub_blocks)
blocks_df <- as.data.frame(blocks_intersection[mask])

@rvalavi
Copy link
Owner

rvalavi commented Oct 11, 2023

Thanks for the report @bfakos
I'll check it and get back to you soon

@bfakos
Copy link
Author

bfakos commented Oct 11, 2023

In the meanwhile, I realized that the two st_intersects() has different direction (i.e., x and y are swapped), but sgbp objects can be transposed by t(), which is really fast (implemented in C).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants