Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PreparedGeometry to speed up repeated Relate operations. #1197

Merged
merged 4 commits into from
Jul 26, 2024

Conversation

michaelkirk
Copy link
Member

@michaelkirk michaelkirk commented Jul 4, 2024

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Fixes #803

Much of the cost of the Relate operation comes from finding all
intersections between all segments of the two geometries.

To speed this up, the edges of each geometry are put into an R-Tree.

We also have to "self node" each geometry, meaning we need to split any
segments at the point that they'd intersect with themselves.

None of that work is new to this commit, but what is new is that we now
cache the self-noding work and the geometry's r-tree.

relate prepared polygons
                        time:   [49.036 ms 49.199 ms 49.373 ms]
relate unprepared polygons
                        time:   [842.91 ms 844.32 ms 845.82 ms]

@gauteh
Copy link

gauteh commented Jul 7, 2024

This looks very promising. I'm trying this out in: https://github.com/gauteh/roaring-landmask/blob/georust-prepared/src/shapes.rs#L22 , but I'm not able to share the PreparedGeometry'ies between threads (through pyclass). I've also put the Polygons of the MultiPolygon in an RTree (not sure if that is necessary anymore..?).

@urschrei
Copy link
Member

urschrei commented Jul 7, 2024

I've also put the Polygons of the MultiPolygon in an RTree (not sure if that is necessary anymore..?).

I don't think you need to do this…

Much of the cost of the `Relate` operation comes from finding all
intersections between all segments of the two geometries.

To speed this up, the edges of each geometry are put into an R-Tree.

We also have to "self node" each geometry, meaning we need to split any
segments at the point that they'd intersect with an edge from the other
geometry.

None of that work is new to this commit, but what is new is that we now
cache the self-noding work and the geometry's r-tree.

    relate prepared polygons
                            time:   [49.036 ms 49.199 ms 49.373 ms]
    relate unprepared polygons
                            time:   [842.91 ms 844.32 ms 845.82 ms]
@gauteh
Copy link

gauteh commented Jul 8, 2024

I've tried to change all the spots with RefCell (which can't just be put in an Arc). Eventually that will require a mut GeometryGraph which will prevent parallelism anyway. So far I've only found swap_labels that requires a mutable GeometryGraphy. Maybe the best solution is add another Label (https://github.com/georust/geo/pull/1197/files#diff-b64c206f2b566eef304be445dd47e93d22b59dd11b986e715b8152a367e862b1L22) for the other direction? Or some method on labels that swap them/reverse them only for the operation, but not changing the object. I assume the latter is not so good since they are already cached for a reason.

@michaelkirk
Copy link
Member Author

PreparedGeometry is not Send because GeometryGraph is not send. It seems like something worth addressing, but arguably out of scope for this PR.

@michaelkirk
Copy link
Member Author

I've tried to change all the spots with RefCell (which can't just be put in an Arc). Eventually that will require a mut GeometryGraph which will prevent parallelism anyway. So far I've only found swap_labels that requires a mutable GeometryGraphy. Maybe the best solution is add another Label (https://github.com/georust/geo/pull/1197/files#diff-b64c206f2b566eef304be445dd47e93d22b59dd11b986e715b8152a367e862b1L22) for the other direction? Or some method on labels that swap them/reverse them only for the operation, but not changing the object. I assume the latter is not so good since they are already cached for a reason.

Thanks for taking a look at this.

swap_labels only happens during construction - so there shouldn't be an issue here with mutating anything since no other thread could have a reference to the mutable thing we've just constructed. Right?

    pub fn clone_for_arg_index(&self, from_arg_index: usize, to_arg_index: usize) -> Self {
        let mut graph = Self {
            nodes: self.nodes.clone(),
            // deep copy edges
            edges: self
                .edges
                .iter()
                .map(|e| Rc::new(RefCell::new(e.borrow().clone())))
                .collect(),
        };
        assert_eq!(from_arg_index, 0);
        if from_arg_index != to_arg_index {
            graph.swap_labels(); // <-- mutates the graph, but nothing else could have a handle on `graph` yet.
        }
        graph
    }

@gauteh
Copy link

gauteh commented Jul 8, 2024

I am totally unfamiliar with the code, however: From the docs it seems that swap_labels is done depending on whether you do a.relate(b) or b.relate(a), which is not known in advance?

@gauteh
Copy link

gauteh commented Jul 8, 2024

On the other hand, it is weird that this works before this PR? Maybe it is not related.

@michaelkirk
Copy link
Member Author

michaelkirk commented Jul 8, 2024

From the docs it seems that swap_labels is done depending on whether you do a.relate(b) or b.relate(a), which is not known in advance?

Correct, but swap_labels is called on the cloned GeometryGraph owned by the context that calls swap_labels, so that should be fine. I think the issue is only that GeometryGraph has some internal Rc. This was not relevant before, because GeometryGraph was only created as a transient entity.

The whole point of PreparedGeometry is to keep that entity around for re-use. With this PR it works in a single threaded context. I'd propose that we leave that as the bar for now and follow up with "share a prepared geometry across threads" as future work since this PR is already pretty large.

Plus, that way we can have a focused study of the performance tradeoffs needed (e.g. Rc vs Arc).

Copy link
Member Author

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops - meant to leave these comments to aid review when I first opened.

nodes: NodeMap::new(),
isolated_edges: vec![],
line_intersector: RobustLineIntersector::new(),
}
}

pub(crate) fn compute_intersection_matrix(&mut self) -> IntersectionMatrix {
let mut intersection_matrix = IntersectionMatrix::empty();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this into the empty_disjoint constructor.

@@ -293,44 +285,6 @@ where
}
}

/// If the Geometries are disjoint, we need to enter their dimension and boundary dimension in
/// the `Outside` rows in the IM
fn compute_disjoint_intersection_matrix(&self, intersection_matrix: &mut IntersectionMatrix) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this to InterseciontMatrix::compute_disjoint method.

};
}

macro_rules! cartesian_pairs_helper {
Copy link
Member Author

@michaelkirk michaelkirk Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got rid of this awful sophisticated 🧐 macro and now the relate impl is generic - anything that implements Relate can now relate to anything else that implements Relate.

}
}

struct Segment<'a, F: GeoFloat + rstar::RTreeNum> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved Segment to its own file.

@gauteh
Copy link

gauteh commented Jul 8, 2024

The whole point of PreparedGeometry is to keep that entity around for re-use. With this PR it works in a single threaded context. I'd propose that we leave that as the bar for now and follow up with "share a prepared geometry across threads" as future work since this PR is already pretty large.

Plus, that way we can have a focused study of the performance tradeoffs needed (e.g. Rc vs Arc).

Absolutely! I've been chipping away at moving to just &mut, don't think we need the Arc. There are a couple of cases where it's not possible to exactly replicate the behavior, but I suspect those cases may be not totally correct (or at least redundant). I'll put it in a separate PR for reference (whether it is usable or not).

@michaelkirk michaelkirk added this pull request to the merge queue Jul 26, 2024
Merged via the queue into main with commit 8eb091f Jul 26, 2024
15 checks passed
@michaelkirk michaelkirk deleted the mkirk/prepared-geom branch July 26, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prepared Geometries
3 participants