Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add mutex around step callback #69

Merged
merged 5 commits into from
Feb 25, 2024
Merged

feat: add mutex around step callback #69

merged 5 commits into from
Feb 25, 2024

Conversation

domire8
Copy link
Member

@domire8 domire8 commented Feb 14, 2024

Description

This PR solves the issue by adding the mutex as proposed in the parent issue.

Would it be sensible to add the same thing in the python component interface?

Review guidelines

Estimated Time of Review: 5 minutes

Checklist before merging:

  • Confirm that the relevant changelog(s) are up-to-date in case of any user-facing changes

@domire8 domire8 linked an issue Feb 14, 2024 that may be closed by this pull request
@bpapaspyros
Copy link
Member

@JasperTan97 this would in principle make the mutexes in aica-technology/dynamic-components#360 unnecessary (but shouldn't break your code if you keep them). For testing purpose perhaps you could test (if/when you have the time) and remove them in your component?

JasperTan97
JasperTan97 previously approved these changes Feb 14, 2024
Copy link

@JasperTan97 JasperTan97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@JasperTan97
Copy link

JasperTan97 commented Feb 14, 2024

@JasperTan97 this would in principle make the mutexes in https://github.com/aica-technology/dynamic-components/pull/360 unnecessary (but shouldn't break your code if you keep them). For testing purpose perhaps you could test (if/when you have the time) and remove them in your component?

@bpapaspyros : So I am wondering about this too, because detect_markers is technically not a step function, rather, it's being called from PoseDetectionComponent via here and here. (There's a little bit of back and forth with the image_callback and on_image_callback, but in the end, on_image_callback is overidden)

Since it is using create_subscription and not the step callback, I am not sure if the mutex also protects this component. I will keep looking at the code, but my understanding right now is that we still need the mutex

@domire8
Copy link
Member Author

domire8 commented Feb 14, 2024

Since it is using create_subscription and not the step callback, I am not sure if the mutex also protects this component. I will keep looking at the code, but my understanding right now is that we still need the mutex

Very good point, we didn't look at the marker component closely enough. I think you are right that it's still needed

@bpapaspyros
Copy link
Member

@JasperTan97 this would in principle make the mutexes in https://github.com/aica-technology/dynamic-components/pull/360 unnecessary (but shouldn't break your code if you keep them). For testing purpose perhaps you could test (if/when you have the time) and remove them in your component?

@bpapaspyros : So I am wondering about this too, because detect_markers is technically not a step function, rather, it's being called from PoseDetectionComponent via here and here. (There's a little bit of back and forth with the image_callback and on_image_callback, but in the end, on_image_callback is overidden)

Since it is using create_subscription and not the step callback, I am not sure if the mutex also protects this component. I will keep looking at the code, but my understanding right now is that we still need the mutex

ah, yes. You are probably right. Nice, thanks for the feedback

bpapaspyros
bpapaspyros previously approved these changes Feb 14, 2024
Copy link
Member

@eeberhard eeberhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already a nice addition in my opinion. It would be a reasonable extension to check and implement a similar thread safety mechanism for the python interface.

The other valid point is that this will not safeguard other callbacks in case of concurrent destruction of the node. It's worth noting that the step wall-timer and all other callbacks are in the default mutually exclusive callback group and should never run concurrently in the executor; it's just unexpected external destruction of the node that may cause a race condition. Perhaps exploring callback groups and finding a way to check the current callback queue in the destructor may be a more general safeguarding approach.

@domire8
Copy link
Member Author

domire8 commented Feb 14, 2024

Perhaps exploring callback groups and finding a way to check the current callback queue in the destructor may be a more general safeguarding approach.

Fair enough, how do you see that in terms of priorities then? Should we implement the mutex in Python as proposed above and focus on callback groups another time or discard this PR and focus on callback groups now? To me it seems that looking into callback groups will be a bigger time investment (especially if I do it)

@domire8 domire8 dismissed stale reviews from bpapaspyros and JasperTan97 via 5b1a420 February 16, 2024 10:53
Co-authored-by: Enrico Eberhard <32450951+eeberhard@users.noreply.github.com>
JasperTan97
JasperTan97 previously approved these changes Feb 16, 2024
Copy link

@JasperTan97 JasperTan97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@eeberhard eeberhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you see that in terms of priorities then? Should we implement the mutex in Python as proposed above and focus on callback groups another time?

I think I'd like to see it also in Python before merging, so that we have parity between the implementations. From what I remember, Python components aren't fully deleted when they are removed from the executor and are just cleaned up by the garbage collector at some point, which makes weird race conditions on undefined memory a bit less likely. Still, if we make an explicit point of safe-guarding step, it frees us up to be more aggressive when unmounting Python components in the Python component manager.

This change invariably adds a bit more thread safety at the cost of a few more CPU operations per step, so it would be good to pass through. Achieving this level with managed callback groups is potentially infeasible. If anything, I think any efforts into callback groups should be mostly focused on addressing priority inversion and keeping step in a high priority group, but that requires design thinking across the stack (i.e. to synchronize those callback groups and threads with the executors of the respective component managers)

@domire8 domire8 dismissed stale reviews from JasperTan97 and bpapaspyros via 19fc6b0 February 19, 2024 06:54
@domire8 domire8 merged commit 59b4b6e into main Feb 25, 2024
4 checks passed
@domire8 domire8 deleted the feat/lock-step branch February 25, 2024 20:11
@github-actions github-actions bot locked and limited conversation to collaborators Feb 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add mutex around step callback
4 participants