Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce assumptions in System Mode Manager initialisation #13

Open
chgio opened this issue Jan 24, 2024 · 2 comments
Open

Enforce assumptions in System Mode Manager initialisation #13

chgio opened this issue Jan 24, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@chgio
Copy link
Member

chgio commented Jan 24, 2024

#4 and #11 introduced a relatively simple bug in the System Mode Manager: initial subsystem states are only assumed, not enforced.

As this bug became apparent through a discrepancy with the payload camera, for cloudview-1 (Release) this was hot-fixed by adding manual mode override through Serial in guorbit/camera-relay (cloudview-1 Release): src/main.cpp, lines 57-80 @ 3731638 (comment).

This "hack" adopts the approach of ensuring the assumptions are correct, which did work but is suboptimal in terms of robustness, especially in unmanned initialisation scenarios. Instead -- or, better yet, in addition -- it would be auspicable to programmatically enforce the assumptions in the initial transition of the System Mode Manager.
My idea is to substitute the assignments in the initialisation line with calls to the same code that requests the TC-based "arbitrary" subsystem mode changes, but requesting the currently assumed initial mode instead.

It should be possible to achieve this elegantly enough through SDL Joins or SDL Procedures: internal refactorings should be strongly preferred, to minimise the architectural impact and memory footprint of these changes.
(TASTE SDL Tutorial)

We should also think about how to handle failures: the most basic approach would be to wrap this in a loop to try again until success, but Rule 2 of NASA/JPL's The Power of 10 prescribes that indefinite loops should be avoided. This could potentially be an exception, as operational functionalities cannot be safely reached unless initialisation is successful; otherwise, a definite loop could leave an increasing time between successive tries to hopefully allow for completion of whatever process is causing the failure.
A perhaps better approach could be to get rid of the assumptions entirely, assigning the actual state contained in the subsystem's response instead of the requested assumption.

@chgio
Copy link
Member Author

chgio commented Feb 6, 2024

Solid work on 3cb6756 to 0e6cd71 -- very elegant control flow injection 😄

As we'd discussed previously already, nice refactoring of the initial assumptions and initialisation of the report variables with the assumed data!
Still, we would be much better off following @ZonaDashti's suggestion and just querying subsystems about their actual initial modes to take it from there, which would get rid of the assumptions altogether.
I've set up #14 about that -- it's more work, and definitely beyond the scope of this initial bugfix, but probably the only and definitely the most elegant way to really fix this bug without having it come back to haunt us again! Plus, the extension to #15 would be useful in general for operations.

Nice implementation of the retry loop too! I'm only a little worried about the isolation of the initial state in the System Mode Manager's statechart view -- though analysing the transitions does reveal that it can only end up in Wait_change. I guess the statechart generator is getting a little caught up in the goto loops, so it might be worth pinging Maxime about this.
At any rate, the "unassuming mode readouts" of #14 should relieve this section of a lot of its criticality, as we can try automatically a few times and then in case of continued failure just put the burden of fixing this on the ground operator.

In the Mission Mode Change TC Handler, very interesting introduction of a separate Wait_initial state as opposed to the earlier idea of just initialising the component into Wait_report. Perhaps this could be used to accept both telecommands and report relay requests, to let a human into the loop to fix initialisation issues -- however, we should think about this carefully to avoid the risk of bricking anything instead!
Again, the "unassuming mode readouts" of #14 should remove the need for the dedicated mission_initial and system_mode_initial and keep our data model nice and clean.

Also, one thing that I forgot to mention: I noticed you only changed the test_subsystem_mode_change_validation implementation of the system_mode_manager -- I thought I remembered more code in that component!
This is an alternative implementation I made, and deliberately stripped of its logic to validate the requested subsystem mode changes, to test that its duplication distributed into each subsystem for redundancy is correct.
However, I realise this was a mistake on my part: multiple implementations are confusing and a smelly way to do this, as any change to the logic would need to be duplicated manually in both instances; instead, we should use the SDL Alternative (compilation option) symbol and do separate test runs with the alternative flag set and unset.
Apologies for the pain, please just move your changes to the default implementation and test that one; I'll try to take care of the Alternative unchecking branches (#16) -- unless, of course, you are particularly adamant about doing it yourself 😄

@chgio
Copy link
Member Author

chgio commented Feb 7, 2024

⬆️ @xszymonzur pls write down anything you do differently to the Capella model so I can backpropagate it ⬆️

(don't let me catch up ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants