Streamline observations and define terms and scope #14

nkakouros · 2024-03-25T09:25:24Z

After each simulation step, the first item returned is the observations. A single observation currently contains the following keys:

dict_keys(['is_observable', 'observed_state', 'remaining_ttc', 'asset_type', 'asset_id', 'step_name', 'edges'])

I suggest we do the following:

Consider renaming some of the last 4 keys.
- First, we could use edge_indices instead of edges to make what is contained clearer.
- I also find step_name confusing. step_name can mean either the qualified step name or not. I think we should always use the distinction qualified/unqualified when talking about such a name.
- I think the unqualified name makes sense to us because we think in a MAL way (of asset types having steps). When we look at the attacked graph and the instance model, both assets and steps change meaning a bit; assets are instances of MAL assets, and steps are the nodes in the attack graph. And maybe this is how other people who work with attack graphs but not with MAL talk about these things. This is a broader discussion about standardizing names and what a term should unequivocally mean, so I will share some examples and thoughts. For example, we could talk about asset types for the MAL asset entities and assets or asset instances for the objects in the instance model; and attack steps for the steps in MAL specs and attack nodes or step nodes for the nodes of the attack graph.
Remove the remaining_ttc. This does not make sense to be disclosed to the agents, it is a simulation thing ("when should the state of an attack node be changed?").
Move the functionality to generate the asset_type, asset_id, step_name lists to the AttackGraph in the MAL toolbox. These lists are useful for any use of the attack graph to train models, not just for RL. They are also "about" the attack graph and for discoverability and conceptually they belong there.
Cache these lists. Since we do not do DynaMAL, we could have these lists be properties of the attack graph that are set during graph generation. If in the future we want dynamic stuff and we need to dynamically generate these lists again, we can easily do it in python with @property.
Document why stuff like asset_type, asset_id, etc. are passed to the agents through the observations and not e.g. at agent creation (via their __init__ method). It seems passing this info through the observations is common practice and makes re-using 3rd-party agents easier. There are workarounds, but maybe we should be conformant with the common practice.
Decide whether we want to be MAL specific or not. Following from the above, should we try to cater for projects/people that do not use MAL? I think not. For instance, if our agents take the graph structure info at creation time while 3rd-party ones do not, it's super fine to do that if we have good reason (e.g. performance). We anyway base the simulation on MAL specs and instance models.

The text was updated successfully, but these errors were encountered:

kasanari · 2024-03-25T09:42:14Z

I agree with renaming edges to either edge_index or edge_indices. It is more in line with how pytorch geometric names it. It also allows us to add a 'edge_attr' field for when we wish to have features for edges.
Remaining TTC could be removed for now, but one might argue that it is relevant for an attacker agent if we are working in a fully observable scenario. I used to have a search-based attacker that used the remaining ttc as a path cost, for example.
Regarding static information in the obs: To me this is only an issue if memory usage becomes too high. I agree that caching heavy calculations on the backend would be good to save CPU cycles, but adding the asset types to the obs every iteration is not a costly operation from that perspective. Not including it only makes the system less 'pure' from an MDP or programming perspective, because all information that is being used to make decisions (and advance the system to the next state) is not included in the state/observation representation.

kasanari · 2024-03-25T09:46:30Z

For example: I am considering doing some offline RL. Then I would save a 'dataset' of observations that can then be used to train in a supervised manner. If the observations contains all the required info, then I can just save a list of observations in a text file and train on it without even importing the mal-toolbox.

andrewbwm · 2024-03-25T10:29:52Z

I categorically agree with everything you've said. I just require a clarification on one point.

* Cache these lists. Since we do not do DynaMAL, we could have these lists be properties of the attack graph that are set during graph generation. If in the future we want dynamic stuff and we need to dynamically generate these lists again, we can easily do it in python with `@property`.

What do you mean by this? Just adding the cache hint like we have for the observation space? Because the list is not regenerated, but I think you know this already.

nkakouros · 2024-05-11T21:56:54Z

What do you mean by this? Just adding the cache hint like we have for the observation space? Because the list is not regenerated, but I think you know this already.

I am not sure I remember. But given that these are already cached, moving these lists to the attack graph is the still relevant part of that comment of mine.

andrewbwm mentioned this issue Jun 4, 2024

Refactor and Clean Up the Existing Codebase #30

Open

5 tasks

andrewbwm added this to the Jun-August 2024 milestone Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline observations and define terms and scope #14

Streamline observations and define terms and scope #14

nkakouros commented Mar 25, 2024

kasanari commented Mar 25, 2024

kasanari commented Mar 25, 2024

andrewbwm commented Mar 25, 2024 •

edited

Loading

nkakouros commented May 11, 2024

Streamline observations and define terms and scope #14

Streamline observations and define terms and scope #14

Comments

nkakouros commented Mar 25, 2024

kasanari commented Mar 25, 2024

kasanari commented Mar 25, 2024

andrewbwm commented Mar 25, 2024 • edited Loading

nkakouros commented May 11, 2024

andrewbwm commented Mar 25, 2024 •

edited

Loading