Total number of cores and nodes assumes one full node per running job #4

SamBoutin · 2023-08-08T21:18:46Z

The current code appears to assume that each job uses a full node. This might not always be the case when using for example a slurm job array with multiple elements per node. This can lead to large over estimate of the cluster usage.

I'm guessing an (at least partial) fix would be to only count distinct nodes in the process_data function.

The text was updated successfully, but these errors were encountered:

basnijholt · 2023-08-09T01:34:38Z

Thanks for reporting!

I didn't consider this case.

I still think the right thing to do is to count all the cores on a node because in principal these are reserved and no other people can use them. However, like you suggest, we shouldn't double count nodes if multiple jobs of the same person run on a node. If multiple people run a job on a node it's more complicated though (and perhaps we can ignore this).

SamBoutin · 2023-08-09T03:16:09Z

Yeah, that sounds like a good way to go. It seems fine to me to ignore the edge case of multiple users having jobs on the same node.

basnijholt · 2024-08-15T17:39:06Z

@SamBoutin, I just updated the way the cores are counted. It should now correctly count --exclusive and --exclusive=user jobs. #6

Could you try v2.0.0 and see whether it works for you?

cc @jbweston @aeantipov

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Total number of cores and nodes assumes one full node per running job #4

Total number of cores and nodes assumes one full node per running job #4

SamBoutin commented Aug 8, 2023

basnijholt commented Aug 9, 2023

SamBoutin commented Aug 9, 2023

basnijholt commented Aug 15, 2024 •

edited

Loading

Total number of cores and nodes assumes one full node per running job #4

Total number of cores and nodes assumes one full node per running job #4

Comments

SamBoutin commented Aug 8, 2023

basnijholt commented Aug 9, 2023

SamBoutin commented Aug 9, 2023

basnijholt commented Aug 15, 2024 • edited Loading

basnijholt commented Aug 15, 2024 •

edited

Loading