Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Total number of cores and nodes assumes one full node per running job #4

Open
SamBoutin opened this issue Aug 8, 2023 · 3 comments

Comments

@SamBoutin
Copy link

The current code appears to assume that each job uses a full node. This might not always be the case when using for example a slurm job array with multiple elements per node. This can lead to large over estimate of the cluster usage.

I'm guessing an (at least partial) fix would be to only count distinct nodes in the process_data function.

@basnijholt
Copy link
Owner

Thanks for reporting!

I didn't consider this case.

I still think the right thing to do is to count all the cores on a node because in principal these are reserved and no other people can use them. However, like you suggest, we shouldn't double count nodes if multiple jobs of the same person run on a node. If multiple people run a job on a node it's more complicated though (and perhaps we can ignore this).

@SamBoutin
Copy link
Author

Yeah, that sounds like a good way to go. It seems fine to me to ignore the edge case of multiple users having jobs on the same node.

@basnijholt
Copy link
Owner

basnijholt commented Aug 15, 2024

@SamBoutin, I just updated the way the cores are counted. It should now correctly count --exclusive and --exclusive=user jobs. #6

Could you try v2.0.0 and see whether it works for you?

cc @jbweston @aeantipov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants