Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple GPUs on multiple nodes #9

Open
yuhc opened this issue Jun 8, 2020 · 0 comments
Open

Support multiple GPUs on multiple nodes #9

yuhc opened this issue Jun 8, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@yuhc
Copy link
Member

yuhc commented Jun 8, 2020

AvA already supports single-node multi-GPU case, where a single process can get access to multiple GPUs on a single GPU node.
The CUDA process needs to call cudaSetDevice explicitly to choose the in-use GPU during the runtime, and this feature can be utilized to support multi-node multi-GPU.

The basic idea is to run a worker on a GPU (which can be on different GPU nodes). When the application calls cudaSetDevice, guestlib changes the address of the worker dynamically and all following CUDA APIs will be forwarded to that worker. This assumes that there is no inter-GPU data transfer via channels like NVLink.

An improvement will be to use multiple local GPUs in a worker, and the guestlib changes the worker address and forwards cudaSetDevice(adjusted GPU ID) to that worker.

@yuhc yuhc added the enhancement New feature or request label Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant