Dynamic PJRT plugin registration API #5644

will-cromar · 2023-09-25T18:38:23Z

First pass at implementing a common API for device plugins. The eventual goal is to remove any cases where we have to hard-code the device type in our build, allowing truly dynamic plugins through the PJRT plugin API.

Add DevicePlugin API, including sample implementation for TPU
Enable dynamic plugins with PJRT_DYNAMIC_PLUGINS=1 or plugins.use_dynamic_plugins
Add registration mechanism for PJRT plugins. If you register a valid plugin with plugin.register_plugin and enable plugins, then you can use the same device by name by setting PJRT_DEVICE. (see integration test in this PR for an example)
- Completely new devices will probably not work yet. We still rely too much on parsing hard-coded strings in the internals.
Default behavior doesn't change at all for now

Future work:

Remove or update XlaDeviceType so plugins don't have to register their device strings in this repository
Move GPU client into PJRT Plugin
Automatically register TPU and GPU plugins and remove hard-coded PJRT client initialization.

will-cromar · 2023-12-14T18:24:39Z

Heads up @jzhoulon @aws-kingrj, I'm working on a new way for external packages to register PJRT plugins with torch_xla. No action is required from you right now. I'll keep Neuron and XPU working within this repository while we develop the idea.

When this API is finalized, we can move plugin registration (something like TpuPlugin in this PR) into your respective packages.

will-cromar · 2023-12-14T18:26:40Z

Leaving this as draft for now until I rebase after #5677, but this PR is largely ready for comments.

vanbasten23 · 2023-12-14T20:06:57Z

test/pjrt/test_dynamic_plugin_tpu.py

+    assert len(xm.get_xla_supported_devices('TPU')) > 0
+
+  def test_dynamic_plugin_api(self):
+    with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:


I guess the difference b/w test_dynamic_plugin_api and test_spawn is that the former test single processing and the latter test the multi-processing?

Yeah, I wrote test_dynamic_plugin_api before the other. I'll change the name to something like test_single_process to be more clear

vanbasten23 · 2023-12-14T20:14:06Z

torch_xla/experimental/plugins.py

+
+def register_plugin(name: str, device_plugin: DevicePlugin):
+  _plugin_registry[name.upper()] = device_plugin
+  torch_xla._XLAC._register_pjrt_plugin(name, device_plugin.library_path())


I wonder what library_path we should use for GPU. iiuc, GPU doesn't involve libTPU lib.

None right now. GPU support is statically linked in. When that moves to a plugin (say libsegpu.so), it will be the path to that binary.

JackCaoG · 2023-12-14T22:17:07Z

test/pjrt/test_dynamic_plugin_tpu.py

+
+  @staticmethod
+  def _assert_tpus_exist(index=0):
+    del index


what's this for?

index is required for spawn, but we don't need it. I just explicitly delete it to mark it unused

JackCaoG · 2023-12-14T22:20:06Z

torch_xla/_internal/pjrt.py

@@ -96,7 +97,9 @@ def _run_singleprocess(fn: Callable[..., R], *args, **kwargs) -> Dict[int, R]:
  """
  os.environ.setdefault(xenv.PJRT_LOCAL_PROCESS_COUNT, '1')

-  if runtime.device_type() == 'TPU':
+  if plugins.using_dynamic_plugins():
+    plugins.default().configure_single_process()


should we throw a warning or something when pople configure PJRT_DEVICE while also register the plugin in the code?

You still select the device type with PJRT_DEVICE. Plugins will just let you register new device types when we clean up all of the hardcoded strings.

JackCaoG · 2023-12-14T22:23:37Z

torch_xla/csrc/runtime/computation_client.cc

@@ -53,6 +53,21 @@ int64_t ComputationClient::GetDeviceOrdinal(const std::string& device) {
  return std::stoi(device.substr(pos + 1));
 }

+std::unordered_map<std::string, std::string> pjrt_plugins_;


I thought user can only register one plug in? What;s the use case of registering multiple?

We'll register TPU and GPU as default options, and then other packages will add plugins on top of those. JAX is also using Python entry points to register available plugins automatically, which we may also want to do.

JackCaoG · 2023-12-15T18:35:56Z

test/pjrt/test_dynamic_plugin_tpu.py

+import torch_xla.runtime as xr
+from torch_xla._internal import tpu
+
+plugins.register_plugin('TPU', tpu.TpuPlugin())


are you going to put this in our init file eventually?

Yeah. I'm avoiding any changes to the default behavior while this is WIP.

vanbasten23

LGTM! Thanks.

will-cromar added DO_NOT_REVIEW_YET runtime labels Sep 25, 2023

will-cromar changed the title ~~[WIP] Dynamic PJRT plugin API~~ [WIP] Dynamic PJRT plugin registration API Sep 25, 2023

will-cromar force-pushed the wcromar/dynamic-plugin-api branch from 6e37cbf to 3f20fde Compare December 5, 2023 18:51

will-cromar mentioned this pull request Dec 5, 2023

How to add a new third-party Backend #6014

Closed

will-cromar force-pushed the wcromar/dynamic-plugin-api branch from 7956ce5 to b732806 Compare December 11, 2023 18:31

will-cromar changed the title ~~[WIP] Dynamic PJRT plugin registration API~~ Dynamic PJRT plugin registration API Dec 14, 2023

will-cromar removed the DO_NOT_REVIEW_YET label Dec 14, 2023

will-cromar requested review from vanbasten23 and JackCaoG December 14, 2023 18:27

vanbasten23 reviewed Dec 14, 2023

View reviewed changes

JackCaoG reviewed Dec 14, 2023

View reviewed changes

JackCaoG reviewed Dec 15, 2023

View reviewed changes

JackCaoG approved these changes Dec 15, 2023

View reviewed changes

will-cromar added 11 commits December 18, 2023 18:50

Dynamic PJRT plugin API

9c38291

docstrings

5936932

formatting

2c4dcd4

fix tpu plugin

db4825c

expose library path directly

faaa6be

don't register TPU plugin as a default

a6686a4

unit test

664f11f

Add switch for dynamic plugins

98d45f9

fix spawning

5b430da

remove comment

3cf42d0

clean up and unbreak stuff

63f08f5

will-cromar added 7 commits December 18, 2023 18:54

formatting

802b4f9

add test to TPU CI

8b2e6e5

find profiler plugin

e9a4632

make test case name clearer

06a1094

fix some merging issues

27a59e8

move PJRT plugin registry to initialize_pjrt

eff4344

initialize_pjrt -> pjrt_backend

8a48e90

will-cromar force-pushed the wcromar/dynamic-plugin-api branch from 2c1c483 to 8a48e90 Compare December 18, 2023 19:29

formatting

867a8f1

will-cromar marked this pull request as ready for review December 18, 2023 19:32

will-cromar requested a review from vanbasten23 December 18, 2023 21:24

vanbasten23 approved these changes Dec 18, 2023

View reviewed changes

will-cromar merged commit ad14582 into master Dec 18, 2023
20 checks passed

JackCaoG mentioned this pull request Dec 19, 2023

How to use a PJRT plugin from Pytorch? openxla/xla#7855

Open

will-cromar mentioned this pull request Dec 28, 2023

Improve PJRT C API support for GPUs and custom hardware #6242

Closed

9 tasks

mbzomowski pushed a commit to mbzomowski-test-org/xla that referenced this pull request Jan 3, 2024

Dynamic PJRT plugin registration API (pytorch#5644)

e5a21f6

golechwierowicz pushed a commit that referenced this pull request Jan 12, 2024

Dynamic PJRT plugin registration API (#5644)

4b076d6

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Dynamic PJRT plugin registration API (#5644)

ac121f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic PJRT plugin registration API #5644

Dynamic PJRT plugin registration API #5644

will-cromar commented Sep 25, 2023 •

edited

Loading

will-cromar commented Dec 14, 2023

will-cromar commented Dec 14, 2023

vanbasten23 Dec 14, 2023

will-cromar Dec 14, 2023

vanbasten23 Dec 14, 2023

will-cromar Dec 14, 2023

JackCaoG Dec 14, 2023

will-cromar Dec 14, 2023

JackCaoG Dec 14, 2023

will-cromar Dec 14, 2023

JackCaoG Dec 14, 2023

will-cromar Dec 14, 2023

JackCaoG Dec 15, 2023

will-cromar Dec 15, 2023

vanbasten23 left a comment

Dynamic PJRT plugin registration API #5644

Dynamic PJRT plugin registration API #5644

Conversation

will-cromar commented Sep 25, 2023 • edited Loading

will-cromar commented Dec 14, 2023

will-cromar commented Dec 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vanbasten23 left a comment

Choose a reason for hiding this comment

will-cromar commented Sep 25, 2023 •

edited

Loading