[PJRT][IFRT] Move topology discovery into PJRT-IFRT. #68260
Draft
+414
−208
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[PJRT][IFRT] Move topology discovery into PJRT-IFRT.
Currently each PJRT backend is responsible for figuring out the global topology. This has several downsides:
This change adds support for topology discover to PJRT-IFRT, and migrates the CPU PJRT plugin to use it instead of performing its own topology discovery. Future changes will migrate other PJRT plugins.
One notable effect of this change is that we no longer assign device IDs in contiguous fashion. It's up to each PJRT plugin to choose globally unique device IDs, but after this change the CPU plugin forms a global device ID as
(process_id << 17 | local_device_id)
. It is hard to say for sure if any code is relying on contiguity of the device ID space, but it is not something we've ever documented as a contract of PJRT or JAX.Another downside of this change is that on CPU and GPU PjRtTopologyDescription may no longer have a complete description of the cluster topology since it is derived from a single PJRT client's view of the cluster, and after this change that view only contains local devices. We leave fixing this to a future change.