Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dynamically allocating descriptors when pool is exhausted. #2291

Merged
merged 1 commit into from
Jul 28, 2024

Conversation

billhollings
Copy link
Contributor

@billhollings billhollings commented Jul 25, 2024

  • MVKDescriptorTypePool if the pool is exhausted, report a warning, allocate a new descriptor instance, and when it is freed, destroy it.
  • Deprecate MVKConfiguration::preallocateDescriptors and MVK_CONFIG_PREALLOCATE_DESCRIPTORS, as both preallocation and dynamic allocation of descriptors are now supported.
  • vkAllocateDescriptorSets(): Per Vulkan spec, if any descriptor set allocation fails, free any successful allocations, and populate all descriptor set pointers with VK_NULL_HANDLE. In addition, return VK_ERROR_FRAGMENTED_POOL if failure was due to Metal argument buffer fragmentation.
  • vkUpdateDescriptorSets() accepts null VkDescriptorSets as no-ops.
  • MVKDescriptorPool::getLogDescription() include count of remaining descriptors in the pool, for diagnostic logging during debugging.

Fix #2282.

@billhollings
Copy link
Contributor Author

billhollings commented Jul 25, 2024

@cdavis5e @danginsburg

The Vulkan spec says:

If a call to vkAllocateDescriptorSets would cause the total number of descriptor sets allocated from the pool to exceed the value of VkDescriptorPoolCreateInfo::maxSets used to create pAllocateInfo->descriptorPool, then the allocation may fail due to lack of space in the descriptor pool. Similarly, the allocation may fail due to lack of space if the call to vkAllocateDescriptorSets would cause the number of any given descriptor type to exceed the sum of all the descriptorCount members of each element of VkDescriptorPoolCreateInfo::pPoolSizes with a type equal to that type.

So it's up to the implementation to define how strict it's going to be in enforcing these rules.

Previously, MoltenVK chose to fail and report in both of these situations. But based on the issue that triggered this PR (#2282), other implementations are more lax. This PR relaxes the second directive, dynamically allocating additional descriptors of a type once that type has been exhausted in the pool.

Design questions for you:

  1. Doing so means the app does not get feedback that their VkDescriptorPoolCreateInfo numbers are wrong, if the app intends to use this info to allocate an additional VkDescriptorPool once the current one has exhausted a type. Does this matter? Do apps follow this approach, and will suffer if we blindly dynamically allocate descriptors beyond the pool limit.

  2. This PR does not relax the limit on VkDescriptorPoolCreateInfo::maxSets, and still reports failure when that limit is exhausted, because it wasn't needed for the issue that triggered this PR, and because it would mean dynamically allocating a Metal argument buffer for each descriptor set requested above the limit. However, we could easily add that capability and eliminate that restriction as well (basically never fail on a descriptor set allocation), if it makes sense

More discussion in the triggering issue.

Thoughts?

@danginsburg
Copy link
Collaborator

Thoughts?

A couple of thoughts:

  • I've definitely seen this behave differently on different implementations, for example in SDL3 I forgot to include a descriptor type at all in the pool and it was working with no validation errors on NVIDIA and AMD, but then failed on Android (Vulkan Renderer on Android libsdl-org/SDL#10208 (comment)). I considered that an application bug and fixed it.
  • The behavior of vkAllocateDescriptorSets changed in VK_KHR_maintenance1, does the app opt-in to that? You might need to check whether that is enabled before returning VK_ERROR_OUT_OF_POOL_MEMORY? I think apps written against Vulkan 1.0 without maintenance1 wouldn't expect that error.
  1. Doing so means the app does not get feedback that their VkDescriptorPoolCreateInfo numbers are wrong, if the app intends to use this info to allocate an additional VkDescriptorPool once the current one has exhausted a type. Does this matter? Do apps follow this approach, and will suffer if we blindly dynamically allocate descriptors beyond the pool limit.

That's definitely what we do in our engine, yes. I guess it depends on what the cost/trade-off is going to be having MoltenVK dynamically allocate versus if the app had created its own new descriptor pool.

  1. This PR does not relax the limit on VkDescriptorPoolCreateInfo::maxSets, and still reports failure when that limit is exhausted, because it wasn't needed for the issue that triggered this PR, and because it would mean dynamically allocating a Metal argument buffer for each descriptor set requested above the limit. However, we could easily add that capability and eliminate that restriction as well (basically never fail on a descriptor set allocation), if it makes sense

That seems like the right call.

@billhollings
Copy link
Contributor Author

@danginsburg

Thanks for all the background info.

  • The behavior of vkAllocateDescriptorSets changed in VK_KHR_maintenance1, does the app opt-in to that? You might need to check whether that is enabled before returning VK_ERROR_OUT_OF_POOL_MEMORY? I think apps written against Vulkan 1.0 without maintenance1 wouldn't expect that error.

Thanks for pointing that out. The app (vkQuake) is not opting in on VK_KHR_maintenance1, but it is requesting Vulkan 1.1, so MoltenVK's error response was accurate. And even with 1.0 itself, the spec says any error returned should be treated as a failure.

  1. This PR does not relax the limit on VkDescriptorPoolCreateInfo::maxSets, and still reports failure when that limit is exhausted, because it wasn't needed for the issue that triggered this PR, and because it would mean dynamically allocating a Metal argument buffer for each descriptor set requested above the limit. However, we could easily add that capability and eliminate that restriction as well (basically never fail on a descriptor set allocation), if it makes sense

That seems like the right call.

Can you clarify which of the two options I suggested you think is the right call?

@danginsburg
Copy link
Collaborator

Can you clarify which of the two options I suggested you think is the right call?

Reporting a failure and not allocating a Metal argument buffer.

@cdavis5e
Copy link
Collaborator

  • Doing so means the app does not get feedback that their VkDescriptorPoolCreateInfo numbers are wrong, if the app intends to use this info to allocate an additional VkDescriptorPool once the current one has exhausted a type. Does this matter? Do apps follow this approach, and will suffer if we blindly dynamically allocate descriptors beyond the pool limit.

It's easy to dismiss vkQuake's behavior as being outside the spec, because it requested 1.1 behavior but failed to handle the out-of-memory condition. I suppose it's even easy enough to fix in vkQuake, given that John Carmack (thankfully) open-sourced his game engines. However, now I'm worried that there are other existing apps out there that can't be so easily updated.

Also, I'm not sure how valid this concern is, given how argument buffers work: memory is memory; there's no separate memory for texture references vs. buffer pointers. I think the approach here is the right call.

  • This PR does not relax the limit on VkDescriptorPoolCreateInfo::maxSets, and still reports failure when that limit is exhausted, because it wasn't needed for the issue that triggered this PR, and because it would mean dynamically allocating a Metal argument buffer for each descriptor set requested above the limit. However, we could easily add that capability and eliminate that restriction as well (basically never fail on a descriptor set allocation), if it makes sense

I agree with @danginsburg here. I'd add that supporting this would come at the cost of additional complexity in the descriptor set code--that code is hairy enough as it is.

@billhollings
Copy link
Contributor Author

@cdavis5e

Thanks for the additional advice and feedback.

If you have the cycles, can you also review the PR content, please?

@cdavis5e
Copy link
Collaborator

If you have the cycles, can you also review the PR content, please?

Done. Sorry about the delay.

- MVKDescriptorTypePool if the pool is exhausted, report a warning,
  allocate a new descriptor instance, and when it is freed, destroy it.
- Deprecate MVKConfiguration::preallocateDescriptors and
  MVK_CONFIG_PREALLOCATE_DESCRIPTORS, as both preallocation
  and dynamic allocation of descriptors are now supported.
- vkAllocateDescriptorSets(): Per Vulkan spec, if any descriptor set allocation
  fails, free any successful allocations, and populate all descriptor set pointers
  with VK_NULL_HANDLE. In addition, return VK_ERROR_FRAGMENTED_POOL if failure was
  due to Metal argument buffer fragmentation.
- vkUpdateDescriptorSets() accepts null VkDescriptorSets as no-ops.
- MVKDescriptorPool::getLogDescription() include count of remaining
  descriptors in the pool, for diagnostic logging during debugging.
@billhollings billhollings merged commit c2bca92 into KhronosGroup:main Jul 28, 2024
6 checks passed
@billhollings billhollings deleted the dyn-alloc-descriptors branch July 28, 2024 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EXC_BAD_ACCESS from mvkUpdateDescriptorSets on macOS
3 participants