Support dynamically allocating descriptors when pool is exhausted. #2291

billhollings · 2024-07-25T12:47:48Z

MVKDescriptorTypePool if the pool is exhausted, report a warning, allocate a new descriptor instance, and when it is freed, destroy it.
Deprecate MVKConfiguration::preallocateDescriptors and MVK_CONFIG_PREALLOCATE_DESCRIPTORS, as both preallocation and dynamic allocation of descriptors are now supported.
vkAllocateDescriptorSets(): Per Vulkan spec, if any descriptor set allocation fails, free any successful allocations, and populate all descriptor set pointers with VK_NULL_HANDLE. In addition, return VK_ERROR_FRAGMENTED_POOL if failure was due to Metal argument buffer fragmentation.
vkUpdateDescriptorSets() accepts null VkDescriptorSets as no-ops.
MVKDescriptorPool::getLogDescription() include count of remaining descriptors in the pool, for diagnostic logging during debugging.

billhollings · 2024-07-25T13:52:56Z

The Vulkan spec says:

If a call to vkAllocateDescriptorSets would cause the total number of descriptor sets allocated from the pool to exceed the value of VkDescriptorPoolCreateInfo::maxSets used to create pAllocateInfo->descriptorPool, then the allocation may fail due to lack of space in the descriptor pool. Similarly, the allocation may fail due to lack of space if the call to vkAllocateDescriptorSets would cause the number of any given descriptor type to exceed the sum of all the descriptorCount members of each element of VkDescriptorPoolCreateInfo::pPoolSizes with a type equal to that type.

So it's up to the implementation to define how strict it's going to be in enforcing these rules.

Previously, MoltenVK chose to fail and report in both of these situations. But based on the issue that triggered this PR (#2282), other implementations are more lax. This PR relaxes the second directive, dynamically allocating additional descriptors of a type once that type has been exhausted in the pool.

Design questions for you:

Doing so means the app does not get feedback that their VkDescriptorPoolCreateInfo numbers are wrong, if the app intends to use this info to allocate an additional VkDescriptorPool once the current one has exhausted a type. Does this matter? Do apps follow this approach, and will suffer if we blindly dynamically allocate descriptors beyond the pool limit.
This PR does not relax the limit on VkDescriptorPoolCreateInfo::maxSets, and still reports failure when that limit is exhausted, because it wasn't needed for the issue that triggered this PR, and because it would mean dynamically allocating a Metal argument buffer for each descriptor set requested above the limit. However, we could easily add that capability and eliminate that restriction as well (basically never fail on a descriptor set allocation), if it makes sense

More discussion in the triggering issue.

Thoughts?

danginsburg · 2024-07-25T15:05:23Z

Thoughts?

A couple of thoughts:

I've definitely seen this behave differently on different implementations, for example in SDL3 I forgot to include a descriptor type at all in the pool and it was working with no validation errors on NVIDIA and AMD, but then failed on Android (Vulkan Renderer on Android libsdl-org/SDL#10208 (comment)). I considered that an application bug and fixed it.
The behavior of vkAllocateDescriptorSets changed in VK_KHR_maintenance1, does the app opt-in to that? You might need to check whether that is enabled before returning VK_ERROR_OUT_OF_POOL_MEMORY? I think apps written against Vulkan 1.0 without maintenance1 wouldn't expect that error.

Doing so means the app does not get feedback that their VkDescriptorPoolCreateInfo numbers are wrong, if the app intends to use this info to allocate an additional VkDescriptorPool once the current one has exhausted a type. Does this matter? Do apps follow this approach, and will suffer if we blindly dynamically allocate descriptors beyond the pool limit.

That's definitely what we do in our engine, yes. I guess it depends on what the cost/trade-off is going to be having MoltenVK dynamically allocate versus if the app had created its own new descriptor pool.

This PR does not relax the limit on VkDescriptorPoolCreateInfo::maxSets, and still reports failure when that limit is exhausted, because it wasn't needed for the issue that triggered this PR, and because it would mean dynamically allocating a Metal argument buffer for each descriptor set requested above the limit. However, we could easily add that capability and eliminate that restriction as well (basically never fail on a descriptor set allocation), if it makes sense

That seems like the right call.

billhollings · 2024-07-25T17:40:48Z

@danginsburg

Thanks for all the background info.

The behavior of vkAllocateDescriptorSets changed in VK_KHR_maintenance1, does the app opt-in to that? You might need to check whether that is enabled before returning VK_ERROR_OUT_OF_POOL_MEMORY? I think apps written against Vulkan 1.0 without maintenance1 wouldn't expect that error.

Thanks for pointing that out. The app (vkQuake) is not opting in on VK_KHR_maintenance1, but it is requesting Vulkan 1.1, so MoltenVK's error response was accurate. And even with 1.0 itself, the spec says any error returned should be treated as a failure.

This PR does not relax the limit on VkDescriptorPoolCreateInfo::maxSets, and still reports failure when that limit is exhausted, because it wasn't needed for the issue that triggered this PR, and because it would mean dynamically allocating a Metal argument buffer for each descriptor set requested above the limit. However, we could easily add that capability and eliminate that restriction as well (basically never fail on a descriptor set allocation), if it makes sense

That seems like the right call.

Can you clarify which of the two options I suggested you think is the right call?

danginsburg · 2024-07-25T17:42:48Z

Can you clarify which of the two options I suggested you think is the right call?

Reporting a failure and not allocating a Metal argument buffer.

cdavis5e · 2024-07-26T02:28:32Z

Doing so means the app does not get feedback that their VkDescriptorPoolCreateInfo numbers are wrong, if the app intends to use this info to allocate an additional VkDescriptorPool once the current one has exhausted a type. Does this matter? Do apps follow this approach, and will suffer if we blindly dynamically allocate descriptors beyond the pool limit.

It's easy to dismiss vkQuake's behavior as being outside the spec, because it requested 1.1 behavior but failed to handle the out-of-memory condition. I suppose it's even easy enough to fix in vkQuake, given that John Carmack (thankfully) open-sourced his game engines. However, now I'm worried that there are other existing apps out there that can't be so easily updated.

Also, I'm not sure how valid this concern is, given how argument buffers work: memory is memory; there's no separate memory for texture references vs. buffer pointers. I think the approach here is the right call.

This PR does not relax the limit on VkDescriptorPoolCreateInfo::maxSets, and still reports failure when that limit is exhausted, because it wasn't needed for the issue that triggered this PR, and because it would mean dynamically allocating a Metal argument buffer for each descriptor set requested above the limit. However, we could easily add that capability and eliminate that restriction as well (basically never fail on a descriptor set allocation), if it makes sense

I agree with @danginsburg here. I'd add that supporting this would come at the cost of additional complexity in the descriptor set code--that code is hairy enough as it is.

billhollings · 2024-07-26T02:38:24Z

@cdavis5e

Thanks for the additional advice and feedback.

If you have the cycles, can you also review the PR content, please?

MoltenVK/MoltenVK/GPUObjects/MVKDescriptorSet.mm

cdavis5e · 2024-07-27T01:36:06Z

If you have the cycles, can you also review the PR content, please?

Done. Sorry about the delay.

- MVKDescriptorTypePool if the pool is exhausted, report a warning, allocate a new descriptor instance, and when it is freed, destroy it. - Deprecate MVKConfiguration::preallocateDescriptors and MVK_CONFIG_PREALLOCATE_DESCRIPTORS, as both preallocation and dynamic allocation of descriptors are now supported. - vkAllocateDescriptorSets(): Per Vulkan spec, if any descriptor set allocation fails, free any successful allocations, and populate all descriptor set pointers with VK_NULL_HANDLE. In addition, return VK_ERROR_FRAGMENTED_POOL if failure was due to Metal argument buffer fragmentation. - vkUpdateDescriptorSets() accepts null VkDescriptorSets as no-ops. - MVKDescriptorPool::getLogDescription() include count of remaining descriptors in the pool, for diagnostic logging during debugging.

billhollings force-pushed the dyn-alloc-descriptors branch from 7f8886f to 9b9ac03 Compare July 25, 2024 12:50

billhollings requested a review from cdavis5e July 25, 2024 12:51

billhollings mentioned this pull request Jul 25, 2024

EXC_BAD_ACCESS from mvkUpdateDescriptorSets on macOS #2282

Closed

billhollings force-pushed the dyn-alloc-descriptors branch from 9b9ac03 to d39adce Compare July 25, 2024 18:00

cdavis5e approved these changes Jul 27, 2024

View reviewed changes

MoltenVK/MoltenVK/GPUObjects/MVKDescriptorSet.mm Outdated Show resolved Hide resolved

billhollings force-pushed the dyn-alloc-descriptors branch from d39adce to 944033b Compare July 27, 2024 23:38

billhollings requested a review from cdavis5e July 27, 2024 23:42

billhollings force-pushed the dyn-alloc-descriptors branch from 944033b to ddd5bf4 Compare July 27, 2024 23:45

cdavis5e approved these changes Jul 28, 2024

View reviewed changes

billhollings merged commit c2bca92 into KhronosGroup:main Jul 28, 2024
6 checks passed

billhollings deleted the dyn-alloc-descriptors branch July 28, 2024 12:08

vsonnier mentioned this pull request Jul 28, 2024

Sigfault when loading The Immortal Lock Novum/vkQuake#704

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support dynamically allocating descriptors when pool is exhausted. #2291

Support dynamically allocating descriptors when pool is exhausted. #2291

billhollings commented Jul 25, 2024 •

edited

Loading

billhollings commented Jul 25, 2024 •

edited

Loading

danginsburg commented Jul 25, 2024

billhollings commented Jul 25, 2024

danginsburg commented Jul 25, 2024

cdavis5e commented Jul 26, 2024

billhollings commented Jul 26, 2024

cdavis5e commented Jul 27, 2024

Support dynamically allocating descriptors when pool is exhausted. #2291

Support dynamically allocating descriptors when pool is exhausted. #2291

Conversation

billhollings commented Jul 25, 2024 • edited Loading

billhollings commented Jul 25, 2024 • edited Loading

danginsburg commented Jul 25, 2024

billhollings commented Jul 25, 2024

danginsburg commented Jul 25, 2024

cdavis5e commented Jul 26, 2024

billhollings commented Jul 26, 2024

cdavis5e commented Jul 27, 2024

billhollings commented Jul 25, 2024 •

edited

Loading

billhollings commented Jul 25, 2024 •

edited

Loading