diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json
index 85e420b4..fc4896f6 100644
--- a/dev/.documenter-siteinfo.json
+++ b/dev/.documenter-siteinfo.json
@@ -1 +1 @@
-{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-08-23T07:50:36","documenter_version":"1.5.0"}}
\ No newline at end of file
+{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-08-29T13:41:43","documenter_version":"1.5.0"}}
\ No newline at end of file
diff --git a/dev/api/array/index.html b/dev/api/array/index.html
index 3909a455..7687b073 100644
--- a/dev/api/array/index.html
+++ b/dev/api/array/index.html
@@ -14,4 +14,4 @@
 3-element MtlVector{Int64, Metal.PrivateStorage}:
  1
  2
- 3</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/array.jl#L464-L499">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlArray" href="#Metal.MtlArray"><code>Metal.MtlArray</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MtlArray{T,N,S} &lt;: AbstractGPUArray{T,N}</code></pre><p><code>N</code>-dimensional Metal array with storage mode <code>S</code> and elements of type <code>T</code>.</p><p><code>S</code> can be <code>Metal.PrivateStorage</code> (default), <code>Metal.SharedStorage</code>, or <code>Metal.ManagedStorage</code>.</p><p>See the Array Programming section of the Metal.jl docs for more details.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/array.jl#L37-L45">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlVector" href="#Metal.MtlVector"><code>Metal.MtlVector</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MtlVector{T,S} &lt;: AbstractGPUVector{T}</code></pre><p>One-dimensional array with elements of type T for use with Apple Metal-compatible GPUs. Alias for MtlArray{T,1,S}.</p><p>See also <code>Vector</code>(@ref), and the Array Programming section of the Metal.jl docs for more details.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/array.jl#L157-L164">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlMatrix" href="#Metal.MtlMatrix"><code>Metal.MtlMatrix</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MtlMatrix{T,S} &lt;: AbstractGPUMatrix{T}</code></pre><p>Two-dimensional array with elements of type T for use with Apple Metal-compatible GPUs. Alias for MtlArray{T,2,S}.</p><p>See also <code>Matrix</code>(@ref), and the Array Programming section of the Metal.jl docs for more details.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/array.jl#L167-L174">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlVecOrMat" href="#Metal.MtlVecOrMat"><code>Metal.MtlVecOrMat</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MtlVecOrMat{T,S}</code></pre><p>Union type of MtlVector{T,S} and MtlMatrix{T,S} which allows functions to accept either an MtlMatrix or an MtlVector.</p><p>See also <code>VecOrMat</code>(@ref) for examples.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/array.jl#L177-L184">source</a></section></article><h2 id="Storage-modes"><a class="docs-heading-anchor" href="#Storage-modes">Storage modes</a><a id="Storage-modes-1"></a><a class="docs-heading-anchor-permalink" href="#Storage-modes" title="Permalink"></a></h2><p>The Metal API has various storage modes that dictate how a resource can be accessed. <code>MtlArray</code>s are <code>Metal.PrivateStorage</code> by default, but they can also be <code>Metal.SharedStorage</code> or <code>Metal.ManagedStorage</code>. For more information on storage modes, see the official <a href="https://developer.apple.com/documentation/metal/resource_fundamentals/setting_resource_storage_modes">Metal documentation</a>.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MTL.PrivateStorage" href="#Metal.MTL.PrivateStorage"><code>Metal.MTL.PrivateStorage</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">struct Metal.PrivateStorage &lt;: MTL.StorageMode</code></pre><p>Used to indicate that the resource is stored using <code>MTLStorageModePrivate</code> in memory.</p><p>For more information on Metal storage modes, refer to the official Metal documentation.</p><p>See also <a href="#Metal.MTL.SharedStorage"><code>Metal.SharedStorage</code></a> and <a href="#Metal.MTL.ManagedStorage"><code>Metal.ManagedStorage</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mtl/storage_type.jl#L34-L42">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MTL.SharedStorage" href="#Metal.MTL.SharedStorage"><code>Metal.MTL.SharedStorage</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">struct Metal.SharedStorage &lt;: MTL.StorageMode</code></pre><p>Used to indicate that the resource is stored using <code>MTLStorageModeShared</code> in memory.</p><p>For more information on Metal storage modes, refer to the official Metal documentation.</p><p>See also <a href="#Metal.MTL.PrivateStorage"><code>Metal.PrivateStorage</code></a> and <a href="#Metal.MTL.ManagedStorage"><code>Metal.ManagedStorage</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mtl/storage_type.jl#L12-L20">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MTL.ManagedStorage" href="#Metal.MTL.ManagedStorage"><code>Metal.MTL.ManagedStorage</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">struct Metal.ManagedStorage &lt;: MTL.StorageMode</code></pre><p>Used to indicate that the resource is stored using <code>MTLStorageModeManaged</code> in memory.</p><p>For more information on Metal storage modes, refer to the official Metal documentation.</p><p>See also <a href="#Metal.MTL.SharedStorage"><code>Metal.SharedStorage</code></a> and <a href="#Metal.MTL.PrivateStorage"><code>Metal.PrivateStorage</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mtl/storage_type.jl#L23-L31">source</a></section></article><p>There also exist the following convenience functions to check if an MtlArray is using a specific storage mode:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.is_private" href="#Metal.is_private"><code>Metal.is_private</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">is_private(A::MtlArray) -&gt; Bool</code></pre><p>Returns true if <code>A</code> has storage mode <a href="#Metal.MTL.PrivateStorage"><code>Metal.PrivateStorage</code></a>.</p><p>See also <a href="#Metal.is_shared"><code>is_shared</code></a> and <a href="#Metal.is_managed"><code>is_managed</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/array.jl#L145-L151">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.is_shared" href="#Metal.is_shared"><code>Metal.is_shared</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">is_shared(A::MtlArray) -&gt; Bool</code></pre><p>Returns true if <code>A</code> has storage mode <a href="#Metal.MTL.SharedStorage"><code>Metal.SharedStorage</code></a>.</p><p>See also <a href="#Metal.is_private"><code>is_private</code></a> and <a href="#Metal.is_managed"><code>is_managed</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/array.jl#L127-L133">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.is_managed" href="#Metal.is_managed"><code>Metal.is_managed</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">is_managed(A::MtlArray) -&gt; Bool</code></pre><p>Returns true if <code>A</code> has storage mode <a href="#Metal.MTL.ManagedStorage"><code>Metal.ManagedStorage</code></a>.</p><p>See also <a href="#Metal.is_shared"><code>is_shared</code></a> and <a href="#Metal.is_private"><code>is_private</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/array.jl#L136-L142">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../kernel/">« Kernel programming</a><a class="docs-footer-nextpage" href="../mps/">Metal Performance Shaders »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+ 3</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/array.jl#L464-L499">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlArray" href="#Metal.MtlArray"><code>Metal.MtlArray</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MtlArray{T,N,S} &lt;: AbstractGPUArray{T,N}</code></pre><p><code>N</code>-dimensional Metal array with storage mode <code>S</code> and elements of type <code>T</code>.</p><p><code>S</code> can be <code>Metal.PrivateStorage</code> (default), <code>Metal.SharedStorage</code>, or <code>Metal.ManagedStorage</code>.</p><p>See the Array Programming section of the Metal.jl docs for more details.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/array.jl#L37-L45">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlVector" href="#Metal.MtlVector"><code>Metal.MtlVector</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MtlVector{T,S} &lt;: AbstractGPUVector{T}</code></pre><p>One-dimensional array with elements of type T for use with Apple Metal-compatible GPUs. Alias for MtlArray{T,1,S}.</p><p>See also <code>Vector</code>(@ref), and the Array Programming section of the Metal.jl docs for more details.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/array.jl#L157-L164">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlMatrix" href="#Metal.MtlMatrix"><code>Metal.MtlMatrix</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MtlMatrix{T,S} &lt;: AbstractGPUMatrix{T}</code></pre><p>Two-dimensional array with elements of type T for use with Apple Metal-compatible GPUs. Alias for MtlArray{T,2,S}.</p><p>See also <code>Matrix</code>(@ref), and the Array Programming section of the Metal.jl docs for more details.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/array.jl#L167-L174">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlVecOrMat" href="#Metal.MtlVecOrMat"><code>Metal.MtlVecOrMat</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MtlVecOrMat{T,S}</code></pre><p>Union type of MtlVector{T,S} and MtlMatrix{T,S} which allows functions to accept either an MtlMatrix or an MtlVector.</p><p>See also <code>VecOrMat</code>(@ref) for examples.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/array.jl#L177-L184">source</a></section></article><h2 id="Storage-modes"><a class="docs-heading-anchor" href="#Storage-modes">Storage modes</a><a id="Storage-modes-1"></a><a class="docs-heading-anchor-permalink" href="#Storage-modes" title="Permalink"></a></h2><p>The Metal API has various storage modes that dictate how a resource can be accessed. <code>MtlArray</code>s are <code>Metal.PrivateStorage</code> by default, but they can also be <code>Metal.SharedStorage</code> or <code>Metal.ManagedStorage</code>. For more information on storage modes, see the official <a href="https://developer.apple.com/documentation/metal/resource_fundamentals/setting_resource_storage_modes">Metal documentation</a>.</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MTL.PrivateStorage" href="#Metal.MTL.PrivateStorage"><code>Metal.MTL.PrivateStorage</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">struct Metal.PrivateStorage &lt;: MTL.StorageMode</code></pre><p>Used to indicate that the resource is stored using <code>MTLStorageModePrivate</code> in memory.</p><p>For more information on Metal storage modes, refer to the official Metal documentation.</p><p>See also <a href="#Metal.MTL.SharedStorage"><code>Metal.SharedStorage</code></a> and <a href="#Metal.MTL.ManagedStorage"><code>Metal.ManagedStorage</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mtl/storage_type.jl#L34-L42">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MTL.SharedStorage" href="#Metal.MTL.SharedStorage"><code>Metal.MTL.SharedStorage</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">struct Metal.SharedStorage &lt;: MTL.StorageMode</code></pre><p>Used to indicate that the resource is stored using <code>MTLStorageModeShared</code> in memory.</p><p>For more information on Metal storage modes, refer to the official Metal documentation.</p><p>See also <a href="#Metal.MTL.PrivateStorage"><code>Metal.PrivateStorage</code></a> and <a href="#Metal.MTL.ManagedStorage"><code>Metal.ManagedStorage</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mtl/storage_type.jl#L12-L20">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MTL.ManagedStorage" href="#Metal.MTL.ManagedStorage"><code>Metal.MTL.ManagedStorage</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">struct Metal.ManagedStorage &lt;: MTL.StorageMode</code></pre><p>Used to indicate that the resource is stored using <code>MTLStorageModeManaged</code> in memory.</p><p>For more information on Metal storage modes, refer to the official Metal documentation.</p><p>See also <a href="#Metal.MTL.SharedStorage"><code>Metal.SharedStorage</code></a> and <a href="#Metal.MTL.PrivateStorage"><code>Metal.PrivateStorage</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mtl/storage_type.jl#L23-L31">source</a></section></article><p>There also exist the following convenience functions to check if an MtlArray is using a specific storage mode:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.is_private" href="#Metal.is_private"><code>Metal.is_private</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">is_private(A::MtlArray) -&gt; Bool</code></pre><p>Returns true if <code>A</code> has storage mode <a href="#Metal.MTL.PrivateStorage"><code>Metal.PrivateStorage</code></a>.</p><p>See also <a href="#Metal.is_shared"><code>is_shared</code></a> and <a href="#Metal.is_managed"><code>is_managed</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/array.jl#L145-L151">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.is_shared" href="#Metal.is_shared"><code>Metal.is_shared</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">is_shared(A::MtlArray) -&gt; Bool</code></pre><p>Returns true if <code>A</code> has storage mode <a href="#Metal.MTL.SharedStorage"><code>Metal.SharedStorage</code></a>.</p><p>See also <a href="#Metal.is_private"><code>is_private</code></a> and <a href="#Metal.is_managed"><code>is_managed</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/array.jl#L127-L133">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.is_managed" href="#Metal.is_managed"><code>Metal.is_managed</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">is_managed(A::MtlArray) -&gt; Bool</code></pre><p>Returns true if <code>A</code> has storage mode <a href="#Metal.MTL.ManagedStorage"><code>Metal.ManagedStorage</code></a>.</p><p>See also <a href="#Metal.is_shared"><code>is_shared</code></a> and <a href="#Metal.is_private"><code>is_private</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/array.jl#L136-L142">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../kernel/">« Kernel programming</a><a class="docs-footer-nextpage" href="../mps/">Metal Performance Shaders »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/api/compiler/index.html b/dev/api/compiler/index.html
index ba67a885..ec7c5244 100644
--- a/dev/api/compiler/index.html
+++ b/dev/api/compiler/index.html
@@ -1,8 +1,8 @@
 <!DOCTYPE html>
-<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Compiler · Metal.jl</title><meta name="title" content="Compiler · Metal.jl"/><meta property="og:title" content="Compiler · Metal.jl"/><meta property="twitter:title" content="Compiler · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/api/compiler/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/api/compiler/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/api/compiler/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../../usage/overview/">Overview</a></li><li><a class="tocitem" href="../../usage/array/">Array programming</a></li><li><a class="tocitem" href="../../usage/kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li><a class="tocitem" href="../essentials/">Essentials</a></li><li class="is-active"><a class="tocitem" href>Compiler</a><ul class="internal"><li><a class="tocitem" href="#Execution"><span>Execution</span></a></li><li><a class="tocitem" href="#Reflection"><span>Reflection</span></a></li></ul></li><li><a class="tocitem" href="../kernel/">Kernel programming</a></li><li><a class="tocitem" href="../array/">Array programming</a></li><li><a class="tocitem" href="../mps/">Metal Performance Shaders</a></li></ul></li><li><span class="tocitem">FAQ</span><ul><li><a class="tocitem" href="../../faq/faq/">Frequently Asked Questions</a></li><li><a class="tocitem" href="../../faq/contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">API reference</a></li><li class="is-active"><a href>Compiler</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Compiler</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/api/compiler.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Compiler"><a class="docs-heading-anchor" href="#Compiler">Compiler</a><a id="Compiler-1"></a><a class="docs-heading-anchor-permalink" href="#Compiler" title="Permalink"></a></h1><h2 id="Execution"><a class="docs-heading-anchor" href="#Execution">Execution</a><a id="Execution-1"></a><a class="docs-heading-anchor-permalink" href="#Execution" title="Permalink"></a></h2><p>The main entry-point to the compiler is the <code>@metal</code> macro:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.@metal" href="#Metal.@metal"><code>Metal.@metal</code></a> — <span class="docstring-category">Macro</span></header><section><div><pre><code class="language-julia hljs">@metal threads=... groups=... [kwargs...] func(args...)</code></pre><p>High-level interface for executing code on a GPU.</p><p>The <code>@metal</code> macro should prefix a call, with <code>func</code> a callable function or object that should return nothing. It will be compiled to a Metal function upon first use, and to a certain extent arguments will be converted and managed automatically using <code>mtlconvert</code>. Finally, a call to <code>mtlcall</code> is performed, creating a command buffer in the current global command queue then committing it.</p><p>There is one supported keyword argument that influences the behavior of <code>@metal</code>:</p><ul><li><code>launch</code>: whether to launch this kernel, defaults to <code>true</code>. If <code>false</code> the returned kernel object should be launched by calling it and passing arguments again.</li><li><code>name</code>: the name of the kernel in the generated code. Defaults to an automatically- generated name.</li><li><code>queue</code>: the command queue to use for this kernel. Defaults to the global command queue.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/compiler/execution.jl#L10-L28">source</a></section></article><p>If needed, you can use a lower-level API that lets you inspect the compiler kernel:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.mtlconvert" href="#Metal.mtlconvert"><code>Metal.mtlconvert</code></a> — <span class="docstring-category">Function</span></header><section><div><p>mtlconvert(x, [cce])</p><p>This function is called for every argument to be passed to a kernel, allowing it to be converted to a GPU-friendly format. By default, the function does nothing and returns the input object <code>x</code> as-is.</p><p>Do not add methods to this function, but instead extend the underlying Adapt.jl package and register methods for the the <code>Metal.Adaptor</code> type.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/compiler/execution.jl#L146-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.mtlfunction" href="#Metal.mtlfunction"><code>Metal.mtlfunction</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">mtlfunction(f, tt=Tuple{}; kwargs...)</code></pre><p>Low-level interface to compile a function invocation for the currently-active GPU, returning a callable kernel object. For a higher-level interface, use <a href="#Metal.@metal"><code>@metal</code></a>.</p><p>The output of this function is automatically cached, i.e. you can simply call <code>mtlfunction</code> in a hot path without degrading performance. New code will be generated automatically when the function changes, or when different types or keyword arguments are provided.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/compiler/execution.jl#L168-L177">source</a></section></article><h2 id="Reflection"><a class="docs-heading-anchor" href="#Reflection">Reflection</a><a id="Reflection-1"></a><a class="docs-heading-anchor-permalink" href="#Reflection" title="Permalink"></a></h2><p>If you want to inspect generated code, you can use macros that resemble functionality from the InteractiveUtils standard library:</p><pre><code class="nohighlight hljs">@device_code_lowered
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Compiler · Metal.jl</title><meta name="title" content="Compiler · Metal.jl"/><meta property="og:title" content="Compiler · Metal.jl"/><meta property="twitter:title" content="Compiler · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/api/compiler/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/api/compiler/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/api/compiler/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../../usage/overview/">Overview</a></li><li><a class="tocitem" href="../../usage/array/">Array programming</a></li><li><a class="tocitem" href="../../usage/kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li><a class="tocitem" href="../essentials/">Essentials</a></li><li class="is-active"><a class="tocitem" href>Compiler</a><ul class="internal"><li><a class="tocitem" href="#Execution"><span>Execution</span></a></li><li><a class="tocitem" href="#Reflection"><span>Reflection</span></a></li></ul></li><li><a class="tocitem" href="../kernel/">Kernel programming</a></li><li><a class="tocitem" href="../array/">Array programming</a></li><li><a class="tocitem" href="../mps/">Metal Performance Shaders</a></li></ul></li><li><span class="tocitem">FAQ</span><ul><li><a class="tocitem" href="../../faq/faq/">Frequently Asked Questions</a></li><li><a class="tocitem" href="../../faq/contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">API reference</a></li><li class="is-active"><a href>Compiler</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Compiler</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/api/compiler.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Compiler"><a class="docs-heading-anchor" href="#Compiler">Compiler</a><a id="Compiler-1"></a><a class="docs-heading-anchor-permalink" href="#Compiler" title="Permalink"></a></h1><h2 id="Execution"><a class="docs-heading-anchor" href="#Execution">Execution</a><a id="Execution-1"></a><a class="docs-heading-anchor-permalink" href="#Execution" title="Permalink"></a></h2><p>The main entry-point to the compiler is the <code>@metal</code> macro:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.@metal" href="#Metal.@metal"><code>Metal.@metal</code></a> — <span class="docstring-category">Macro</span></header><section><div><pre><code class="language-julia hljs">@metal threads=... groups=... [kwargs...] func(args...)</code></pre><p>High-level interface for executing code on a GPU.</p><p>The <code>@metal</code> macro should prefix a call, with <code>func</code> a callable function or object that should return nothing. It will be compiled to a Metal function upon first use, and to a certain extent arguments will be converted and managed automatically using <code>mtlconvert</code>. Finally, a call to <code>mtlcall</code> is performed, creating a command buffer in the current global command queue then committing it.</p><p>There is one supported keyword argument that influences the behavior of <code>@metal</code>:</p><ul><li><code>launch</code>: whether to launch this kernel, defaults to <code>true</code>. If <code>false</code> the returned kernel object should be launched by calling it and passing arguments again.</li><li><code>name</code>: the name of the kernel in the generated code. Defaults to an automatically- generated name.</li><li><code>queue</code>: the command queue to use for this kernel. Defaults to the global command queue.</li></ul></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/compiler/execution.jl#L10-L28">source</a></section></article><p>If needed, you can use a lower-level API that lets you inspect the compiler kernel:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.mtlconvert" href="#Metal.mtlconvert"><code>Metal.mtlconvert</code></a> — <span class="docstring-category">Function</span></header><section><div><p>mtlconvert(x, [cce])</p><p>This function is called for every argument to be passed to a kernel, allowing it to be converted to a GPU-friendly format. By default, the function does nothing and returns the input object <code>x</code> as-is.</p><p>Do not add methods to this function, but instead extend the underlying Adapt.jl package and register methods for the the <code>Metal.Adaptor</code> type.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/compiler/execution.jl#L146-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.mtlfunction" href="#Metal.mtlfunction"><code>Metal.mtlfunction</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">mtlfunction(f, tt=Tuple{}; kwargs...)</code></pre><p>Low-level interface to compile a function invocation for the currently-active GPU, returning a callable kernel object. For a higher-level interface, use <a href="#Metal.@metal"><code>@metal</code></a>.</p><p>The output of this function is automatically cached, i.e. you can simply call <code>mtlfunction</code> in a hot path without degrading performance. New code will be generated automatically when the function changes, or when different types or keyword arguments are provided.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/compiler/execution.jl#L168-L177">source</a></section></article><h2 id="Reflection"><a class="docs-heading-anchor" href="#Reflection">Reflection</a><a id="Reflection-1"></a><a class="docs-heading-anchor-permalink" href="#Reflection" title="Permalink"></a></h2><p>If you want to inspect generated code, you can use macros that resemble functionality from the InteractiveUtils standard library:</p><pre><code class="nohighlight hljs">@device_code_lowered
 @device_code_typed
 @device_code_warntype
 @device_code_llvm
 @device_code_native
 @device_code_agx
-@device_code</code></pre><p>For more information, please consult the GPUCompiler.jl documentation. <code>code_agx</code> is actually <code>code_native</code>:</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../essentials/">« Essentials</a><a class="docs-footer-nextpage" href="../kernel/">Kernel programming »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+@device_code</code></pre><p>For more information, please consult the GPUCompiler.jl documentation. <code>code_agx</code> is actually <code>code_native</code>:</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../essentials/">« Essentials</a><a class="docs-footer-nextpage" href="../kernel/">Kernel programming »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/api/essentials/index.html b/dev/api/essentials/index.html
index cbf1a8cd..31424d5f 100644
--- a/dev/api/essentials/index.html
+++ b/dev/api/essentials/index.html
@@ -1,2 +1,2 @@
 <!DOCTYPE html>
-<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Essentials · Metal.jl</title><meta name="title" content="Essentials · Metal.jl"/><meta property="og:title" content="Essentials · Metal.jl"/><meta property="twitter:title" content="Essentials · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/api/essentials/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/api/essentials/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/api/essentials/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../../usage/overview/">Overview</a></li><li><a class="tocitem" href="../../usage/array/">Array programming</a></li><li><a class="tocitem" href="../../usage/kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li class="is-active"><a class="tocitem" href>Essentials</a><ul class="internal"><li><a class="tocitem" href="#Versions-and-Support"><span>Versions and Support</span></a></li><li><a class="tocitem" href="#Global-State"><span>Global State</span></a></li></ul></li><li><a class="tocitem" href="../compiler/">Compiler</a></li><li><a class="tocitem" href="../kernel/">Kernel programming</a></li><li><a class="tocitem" href="../array/">Array programming</a></li><li><a class="tocitem" href="../mps/">Metal Performance Shaders</a></li></ul></li><li><span class="tocitem">FAQ</span><ul><li><a class="tocitem" href="../../faq/faq/">Frequently Asked Questions</a></li><li><a class="tocitem" href="../../faq/contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">API reference</a></li><li class="is-active"><a href>Essentials</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Essentials</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/api/essentials.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Essentials"><a class="docs-heading-anchor" href="#Essentials">Essentials</a><a id="Essentials-1"></a><a class="docs-heading-anchor-permalink" href="#Essentials" title="Permalink"></a></h1><h2 id="Versions-and-Support"><a class="docs-heading-anchor" href="#Versions-and-Support">Versions and Support</a><a id="Versions-and-Support-1"></a><a class="docs-heading-anchor-permalink" href="#Versions-and-Support" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.macos_version" href="#Metal.macos_version"><code>Metal.macos_version</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Metal.macos_version() -&gt; VersionNumber</code></pre><p>Returns the host macOS version.</p><p>See also <a href="#Metal.darwin_version"><code>Metal.darwin_version</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/version.jl#L34-L40">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.darwin_version" href="#Metal.darwin_version"><code>Metal.darwin_version</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Metal.darwin_version() -&gt; VersionNumber</code></pre><p>Returns the host Darwin kernel version.</p><p>See also <a href="#Metal.macos_version"><code>Metal.macos_version</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/version.jl#L19-L25">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.metal_support" href="#Metal.metal_support"><code>Metal.metal_support</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Metal.metal_support() -&gt; VersionNumber</code></pre><p>Returns the highest supported version for the Metal Shading Language.</p><p>See also <a href="#Metal.metallib_support"><code>Metal.metallib_support</code></a> and <a href="#Metal.air_support"><code>Metal.air_support</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/version.jl#L119-L125">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.metallib_support" href="#Metal.metallib_support"><code>Metal.metallib_support</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Metal.metallib_support() -&gt; VersionNumber</code></pre><p>Returns the highest supported version for the metallib file format.</p><p>See also <a href="#Metal.air_support"><code>Metal.air_support</code></a> and <a href="#Metal.metal_support"><code>Metal.metal_support</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/version.jl#L61-L67">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.air_support" href="#Metal.air_support"><code>Metal.air_support</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Metal.air_support() -&gt; VersionNumber</code></pre><p>Returns the highest supported version for the embedded AIR bitcode format.</p><p>See also <a href="#Metal.metallib_support"><code>Metal.metallib_support</code></a> and <a href="#Metal.metal_support"><code>Metal.metal_support</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/version.jl#L89-L95">source</a></section></article><h2 id="Global-State"><a class="docs-heading-anchor" href="#Global-State">Global State</a><a id="Global-State-1"></a><a class="docs-heading-anchor-permalink" href="#Global-State" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.device!" href="#Metal.device!"><code>Metal.device!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">device!(dev::MTLDevice)</code></pre><p>Sets the Metal GPU device associated with the current Julia task.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/state.jl#L31-L35">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MTL.devices" href="#Metal.MTL.devices"><code>Metal.MTL.devices</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">devices()</code></pre><p>Get an iterator for the compute devices.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mtl/device.jl#L72-L76">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.device" href="#Metal.device"><code>Metal.device</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">device()::MTLDevice</code></pre><p>Return the Metal GPU device associated with the current Julia task.</p><p>Since all M-series systems currently only externally show a single GPU, this function effectively returns the only system GPU.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/state.jl#L8-L15">source</a></section><section><div><pre><code class="language-julia hljs">device(&lt;:MtlArray)</code></pre><p>Get the Metal device for an MtlArray.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/array.jl#L117-L121">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.global_queue" href="#Metal.global_queue"><code>Metal.global_queue</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">global_queue(dev::MTLDevice)::MTLCommandQueue</code></pre><p>Return the Metal command queue associated with the current Julia thread.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/state.jl#L40-L44">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.synchronize" href="#Metal.synchronize"><code>Metal.synchronize</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">synchronize(queue)</code></pre><p>Wait for currently committed GPU work on this queue to finish.</p><p>Create a new MTLCommandBuffer from the global command queue, commit it to the queue, and simply wait for it to be completed. Since command buffers <em>should</em> execute in a First-In-First-Out manner, this synchronizes the GPU.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/state.jl#L59-L67">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.device_synchronize" href="#Metal.device_synchronize"><code>Metal.device_synchronize</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">device_synchronize()</code></pre><p>Synchronize all committed GPU work across all global queues</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/state.jl#L74-L78">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../profiling/">« Profiling</a><a class="docs-footer-nextpage" href="../compiler/">Compiler »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Essentials · Metal.jl</title><meta name="title" content="Essentials · Metal.jl"/><meta property="og:title" content="Essentials · Metal.jl"/><meta property="twitter:title" content="Essentials · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/api/essentials/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/api/essentials/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/api/essentials/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../../usage/overview/">Overview</a></li><li><a class="tocitem" href="../../usage/array/">Array programming</a></li><li><a class="tocitem" href="../../usage/kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li class="is-active"><a class="tocitem" href>Essentials</a><ul class="internal"><li><a class="tocitem" href="#Versions-and-Support"><span>Versions and Support</span></a></li><li><a class="tocitem" href="#Global-State"><span>Global State</span></a></li></ul></li><li><a class="tocitem" href="../compiler/">Compiler</a></li><li><a class="tocitem" href="../kernel/">Kernel programming</a></li><li><a class="tocitem" href="../array/">Array programming</a></li><li><a class="tocitem" href="../mps/">Metal Performance Shaders</a></li></ul></li><li><span class="tocitem">FAQ</span><ul><li><a class="tocitem" href="../../faq/faq/">Frequently Asked Questions</a></li><li><a class="tocitem" href="../../faq/contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">API reference</a></li><li class="is-active"><a href>Essentials</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Essentials</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/api/essentials.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Essentials"><a class="docs-heading-anchor" href="#Essentials">Essentials</a><a id="Essentials-1"></a><a class="docs-heading-anchor-permalink" href="#Essentials" title="Permalink"></a></h1><h2 id="Versions-and-Support"><a class="docs-heading-anchor" href="#Versions-and-Support">Versions and Support</a><a id="Versions-and-Support-1"></a><a class="docs-heading-anchor-permalink" href="#Versions-and-Support" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.macos_version" href="#Metal.macos_version"><code>Metal.macos_version</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Metal.macos_version() -&gt; VersionNumber</code></pre><p>Returns the host macOS version.</p><p>See also <a href="#Metal.darwin_version"><code>Metal.darwin_version</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/version.jl#L34-L40">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.darwin_version" href="#Metal.darwin_version"><code>Metal.darwin_version</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Metal.darwin_version() -&gt; VersionNumber</code></pre><p>Returns the host Darwin kernel version.</p><p>See also <a href="#Metal.macos_version"><code>Metal.macos_version</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/version.jl#L19-L25">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.metal_support" href="#Metal.metal_support"><code>Metal.metal_support</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Metal.metal_support() -&gt; VersionNumber</code></pre><p>Returns the highest supported version for the Metal Shading Language.</p><p>See also <a href="#Metal.metallib_support"><code>Metal.metallib_support</code></a> and <a href="#Metal.air_support"><code>Metal.air_support</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/version.jl#L119-L125">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.metallib_support" href="#Metal.metallib_support"><code>Metal.metallib_support</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Metal.metallib_support() -&gt; VersionNumber</code></pre><p>Returns the highest supported version for the metallib file format.</p><p>See also <a href="#Metal.air_support"><code>Metal.air_support</code></a> and <a href="#Metal.metal_support"><code>Metal.metal_support</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/version.jl#L61-L67">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.air_support" href="#Metal.air_support"><code>Metal.air_support</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">Metal.air_support() -&gt; VersionNumber</code></pre><p>Returns the highest supported version for the embedded AIR bitcode format.</p><p>See also <a href="#Metal.metallib_support"><code>Metal.metallib_support</code></a> and <a href="#Metal.metal_support"><code>Metal.metal_support</code></a>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/version.jl#L89-L95">source</a></section></article><h2 id="Global-State"><a class="docs-heading-anchor" href="#Global-State">Global State</a><a id="Global-State-1"></a><a class="docs-heading-anchor-permalink" href="#Global-State" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.device!" href="#Metal.device!"><code>Metal.device!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">device!(dev::MTLDevice)</code></pre><p>Sets the Metal GPU device associated with the current Julia task.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/state.jl#L31-L35">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MTL.devices" href="#Metal.MTL.devices"><code>Metal.MTL.devices</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">devices()</code></pre><p>Get an iterator for the compute devices.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mtl/device.jl#L72-L76">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.device" href="#Metal.device"><code>Metal.device</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">device()::MTLDevice</code></pre><p>Return the Metal GPU device associated with the current Julia task.</p><p>Since all M-series systems currently only externally show a single GPU, this function effectively returns the only system GPU.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/state.jl#L8-L15">source</a></section><section><div><pre><code class="language-julia hljs">device(&lt;:MtlArray)</code></pre><p>Get the Metal device for an MtlArray.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/array.jl#L117-L121">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.global_queue" href="#Metal.global_queue"><code>Metal.global_queue</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">global_queue(dev::MTLDevice)::MTLCommandQueue</code></pre><p>Return the Metal command queue associated with the current Julia thread.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/state.jl#L40-L44">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.synchronize" href="#Metal.synchronize"><code>Metal.synchronize</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">synchronize(queue)</code></pre><p>Wait for currently committed GPU work on this queue to finish.</p><p>Create a new MTLCommandBuffer from the global command queue, commit it to the queue, and simply wait for it to be completed. Since command buffers <em>should</em> execute in a First-In-First-Out manner, this synchronizes the GPU.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/state.jl#L59-L67">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.device_synchronize" href="#Metal.device_synchronize"><code>Metal.device_synchronize</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">device_synchronize()</code></pre><p>Synchronize all committed GPU work across all global queues</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/state.jl#L74-L78">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../profiling/">« Profiling</a><a class="docs-footer-nextpage" href="../compiler/">Compiler »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/api/kernel/index.html b/dev/api/kernel/index.html
index ed522e67..b16d5527 100644
--- a/dev/api/kernel/index.html
+++ b/dev/api/kernel/index.html
@@ -1,24 +1,24 @@
 <!DOCTYPE html>
-<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Kernel programming · Metal.jl</title><meta name="title" content="Kernel programming · Metal.jl"/><meta property="og:title" content="Kernel programming · Metal.jl"/><meta property="twitter:title" content="Kernel programming · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/api/kernel/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/api/kernel/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/api/kernel/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../../usage/overview/">Overview</a></li><li><a class="tocitem" href="../../usage/array/">Array programming</a></li><li><a class="tocitem" href="../../usage/kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li><a class="tocitem" href="../essentials/">Essentials</a></li><li><a class="tocitem" href="../compiler/">Compiler</a></li><li class="is-active"><a class="tocitem" href>Kernel programming</a><ul class="internal"><li><a class="tocitem" href="#Indexing-and-dimensions"><span>Indexing and dimensions</span></a></li><li><a class="tocitem" href="#Device-arrays"><span>Device arrays</span></a></li><li><a class="tocitem" href="#Synchronization"><span>Synchronization</span></a></li></ul></li><li><a class="tocitem" href="../array/">Array programming</a></li><li><a class="tocitem" href="../mps/">Metal Performance Shaders</a></li></ul></li><li><span class="tocitem">FAQ</span><ul><li><a class="tocitem" href="../../faq/faq/">Frequently Asked Questions</a></li><li><a class="tocitem" href="../../faq/contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">API reference</a></li><li class="is-active"><a href>Kernel programming</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Kernel programming</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/api/kernel.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Kernel-programming"><a class="docs-heading-anchor" href="#Kernel-programming">Kernel programming</a><a id="Kernel-programming-1"></a><a class="docs-heading-anchor-permalink" href="#Kernel-programming" title="Permalink"></a></h1><p>This section lists the package&#39;s public functionality that corresponds to special Metal functions for use in device code. For more information about these functions, please consult the <a href="https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf">Metal Shading Language specification</a>.</p><p>This is made possible by interfacing with the Metal libraries by wrapping a subset of the ObjectiveC APIs using <a href="https://github.com/JuliaInterop/ObjectiveC.jl">ObjectiveC.jl</a>. These low-level wrappers are available in the MTL submodule exported by Metal.jl.</p><h2 id="Indexing-and-dimensions"><a class="docs-heading-anchor" href="#Indexing-and-dimensions">Indexing and dimensions</a><a id="Indexing-and-dimensions-1"></a><a class="docs-heading-anchor-permalink" href="#Indexing-and-dimensions" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_execution_width" href="#Metal.thread_execution_width"><code>Metal.thread_execution_width</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_execution_width()::UInt32</code></pre><p>Return the execution width of the compute unit.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L127-L131">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_index_in_quadgroup" href="#Metal.thread_index_in_quadgroup"><code>Metal.thread_index_in_quadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_index_in_quadgroup()::UInt32</code></pre><p>Return the index of the current thread in its quadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L109-L113">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_index_in_simdgroup" href="#Metal.thread_index_in_simdgroup"><code>Metal.thread_index_in_simdgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_index_in_simdgroup()::UInt32</code></pre><p>Return the index of the current thread in its simdgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L115-L119">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_index_in_threadgroup" href="#Metal.thread_index_in_threadgroup"><code>Metal.thread_index_in_threadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_index_in_threadgroup()::UInt32</code></pre><p>Return the index of the current thread in its threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L121-L125">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_position_in_grid_1d" href="#Metal.thread_position_in_grid_1d"><code>Metal.thread_position_in_grid_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_position_in_grid_1d()::UInt32
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Kernel programming · Metal.jl</title><meta name="title" content="Kernel programming · Metal.jl"/><meta property="og:title" content="Kernel programming · Metal.jl"/><meta property="twitter:title" content="Kernel programming · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/api/kernel/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/api/kernel/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/api/kernel/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../../usage/overview/">Overview</a></li><li><a class="tocitem" href="../../usage/array/">Array programming</a></li><li><a class="tocitem" href="../../usage/kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li><a class="tocitem" href="../essentials/">Essentials</a></li><li><a class="tocitem" href="../compiler/">Compiler</a></li><li class="is-active"><a class="tocitem" href>Kernel programming</a><ul class="internal"><li><a class="tocitem" href="#Indexing-and-dimensions"><span>Indexing and dimensions</span></a></li><li><a class="tocitem" href="#Device-arrays"><span>Device arrays</span></a></li><li><a class="tocitem" href="#Synchronization"><span>Synchronization</span></a></li></ul></li><li><a class="tocitem" href="../array/">Array programming</a></li><li><a class="tocitem" href="../mps/">Metal Performance Shaders</a></li></ul></li><li><span class="tocitem">FAQ</span><ul><li><a class="tocitem" href="../../faq/faq/">Frequently Asked Questions</a></li><li><a class="tocitem" href="../../faq/contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">API reference</a></li><li class="is-active"><a href>Kernel programming</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Kernel programming</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/api/kernel.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Kernel-programming"><a class="docs-heading-anchor" href="#Kernel-programming">Kernel programming</a><a id="Kernel-programming-1"></a><a class="docs-heading-anchor-permalink" href="#Kernel-programming" title="Permalink"></a></h1><p>This section lists the package&#39;s public functionality that corresponds to special Metal functions for use in device code. For more information about these functions, please consult the <a href="https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf">Metal Shading Language specification</a>.</p><p>This is made possible by interfacing with the Metal libraries by wrapping a subset of the ObjectiveC APIs using <a href="https://github.com/JuliaInterop/ObjectiveC.jl">ObjectiveC.jl</a>. These low-level wrappers are available in the MTL submodule exported by Metal.jl.</p><h2 id="Indexing-and-dimensions"><a class="docs-heading-anchor" href="#Indexing-and-dimensions">Indexing and dimensions</a><a id="Indexing-and-dimensions-1"></a><a class="docs-heading-anchor-permalink" href="#Indexing-and-dimensions" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_execution_width" href="#Metal.thread_execution_width"><code>Metal.thread_execution_width</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_execution_width()::UInt32</code></pre><p>Return the execution width of the compute unit.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L127-L131">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_index_in_quadgroup" href="#Metal.thread_index_in_quadgroup"><code>Metal.thread_index_in_quadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_index_in_quadgroup()::UInt32</code></pre><p>Return the index of the current thread in its quadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L109-L113">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_index_in_simdgroup" href="#Metal.thread_index_in_simdgroup"><code>Metal.thread_index_in_simdgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_index_in_simdgroup()::UInt32</code></pre><p>Return the index of the current thread in its simdgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L115-L119">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_index_in_threadgroup" href="#Metal.thread_index_in_threadgroup"><code>Metal.thread_index_in_threadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_index_in_threadgroup()::UInt32</code></pre><p>Return the index of the current thread in its threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L121-L125">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_position_in_grid_1d" href="#Metal.thread_position_in_grid_1d"><code>Metal.thread_position_in_grid_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_position_in_grid_1d()::UInt32
 thread_position_in_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
-thread_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the current thread&#39;s position in an N-dimensional grid of threads.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_position_in_threadgroup_1d" href="#Metal.thread_position_in_threadgroup_1d"><code>Metal.thread_position_in_threadgroup_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_position_in_threadgroup_1d()::UInt32
+thread_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the current thread&#39;s position in an N-dimensional grid of threads.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.thread_position_in_threadgroup_1d" href="#Metal.thread_position_in_threadgroup_1d"><code>Metal.thread_position_in_threadgroup_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">thread_position_in_threadgroup_1d()::UInt32
 thread_position_in_threadgroup_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
-thread_position_in_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the current thread&#39;s unique position within a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threadgroup_position_in_grid_1d" href="#Metal.threadgroup_position_in_grid_1d"><code>Metal.threadgroup_position_in_grid_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threadgroup_position_in_grid_1d()::UInt32
+thread_position_in_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the current thread&#39;s unique position within a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threadgroup_position_in_grid_1d" href="#Metal.threadgroup_position_in_grid_1d"><code>Metal.threadgroup_position_in_grid_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threadgroup_position_in_grid_1d()::UInt32
 threadgroup_position_in_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
-threadgroup_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the current threadgroup&#39;s unique position within the grid.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threadgroups_per_grid_1d" href="#Metal.threadgroups_per_grid_1d"><code>Metal.threadgroups_per_grid_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threadgroups_per_grid_1d()::UInt32
+threadgroup_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the current threadgroup&#39;s unique position within the grid.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threadgroups_per_grid_1d" href="#Metal.threadgroups_per_grid_1d"><code>Metal.threadgroups_per_grid_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threadgroups_per_grid_1d()::UInt32
 threadgroups_per_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
-threadgroups_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the number of threadgroups per grid.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threads_per_grid_1d" href="#Metal.threads_per_grid_1d"><code>Metal.threads_per_grid_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threads_per_grid_1d()::UInt32
+threadgroups_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the number of threadgroups per grid.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threads_per_grid_1d" href="#Metal.threads_per_grid_1d"><code>Metal.threads_per_grid_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threads_per_grid_1d()::UInt32
 threads_per_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
-threads_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the grid size.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threads_per_simdgroup" href="#Metal.threads_per_simdgroup"><code>Metal.threads_per_simdgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threads_per_simdgroup()::UInt32</code></pre><p>Return the thread execution width of a simdgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L133-L137">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threads_per_threadgroup_1d" href="#Metal.threads_per_threadgroup_1d"><code>Metal.threads_per_threadgroup_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threads_per_threadgroup_1d()::UInt32
+threads_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the grid size.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threads_per_simdgroup" href="#Metal.threads_per_simdgroup"><code>Metal.threads_per_simdgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threads_per_simdgroup()::UInt32</code></pre><p>Return the thread execution width of a simdgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L133-L137">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threads_per_threadgroup_1d" href="#Metal.threads_per_threadgroup_1d"><code>Metal.threads_per_threadgroup_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threads_per_threadgroup_1d()::UInt32
 threads_per_threadgroup_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
-threads_per_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the thread execution width of a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.simdgroups_per_threadgroup" href="#Metal.simdgroups_per_threadgroup"><code>Metal.simdgroups_per_threadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">simdgroups_per_threadgroup()::UInt32</code></pre><p>Return the simdgroup execution width of a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L103-L107">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.simdgroup_index_in_threadgroup" href="#Metal.simdgroup_index_in_threadgroup"><code>Metal.simdgroup_index_in_threadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">simdgroup_index_in_threadgroup()::UInt32</code></pre><p>Return the index of a simdgroup within a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L97-L101">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.quadgroup_index_in_threadgroup" href="#Metal.quadgroup_index_in_threadgroup"><code>Metal.quadgroup_index_in_threadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">quadgroup_index_in_threadgroup()::UInt32</code></pre><p>Return the index of a quadgroup within a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L85-L89">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.quadgroups_per_threadgroup" href="#Metal.quadgroups_per_threadgroup"><code>Metal.quadgroups_per_threadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">quadgroups_per_threadgroup()::UInt32</code></pre><p>Return the quadgroup execution width of a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L91-L95">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.grid_size_1d" href="#Metal.grid_size_1d"><code>Metal.grid_size_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">grid_size_1d()::UInt32
+threads_per_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the thread execution width of a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.simdgroups_per_threadgroup" href="#Metal.simdgroups_per_threadgroup"><code>Metal.simdgroups_per_threadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">simdgroups_per_threadgroup()::UInt32</code></pre><p>Return the simdgroup execution width of a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L103-L107">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.simdgroup_index_in_threadgroup" href="#Metal.simdgroup_index_in_threadgroup"><code>Metal.simdgroup_index_in_threadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">simdgroup_index_in_threadgroup()::UInt32</code></pre><p>Return the index of a simdgroup within a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L97-L101">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.quadgroup_index_in_threadgroup" href="#Metal.quadgroup_index_in_threadgroup"><code>Metal.quadgroup_index_in_threadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">quadgroup_index_in_threadgroup()::UInt32</code></pre><p>Return the index of a quadgroup within a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L85-L89">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.quadgroups_per_threadgroup" href="#Metal.quadgroups_per_threadgroup"><code>Metal.quadgroups_per_threadgroup</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">quadgroups_per_threadgroup()::UInt32</code></pre><p>Return the quadgroup execution width of a threadgroup.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L91-L95">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.grid_size_1d" href="#Metal.grid_size_1d"><code>Metal.grid_size_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">grid_size_1d()::UInt32
 grid_size_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
-grid_size_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return maximum size of the grid for threads that read per-thread stage-in data.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.grid_origin_1d" href="#Metal.grid_origin_1d"><code>Metal.grid_origin_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">grid_origin_1d()::UInt32
+grid_size_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return maximum size of the grid for threads that read per-thread stage-in data.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.grid_origin_1d" href="#Metal.grid_origin_1d"><code>Metal.grid_origin_1d</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">grid_origin_1d()::UInt32
 grid_origin_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
-grid_origin_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the origin offset of the grid for threads that read per-thread stage-in data.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><h2 id="Device-arrays"><a class="docs-heading-anchor" href="#Device-arrays">Device arrays</a><a id="Device-arrays-1"></a><a class="docs-heading-anchor-permalink" href="#Device-arrays" title="Permalink"></a></h2><p>Metal.jl provides a primitive, lightweight array type to manage GPU data organized in an plain, dense fashion. This is the device-counterpart to the <code>MtlArray</code>, and implements (part of) the array interface as well as other functionality for use <em>on</em> the GPU:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlDeviceArray" href="#Metal.MtlDeviceArray"><code>Metal.MtlDeviceArray</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MtlDeviceArray(dims, ptr)
+grid_origin_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}</code></pre><p>Return the origin offset of the grid for threads that read per-thread stage-in data.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/arguments.jl#L149-L155">source</a></section></article><h2 id="Device-arrays"><a class="docs-heading-anchor" href="#Device-arrays">Device arrays</a><a id="Device-arrays-1"></a><a class="docs-heading-anchor-permalink" href="#Device-arrays" title="Permalink"></a></h2><p>Metal.jl provides a primitive, lightweight array type to manage GPU data organized in an plain, dense fashion. This is the device-counterpart to the <code>MtlArray</code>, and implements (part of) the array interface as well as other functionality for use <em>on</em> the GPU:</p><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlDeviceArray" href="#Metal.MtlDeviceArray"><code>Metal.MtlDeviceArray</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MtlDeviceArray(dims, ptr)
 MtlDeviceArray{T}(dims, ptr)
 MtlDeviceArray{T,A}(dims, ptr)
-MtlDeviceArray{T,A,N}(dims, ptr)</code></pre><p>Construct an <code>N</code>-dimensional dense Metal device array with element type <code>T</code> wrapping a pointer, where <code>N</code> is determined from the length of <code>dims</code> and <code>T</code> is determined from the type of <code>ptr</code>.</p><p><code>dims</code> may be a single scalar, or a tuple of integers corresponding to the lengths in each dimension). If the rank <code>N</code> is supplied explicitly as in <code>Array{T,N}(dims)</code>, then it must match the length of <code>dims</code>. The same applies to the element type <code>T</code>, which should match the type of the pointer <code>ptr</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/array.jl#L8-L22">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.Const" href="#Metal.Const"><code>Metal.Const</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Const(A::MtlDeviceArray)</code></pre><p>Mark a MtlDeviceArray as constant/read-only and to use the constant address space.</p><div class="admonition is-warning"><header class="admonition-header">Warning</header><div class="admonition-body"><p>Experimental API. Subject to change without deprecation.</p></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/array.jl#L124-L130">source</a></section></article><h3 id="Shared-memory"><a class="docs-heading-anchor" href="#Shared-memory">Shared memory</a><a id="Shared-memory-1"></a><a class="docs-heading-anchor-permalink" href="#Shared-memory" title="Permalink"></a></h3><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlThreadGroupArray" href="#Metal.MtlThreadGroupArray"><code>Metal.MtlThreadGroupArray</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">MtlThreadGroupArray(::Type{T}, dims)</code></pre><p>Create an array local to each threadgroup launched during kernel execution.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/memory.jl#L3-L7">source</a></section></article><h2 id="Synchronization"><a class="docs-heading-anchor" href="#Synchronization">Synchronization</a><a id="Synchronization-1"></a><a class="docs-heading-anchor-permalink" href="#Synchronization" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MemoryFlags" href="#Metal.MemoryFlags"><code>Metal.MemoryFlags</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MemoryFlags</code></pre><p>Flags to set the memory synchronization behavior of threadgroup_barrier and simdgroup_barrier.</p><p>Possible values:</p><pre><code class="nohighlight hljs">None: Set barriers to only act as an execution barrier and not apply a memory fence.
+MtlDeviceArray{T,A,N}(dims, ptr)</code></pre><p>Construct an <code>N</code>-dimensional dense Metal device array with element type <code>T</code> wrapping a pointer, where <code>N</code> is determined from the length of <code>dims</code> and <code>T</code> is determined from the type of <code>ptr</code>.</p><p><code>dims</code> may be a single scalar, or a tuple of integers corresponding to the lengths in each dimension). If the rank <code>N</code> is supplied explicitly as in <code>Array{T,N}(dims)</code>, then it must match the length of <code>dims</code>. The same applies to the element type <code>T</code>, which should match the type of the pointer <code>ptr</code>.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/array.jl#L8-L22">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.Const" href="#Metal.Const"><code>Metal.Const</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">Const(A::MtlDeviceArray)</code></pre><p>Mark a MtlDeviceArray as constant/read-only and to use the constant address space.</p><div class="admonition is-warning"><header class="admonition-header">Warning</header><div class="admonition-body"><p>Experimental API. Subject to change without deprecation.</p></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/array.jl#L124-L130">source</a></section></article><h3 id="Shared-memory"><a class="docs-heading-anchor" href="#Shared-memory">Shared memory</a><a id="Shared-memory-1"></a><a class="docs-heading-anchor-permalink" href="#Shared-memory" title="Permalink"></a></h3><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MtlThreadGroupArray" href="#Metal.MtlThreadGroupArray"><code>Metal.MtlThreadGroupArray</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">MtlThreadGroupArray(::Type{T}, dims)</code></pre><p>Create an array local to each threadgroup launched during kernel execution.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/memory.jl#L3-L7">source</a></section></article><h2 id="Synchronization"><a class="docs-heading-anchor" href="#Synchronization">Synchronization</a><a id="Synchronization-1"></a><a class="docs-heading-anchor-permalink" href="#Synchronization" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MemoryFlags" href="#Metal.MemoryFlags"><code>Metal.MemoryFlags</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MemoryFlags</code></pre><p>Flags to set the memory synchronization behavior of threadgroup_barrier and simdgroup_barrier.</p><p>Possible values:</p><pre><code class="nohighlight hljs">None: Set barriers to only act as an execution barrier and not apply a memory fence.
 
 Device: Ensure the GPU correctly orders the memory operations to device memory
         for threads in the threadgroup or simdgroup.
@@ -30,4 +30,4 @@
         threads in a threadgroup or simdgroup for a texture with the read_write access qualifier.
 
 ThreadGroup_ImgBlock: Ensure the GPU correctly orders the memory operations to threadgroup imageblock memory
-        for threads in a threadgroup or simdgroup.</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/synchronization.jl#L6-L26">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threadgroup_barrier" href="#Metal.threadgroup_barrier"><code>Metal.threadgroup_barrier</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threadgroup_barrier(flag::MemoryFlags=MemoryFlagNone)</code></pre><p>Synchronize all threads in a threadgroup.</p><p>Possible flags that affect the memory synchronization behavior are found in <a href="#Metal.MemoryFlags"><code>MemoryFlags</code></a></p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/synchronization.jl#L36-L42">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.simdgroup_barrier" href="#Metal.simdgroup_barrier"><code>Metal.simdgroup_barrier</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">simdgroup_barrier(flag::MemoryFlags=MemoryFlagNone)</code></pre><p>Synchronize all threads in a SIMD-group.</p><p>Possible flags that affect the memory synchronization behavior are found in <a href="#Metal.MemoryFlags"><code>MemoryFlags</code></a></p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/src/device/intrinsics/synchronization.jl#L46-L52">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../compiler/">« Compiler</a><a class="docs-footer-nextpage" href="../array/">Array programming »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+        for threads in a threadgroup or simdgroup.</code></pre></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/synchronization.jl#L6-L26">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.threadgroup_barrier" href="#Metal.threadgroup_barrier"><code>Metal.threadgroup_barrier</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">threadgroup_barrier(flag::MemoryFlags=MemoryFlagNone)</code></pre><p>Synchronize all threads in a threadgroup.</p><p>Possible flags that affect the memory synchronization behavior are found in <a href="#Metal.MemoryFlags"><code>MemoryFlags</code></a></p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/synchronization.jl#L36-L42">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.simdgroup_barrier" href="#Metal.simdgroup_barrier"><code>Metal.simdgroup_barrier</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">simdgroup_barrier(flag::MemoryFlags=MemoryFlagNone)</code></pre><p>Synchronize all threads in a SIMD-group.</p><p>Possible flags that affect the memory synchronization behavior are found in <a href="#Metal.MemoryFlags"><code>MemoryFlags</code></a></p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/src/device/intrinsics/synchronization.jl#L46-L52">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../compiler/">« Compiler</a><a class="docs-footer-nextpage" href="../array/">Array programming »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/api/mps/index.html b/dev/api/mps/index.html
index 754bcd11..be6a7175 100644
--- a/dev/api/mps/index.html
+++ b/dev/api/mps/index.html
@@ -1,4 +1,4 @@
 <!DOCTYPE html>
-<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Metal Performance Shaders · Metal.jl</title><meta name="title" content="Metal Performance Shaders · Metal.jl"/><meta property="og:title" content="Metal Performance Shaders · Metal.jl"/><meta property="twitter:title" content="Metal Performance Shaders · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/api/mps/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/api/mps/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/api/mps/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../../usage/overview/">Overview</a></li><li><a class="tocitem" href="../../usage/array/">Array programming</a></li><li><a class="tocitem" href="../../usage/kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li><a class="tocitem" href="../essentials/">Essentials</a></li><li><a class="tocitem" href="../compiler/">Compiler</a></li><li><a class="tocitem" href="../kernel/">Kernel programming</a></li><li><a class="tocitem" href="../array/">Array programming</a></li><li class="is-active"><a class="tocitem" href>Metal Performance Shaders</a><ul class="internal"><li><a class="tocitem" href="#Matrices-and-Vectors"><span>Matrices and Vectors</span></a></li></ul></li></ul></li><li><span class="tocitem">FAQ</span><ul><li><a class="tocitem" href="../../faq/faq/">Frequently Asked Questions</a></li><li><a class="tocitem" href="../../faq/contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">API reference</a></li><li class="is-active"><a href>Metal Performance Shaders</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Metal Performance Shaders</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/api/mps.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Metal-Performance-Shaders"><a class="docs-heading-anchor" href="#Metal-Performance-Shaders">Metal Performance Shaders</a><a id="Metal-Performance-Shaders-1"></a><a class="docs-heading-anchor-permalink" href="#Metal-Performance-Shaders" title="Permalink"></a></h1><p>This section lists the package&#39;s public functionality that corresponds to the Metal Performance Shaders functions. For more information about these functions, or to see which functions have yet to be implemented in this package, please consult the <a href="https://developer.apple.com/documentation/metalperformanceshaders?language=objc">Metal Performance Shaders Documentation</a>.</p><h2 id="Matrices-and-Vectors"><a class="docs-heading-anchor" href="#Matrices-and-Vectors">Matrices and Vectors</a><a id="Matrices-and-Vectors-1"></a><a class="docs-heading-anchor-permalink" href="#Matrices-and-Vectors" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.MPSMatrix" href="#Metal.MPS.MPSMatrix"><code>Metal.MPS.MPSMatrix</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MPSMatrix(mat::MtlMatrix)</code></pre><p>Metal matrix representation used in Performance Shaders.</p><p>Note that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mps/matrix.jl#L121-L128">source</a></section><section><div><pre><code class="language-julia hljs">MPSMatrix(vec::MtlVector)</code></pre><p>Metal matrix representation used in Performance Shaders.</p><p>Note that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mps/matrix.jl#L136-L143">source</a></section><section><div><pre><code class="language-julia hljs">MPSMatrix(arr::MtlArray{T,3})</code></pre><p>Metal batched matrix representation used in Performance Shaders.</p><p>Note that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mps/matrix.jl#L151-L158">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.MPSVector" href="#Metal.MPS.MPSVector"><code>Metal.MPS.MPSVector</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MPSVector(arr::MtlVector)</code></pre><p>Metal vector representation used in Performance Shaders.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mps/vector.jl#L66-L70">source</a></section></article><h3 id="Matrix-Arithmetic-Operators"><a class="docs-heading-anchor" href="#Matrix-Arithmetic-Operators">Matrix Arithmetic Operators</a><a id="Matrix-Arithmetic-Operators-1"></a><a class="docs-heading-anchor-permalink" href="#Matrix-Arithmetic-Operators" title="Permalink"></a></h3><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.matmul!" href="#Metal.MPS.matmul!"><code>Metal.MPS.matmul!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">matMulMPS(a::MtlMatrix, b::MtlMatrix, c::MtlMatrix, alpha=1, beta=1,
-          transpose_left=false, transpose_right=false)</code></pre><p>A <code>MPSMatrixMultiplication</code> kernel thay computes: <code>c = alpha * op(a) * beta * op(b) + beta * C</code></p><p>This function should not typically be used. Rather, use the normal <code>LinearAlgebra</code> interface with any <code>MtlArray</code> and it should be accelerated using Metal Performance Shaders.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mps/matrix.jl#L205-L213">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.matvecmul!" href="#Metal.MPS.matvecmul!"><code>Metal.MPS.matvecmul!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">matvecmul!(c::MtlVector, a::MtlMatrix, b::MtlVector, alpha=1, beta=1, transpose=false)</code></pre><p>A <code>MPSMatrixVectorMultiplication</code> kernel thay computes:   <code>c = alpha * op(a) * b + beta * c</code></p><p>This function should not typically be used. Rather, use the normal <code>LinearAlgebra</code> interface with any <code>MtlArray</code> and it should be accelerated using Metal Performance Shaders.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mps/vector.jl#L112-L120">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.topk" href="#Metal.MPS.topk"><code>Metal.MPS.topk</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">MPS.topk(A::MtlMatrix{T}, k) where {T&lt;:MtlFloat}</code></pre><p>Compute the top <code>k</code> values and their corresponding indices column-wise in a matrix <code>A</code>. Return the indices in <code>I</code> and the values in <code>V</code>.</p><p><code>k</code> cannot be greater than 16.</p><p>Uses <code>MPSMatrixFindTopK</code>.</p><p>See also: <a href="#Metal.MPS.topk!"><code>topk!</code></a>.</p><div class="admonition is-category-warn"><header class="admonition-header">Warn</header><div class="admonition-body"><p>This interface is experimental, and might change without warning.</p></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mps/matrix.jl#L316-L330">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.topk!" href="#Metal.MPS.topk!"><code>Metal.MPS.topk!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">MPS.topk!(A::MtlMatrix{T}, I::MtlMatrix{Int32}, V::MtlMatrix{T}, k)
-                                                 where {T&lt;:MtlFloat}</code></pre><p>Compute the top <code>k</code> values and their corresponding indices column-wise in a matrix <code>A</code>. Return the indices in <code>I</code> and the values in <code>V</code>.</p><p><code>k</code> cannot be greater than 16.</p><p>Uses <code>MPSMatrixFindTopK</code>.</p><p>See also: <a href="#Metal.MPS.topk"><code>topk</code></a>.</p><div class="admonition is-category-warn"><header class="admonition-header">Warn</header><div class="admonition-body"><p>This interface is experimental, and might change without warning.</p></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/28576b3f4601ed0b32ccc74485cddf9a6f56249c/lib/mps/matrix.jl#L272-L287">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../array/">« Array programming</a><a class="docs-footer-nextpage" href="../../faq/faq/">Frequently Asked Questions »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Metal Performance Shaders · Metal.jl</title><meta name="title" content="Metal Performance Shaders · Metal.jl"/><meta property="og:title" content="Metal Performance Shaders · Metal.jl"/><meta property="twitter:title" content="Metal Performance Shaders · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/api/mps/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/api/mps/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/api/mps/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../../usage/overview/">Overview</a></li><li><a class="tocitem" href="../../usage/array/">Array programming</a></li><li><a class="tocitem" href="../../usage/kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li><a class="tocitem" href="../essentials/">Essentials</a></li><li><a class="tocitem" href="../compiler/">Compiler</a></li><li><a class="tocitem" href="../kernel/">Kernel programming</a></li><li><a class="tocitem" href="../array/">Array programming</a></li><li class="is-active"><a class="tocitem" href>Metal Performance Shaders</a><ul class="internal"><li><a class="tocitem" href="#Matrices-and-Vectors"><span>Matrices and Vectors</span></a></li></ul></li></ul></li><li><span class="tocitem">FAQ</span><ul><li><a class="tocitem" href="../../faq/faq/">Frequently Asked Questions</a></li><li><a class="tocitem" href="../../faq/contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">API reference</a></li><li class="is-active"><a href>Metal Performance Shaders</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Metal Performance Shaders</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/api/mps.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Metal-Performance-Shaders"><a class="docs-heading-anchor" href="#Metal-Performance-Shaders">Metal Performance Shaders</a><a id="Metal-Performance-Shaders-1"></a><a class="docs-heading-anchor-permalink" href="#Metal-Performance-Shaders" title="Permalink"></a></h1><p>This section lists the package&#39;s public functionality that corresponds to the Metal Performance Shaders functions. For more information about these functions, or to see which functions have yet to be implemented in this package, please consult the <a href="https://developer.apple.com/documentation/metalperformanceshaders?language=objc">Metal Performance Shaders Documentation</a>.</p><h2 id="Matrices-and-Vectors"><a class="docs-heading-anchor" href="#Matrices-and-Vectors">Matrices and Vectors</a><a id="Matrices-and-Vectors-1"></a><a class="docs-heading-anchor-permalink" href="#Matrices-and-Vectors" title="Permalink"></a></h2><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.MPSMatrix" href="#Metal.MPS.MPSMatrix"><code>Metal.MPS.MPSMatrix</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MPSMatrix(mat::MtlMatrix)</code></pre><p>Metal matrix representation used in Performance Shaders.</p><p>Note that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mps/matrix.jl#L121-L128">source</a></section><section><div><pre><code class="language-julia hljs">MPSMatrix(vec::MtlVector)</code></pre><p>Metal matrix representation used in Performance Shaders.</p><p>Note that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mps/matrix.jl#L136-L143">source</a></section><section><div><pre><code class="language-julia hljs">MPSMatrix(arr::MtlArray{T,3})</code></pre><p>Metal batched matrix representation used in Performance Shaders.</p><p>Note that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mps/matrix.jl#L151-L158">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.MPSVector" href="#Metal.MPS.MPSVector"><code>Metal.MPS.MPSVector</code></a> — <span class="docstring-category">Type</span></header><section><div><pre><code class="language-julia hljs">MPSVector(arr::MtlVector)</code></pre><p>Metal vector representation used in Performance Shaders.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mps/vector.jl#L66-L70">source</a></section></article><h3 id="Matrix-Arithmetic-Operators"><a class="docs-heading-anchor" href="#Matrix-Arithmetic-Operators">Matrix Arithmetic Operators</a><a id="Matrix-Arithmetic-Operators-1"></a><a class="docs-heading-anchor-permalink" href="#Matrix-Arithmetic-Operators" title="Permalink"></a></h3><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.matmul!" href="#Metal.MPS.matmul!"><code>Metal.MPS.matmul!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">matMulMPS(a::MtlMatrix, b::MtlMatrix, c::MtlMatrix, alpha=1, beta=1,
+          transpose_left=false, transpose_right=false)</code></pre><p>A <code>MPSMatrixMultiplication</code> kernel thay computes: <code>c = alpha * op(a) * beta * op(b) + beta * C</code></p><p>This function should not typically be used. Rather, use the normal <code>LinearAlgebra</code> interface with any <code>MtlArray</code> and it should be accelerated using Metal Performance Shaders.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mps/matrix.jl#L205-L213">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.matvecmul!" href="#Metal.MPS.matvecmul!"><code>Metal.MPS.matvecmul!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">matvecmul!(c::MtlVector, a::MtlMatrix, b::MtlVector, alpha=1, beta=1, transpose=false)</code></pre><p>A <code>MPSMatrixVectorMultiplication</code> kernel thay computes:   <code>c = alpha * op(a) * b + beta * c</code></p><p>This function should not typically be used. Rather, use the normal <code>LinearAlgebra</code> interface with any <code>MtlArray</code> and it should be accelerated using Metal Performance Shaders.</p></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mps/vector.jl#L112-L120">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.topk" href="#Metal.MPS.topk"><code>Metal.MPS.topk</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">MPS.topk(A::MtlMatrix{T}, k) where {T&lt;:MtlFloat}</code></pre><p>Compute the top <code>k</code> values and their corresponding indices column-wise in a matrix <code>A</code>. Return the indices in <code>I</code> and the values in <code>V</code>.</p><p><code>k</code> cannot be greater than 16.</p><p>Uses <code>MPSMatrixFindTopK</code>.</p><p>See also: <a href="#Metal.MPS.topk!"><code>topk!</code></a>.</p><div class="admonition is-category-warn"><header class="admonition-header">Warn</header><div class="admonition-body"><p>This interface is experimental, and might change without warning.</p></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mps/matrix.jl#L316-L330">source</a></section></article><article class="docstring"><header><a class="docstring-article-toggle-button fa-solid fa-chevron-down" href="javascript:;" title="Collapse docstring"></a><a class="docstring-binding" id="Metal.MPS.topk!" href="#Metal.MPS.topk!"><code>Metal.MPS.topk!</code></a> — <span class="docstring-category">Function</span></header><section><div><pre><code class="language-julia hljs">MPS.topk!(A::MtlMatrix{T}, I::MtlMatrix{Int32}, V::MtlMatrix{T}, k)
+                                                 where {T&lt;:MtlFloat}</code></pre><p>Compute the top <code>k</code> values and their corresponding indices column-wise in a matrix <code>A</code>. Return the indices in <code>I</code> and the values in <code>V</code>.</p><p><code>k</code> cannot be greater than 16.</p><p>Uses <code>MPSMatrixFindTopK</code>.</p><p>See also: <a href="#Metal.MPS.topk"><code>topk</code></a>.</p><div class="admonition is-category-warn"><header class="admonition-header">Warn</header><div class="admonition-body"><p>This interface is experimental, and might change without warning.</p></div></div></div><a class="docs-sourcelink" target="_blank" href="https://github.com/JuliaGPU/Metal.jl/blob/adac3bd000be7dc6b64725a8e8a7a47a154c4dfe/lib/mps/matrix.jl#L272-L287">source</a></section></article></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../array/">« Array programming</a><a class="docs-footer-nextpage" href="../../faq/faq/">Frequently Asked Questions »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/faq/contributing/index.html b/dev/faq/contributing/index.html
index 127045df..bedc1c07 100644
--- a/dev/faq/contributing/index.html
+++ b/dev/faq/contributing/index.html
@@ -7,4 +7,4 @@
                         uint i [[thread_position_in_grid]])
 {
     atomic_store_explicit(&amp;out[i], 0.0f, memory_order_relaxed);
-}</code></pre><p>To compile with Metal&#39;s tools and emit human-readable IR, run something roughly along the lines of: <code>xcrun metal -S -emit-llvm dummy_kernel.metal</code></p><p>This will create a <code>.ll</code> file that you can then parse for whatever information you need. Be sure to double-check the metadata at the bottom for any significant changes your functionality introduces.</p><p>Test with different types and configurations to see what changes are caused. Also ensure that when writing very simple kernels, whatever you&#39;re interested in doesn&#39;t get optimized away. Double-check that the kernel&#39;s IR makes sense for what you wrote.</p><h2 id="Metal-Performance-Shaders"><a class="docs-heading-anchor" href="#Metal-Performance-Shaders">Metal Performance Shaders</a><a id="Metal-Performance-Shaders-1"></a><a class="docs-heading-anchor-permalink" href="#Metal-Performance-Shaders" title="Permalink"></a></h2><p>Metal exposes a special interface to its library of optimized kernels. Rather than accepting the normal set of input GPU data structures, it requires special <code>MPS</code> datatypes that assume row-major memory layout. As this is not the Julia default, adapt accordingly. Adding MPS functionality should be mostly straightforward, so this can be an easy entry point to helping. To get started, you can have a look at the <a href="https://developer.apple.com/documentation/metalperformanceshaders?language=objc">Metal Performance Shaders Documentation</a> from Apple.</p><h2 id="Exposing-your-Interface"><a class="docs-heading-anchor" href="#Exposing-your-Interface">Exposing your Interface</a><a id="Exposing-your-Interface-1"></a><a class="docs-heading-anchor-permalink" href="#Exposing-your-Interface" title="Permalink"></a></h2><p>There are varying degrees of user-facing interfaces from Metal.jl. At the lowest level is <code>Metal.MTL.xxx</code>. This is for low-level functionality close to or at bare Objective-C, or things that a normal user wouldn&#39;t directly be using. <code>Metal.MPS.xxx</code> is for Metal Performance Shader specifics (like <code>MPSMatrix</code>). Next, is <code>Metal.xxx</code>. This is for higher-level, usually pure-Julian functionality (like <code>device()</code>). The only thing beyond this is exporting into the global namespace. That would be useful for uniquely-named functions/structures/macros with clear and common use-cases (<code>MtlArray</code> or <code>@metal</code>).</p><p>Additionally, you can override non-Metal.jl functions like <code>LinearAlgebra.mul!</code> seen <a href="https://github.com/JuliaGPU/Metal.jl/blob/main/lib/mps/linalg.jl#L34">here</a>. This is essentially (ab)using multiple dispatch to specialize for certain cases (usually for more performant execution).</p><p>If your function is only available from within GPU kernels (like thread indexing intrinsics). Be sure to properly annotate with <code>@device_function</code> to ensure that calling from the host doesn&#39;t kill your Julia process.</p><p>Generally, think about how frequently you expect your addition to be used, how complex its use-case is, and whether or not it clashes/reimplements/optimizes existing functionality from outside Metal.jl. Put it behind the corresponding interface.</p><h2 id="Creating-Tests"><a class="docs-heading-anchor" href="#Creating-Tests">Creating Tests</a><a id="Creating-Tests-1"></a><a class="docs-heading-anchor-permalink" href="#Creating-Tests" title="Permalink"></a></h2><p>As it&#39;s good practice, and JuliaGPU has great CI/CD workflows, your addition should have associated tests to ensure correctness and edge cases. Look to existing examples under the <code>test</code> folder for initial guidance, and be sure to create tests for all valid types. Any new Julia file in this folder will be ran as its own testset. If you feel your tests don&#39;t fit in any existing place, you&#39;ll probably want to create a new file with an appropriate name.</p><h2 id="Running-a-Subset-of-the-Existing-Tests"><a class="docs-heading-anchor" href="#Running-a-Subset-of-the-Existing-Tests">Running a Subset of the Existing Tests</a><a id="Running-a-Subset-of-the-Existing-Tests-1"></a><a class="docs-heading-anchor-permalink" href="#Running-a-Subset-of-the-Existing-Tests" title="Permalink"></a></h2><p>Sometimes you won&#39;t want to run the entire testsuite. You may just want to run the tests for your new functionality. To do that, you can either pass the name of the testset to the <code>test/runtests.jl</code> script: <code>julia --project=test test/runtests.jl metal</code> or you can isolate test files by running them alone after running the <code>test/setup.jl</code> script: <code>julia --project=test -L test/setup.jl test/metal.jl</code></p><h2 id="Thank-You-and-Good-Luck"><a class="docs-heading-anchor" href="#Thank-You-and-Good-Luck">Thank You and Good Luck</a><a id="Thank-You-and-Good-Luck-1"></a><a class="docs-heading-anchor-permalink" href="#Thank-You-and-Good-Luck" title="Permalink"></a></h2><p>Open-source projects like this only happen because people like you are willing to spend their free time helping out. Most anything you&#39;re able to do is helpful, but if you get stuck, seek guidance from Slack or Discourse. Don&#39;t feel like your contribution has to be perfect. If you put in effort and make progress, there will likely be some senior developer willing to polish your code before merging. Open-source software is a team effort...welcome to the team!</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../faq/">« Frequently Asked Questions</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+}</code></pre><p>To compile with Metal&#39;s tools and emit human-readable IR, run something roughly along the lines of: <code>xcrun metal -S -emit-llvm dummy_kernel.metal</code></p><p>This will create a <code>.ll</code> file that you can then parse for whatever information you need. Be sure to double-check the metadata at the bottom for any significant changes your functionality introduces.</p><p>Test with different types and configurations to see what changes are caused. Also ensure that when writing very simple kernels, whatever you&#39;re interested in doesn&#39;t get optimized away. Double-check that the kernel&#39;s IR makes sense for what you wrote.</p><h2 id="Metal-Performance-Shaders"><a class="docs-heading-anchor" href="#Metal-Performance-Shaders">Metal Performance Shaders</a><a id="Metal-Performance-Shaders-1"></a><a class="docs-heading-anchor-permalink" href="#Metal-Performance-Shaders" title="Permalink"></a></h2><p>Metal exposes a special interface to its library of optimized kernels. Rather than accepting the normal set of input GPU data structures, it requires special <code>MPS</code> datatypes that assume row-major memory layout. As this is not the Julia default, adapt accordingly. Adding MPS functionality should be mostly straightforward, so this can be an easy entry point to helping. To get started, you can have a look at the <a href="https://developer.apple.com/documentation/metalperformanceshaders?language=objc">Metal Performance Shaders Documentation</a> from Apple.</p><h2 id="Exposing-your-Interface"><a class="docs-heading-anchor" href="#Exposing-your-Interface">Exposing your Interface</a><a id="Exposing-your-Interface-1"></a><a class="docs-heading-anchor-permalink" href="#Exposing-your-Interface" title="Permalink"></a></h2><p>There are varying degrees of user-facing interfaces from Metal.jl. At the lowest level is <code>Metal.MTL.xxx</code>. This is for low-level functionality close to or at bare Objective-C, or things that a normal user wouldn&#39;t directly be using. <code>Metal.MPS.xxx</code> is for Metal Performance Shader specifics (like <code>MPSMatrix</code>). Next, is <code>Metal.xxx</code>. This is for higher-level, usually pure-Julian functionality (like <code>device()</code>). The only thing beyond this is exporting into the global namespace. That would be useful for uniquely-named functions/structures/macros with clear and common use-cases (<code>MtlArray</code> or <code>@metal</code>).</p><p>Additionally, you can override non-Metal.jl functions like <code>LinearAlgebra.mul!</code> seen <a href="https://github.com/JuliaGPU/Metal.jl/blob/main/lib/mps/linalg.jl#L34">here</a>. This is essentially (ab)using multiple dispatch to specialize for certain cases (usually for more performant execution).</p><p>If your function is only available from within GPU kernels (like thread indexing intrinsics). Be sure to properly annotate with <code>@device_function</code> to ensure that calling from the host doesn&#39;t kill your Julia process.</p><p>Generally, think about how frequently you expect your addition to be used, how complex its use-case is, and whether or not it clashes/reimplements/optimizes existing functionality from outside Metal.jl. Put it behind the corresponding interface.</p><h2 id="Creating-Tests"><a class="docs-heading-anchor" href="#Creating-Tests">Creating Tests</a><a id="Creating-Tests-1"></a><a class="docs-heading-anchor-permalink" href="#Creating-Tests" title="Permalink"></a></h2><p>As it&#39;s good practice, and JuliaGPU has great CI/CD workflows, your addition should have associated tests to ensure correctness and edge cases. Look to existing examples under the <code>test</code> folder for initial guidance, and be sure to create tests for all valid types. Any new Julia file in this folder will be ran as its own testset. If you feel your tests don&#39;t fit in any existing place, you&#39;ll probably want to create a new file with an appropriate name.</p><h2 id="Running-a-Subset-of-the-Existing-Tests"><a class="docs-heading-anchor" href="#Running-a-Subset-of-the-Existing-Tests">Running a Subset of the Existing Tests</a><a id="Running-a-Subset-of-the-Existing-Tests-1"></a><a class="docs-heading-anchor-permalink" href="#Running-a-Subset-of-the-Existing-Tests" title="Permalink"></a></h2><p>Sometimes you won&#39;t want to run the entire testsuite. You may just want to run the tests for your new functionality. To do that, you can either pass the name of the testset to the <code>test/runtests.jl</code> script: <code>julia --project=test test/runtests.jl metal</code> or you can isolate test files by running them alone after running the <code>test/setup.jl</code> script: <code>julia --project=test -L test/setup.jl test/metal.jl</code></p><h2 id="Thank-You-and-Good-Luck"><a class="docs-heading-anchor" href="#Thank-You-and-Good-Luck">Thank You and Good Luck</a><a id="Thank-You-and-Good-Luck-1"></a><a class="docs-heading-anchor-permalink" href="#Thank-You-and-Good-Luck" title="Permalink"></a></h2><p>Open-source projects like this only happen because people like you are willing to spend their free time helping out. Most anything you&#39;re able to do is helpful, but if you get stuck, seek guidance from Slack or Discourse. Don&#39;t feel like your contribution has to be perfect. If you put in effort and make progress, there will likely be some senior developer willing to polish your code before merging. Open-source software is a team effort...welcome to the team!</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../faq/">« Frequently Asked Questions</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/faq/faq/index.html b/dev/faq/faq/index.html
index 63e36408..86449158 100644
--- a/dev/faq/faq/index.html
+++ b/dev/faq/faq/index.html
@@ -1,2 +1,2 @@
 <!DOCTYPE html>
-<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Frequently Asked Questions · Metal.jl</title><meta name="title" content="Frequently Asked Questions · Metal.jl"/><meta property="og:title" content="Frequently Asked Questions · Metal.jl"/><meta property="twitter:title" content="Frequently Asked Questions · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/faq/faq/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/faq/faq/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/faq/faq/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../../usage/overview/">Overview</a></li><li><a class="tocitem" href="../../usage/array/">Array programming</a></li><li><a class="tocitem" href="../../usage/kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li><a class="tocitem" href="../../api/essentials/">Essentials</a></li><li><a class="tocitem" href="../../api/compiler/">Compiler</a></li><li><a class="tocitem" href="../../api/kernel/">Kernel programming</a></li><li><a class="tocitem" href="../../api/array/">Array programming</a></li><li><a class="tocitem" href="../../api/mps/">Metal Performance Shaders</a></li></ul></li><li><span class="tocitem">FAQ</span><ul><li class="is-active"><a class="tocitem" href>Frequently Asked Questions</a><ul class="internal"><li><a class="tocitem" href="#Can-you-wrap-this-Metal-API?"><span>Can you wrap this Metal API?</span></a></li></ul></li><li><a class="tocitem" href="../contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">FAQ</a></li><li class="is-active"><a href>Frequently Asked Questions</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Frequently Asked Questions</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/faq/faq.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Frequently-Asked-Questions"><a class="docs-heading-anchor" href="#Frequently-Asked-Questions">Frequently Asked Questions</a><a id="Frequently-Asked-Questions-1"></a><a class="docs-heading-anchor-permalink" href="#Frequently-Asked-Questions" title="Permalink"></a></h1><h2 id="Can-you-wrap-this-Metal-API?"><a class="docs-heading-anchor" href="#Can-you-wrap-this-Metal-API?">Can you wrap this Metal API?</a><a id="Can-you-wrap-this-Metal-API?-1"></a><a class="docs-heading-anchor-permalink" href="#Can-you-wrap-this-Metal-API?" title="Permalink"></a></h2><p>Most likely. Any help on designing or implementing high-level wrappers for MSL&#39;s low-level functionality is greatly appreciated, so please consider <a href="../contributing/">contributing</a> your uses of these APIs on the respective repositories.</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../api/mps/">« Metal Performance Shaders</a><a class="docs-footer-nextpage" href="../contributing/">Contributing »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Frequently Asked Questions · Metal.jl</title><meta name="title" content="Frequently Asked Questions · Metal.jl"/><meta property="og:title" content="Frequently Asked Questions · Metal.jl"/><meta property="twitter:title" content="Frequently Asked Questions · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/faq/faq/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/faq/faq/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/faq/faq/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../../usage/overview/">Overview</a></li><li><a class="tocitem" href="../../usage/array/">Array programming</a></li><li><a class="tocitem" href="../../usage/kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li><a class="tocitem" href="../../api/essentials/">Essentials</a></li><li><a class="tocitem" href="../../api/compiler/">Compiler</a></li><li><a class="tocitem" href="../../api/kernel/">Kernel programming</a></li><li><a class="tocitem" href="../../api/array/">Array programming</a></li><li><a class="tocitem" href="../../api/mps/">Metal Performance Shaders</a></li></ul></li><li><span class="tocitem">FAQ</span><ul><li class="is-active"><a class="tocitem" href>Frequently Asked Questions</a><ul class="internal"><li><a class="tocitem" href="#Can-you-wrap-this-Metal-API?"><span>Can you wrap this Metal API?</span></a></li></ul></li><li><a class="tocitem" href="../contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">FAQ</a></li><li class="is-active"><a href>Frequently Asked Questions</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Frequently Asked Questions</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/faq/faq.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Frequently-Asked-Questions"><a class="docs-heading-anchor" href="#Frequently-Asked-Questions">Frequently Asked Questions</a><a id="Frequently-Asked-Questions-1"></a><a class="docs-heading-anchor-permalink" href="#Frequently-Asked-Questions" title="Permalink"></a></h1><h2 id="Can-you-wrap-this-Metal-API?"><a class="docs-heading-anchor" href="#Can-you-wrap-this-Metal-API?">Can you wrap this Metal API?</a><a id="Can-you-wrap-this-Metal-API?-1"></a><a class="docs-heading-anchor-permalink" href="#Can-you-wrap-this-Metal-API?" title="Permalink"></a></h2><p>Most likely. Any help on designing or implementing high-level wrappers for MSL&#39;s low-level functionality is greatly appreciated, so please consider <a href="../contributing/">contributing</a> your uses of these APIs on the respective repositories.</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../api/mps/">« Metal Performance Shaders</a><a class="docs-footer-nextpage" href="../contributing/">Contributing »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/index.html b/dev/index.html
index 7eb10692..e591deec 100644
--- a/dev/index.html
+++ b/dev/index.html
@@ -6,4 +6,4 @@
 # smoke test
 using Metal
 Metal.versioninfo()</code></pre><p>If you want to ensure everything works as expected, you can execute the test suite.</p><pre><code class="language-julia hljs">using Pkg
-Pkg.test(&quot;Metal&quot;)</code></pre><p>The following resources may also be of interest (although are mainly focused on the CUDA GPU  backend):</p><ul><li>Effectively using GPUs with Julia: <a href="https://docs.google.com/presentation/d/1l-BuAtyKgoVYakJSijaSqaTL3friESDyTOnU2OLqGoA/">slides</a></li><li>How Julia is compiled to GPUs: <a href="https://www.youtube.com/watch?v=Fz-ogmASMAE">video</a></li></ul><h2 id="Contributing"><a class="docs-heading-anchor" href="#Contributing">Contributing</a><a id="Contributing-1"></a><a class="docs-heading-anchor-permalink" href="#Contributing" title="Permalink"></a></h2><p>If you want to help improve this package, look at <a href="faq/contributing/">the contributing page</a> for more details.</p><h2 id="Acknowledgements"><a class="docs-heading-anchor" href="#Acknowledgements">Acknowledgements</a><a id="Acknowledgements-1"></a><a class="docs-heading-anchor-permalink" href="#Acknowledgements" title="Permalink"></a></h2><p>The Julia Metal stack has been a collaborative effort by many individuals. Significant contributions have been made by the following individuals:</p><ul><li>Tim Besard (@maleadt) (lead developer)</li><li>Filippo Vicentini (@PhilipVinc)</li><li>Max Hawkins (@max-Hawkins)</li></ul><h2 id="Supporting-and-Citing"><a class="docs-heading-anchor" href="#Supporting-and-Citing">Supporting and Citing</a><a id="Supporting-and-Citing-1"></a><a class="docs-heading-anchor-permalink" href="#Supporting-and-Citing" title="Permalink"></a></h2><p>Some of the software in this ecosystem was developed as part of academic research. If you would like to help support it, please star the repository as such metrics may help us secure funding in the future. If you use our software as part of your research, teaching, or other activities, we would be grateful if you could cite our work. The <a href="https://github.com/JuliaGPU/Metal.jl/blob/main/CITATION.cff">CITATION.cff</a> file in the root of this repository lists the relevant papers.</p></article><nav class="docs-footer"><a class="docs-footer-nextpage" href="usage/overview/">Overview »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+Pkg.test(&quot;Metal&quot;)</code></pre><p>The following resources may also be of interest (although are mainly focused on the CUDA GPU  backend):</p><ul><li>Effectively using GPUs with Julia: <a href="https://docs.google.com/presentation/d/1l-BuAtyKgoVYakJSijaSqaTL3friESDyTOnU2OLqGoA/">slides</a></li><li>How Julia is compiled to GPUs: <a href="https://www.youtube.com/watch?v=Fz-ogmASMAE">video</a></li></ul><h2 id="Contributing"><a class="docs-heading-anchor" href="#Contributing">Contributing</a><a id="Contributing-1"></a><a class="docs-heading-anchor-permalink" href="#Contributing" title="Permalink"></a></h2><p>If you want to help improve this package, look at <a href="faq/contributing/">the contributing page</a> for more details.</p><h2 id="Acknowledgements"><a class="docs-heading-anchor" href="#Acknowledgements">Acknowledgements</a><a id="Acknowledgements-1"></a><a class="docs-heading-anchor-permalink" href="#Acknowledgements" title="Permalink"></a></h2><p>The Julia Metal stack has been a collaborative effort by many individuals. Significant contributions have been made by the following individuals:</p><ul><li>Tim Besard (@maleadt) (lead developer)</li><li>Filippo Vicentini (@PhilipVinc)</li><li>Max Hawkins (@max-Hawkins)</li></ul><h2 id="Supporting-and-Citing"><a class="docs-heading-anchor" href="#Supporting-and-Citing">Supporting and Citing</a><a id="Supporting-and-Citing-1"></a><a class="docs-heading-anchor-permalink" href="#Supporting-and-Citing" title="Permalink"></a></h2><p>Some of the software in this ecosystem was developed as part of academic research. If you would like to help support it, please star the repository as such metrics may help us secure funding in the future. If you use our software as part of your research, teaching, or other activities, we would be grateful if you could cite our work. The <a href="https://github.com/JuliaGPU/Metal.jl/blob/main/CITATION.cff">CITATION.cff</a> file in the root of this repository lists the relevant papers.</p></article><nav class="docs-footer"><a class="docs-footer-nextpage" href="usage/overview/">Overview »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/objects.inv b/dev/objects.inv
index 98685638..641165c2 100644
Binary files a/dev/objects.inv and b/dev/objects.inv differ
diff --git a/dev/profiling/index.html b/dev/profiling/index.html
index 4d7766cd..a00a9906 100644
--- a/dev/profiling/index.html
+++ b/dev/profiling/index.html
@@ -31,4 +31,4 @@
 
 julia&gt; Metal.@capture @metal threads=length(c) vadd(a, b, c);
 ...
-[ Info: GPU frame capture saved to julia_1.gputrace; open the resulting trace in Xcode</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../usage/kernel/">« Kernel programming</a><a class="docs-footer-nextpage" href="../api/essentials/">Essentials »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+[ Info: GPU frame capture saved to julia_1.gputrace; open the resulting trace in Xcode</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../usage/kernel/">« Kernel programming</a><a class="docs-footer-nextpage" href="../api/essentials/">Essentials »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/search_index.js b/dev/search_index.js
index 16337237..56c98040 100644
--- a/dev/search_index.js
+++ b/dev/search_index.js
@@ -1,3 +1,3 @@
 var documenterSearchIndex = {"docs":
-[{"location":"usage/overview/#UsageOverview","page":"Overview","title":"Overview","text":"","category":"section"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"The Metal.jl package provides three distinct, but related, interfaces for Metal programming:","category":"page"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"the MtlArray type: for programming with arrays;\nnative kernel programming capabilities: for writing Metal kernels in Julia;\nMetal API wrappers: for low-level interactions with the Metal libraries.","category":"page"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"Much of the Julia Metal programming stack can be used by just relying on the MtlArray type, and using platform-agnostic programming patterns like broadcast and other array abstractions. Only once you hit a performance bottleneck, or some missing functionality, you might need to write a custom kernel or use the underlying Metal APIs.","category":"page"},{"location":"usage/overview/#The-MtlArray-type","page":"Overview","title":"The MtlArray type","text":"","category":"section"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"The MtlArray type is an essential part of the toolchain. Primarily, it is used to manage GPU memory, and copy data from and back to the CPU:","category":"page"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"a = MtlArray{Int}(undef, 1024)\n\n# essential memory operations, like copying, filling, reshaping, ...\nb = copy(a)\nfill!(b, 0)\n@test b == Metal.zeros(Int, 1024)\n\n# automatic memory management\na = nothing","category":"page"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"Beyond memory management, there are a whole range of array operations to process your data. This includes several higher-order operations that take other code as arguments, such as map, reduce or broadcast. With these, it is possible to perform kernel-like operations without actually writing your own GPU kernels:","category":"page"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"a = Metal.zeros(1024)\nb = Metal.ones(1024)\na.^2 .+ sin.(b)","category":"page"},{"location":"usage/array/#Array-programming","page":"Array programming","title":"Array programming","text":"","category":"section"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"DocTestSetup = quote\n    using Metal\nend","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"The easiest way to use the GPU's massive parallelism, is by expressing operations in terms of arrays: Metal.jl provides an array type, MtlArray, and many specialized array operations that execute efficiently on the GPU hardware. In this section, we will briefly demonstrate use of the MtlArray type. Since we expose Metal's functionality by implementing existing Julia interfaces on the MtlArray type, you should refer to the upstream Julia documentation for more information on these operations.","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"If you encounter missing functionality, or are running into operations that trigger so-called \"scalar iteration\", have a look at the issue tracker and file a new issue if there's none. Do note that you can always access the underlying Metal APIs by calling into the relevant submodule.","category":"page"},{"location":"usage/array/#Construction-and-Initialization","page":"Array programming","title":"Construction and Initialization","text":"","category":"section"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"The MtlArray type aims to implement the AbstractArray interface, and provide implementations of methods that are commonly used when working with arrays. That means you can construct MtlArrays in the same way as regular Array objects:","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"julia> MtlArray{Int}(undef, 2)\n2-element MtlVector{Int64, Metal.PrivateStorage}:\n 0\n 0\n\njulia> MtlArray{Int}(undef, (1,2))\n1×2 MtlMatrix{Int64, Metal.PrivateStorage}:\n 0  0\n\njulia> similar(ans)\n1×2 MtlMatrix{Int64, Metal.PrivateStorage}:\n 0  0","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"Copying memory to or from the GPU can be expressed using constructors as well, or by calling copyto!:","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"julia> a = MtlArray([1,2])\n2-element MtlVector{Int64, Metal.PrivateStorage}:\n 1\n 2\n\njulia> b = Array(a)\n2-element Vector{Int64}:\n 1\n 2\n\njulia> copyto!(b, a)\n2-element Vector{Int64}:\n 1\n 2","category":"page"},{"location":"usage/array/#Higher-order-abstractions","page":"Array programming","title":"Higher-order abstractions","text":"","category":"section"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"The real power of programming GPUs with arrays comes from Julia's higher-order array abstractions: Operations that take user code as an argument, and specialize execution on it. With these functions, you can often avoid having to write custom kernels. For example, to perform simple element-wise operations you can use map or broadcast:","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"julia> a = MtlArray{Float32}(undef, (1,2));\n\njulia> a .= 5\n1×2 MtlMatrix{Float32, Metal.PrivateStorage}:\n 5.0  5.0\n\njulia> map(sin, a)\n1×2 MtlMatrix{Float32, Metal.PrivateStorage}:\n -0.958924  -0.958924","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"To reduce the dimensionality of arrays, Metal.jl implements the various flavours of (map)reduce(dim):","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"julia> a = Metal.ones(2,3)\n2×3 MtlMatrix{Float32, Metal.PrivateStorage}:\n 1.0  1.0  1.0\n 1.0  1.0  1.0\n\njulia> reduce(+, a)\n6.0f0\n\njulia> mapreduce(sin, *, a; dims=2)\n2×1 MtlMatrix{Float32, Metal.PrivateStorage}:\n 0.59582335\n 0.59582335\n\njulia> b = Metal.zeros(1)\n1-element MtlVector{Float32, Metal.PrivateStorage}:\n 0.0\n\njulia> Base.mapreducedim!(identity, +, b, a)\n1×1 MtlMatrix{Float32, Metal.PrivateStorage}:\n 6.0","category":"page"},{"location":"faq/contributing/#Contributing","page":"Contributing","title":"Contributing","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Metal.jl is an especially accessible GPU backend with the presence of GPUs on Apple's recent popular Macbooks. As a result, an average Julia user can now develop and test GPU-accelerated code locally on their laptop. If you're using this package and see a bug or want some additional functionality, this page is for you. Hopefully this information helps encourage you to contribute to the package yourself.","category":"page"},{"location":"faq/contributing/#What-needs-help?","page":"Contributing","title":"What needs help?","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"If you didn't come to this page with your own feature to add, look at the current issues in the git repo for bugs and requested functionality.","category":"page"},{"location":"faq/contributing/#I'm-a-beginner,-can-I-help?","page":"Contributing","title":"I'm a beginner, can I help?","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Yes, but you may spend more time learning rather than directly contributing at the start. Depending on what your goals are though, this might be desirable. There are differing levels of difficulty when considering contributions to Metal.jl. If you're new to these things, check the issues for \"Good First Issue\" tags, look at the documentation for areas that could be added (beginners are especially good at detecting these sort of deficiencies), or message on the Slack #gpu channel asking for guidance.","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Regardless, if you've never used Metal.jl before, it'd probably be best to gain some exposure to it before trying to contibute. You might run into bugs yourself or discover some area you'd really like to help with.","category":"page"},{"location":"faq/contributing/#General-Workflow-for-Adding-Functionality","page":"Contributing","title":"General Workflow for Adding Functionality","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"If you're adding some functionality that originates from Metal Shading Language (MSL) directly (rather than high-level Julia functionality), the workflow will likely look like the below. If you're adding something that only relies on pure Julia additions, you will skip the first two steps.","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Create low-level, Julia wrappers for the Obj-C interface\nCreate high-level Julia structures and functionality\nCreate tests for added functionality","category":"page"},{"location":"faq/contributing/#Mapping-to-Metal-Intrinsics","page":"Contributing","title":"Mapping to Metal Intrinsics","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Some Metal functions map directly to Apple intermediate representation intrinsics. In this case, wrapping them into Metal.jl is relatively easy. All that needs to be done is to create a mapping from a Julia function via a simple ccall. See the threadgroup barrier implementation for reference.","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"However, the Metal documentation doesn't tell you what the format of the intrinsic names should be. To find this out, you need to create your own test kernel directly in the Metal Shading Language, compile it using Apple's tooling, then view the created intermediate representation (IR).","category":"page"},{"location":"faq/contributing/#Reverse-Engineering-Bare-MSL/Apple-IR","page":"Contributing","title":"Reverse-Engineering Bare MSL/Apple IR","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"First, you need to write an MSL kernel that uses the functionality you're interested in. For example,","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"#include <metal_stdlib>\n\nusing namespace metal;\n\nkernel void dummy_kernel(device volatile atomic_float* out,\n                        uint i [[thread_position_in_grid]])\n{\n    atomic_store_explicit(&out[i], 0.0f, memory_order_relaxed);\n}","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"To compile with Metal's tools and emit human-readable IR, run something roughly along the lines of: xcrun metal -S -emit-llvm dummy_kernel.metal","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"This will create a .ll file that you can then parse for whatever information you need. Be sure to double-check the metadata at the bottom for any significant changes your functionality introduces.","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Test with different types and configurations to see what changes are caused. Also ensure that when writing very simple kernels, whatever you're interested in doesn't get optimized away. Double-check that the kernel's IR makes sense for what you wrote.","category":"page"},{"location":"faq/contributing/#Metal-Performance-Shaders","page":"Contributing","title":"Metal Performance Shaders","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Metal exposes a special interface to its library of optimized kernels. Rather than accepting the normal set of input GPU data structures, it requires special MPS datatypes that assume row-major memory layout. As this is not the Julia default, adapt accordingly. Adding MPS functionality should be mostly straightforward, so this can be an easy entry point to helping. To get started, you can have a look at the Metal Performance Shaders Documentation from Apple.","category":"page"},{"location":"faq/contributing/#Exposing-your-Interface","page":"Contributing","title":"Exposing your Interface","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"There are varying degrees of user-facing interfaces from Metal.jl. At the lowest level is Metal.MTL.xxx. This is for low-level functionality close to or at bare Objective-C, or things that a normal user wouldn't directly be using. Metal.MPS.xxx is for Metal Performance Shader specifics (like MPSMatrix). Next, is Metal.xxx. This is for higher-level, usually pure-Julian functionality (like device()). The only thing beyond this is exporting into the global namespace. That would be useful for uniquely-named functions/structures/macros with clear and common use-cases (MtlArray or @metal).","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Additionally, you can override non-Metal.jl functions like LinearAlgebra.mul! seen here. This is essentially (ab)using multiple dispatch to specialize for certain cases (usually for more performant execution).","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"If your function is only available from within GPU kernels (like thread indexing intrinsics). Be sure to properly annotate with @device_function to ensure that calling from the host doesn't kill your Julia process.","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Generally, think about how frequently you expect your addition to be used, how complex its use-case is, and whether or not it clashes/reimplements/optimizes existing functionality from outside Metal.jl. Put it behind the corresponding interface.","category":"page"},{"location":"faq/contributing/#Creating-Tests","page":"Contributing","title":"Creating Tests","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"As it's good practice, and JuliaGPU has great CI/CD workflows, your addition should have associated tests to ensure correctness and edge cases. Look to existing examples under the test folder for initial guidance, and be sure to create tests for all valid types. Any new Julia file in this folder will be ran as its own testset. If you feel your tests don't fit in any existing place, you'll probably want to create a new file with an appropriate name.","category":"page"},{"location":"faq/contributing/#Running-a-Subset-of-the-Existing-Tests","page":"Contributing","title":"Running a Subset of the Existing Tests","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Sometimes you won't want to run the entire testsuite. You may just want to run the tests for your new functionality. To do that, you can either pass the name of the testset to the test/runtests.jl script: julia --project=test test/runtests.jl metal or you can isolate test files by running them alone after running the test/setup.jl script: julia --project=test -L test/setup.jl test/metal.jl","category":"page"},{"location":"faq/contributing/#Thank-You-and-Good-Luck","page":"Contributing","title":"Thank You and Good Luck","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Open-source projects like this only happen because people like you are willing to spend their free time helping out. Most anything you're able to do is helpful, but if you get stuck, seek guidance from Slack or Discourse. Don't feel like your contribution has to be perfect. If you put in effort and make progress, there will likely be some senior developer willing to polish your code before merging. Open-source software is a team effort...welcome to the team!","category":"page"},{"location":"api/kernel/#Kernel-programming","page":"Kernel programming","title":"Kernel programming","text":"","category":"section"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"This section lists the package's public functionality that corresponds to special Metal functions for use in device code. For more information about these functions, please consult the Metal Shading Language specification.","category":"page"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"This is made possible by interfacing with the Metal libraries by wrapping a subset of the ObjectiveC APIs using ObjectiveC.jl. These low-level wrappers are available in the MTL submodule exported by Metal.jl.","category":"page"},{"location":"api/kernel/#Indexing-and-dimensions","page":"Kernel programming","title":"Indexing and dimensions","text":"","category":"section"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"thread_execution_width\nthread_index_in_quadgroup\nthread_index_in_simdgroup\nthread_index_in_threadgroup\nthread_position_in_grid_1d\nthread_position_in_threadgroup_1d\nthreadgroup_position_in_grid_1d\nthreadgroups_per_grid_1d\nthreads_per_grid_1d\nthreads_per_simdgroup\nthreads_per_threadgroup_1d\nsimdgroups_per_threadgroup\nsimdgroup_index_in_threadgroup\nquadgroup_index_in_threadgroup\nquadgroups_per_threadgroup\ngrid_size_1d\ngrid_origin_1d","category":"page"},{"location":"api/kernel/#Metal.thread_execution_width","page":"Kernel programming","title":"Metal.thread_execution_width","text":"thread_execution_width()::UInt32\n\nReturn the execution width of the compute unit.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.thread_index_in_quadgroup","page":"Kernel programming","title":"Metal.thread_index_in_quadgroup","text":"thread_index_in_quadgroup()::UInt32\n\nReturn the index of the current thread in its quadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.thread_index_in_simdgroup","page":"Kernel programming","title":"Metal.thread_index_in_simdgroup","text":"thread_index_in_simdgroup()::UInt32\n\nReturn the index of the current thread in its simdgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.thread_index_in_threadgroup","page":"Kernel programming","title":"Metal.thread_index_in_threadgroup","text":"thread_index_in_threadgroup()::UInt32\n\nReturn the index of the current thread in its threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.thread_position_in_grid_1d","page":"Kernel programming","title":"Metal.thread_position_in_grid_1d","text":"thread_position_in_grid_1d()::UInt32\nthread_position_in_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthread_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the current thread's position in an N-dimensional grid of threads.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.thread_position_in_threadgroup_1d","page":"Kernel programming","title":"Metal.thread_position_in_threadgroup_1d","text":"thread_position_in_threadgroup_1d()::UInt32\nthread_position_in_threadgroup_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthread_position_in_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the current thread's unique position within a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.threadgroup_position_in_grid_1d","page":"Kernel programming","title":"Metal.threadgroup_position_in_grid_1d","text":"threadgroup_position_in_grid_1d()::UInt32\nthreadgroup_position_in_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthreadgroup_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the current threadgroup's unique position within the grid.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.threadgroups_per_grid_1d","page":"Kernel programming","title":"Metal.threadgroups_per_grid_1d","text":"threadgroups_per_grid_1d()::UInt32\nthreadgroups_per_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthreadgroups_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the number of threadgroups per grid.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.threads_per_grid_1d","page":"Kernel programming","title":"Metal.threads_per_grid_1d","text":"threads_per_grid_1d()::UInt32\nthreads_per_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthreads_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the grid size.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.threads_per_simdgroup","page":"Kernel programming","title":"Metal.threads_per_simdgroup","text":"threads_per_simdgroup()::UInt32\n\nReturn the thread execution width of a simdgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.threads_per_threadgroup_1d","page":"Kernel programming","title":"Metal.threads_per_threadgroup_1d","text":"threads_per_threadgroup_1d()::UInt32\nthreads_per_threadgroup_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthreads_per_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the thread execution width of a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.simdgroups_per_threadgroup","page":"Kernel programming","title":"Metal.simdgroups_per_threadgroup","text":"simdgroups_per_threadgroup()::UInt32\n\nReturn the simdgroup execution width of a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.simdgroup_index_in_threadgroup","page":"Kernel programming","title":"Metal.simdgroup_index_in_threadgroup","text":"simdgroup_index_in_threadgroup()::UInt32\n\nReturn the index of a simdgroup within a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.quadgroup_index_in_threadgroup","page":"Kernel programming","title":"Metal.quadgroup_index_in_threadgroup","text":"quadgroup_index_in_threadgroup()::UInt32\n\nReturn the index of a quadgroup within a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.quadgroups_per_threadgroup","page":"Kernel programming","title":"Metal.quadgroups_per_threadgroup","text":"quadgroups_per_threadgroup()::UInt32\n\nReturn the quadgroup execution width of a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.grid_size_1d","page":"Kernel programming","title":"Metal.grid_size_1d","text":"grid_size_1d()::UInt32\ngrid_size_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\ngrid_size_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn maximum size of the grid for threads that read per-thread stage-in data.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.grid_origin_1d","page":"Kernel programming","title":"Metal.grid_origin_1d","text":"grid_origin_1d()::UInt32\ngrid_origin_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\ngrid_origin_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the origin offset of the grid for threads that read per-thread stage-in data.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Device-arrays","page":"Kernel programming","title":"Device arrays","text":"","category":"section"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Metal.jl provides a primitive, lightweight array type to manage GPU data organized in an plain, dense fashion. This is the device-counterpart to the MtlArray, and implements (part of) the array interface as well as other functionality for use on the GPU:","category":"page"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"MtlDeviceArray\nMetal.Const","category":"page"},{"location":"api/kernel/#Metal.MtlDeviceArray","page":"Kernel programming","title":"Metal.MtlDeviceArray","text":"MtlDeviceArray(dims, ptr)\nMtlDeviceArray{T}(dims, ptr)\nMtlDeviceArray{T,A}(dims, ptr)\nMtlDeviceArray{T,A,N}(dims, ptr)\n\nConstruct an N-dimensional dense Metal device array with element type T wrapping a pointer, where N is determined from the length of dims and T is determined from the type of ptr.\n\ndims may be a single scalar, or a tuple of integers corresponding to the lengths in each dimension). If the rank N is supplied explicitly as in Array{T,N}(dims), then it must match the length of dims. The same applies to the element type T, which should match the type of the pointer ptr.\n\n\n\n\n\n","category":"type"},{"location":"api/kernel/#Metal.Const","page":"Kernel programming","title":"Metal.Const","text":"Const(A::MtlDeviceArray)\n\nMark a MtlDeviceArray as constant/read-only and to use the constant address space.\n\nwarning: Warning\nExperimental API. Subject to change without deprecation.\n\n\n\n\n\n","category":"type"},{"location":"api/kernel/#Shared-memory","page":"Kernel programming","title":"Shared memory","text":"","category":"section"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"MtlThreadGroupArray","category":"page"},{"location":"api/kernel/#Metal.MtlThreadGroupArray","page":"Kernel programming","title":"Metal.MtlThreadGroupArray","text":"MtlThreadGroupArray(::Type{T}, dims)\n\nCreate an array local to each threadgroup launched during kernel execution.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Synchronization","page":"Kernel programming","title":"Synchronization","text":"","category":"section"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"MemoryFlags\nthreadgroup_barrier\nsimdgroup_barrier","category":"page"},{"location":"api/kernel/#Metal.MemoryFlags","page":"Kernel programming","title":"Metal.MemoryFlags","text":"MemoryFlags\n\nFlags to set the memory synchronization behavior of threadgroup_barrier and simdgroup_barrier.\n\nPossible values:\n\nNone: Set barriers to only act as an execution barrier and not apply a memory fence.\n\nDevice: Ensure the GPU correctly orders the memory operations to device memory\n        for threads in the threadgroup or simdgroup.\n\nThreadGroup: Ensure the GPU correctly orders the memory operations to threadgroup\n        memory for threads in a threadgroup or simdgroup.\n\nTexture: Ensure the GPU correctly orders the memory operations to texture memory for\n        threads in a threadgroup or simdgroup for a texture with the read_write access qualifier.\n\nThreadGroup_ImgBlock: Ensure the GPU correctly orders the memory operations to threadgroup imageblock memory\n        for threads in a threadgroup or simdgroup.\n\n\n\n\n\n","category":"type"},{"location":"api/kernel/#Metal.threadgroup_barrier","page":"Kernel programming","title":"Metal.threadgroup_barrier","text":"threadgroup_barrier(flag::MemoryFlags=MemoryFlagNone)\n\nSynchronize all threads in a threadgroup.\n\nPossible flags that affect the memory synchronization behavior are found in MemoryFlags\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.simdgroup_barrier","page":"Kernel programming","title":"Metal.simdgroup_barrier","text":"simdgroup_barrier(flag::MemoryFlags=MemoryFlagNone)\n\nSynchronize all threads in a SIMD-group.\n\nPossible flags that affect the memory synchronization behavior are found in MemoryFlags\n\n\n\n\n\n","category":"function"},{"location":"faq/faq/#Frequently-Asked-Questions","page":"Frequently Asked Questions","title":"Frequently Asked Questions","text":"","category":"section"},{"location":"faq/faq/#Can-you-wrap-this-Metal-API?","page":"Frequently Asked Questions","title":"Can you wrap this Metal API?","text":"","category":"section"},{"location":"faq/faq/","page":"Frequently Asked Questions","title":"Frequently Asked Questions","text":"Most likely. Any help on designing or implementing high-level wrappers for MSL's low-level functionality is greatly appreciated, so please consider contributing your uses of these APIs on the respective repositories.","category":"page"},{"location":"api/mps/#Metal-Performance-Shaders","page":"Metal Performance Shaders","title":"Metal Performance Shaders","text":"","category":"section"},{"location":"api/mps/","page":"Metal Performance Shaders","title":"Metal Performance Shaders","text":"This section lists the package's public functionality that corresponds to the Metal Performance Shaders functions. For more information about these functions, or to see which functions have yet to be implemented in this package, please consult the Metal Performance Shaders Documentation.","category":"page"},{"location":"api/mps/#Matrices-and-Vectors","page":"Metal Performance Shaders","title":"Matrices and Vectors","text":"","category":"section"},{"location":"api/mps/","page":"Metal Performance Shaders","title":"Metal Performance Shaders","text":"MPS.MPSMatrix\nMPS.MPSVector","category":"page"},{"location":"api/mps/#Metal.MPS.MPSMatrix","page":"Metal Performance Shaders","title":"Metal.MPS.MPSMatrix","text":"MPSMatrix(mat::MtlMatrix)\n\nMetal matrix representation used in Performance Shaders.\n\nNote that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.\n\n\n\n\n\nMPSMatrix(vec::MtlVector)\n\nMetal matrix representation used in Performance Shaders.\n\nNote that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.\n\n\n\n\n\nMPSMatrix(arr::MtlArray{T,3})\n\nMetal batched matrix representation used in Performance Shaders.\n\nNote that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.\n\n\n\n\n\n","category":"type"},{"location":"api/mps/#Metal.MPS.MPSVector","page":"Metal Performance Shaders","title":"Metal.MPS.MPSVector","text":"MPSVector(arr::MtlVector)\n\nMetal vector representation used in Performance Shaders.\n\n\n\n\n\n","category":"type"},{"location":"api/mps/#Matrix-Arithmetic-Operators","page":"Metal Performance Shaders","title":"Matrix Arithmetic Operators","text":"","category":"section"},{"location":"api/mps/","page":"Metal Performance Shaders","title":"Metal Performance Shaders","text":"MPS.matmul!\nMPS.matvecmul!\nMPS.topk\nMPS.topk!","category":"page"},{"location":"api/mps/#Metal.MPS.matmul!","page":"Metal Performance Shaders","title":"Metal.MPS.matmul!","text":"matMulMPS(a::MtlMatrix, b::MtlMatrix, c::MtlMatrix, alpha=1, beta=1,\n          transpose_left=false, transpose_right=false)\n\nA MPSMatrixMultiplication kernel thay computes: c = alpha * op(a) * beta * op(b) + beta * C\n\nThis function should not typically be used. Rather, use the normal LinearAlgebra interface with any MtlArray and it should be accelerated using Metal Performance Shaders.\n\n\n\n\n\n","category":"function"},{"location":"api/mps/#Metal.MPS.matvecmul!","page":"Metal Performance Shaders","title":"Metal.MPS.matvecmul!","text":"matvecmul!(c::MtlVector, a::MtlMatrix, b::MtlVector, alpha=1, beta=1, transpose=false)\n\nA MPSMatrixVectorMultiplication kernel thay computes:   c = alpha * op(a) * b + beta * c\n\nThis function should not typically be used. Rather, use the normal LinearAlgebra interface with any MtlArray and it should be accelerated using Metal Performance Shaders.\n\n\n\n\n\n","category":"function"},{"location":"api/mps/#Metal.MPS.topk","page":"Metal Performance Shaders","title":"Metal.MPS.topk","text":"MPS.topk(A::MtlMatrix{T}, k) where {T<:MtlFloat}\n\nCompute the top k values and their corresponding indices column-wise in a matrix A. Return the indices in I and the values in V.\n\nk cannot be greater than 16.\n\nUses MPSMatrixFindTopK.\n\nSee also: topk!.\n\nwarn: Warn\nThis interface is experimental, and might change without warning.\n\n\n\n\n\n","category":"function"},{"location":"api/mps/#Metal.MPS.topk!","page":"Metal Performance Shaders","title":"Metal.MPS.topk!","text":"MPS.topk!(A::MtlMatrix{T}, I::MtlMatrix{Int32}, V::MtlMatrix{T}, k)\n                                                 where {T<:MtlFloat}\n\nCompute the top k values and their corresponding indices column-wise in a matrix A. Return the indices in I and the values in V.\n\nk cannot be greater than 16.\n\nUses MPSMatrixFindTopK.\n\nSee also: topk.\n\nwarn: Warn\nThis interface is experimental, and might change without warning.\n\n\n\n\n\n","category":"function"},{"location":"api/array/#Array-programming","page":"Array programming","title":"Array programming","text":"","category":"section"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"The Metal array type, MtlArray, generally implements the Base array interface and all of its expected methods.","category":"page"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"However, there is the special function mtl for transferring an array over to the gpu. For compatibility reasons, it will automatically convert arrays of Float64 to Float32.","category":"page"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"mtl\nMtlArray\nMtlVector\nMtlMatrix\nMtlVecOrMat","category":"page"},{"location":"api/array/#Metal.mtl","page":"Array programming","title":"Metal.mtl","text":"mtl(A; storage=Metal.PrivateStorage)\n\nstorage can be Metal.PrivateStorage (default), Metal.SharedStorage, or Metal.ManagedStorage.\n\nOpinionated GPU array adaptor, which may alter the element type T of arrays:\n\nFor T<:AbstractFloat, it makes a MtlArray{Float32} for performance and compatibility reasons (except for Float16).\nFor T<:Complex{<:AbstractFloat} it makes a MtlArray{ComplexF32}.\nFor other isbitstype(T), it makes a MtlArray{T}.\n\nBy contrast, MtlArray(A) never changes the element type.\n\nUses Adapt.jl to act inside some wrapper structs.\n\nExamples\n\njulia> mtl(ones(3)')\n1×3 adjoint(::MtlVector{Float32, Metal.PrivateStorage}) with eltype Float32:\n 1.0  1.0  1.0\n\njulia> mtl(zeros(1,3); storage=Metal.SharedStorage)\n1×3 MtlMatrix{Float32, Metal.SharedStorage}:\n 0.0  0.0  0.0\n\njulia> mtl(1:3)\n1:3\n\njulia> MtlArray(1:3)\n3-element MtlVector{Int64, Metal.PrivateStorage}:\n 1\n 2\n 3\n\n\n\n\n\n","category":"function"},{"location":"api/array/#Metal.MtlArray","page":"Array programming","title":"Metal.MtlArray","text":"MtlArray{T,N,S} <: AbstractGPUArray{T,N}\n\nN-dimensional Metal array with storage mode S and elements of type T.\n\nS can be Metal.PrivateStorage (default), Metal.SharedStorage, or Metal.ManagedStorage.\n\nSee the Array Programming section of the Metal.jl docs for more details.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Metal.MtlVector","page":"Array programming","title":"Metal.MtlVector","text":"MtlVector{T,S} <: AbstractGPUVector{T}\n\nOne-dimensional array with elements of type T for use with Apple Metal-compatible GPUs. Alias for MtlArray{T,1,S}.\n\nSee also Vector(@ref), and the Array Programming section of the Metal.jl docs for more details.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Metal.MtlMatrix","page":"Array programming","title":"Metal.MtlMatrix","text":"MtlMatrix{T,S} <: AbstractGPUMatrix{T}\n\nTwo-dimensional array with elements of type T for use with Apple Metal-compatible GPUs. Alias for MtlArray{T,2,S}.\n\nSee also Matrix(@ref), and the Array Programming section of the Metal.jl docs for more details.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Metal.MtlVecOrMat","page":"Array programming","title":"Metal.MtlVecOrMat","text":"MtlVecOrMat{T,S}\n\nUnion type of MtlVector{T,S} and MtlMatrix{T,S} which allows functions to accept either an MtlMatrix or an MtlVector.\n\nSee also VecOrMat(@ref) for examples.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Storage-modes","page":"Array programming","title":"Storage modes","text":"","category":"section"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"The Metal API has various storage modes that dictate how a resource can be accessed. MtlArrays are Metal.PrivateStorage by default, but they can also be Metal.SharedStorage or Metal.ManagedStorage. For more information on storage modes, see the official Metal documentation.","category":"page"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"Metal.PrivateStorage\nMetal.SharedStorage\nMetal.ManagedStorage","category":"page"},{"location":"api/array/#Metal.MTL.PrivateStorage","page":"Array programming","title":"Metal.MTL.PrivateStorage","text":"struct Metal.PrivateStorage <: MTL.StorageMode\n\nUsed to indicate that the resource is stored using MTLStorageModePrivate in memory.\n\nFor more information on Metal storage modes, refer to the official Metal documentation.\n\nSee also Metal.SharedStorage and Metal.ManagedStorage.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Metal.MTL.SharedStorage","page":"Array programming","title":"Metal.MTL.SharedStorage","text":"struct Metal.SharedStorage <: MTL.StorageMode\n\nUsed to indicate that the resource is stored using MTLStorageModeShared in memory.\n\nFor more information on Metal storage modes, refer to the official Metal documentation.\n\nSee also Metal.PrivateStorage and Metal.ManagedStorage.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Metal.MTL.ManagedStorage","page":"Array programming","title":"Metal.MTL.ManagedStorage","text":"struct Metal.ManagedStorage <: MTL.StorageMode\n\nUsed to indicate that the resource is stored using MTLStorageModeManaged in memory.\n\nFor more information on Metal storage modes, refer to the official Metal documentation.\n\nSee also Metal.SharedStorage and Metal.PrivateStorage.\n\n\n\n\n\n","category":"type"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"There also exist the following convenience functions to check if an MtlArray is using a specific storage mode:","category":"page"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"is_private\nis_shared\nis_managed","category":"page"},{"location":"api/array/#Metal.is_private","page":"Array programming","title":"Metal.is_private","text":"is_private(A::MtlArray) -> Bool\n\nReturns true if A has storage mode Metal.PrivateStorage.\n\nSee also is_shared and is_managed.\n\n\n\n\n\n","category":"function"},{"location":"api/array/#Metal.is_shared","page":"Array programming","title":"Metal.is_shared","text":"is_shared(A::MtlArray) -> Bool\n\nReturns true if A has storage mode Metal.SharedStorage.\n\nSee also is_private and is_managed.\n\n\n\n\n\n","category":"function"},{"location":"api/array/#Metal.is_managed","page":"Array programming","title":"Metal.is_managed","text":"is_managed(A::MtlArray) -> Bool\n\nReturns true if A has storage mode Metal.ManagedStorage.\n\nSee also is_shared and is_private.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Essentials","page":"Essentials","title":"Essentials","text":"","category":"section"},{"location":"api/essentials/#Versions-and-Support","page":"Essentials","title":"Versions and Support","text":"","category":"section"},{"location":"api/essentials/","page":"Essentials","title":"Essentials","text":"Metal.macos_version\nMetal.darwin_version\nMetal.metal_support\nMetal.metallib_support\nMetal.air_support","category":"page"},{"location":"api/essentials/#Metal.macos_version","page":"Essentials","title":"Metal.macos_version","text":"Metal.macos_version() -> VersionNumber\n\nReturns the host macOS version.\n\nSee also Metal.darwin_version.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.darwin_version","page":"Essentials","title":"Metal.darwin_version","text":"Metal.darwin_version() -> VersionNumber\n\nReturns the host Darwin kernel version.\n\nSee also Metal.macos_version.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.metal_support","page":"Essentials","title":"Metal.metal_support","text":"Metal.metal_support() -> VersionNumber\n\nReturns the highest supported version for the Metal Shading Language.\n\nSee also Metal.metallib_support and Metal.air_support.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.metallib_support","page":"Essentials","title":"Metal.metallib_support","text":"Metal.metallib_support() -> VersionNumber\n\nReturns the highest supported version for the metallib file format.\n\nSee also Metal.air_support and Metal.metal_support.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.air_support","page":"Essentials","title":"Metal.air_support","text":"Metal.air_support() -> VersionNumber\n\nReturns the highest supported version for the embedded AIR bitcode format.\n\nSee also Metal.metallib_support and Metal.metal_support.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Global-State","page":"Essentials","title":"Global State","text":"","category":"section"},{"location":"api/essentials/","page":"Essentials","title":"Essentials","text":"Metal.device!\nMetal.devices\nMetal.device\nMetal.global_queue\nMetal.synchronize\nMetal.device_synchronize","category":"page"},{"location":"api/essentials/#Metal.device!","page":"Essentials","title":"Metal.device!","text":"device!(dev::MTLDevice)\n\nSets the Metal GPU device associated with the current Julia task.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.MTL.devices","page":"Essentials","title":"Metal.MTL.devices","text":"devices()\n\nGet an iterator for the compute devices.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.device","page":"Essentials","title":"Metal.device","text":"device()::MTLDevice\n\nReturn the Metal GPU device associated with the current Julia task.\n\nSince all M-series systems currently only externally show a single GPU, this function effectively returns the only system GPU.\n\n\n\n\n\ndevice(<:MtlArray)\n\nGet the Metal device for an MtlArray.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.global_queue","page":"Essentials","title":"Metal.global_queue","text":"global_queue(dev::MTLDevice)::MTLCommandQueue\n\nReturn the Metal command queue associated with the current Julia thread.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.synchronize","page":"Essentials","title":"Metal.synchronize","text":"synchronize(queue)\n\nWait for currently committed GPU work on this queue to finish.\n\nCreate a new MTLCommandBuffer from the global command queue, commit it to the queue, and simply wait for it to be completed. Since command buffers should execute in a First-In-First-Out manner, this synchronizes the GPU.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.device_synchronize","page":"Essentials","title":"Metal.device_synchronize","text":"device_synchronize()\n\nSynchronize all committed GPU work across all global queues\n\n\n\n\n\n","category":"function"},{"location":"usage/kernel/#Kernel-programming","page":"Kernel programming","title":"Kernel programming","text":"","category":"section"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Metal.jl is based off of Apple's Metal Shading Language (MSL) and Metal framework. The interface allows you to utilize the graphics and computing power of Mac GPUs. Like many other GPU frameworks, its history is rooted in graphics processing but has found use in computing/general purpose GPU (GPGPU) applications.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"The most fundamental idea of programming GPUs (when compared to serial CPU programming) is its parallelism. A GPU function (kernel), when called, is not just ran once in isolation. Rather, numerous (often thousands to millions) psuedo-independent instances (called threads) of the kernel are executed in parallel. These threads are arranged in a hierarchy that allows for varying levels of synchronization. For Metal, the hierarchy is as follows:","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Thread: A single execution unit of the kernel\nThreadgroup: A collection of threads that share a common block of memory and synchronization","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"barriers","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Grid: A collection of threadgroups","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"The threadgroup and grid sizes are set by the user when launching the GPU kernel. There are upper limits determined by the targeted hardware, and the sizes can be 1, 2, or 3-dimensional. For Metal.jl, these sizes are set using the @metal macro's keyword arguments. The grid keyword determines the grid size while the threads keyword determines the threadgroup size.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"For example, given a 10x10x3 image that you want to run a function independently on each pixel, the kernel launch code might look like the following: @metal threads=(10,10) groups=3 my_kernel(gpu_image_array) This would launch 3 separate threadgroups of 100 threads each (10 in the first dimension and 10 in the second dimension)","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"There is also additional hierarchy layers that consists of small groups of threads that execute in lockstep called waves/SIMD groups/wavefronts* and quadgroups. However, the basic three-tier hierarchy is enough to get started.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Here is a helpful link with good visualizations of Metal's thread hierarchy (also covering SIMD groups).","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Each thread has its own set of private variables. Most importantly, each thread has associated unique indices to identify itself within its threadgroup and grid. These are traditionally what are used to differentiate execution across threads. You can also query what the grid and threadgroup sizes are as well.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"For Metal.jl, these values are accessed via the following functions:","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"thread_index_in_threadgroup()\ngrid_size_Xd()\nthread_position_in_grid_Xd()\nthread_position_in_threadgroup_Xd()\nthreadgroup_position_in_grid_Xd()\nthreadgroups_per_grid_Xd()\nthreads_per_grid_Xd()\nthreads_per_threadgroup_Xd()","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Where 'X' is 1, 2, or 3 according to the number of dimensions requested.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Using these in a kernel (taken directly from the vadd example):","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"function vadd(a, b, c)\n    i = thread_position_in_grid_1d()\n    c[i] = a[i] + b[i]\n    return\nend","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"This kernel takes in three vectors (a,b,c) all of the same length and stores the element-wise sum of a and b into c. Each thread in this kernel gets its unique position in the grid (arrangement of all threadgroups) and stores this value into the variable i which is then used as the index into the vectors. Thus, each thread is computing one sum and storing the result in the output vector.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"To ensure this kernel functions properly, we have to launch it with exactly as many threads as the length of the vectors. If we under or over-launch threads, the result could be incorrect.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"An example of a good launch:","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"len = prod(size(d_a))\n@metal threads=len vadd(d_a, d_b, d_c)","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Additional notes:","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Kernels must always return nothing\nKernels are asynchronous. To synchronize, use the Metal.@sync macro.","category":"page"},{"location":"usage/kernel/#Other-Helpful-Links","page":"Kernel programming","title":"Other Helpful Links","text":"","category":"section"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Metal Shading Language Specification An Introduction to GPU Programming course from University of Illinois (primarily in CUDA, but the concepts are transferable)","category":"page"},{"location":"api/compiler/#Compiler","page":"Compiler","title":"Compiler","text":"","category":"section"},{"location":"api/compiler/#Execution","page":"Compiler","title":"Execution","text":"","category":"section"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"The main entry-point to the compiler is the @metal macro:","category":"page"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"@metal","category":"page"},{"location":"api/compiler/#Metal.@metal","page":"Compiler","title":"Metal.@metal","text":"@metal threads=... groups=... [kwargs...] func(args...)\n\nHigh-level interface for executing code on a GPU.\n\nThe @metal macro should prefix a call, with func a callable function or object that should return nothing. It will be compiled to a Metal function upon first use, and to a certain extent arguments will be converted and managed automatically using mtlconvert. Finally, a call to mtlcall is performed, creating a command buffer in the current global command queue then committing it.\n\nThere is one supported keyword argument that influences the behavior of @metal:\n\nlaunch: whether to launch this kernel, defaults to true. If false the returned kernel object should be launched by calling it and passing arguments again.\nname: the name of the kernel in the generated code. Defaults to an automatically- generated name.\nqueue: the command queue to use for this kernel. Defaults to the global command queue.\n\n\n\n\n\n","category":"macro"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"If needed, you can use a lower-level API that lets you inspect the compiler kernel:","category":"page"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"Metal.mtlconvert\nMetal.mtlfunction","category":"page"},{"location":"api/compiler/#Metal.mtlconvert","page":"Compiler","title":"Metal.mtlconvert","text":"mtlconvert(x, [cce])\n\nThis function is called for every argument to be passed to a kernel, allowing it to be converted to a GPU-friendly format. By default, the function does nothing and returns the input object x as-is.\n\nDo not add methods to this function, but instead extend the underlying Adapt.jl package and register methods for the the Metal.Adaptor type.\n\n\n\n\n\n","category":"function"},{"location":"api/compiler/#Metal.mtlfunction","page":"Compiler","title":"Metal.mtlfunction","text":"mtlfunction(f, tt=Tuple{}; kwargs...)\n\nLow-level interface to compile a function invocation for the currently-active GPU, returning a callable kernel object. For a higher-level interface, use @metal.\n\nThe output of this function is automatically cached, i.e. you can simply call mtlfunction in a hot path without degrading performance. New code will be generated automatically when the function changes, or when different types or keyword arguments are provided.\n\n\n\n\n\n","category":"function"},{"location":"api/compiler/#Reflection","page":"Compiler","title":"Reflection","text":"","category":"section"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"If you want to inspect generated code, you can use macros that resemble functionality from the InteractiveUtils standard library:","category":"page"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"@device_code_lowered\n@device_code_typed\n@device_code_warntype\n@device_code_llvm\n@device_code_native\n@device_code_agx\n@device_code","category":"page"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"For more information, please consult the GPUCompiler.jl documentation. code_agx is actually code_native:","category":"page"},{"location":"#MacOS-GPU-programming-in-Julia","page":"Home","title":"MacOS GPU programming in Julia","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The Metal.jl package is the main entry point for GPU programming on MacOS in Julia. The package makes it possible to do so at various abstraction levels, from easy-to-use arrays down to hand-written kernels using low-level Metal APIs.","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you have any questions, please feel free to use the #gpu channel on the Julia slack, or the GPU domain of the Julia Discourse.","category":"page"},{"location":"","page":"Home","title":"Home","text":"As this package is still under development, if you spot a bug, please file an issue.","category":"page"},{"location":"#Quick-Start","page":"Home","title":"Quick Start","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Metal.jl ties into your system's existing Metal Shading Language compiler toolchain, so no additional installs are required (unless you want to view profiled GPU operations)","category":"page"},{"location":"","page":"Home","title":"Home","text":"# install the package\nusing Pkg\nPkg.add(\"Metal\")\n\n# smoke test\nusing Metal\nMetal.versioninfo()","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you want to ensure everything works as expected, you can execute the test suite.","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Pkg\nPkg.test(\"Metal\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"The following resources may also be of interest (although are mainly focused on the CUDA GPU  backend):","category":"page"},{"location":"","page":"Home","title":"Home","text":"Effectively using GPUs with Julia: slides\nHow Julia is compiled to GPUs: video","category":"page"},{"location":"#Contributing","page":"Home","title":"Contributing","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"If you want to help improve this package, look at the contributing page for more details.","category":"page"},{"location":"#Acknowledgements","page":"Home","title":"Acknowledgements","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The Julia Metal stack has been a collaborative effort by many individuals. Significant contributions have been made by the following individuals:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Tim Besard (@maleadt) (lead developer)\nFilippo Vicentini (@PhilipVinc)\nMax Hawkins (@max-Hawkins)","category":"page"},{"location":"#Supporting-and-Citing","page":"Home","title":"Supporting and Citing","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Some of the software in this ecosystem was developed as part of academic research. If you would like to help support it, please star the repository as such metrics may help us secure funding in the future. If you use our software as part of your research, teaching, or other activities, we would be grateful if you could cite our work. The CITATION.cff file in the root of this repository lists the relevant papers.","category":"page"},{"location":"profiling/#Profiling","page":"Profiling","title":"Profiling","text":"","category":"section"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"Profiling GPU code is harder than profiling Julia code executing on the CPU. For one, kernels typically execute asynchronously, and thus require appropriate synchronization when measuring their execution time. Furthermore, because the code executes on a different processor, it is much harder to know what is currently executing.","category":"page"},{"location":"profiling/#Time-measurements","page":"Profiling","title":"Time measurements","text":"","category":"section"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"For robust measurements, it is advised to use the BenchmarkTools.jl package which goes to great lengths to perform accurate measurements. Due to the asynchronous nature of GPUs, you need to ensure the GPU is synchronized at the end of every sample, e.g. by calling synchronize() or, even better, wrapping your code in Metal.@sync:","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"Note that the allocations as reported by BenchmarkTools are CPU allocations.","category":"page"},{"location":"profiling/#Application-tracing","page":"Profiling","title":"Application tracing","text":"","category":"section"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"For profiling large applications, simple timings are insufficient. Instead, we want an overview of how and when the GPU was active to avoid times where the device was idle and/or find which kernels needs optimization.","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"As we cannot use the Julia profiler for this task, we will use Metal's GPU profiler directly. Use the Metal.@profile macro to surround the code code of interest. This macro tells your system to track GPU calls and usage statistics and will save this information in a temporary folder ending in '.trace'. For later viewing, copy this folder to a stable location or use the 'dir' argument of the profile macro to store the gputrace to a different location directly.","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"The resulting trace can be opened with the Instruments app, part of Xcode.","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"julia> using Metal\n\njulia> function vadd(a, b, c)\n           i = thread_position_in_grid_1d()\n           c[i] = a[i] + b[i]\n           return\n       end\njulia> a = MtlArray([1]); b = MtlArray([2]); c = similar(a);\n\njulia> Metal.@profile @metal threads=length(c) vadd(a, b, c);\n...\n[ Info: System trace saved to julia_3.trace; open the resulting trace in Instruments","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"It is possible to augment the trace with additional information by using signposts: Similar to NVTX markers and ranges in CUDA.jl, signpost intervals and events can be used to add respectively time intervals and points of interest to the trace. This can be done by using the signpost functionality from ObjectiveC.jl:","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"using ObjectiveC, .OS\n\n@signpost_interval \"My Interval\" begin\n    # code to profile\n    @signpost_event \"My Event\"\nend","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"For more information, e.g. how to pass additional messages to the signposts, or how to use a custom logger, consult the ObjectiveC.jl documentation, or the docstrings of the @signpost_interval and @signpost_event macros.","category":"page"},{"location":"profiling/#Frame-capture","page":"Profiling","title":"Frame capture","text":"","category":"section"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"For more details on specific operations, you can use Metal's frame capture feature to generate a more detailed, and replayable trace of the GPU operations. This requires that Julia is started with the METAL_CAPTURE_ENABLED environment variable set to 1. Frames are captured by wrapping the code of interest in Metal.@capture, and the resulting trace can be opened with Xcode.","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"$ METAL_CAPTURE_ENABLED=1 julia\n...\n\njulia> using Metal\n\njulia> function vadd(a, b, c)\n           i = thread_position_in_grid_1d()\n           c[i] = a[i] + b[i]\n           return\n       end\n\njulia> a = MtlArray([1]); b = MtlArray([2]); c = similar(a);\n... Metal GPU Frame Capture Enabled\n\njulia> Metal.@capture @metal threads=length(c) vadd(a, b, c);\n...\n[ Info: GPU frame capture saved to julia_1.gputrace; open the resulting trace in Xcode","category":"page"}]
+[{"location":"usage/overview/#UsageOverview","page":"Overview","title":"Overview","text":"","category":"section"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"The Metal.jl package provides three distinct, but related, interfaces for Metal programming:","category":"page"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"the MtlArray type: for programming with arrays;\nnative kernel programming capabilities: for writing Metal kernels in Julia;\nMetal API wrappers: for low-level interactions with the Metal libraries.","category":"page"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"Much of the Julia Metal programming stack can be used by just relying on the MtlArray type, and using platform-agnostic programming patterns like broadcast and other array abstractions. Only once you hit a performance bottleneck, or some missing functionality, you might need to write a custom kernel or use the underlying Metal APIs.","category":"page"},{"location":"usage/overview/#The-MtlArray-type","page":"Overview","title":"The MtlArray type","text":"","category":"section"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"The MtlArray type is an essential part of the toolchain. Primarily, it is used to manage GPU memory, and copy data from and back to the CPU:","category":"page"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"a = MtlArray{Int}(undef, 1024)\n\n# essential memory operations, like copying, filling, reshaping, ...\nb = copy(a)\nfill!(b, 0)\n@test b == Metal.zeros(Int, 1024)\n\n# automatic memory management\na = nothing","category":"page"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"Beyond memory management, there are a whole range of array operations to process your data. This includes several higher-order operations that take other code as arguments, such as map, reduce or broadcast. With these, it is possible to perform kernel-like operations without actually writing your own GPU kernels:","category":"page"},{"location":"usage/overview/","page":"Overview","title":"Overview","text":"a = Metal.zeros(1024)\nb = Metal.ones(1024)\na.^2 .+ sin.(b)","category":"page"},{"location":"usage/array/#Array-programming","page":"Array programming","title":"Array programming","text":"","category":"section"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"DocTestSetup = quote\n    using Metal\n    using GPUArrays\n\n    import Random\n    Random.seed!(1)\n\n    Metal.seed!(1)\nend","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"The easiest way to use the GPU's massive parallelism, is by expressing operations in terms of arrays: Metal.jl provides an array type, MtlArray, and many specialized array operations that execute efficiently on the GPU hardware. In this section, we will briefly demonstrate use of the MtlArray type. Since we expose Metal's functionality by implementing existing Julia interfaces on the MtlArray type, you should refer to the upstream Julia documentation for more information on these operations.","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"If you encounter missing functionality, or are running into operations that trigger so-called \"scalar iteration\", have a look at the issue tracker and file a new issue if there's none. Do note that you can always access the underlying Metal APIs by calling into the relevant submodule.","category":"page"},{"location":"usage/array/#Construction-and-Initialization","page":"Array programming","title":"Construction and Initialization","text":"","category":"section"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"The MtlArray type aims to implement the AbstractArray interface, and provide implementations of methods that are commonly used when working with arrays. That means you can construct MtlArrays in the same way as regular Array objects:","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"julia> MtlArray{Int}(undef, 2)\n2-element MtlVector{Int64, Metal.PrivateStorage}:\n 0\n 0\n\njulia> MtlArray{Int}(undef, (1,2))\n1×2 MtlMatrix{Int64, Metal.PrivateStorage}:\n 0  0\n\njulia> similar(ans)\n1×2 MtlMatrix{Int64, Metal.PrivateStorage}:\n 0  0","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"Copying memory to or from the GPU can be expressed using constructors as well, or by calling copyto!:","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"julia> a = MtlArray([1,2])\n2-element MtlVector{Int64, Metal.PrivateStorage}:\n 1\n 2\n\njulia> b = Array(a)\n2-element Vector{Int64}:\n 1\n 2\n\njulia> copyto!(b, a)\n2-element Vector{Int64}:\n 1\n 2","category":"page"},{"location":"usage/array/#Higher-order-abstractions","page":"Array programming","title":"Higher-order abstractions","text":"","category":"section"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"The real power of programming GPUs with arrays comes from Julia's higher-order array abstractions: Operations that take user code as an argument, and specialize execution on it. With these functions, you can often avoid having to write custom kernels. For example, to perform simple element-wise operations you can use map or broadcast:","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"julia> a = MtlArray{Float32}(undef, (1,2));\n\njulia> a .= 5\n1×2 MtlMatrix{Float32, Metal.PrivateStorage}:\n 5.0  5.0\n\njulia> map(sin, a)\n1×2 MtlMatrix{Float32, Metal.PrivateStorage}:\n -0.958924  -0.958924","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"To reduce the dimensionality of arrays, Metal.jl implements the various flavours of (map)reduce(dim):","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"julia> a = Metal.ones(2,3)\n2×3 MtlMatrix{Float32, Metal.PrivateStorage}:\n 1.0  1.0  1.0\n 1.0  1.0  1.0\n\njulia> reduce(+, a)\n6.0f0\n\njulia> mapreduce(sin, *, a; dims=2)\n2×1 MtlMatrix{Float32, Metal.PrivateStorage}:\n 0.59582335\n 0.59582335\n\njulia> b = Metal.zeros(1)\n1-element MtlVector{Float32, Metal.PrivateStorage}:\n 0.0\n\njulia> Base.mapreducedim!(identity, +, b, a)\n1×1 MtlMatrix{Float32, Metal.PrivateStorage}:\n 6.0","category":"page"},{"location":"usage/array/#Random-numbers","page":"Array programming","title":"Random numbers","text":"","category":"section"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"Base's convenience functions for generating random numbers are available in Metal as well:","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"julia> Metal.rand(2)\n2-element MtlVector{Float32, Metal.PrivateStorage}:\n 0.89025915\n 0.8946847\n\njulia> Metal.randn(Float32, 2, 1)\n2×1 MtlMatrix{Float32, Metal.PrivateStorage}:\n 1.2279074\n 1.2518331","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"Behind the scenes, these random numbers come from two different generators: one backed by Metal Performance Shaders, another by using the GPUArrays.jl random methods. Operations on these generators are implemented using methods from the Random standard library:","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"julia> using Random, GPUArrays\n\njulia> a = Random.rand(MPS.default_rng(), Float32, 1)\n1-element MtlVector{Float32, Metal.PrivateStorage}:\n 0.89025915\n\njulia> a = Random.rand!(GPUArrays.default_rng(MtlArray), a)\n1-element MtlVector{Float32, Metal.PrivateStorage}:\n 0.0705002","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"note: Note\nMPSMatrixRandom functionality requires Metal.jl >= v1.4","category":"page"},{"location":"usage/array/","page":"Array programming","title":"Array programming","text":"warning: Warning\nRandom.rand!(::MPS.RNG, args...) and Random.randn!(::MPS.RNG, args...) have a framework limitation that requires the byte offset and byte size of the destination array to be a multiple of 4.","category":"page"},{"location":"faq/contributing/#Contributing","page":"Contributing","title":"Contributing","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Metal.jl is an especially accessible GPU backend with the presence of GPUs on Apple's recent popular Macbooks. As a result, an average Julia user can now develop and test GPU-accelerated code locally on their laptop. If you're using this package and see a bug or want some additional functionality, this page is for you. Hopefully this information helps encourage you to contribute to the package yourself.","category":"page"},{"location":"faq/contributing/#What-needs-help?","page":"Contributing","title":"What needs help?","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"If you didn't come to this page with your own feature to add, look at the current issues in the git repo for bugs and requested functionality.","category":"page"},{"location":"faq/contributing/#I'm-a-beginner,-can-I-help?","page":"Contributing","title":"I'm a beginner, can I help?","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Yes, but you may spend more time learning rather than directly contributing at the start. Depending on what your goals are though, this might be desirable. There are differing levels of difficulty when considering contributions to Metal.jl. If you're new to these things, check the issues for \"Good First Issue\" tags, look at the documentation for areas that could be added (beginners are especially good at detecting these sort of deficiencies), or message on the Slack #gpu channel asking for guidance.","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Regardless, if you've never used Metal.jl before, it'd probably be best to gain some exposure to it before trying to contibute. You might run into bugs yourself or discover some area you'd really like to help with.","category":"page"},{"location":"faq/contributing/#General-Workflow-for-Adding-Functionality","page":"Contributing","title":"General Workflow for Adding Functionality","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"If you're adding some functionality that originates from Metal Shading Language (MSL) directly (rather than high-level Julia functionality), the workflow will likely look like the below. If you're adding something that only relies on pure Julia additions, you will skip the first two steps.","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Create low-level, Julia wrappers for the Obj-C interface\nCreate high-level Julia structures and functionality\nCreate tests for added functionality","category":"page"},{"location":"faq/contributing/#Mapping-to-Metal-Intrinsics","page":"Contributing","title":"Mapping to Metal Intrinsics","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Some Metal functions map directly to Apple intermediate representation intrinsics. In this case, wrapping them into Metal.jl is relatively easy. All that needs to be done is to create a mapping from a Julia function via a simple ccall. See the threadgroup barrier implementation for reference.","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"However, the Metal documentation doesn't tell you what the format of the intrinsic names should be. To find this out, you need to create your own test kernel directly in the Metal Shading Language, compile it using Apple's tooling, then view the created intermediate representation (IR).","category":"page"},{"location":"faq/contributing/#Reverse-Engineering-Bare-MSL/Apple-IR","page":"Contributing","title":"Reverse-Engineering Bare MSL/Apple IR","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"First, you need to write an MSL kernel that uses the functionality you're interested in. For example,","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"#include <metal_stdlib>\n\nusing namespace metal;\n\nkernel void dummy_kernel(device volatile atomic_float* out,\n                        uint i [[thread_position_in_grid]])\n{\n    atomic_store_explicit(&out[i], 0.0f, memory_order_relaxed);\n}","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"To compile with Metal's tools and emit human-readable IR, run something roughly along the lines of: xcrun metal -S -emit-llvm dummy_kernel.metal","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"This will create a .ll file that you can then parse for whatever information you need. Be sure to double-check the metadata at the bottom for any significant changes your functionality introduces.","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Test with different types and configurations to see what changes are caused. Also ensure that when writing very simple kernels, whatever you're interested in doesn't get optimized away. Double-check that the kernel's IR makes sense for what you wrote.","category":"page"},{"location":"faq/contributing/#Metal-Performance-Shaders","page":"Contributing","title":"Metal Performance Shaders","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Metal exposes a special interface to its library of optimized kernels. Rather than accepting the normal set of input GPU data structures, it requires special MPS datatypes that assume row-major memory layout. As this is not the Julia default, adapt accordingly. Adding MPS functionality should be mostly straightforward, so this can be an easy entry point to helping. To get started, you can have a look at the Metal Performance Shaders Documentation from Apple.","category":"page"},{"location":"faq/contributing/#Exposing-your-Interface","page":"Contributing","title":"Exposing your Interface","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"There are varying degrees of user-facing interfaces from Metal.jl. At the lowest level is Metal.MTL.xxx. This is for low-level functionality close to or at bare Objective-C, or things that a normal user wouldn't directly be using. Metal.MPS.xxx is for Metal Performance Shader specifics (like MPSMatrix). Next, is Metal.xxx. This is for higher-level, usually pure-Julian functionality (like device()). The only thing beyond this is exporting into the global namespace. That would be useful for uniquely-named functions/structures/macros with clear and common use-cases (MtlArray or @metal).","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Additionally, you can override non-Metal.jl functions like LinearAlgebra.mul! seen here. This is essentially (ab)using multiple dispatch to specialize for certain cases (usually for more performant execution).","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"If your function is only available from within GPU kernels (like thread indexing intrinsics). Be sure to properly annotate with @device_function to ensure that calling from the host doesn't kill your Julia process.","category":"page"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Generally, think about how frequently you expect your addition to be used, how complex its use-case is, and whether or not it clashes/reimplements/optimizes existing functionality from outside Metal.jl. Put it behind the corresponding interface.","category":"page"},{"location":"faq/contributing/#Creating-Tests","page":"Contributing","title":"Creating Tests","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"As it's good practice, and JuliaGPU has great CI/CD workflows, your addition should have associated tests to ensure correctness and edge cases. Look to existing examples under the test folder for initial guidance, and be sure to create tests for all valid types. Any new Julia file in this folder will be ran as its own testset. If you feel your tests don't fit in any existing place, you'll probably want to create a new file with an appropriate name.","category":"page"},{"location":"faq/contributing/#Running-a-Subset-of-the-Existing-Tests","page":"Contributing","title":"Running a Subset of the Existing Tests","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Sometimes you won't want to run the entire testsuite. You may just want to run the tests for your new functionality. To do that, you can either pass the name of the testset to the test/runtests.jl script: julia --project=test test/runtests.jl metal or you can isolate test files by running them alone after running the test/setup.jl script: julia --project=test -L test/setup.jl test/metal.jl","category":"page"},{"location":"faq/contributing/#Thank-You-and-Good-Luck","page":"Contributing","title":"Thank You and Good Luck","text":"","category":"section"},{"location":"faq/contributing/","page":"Contributing","title":"Contributing","text":"Open-source projects like this only happen because people like you are willing to spend their free time helping out. Most anything you're able to do is helpful, but if you get stuck, seek guidance from Slack or Discourse. Don't feel like your contribution has to be perfect. If you put in effort and make progress, there will likely be some senior developer willing to polish your code before merging. Open-source software is a team effort...welcome to the team!","category":"page"},{"location":"api/kernel/#Kernel-programming","page":"Kernel programming","title":"Kernel programming","text":"","category":"section"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"This section lists the package's public functionality that corresponds to special Metal functions for use in device code. For more information about these functions, please consult the Metal Shading Language specification.","category":"page"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"This is made possible by interfacing with the Metal libraries by wrapping a subset of the ObjectiveC APIs using ObjectiveC.jl. These low-level wrappers are available in the MTL submodule exported by Metal.jl.","category":"page"},{"location":"api/kernel/#Indexing-and-dimensions","page":"Kernel programming","title":"Indexing and dimensions","text":"","category":"section"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"thread_execution_width\nthread_index_in_quadgroup\nthread_index_in_simdgroup\nthread_index_in_threadgroup\nthread_position_in_grid_1d\nthread_position_in_threadgroup_1d\nthreadgroup_position_in_grid_1d\nthreadgroups_per_grid_1d\nthreads_per_grid_1d\nthreads_per_simdgroup\nthreads_per_threadgroup_1d\nsimdgroups_per_threadgroup\nsimdgroup_index_in_threadgroup\nquadgroup_index_in_threadgroup\nquadgroups_per_threadgroup\ngrid_size_1d\ngrid_origin_1d","category":"page"},{"location":"api/kernel/#Metal.thread_execution_width","page":"Kernel programming","title":"Metal.thread_execution_width","text":"thread_execution_width()::UInt32\n\nReturn the execution width of the compute unit.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.thread_index_in_quadgroup","page":"Kernel programming","title":"Metal.thread_index_in_quadgroup","text":"thread_index_in_quadgroup()::UInt32\n\nReturn the index of the current thread in its quadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.thread_index_in_simdgroup","page":"Kernel programming","title":"Metal.thread_index_in_simdgroup","text":"thread_index_in_simdgroup()::UInt32\n\nReturn the index of the current thread in its simdgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.thread_index_in_threadgroup","page":"Kernel programming","title":"Metal.thread_index_in_threadgroup","text":"thread_index_in_threadgroup()::UInt32\n\nReturn the index of the current thread in its threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.thread_position_in_grid_1d","page":"Kernel programming","title":"Metal.thread_position_in_grid_1d","text":"thread_position_in_grid_1d()::UInt32\nthread_position_in_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthread_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the current thread's position in an N-dimensional grid of threads.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.thread_position_in_threadgroup_1d","page":"Kernel programming","title":"Metal.thread_position_in_threadgroup_1d","text":"thread_position_in_threadgroup_1d()::UInt32\nthread_position_in_threadgroup_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthread_position_in_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the current thread's unique position within a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.threadgroup_position_in_grid_1d","page":"Kernel programming","title":"Metal.threadgroup_position_in_grid_1d","text":"threadgroup_position_in_grid_1d()::UInt32\nthreadgroup_position_in_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthreadgroup_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the current threadgroup's unique position within the grid.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.threadgroups_per_grid_1d","page":"Kernel programming","title":"Metal.threadgroups_per_grid_1d","text":"threadgroups_per_grid_1d()::UInt32\nthreadgroups_per_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthreadgroups_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the number of threadgroups per grid.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.threads_per_grid_1d","page":"Kernel programming","title":"Metal.threads_per_grid_1d","text":"threads_per_grid_1d()::UInt32\nthreads_per_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthreads_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the grid size.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.threads_per_simdgroup","page":"Kernel programming","title":"Metal.threads_per_simdgroup","text":"threads_per_simdgroup()::UInt32\n\nReturn the thread execution width of a simdgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.threads_per_threadgroup_1d","page":"Kernel programming","title":"Metal.threads_per_threadgroup_1d","text":"threads_per_threadgroup_1d()::UInt32\nthreads_per_threadgroup_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\nthreads_per_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the thread execution width of a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.simdgroups_per_threadgroup","page":"Kernel programming","title":"Metal.simdgroups_per_threadgroup","text":"simdgroups_per_threadgroup()::UInt32\n\nReturn the simdgroup execution width of a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.simdgroup_index_in_threadgroup","page":"Kernel programming","title":"Metal.simdgroup_index_in_threadgroup","text":"simdgroup_index_in_threadgroup()::UInt32\n\nReturn the index of a simdgroup within a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.quadgroup_index_in_threadgroup","page":"Kernel programming","title":"Metal.quadgroup_index_in_threadgroup","text":"quadgroup_index_in_threadgroup()::UInt32\n\nReturn the index of a quadgroup within a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.quadgroups_per_threadgroup","page":"Kernel programming","title":"Metal.quadgroups_per_threadgroup","text":"quadgroups_per_threadgroup()::UInt32\n\nReturn the quadgroup execution width of a threadgroup.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.grid_size_1d","page":"Kernel programming","title":"Metal.grid_size_1d","text":"grid_size_1d()::UInt32\ngrid_size_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\ngrid_size_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn maximum size of the grid for threads that read per-thread stage-in data.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.grid_origin_1d","page":"Kernel programming","title":"Metal.grid_origin_1d","text":"grid_origin_1d()::UInt32\ngrid_origin_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}\ngrid_origin_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}\n\nReturn the origin offset of the grid for threads that read per-thread stage-in data.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Device-arrays","page":"Kernel programming","title":"Device arrays","text":"","category":"section"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Metal.jl provides a primitive, lightweight array type to manage GPU data organized in an plain, dense fashion. This is the device-counterpart to the MtlArray, and implements (part of) the array interface as well as other functionality for use on the GPU:","category":"page"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"MtlDeviceArray\nMetal.Const","category":"page"},{"location":"api/kernel/#Metal.MtlDeviceArray","page":"Kernel programming","title":"Metal.MtlDeviceArray","text":"MtlDeviceArray(dims, ptr)\nMtlDeviceArray{T}(dims, ptr)\nMtlDeviceArray{T,A}(dims, ptr)\nMtlDeviceArray{T,A,N}(dims, ptr)\n\nConstruct an N-dimensional dense Metal device array with element type T wrapping a pointer, where N is determined from the length of dims and T is determined from the type of ptr.\n\ndims may be a single scalar, or a tuple of integers corresponding to the lengths in each dimension). If the rank N is supplied explicitly as in Array{T,N}(dims), then it must match the length of dims. The same applies to the element type T, which should match the type of the pointer ptr.\n\n\n\n\n\n","category":"type"},{"location":"api/kernel/#Metal.Const","page":"Kernel programming","title":"Metal.Const","text":"Const(A::MtlDeviceArray)\n\nMark a MtlDeviceArray as constant/read-only and to use the constant address space.\n\nwarning: Warning\nExperimental API. Subject to change without deprecation.\n\n\n\n\n\n","category":"type"},{"location":"api/kernel/#Shared-memory","page":"Kernel programming","title":"Shared memory","text":"","category":"section"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"MtlThreadGroupArray","category":"page"},{"location":"api/kernel/#Metal.MtlThreadGroupArray","page":"Kernel programming","title":"Metal.MtlThreadGroupArray","text":"MtlThreadGroupArray(::Type{T}, dims)\n\nCreate an array local to each threadgroup launched during kernel execution.\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Synchronization","page":"Kernel programming","title":"Synchronization","text":"","category":"section"},{"location":"api/kernel/","page":"Kernel programming","title":"Kernel programming","text":"MemoryFlags\nthreadgroup_barrier\nsimdgroup_barrier","category":"page"},{"location":"api/kernel/#Metal.MemoryFlags","page":"Kernel programming","title":"Metal.MemoryFlags","text":"MemoryFlags\n\nFlags to set the memory synchronization behavior of threadgroup_barrier and simdgroup_barrier.\n\nPossible values:\n\nNone: Set barriers to only act as an execution barrier and not apply a memory fence.\n\nDevice: Ensure the GPU correctly orders the memory operations to device memory\n        for threads in the threadgroup or simdgroup.\n\nThreadGroup: Ensure the GPU correctly orders the memory operations to threadgroup\n        memory for threads in a threadgroup or simdgroup.\n\nTexture: Ensure the GPU correctly orders the memory operations to texture memory for\n        threads in a threadgroup or simdgroup for a texture with the read_write access qualifier.\n\nThreadGroup_ImgBlock: Ensure the GPU correctly orders the memory operations to threadgroup imageblock memory\n        for threads in a threadgroup or simdgroup.\n\n\n\n\n\n","category":"type"},{"location":"api/kernel/#Metal.threadgroup_barrier","page":"Kernel programming","title":"Metal.threadgroup_barrier","text":"threadgroup_barrier(flag::MemoryFlags=MemoryFlagNone)\n\nSynchronize all threads in a threadgroup.\n\nPossible flags that affect the memory synchronization behavior are found in MemoryFlags\n\n\n\n\n\n","category":"function"},{"location":"api/kernel/#Metal.simdgroup_barrier","page":"Kernel programming","title":"Metal.simdgroup_barrier","text":"simdgroup_barrier(flag::MemoryFlags=MemoryFlagNone)\n\nSynchronize all threads in a SIMD-group.\n\nPossible flags that affect the memory synchronization behavior are found in MemoryFlags\n\n\n\n\n\n","category":"function"},{"location":"faq/faq/#Frequently-Asked-Questions","page":"Frequently Asked Questions","title":"Frequently Asked Questions","text":"","category":"section"},{"location":"faq/faq/#Can-you-wrap-this-Metal-API?","page":"Frequently Asked Questions","title":"Can you wrap this Metal API?","text":"","category":"section"},{"location":"faq/faq/","page":"Frequently Asked Questions","title":"Frequently Asked Questions","text":"Most likely. Any help on designing or implementing high-level wrappers for MSL's low-level functionality is greatly appreciated, so please consider contributing your uses of these APIs on the respective repositories.","category":"page"},{"location":"api/mps/#Metal-Performance-Shaders","page":"Metal Performance Shaders","title":"Metal Performance Shaders","text":"","category":"section"},{"location":"api/mps/","page":"Metal Performance Shaders","title":"Metal Performance Shaders","text":"This section lists the package's public functionality that corresponds to the Metal Performance Shaders functions. For more information about these functions, or to see which functions have yet to be implemented in this package, please consult the Metal Performance Shaders Documentation.","category":"page"},{"location":"api/mps/#Matrices-and-Vectors","page":"Metal Performance Shaders","title":"Matrices and Vectors","text":"","category":"section"},{"location":"api/mps/","page":"Metal Performance Shaders","title":"Metal Performance Shaders","text":"MPS.MPSMatrix\nMPS.MPSVector","category":"page"},{"location":"api/mps/#Metal.MPS.MPSMatrix","page":"Metal Performance Shaders","title":"Metal.MPS.MPSMatrix","text":"MPSMatrix(mat::MtlMatrix)\n\nMetal matrix representation used in Performance Shaders.\n\nNote that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.\n\n\n\n\n\nMPSMatrix(vec::MtlVector)\n\nMetal matrix representation used in Performance Shaders.\n\nNote that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.\n\n\n\n\n\nMPSMatrix(arr::MtlArray{T,3})\n\nMetal batched matrix representation used in Performance Shaders.\n\nNote that this results in a transposed view of the input, as Metal stores matrices row-major instead of column-major.\n\n\n\n\n\n","category":"type"},{"location":"api/mps/#Metal.MPS.MPSVector","page":"Metal Performance Shaders","title":"Metal.MPS.MPSVector","text":"MPSVector(arr::MtlVector)\n\nMetal vector representation used in Performance Shaders.\n\n\n\n\n\n","category":"type"},{"location":"api/mps/#Matrix-Arithmetic-Operators","page":"Metal Performance Shaders","title":"Matrix Arithmetic Operators","text":"","category":"section"},{"location":"api/mps/","page":"Metal Performance Shaders","title":"Metal Performance Shaders","text":"MPS.matmul!\nMPS.matvecmul!\nMPS.topk\nMPS.topk!","category":"page"},{"location":"api/mps/#Metal.MPS.matmul!","page":"Metal Performance Shaders","title":"Metal.MPS.matmul!","text":"matMulMPS(a::MtlMatrix, b::MtlMatrix, c::MtlMatrix, alpha=1, beta=1,\n          transpose_left=false, transpose_right=false)\n\nA MPSMatrixMultiplication kernel thay computes: c = alpha * op(a) * beta * op(b) + beta * C\n\nThis function should not typically be used. Rather, use the normal LinearAlgebra interface with any MtlArray and it should be accelerated using Metal Performance Shaders.\n\n\n\n\n\n","category":"function"},{"location":"api/mps/#Metal.MPS.matvecmul!","page":"Metal Performance Shaders","title":"Metal.MPS.matvecmul!","text":"matvecmul!(c::MtlVector, a::MtlMatrix, b::MtlVector, alpha=1, beta=1, transpose=false)\n\nA MPSMatrixVectorMultiplication kernel thay computes:   c = alpha * op(a) * b + beta * c\n\nThis function should not typically be used. Rather, use the normal LinearAlgebra interface with any MtlArray and it should be accelerated using Metal Performance Shaders.\n\n\n\n\n\n","category":"function"},{"location":"api/mps/#Metal.MPS.topk","page":"Metal Performance Shaders","title":"Metal.MPS.topk","text":"MPS.topk(A::MtlMatrix{T}, k) where {T<:MtlFloat}\n\nCompute the top k values and their corresponding indices column-wise in a matrix A. Return the indices in I and the values in V.\n\nk cannot be greater than 16.\n\nUses MPSMatrixFindTopK.\n\nSee also: topk!.\n\nwarn: Warn\nThis interface is experimental, and might change without warning.\n\n\n\n\n\n","category":"function"},{"location":"api/mps/#Metal.MPS.topk!","page":"Metal Performance Shaders","title":"Metal.MPS.topk!","text":"MPS.topk!(A::MtlMatrix{T}, I::MtlMatrix{Int32}, V::MtlMatrix{T}, k)\n                                                 where {T<:MtlFloat}\n\nCompute the top k values and their corresponding indices column-wise in a matrix A. Return the indices in I and the values in V.\n\nk cannot be greater than 16.\n\nUses MPSMatrixFindTopK.\n\nSee also: topk.\n\nwarn: Warn\nThis interface is experimental, and might change without warning.\n\n\n\n\n\n","category":"function"},{"location":"api/array/#Array-programming","page":"Array programming","title":"Array programming","text":"","category":"section"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"The Metal array type, MtlArray, generally implements the Base array interface and all of its expected methods.","category":"page"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"However, there is the special function mtl for transferring an array over to the gpu. For compatibility reasons, it will automatically convert arrays of Float64 to Float32.","category":"page"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"mtl\nMtlArray\nMtlVector\nMtlMatrix\nMtlVecOrMat","category":"page"},{"location":"api/array/#Metal.mtl","page":"Array programming","title":"Metal.mtl","text":"mtl(A; storage=Metal.PrivateStorage)\n\nstorage can be Metal.PrivateStorage (default), Metal.SharedStorage, or Metal.ManagedStorage.\n\nOpinionated GPU array adaptor, which may alter the element type T of arrays:\n\nFor T<:AbstractFloat, it makes a MtlArray{Float32} for performance and compatibility reasons (except for Float16).\nFor T<:Complex{<:AbstractFloat} it makes a MtlArray{ComplexF32}.\nFor other isbitstype(T), it makes a MtlArray{T}.\n\nBy contrast, MtlArray(A) never changes the element type.\n\nUses Adapt.jl to act inside some wrapper structs.\n\nExamples\n\njulia> mtl(ones(3)')\n1×3 adjoint(::MtlVector{Float32, Metal.PrivateStorage}) with eltype Float32:\n 1.0  1.0  1.0\n\njulia> mtl(zeros(1,3); storage=Metal.SharedStorage)\n1×3 MtlMatrix{Float32, Metal.SharedStorage}:\n 0.0  0.0  0.0\n\njulia> mtl(1:3)\n1:3\n\njulia> MtlArray(1:3)\n3-element MtlVector{Int64, Metal.PrivateStorage}:\n 1\n 2\n 3\n\n\n\n\n\n","category":"function"},{"location":"api/array/#Metal.MtlArray","page":"Array programming","title":"Metal.MtlArray","text":"MtlArray{T,N,S} <: AbstractGPUArray{T,N}\n\nN-dimensional Metal array with storage mode S and elements of type T.\n\nS can be Metal.PrivateStorage (default), Metal.SharedStorage, or Metal.ManagedStorage.\n\nSee the Array Programming section of the Metal.jl docs for more details.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Metal.MtlVector","page":"Array programming","title":"Metal.MtlVector","text":"MtlVector{T,S} <: AbstractGPUVector{T}\n\nOne-dimensional array with elements of type T for use with Apple Metal-compatible GPUs. Alias for MtlArray{T,1,S}.\n\nSee also Vector(@ref), and the Array Programming section of the Metal.jl docs for more details.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Metal.MtlMatrix","page":"Array programming","title":"Metal.MtlMatrix","text":"MtlMatrix{T,S} <: AbstractGPUMatrix{T}\n\nTwo-dimensional array with elements of type T for use with Apple Metal-compatible GPUs. Alias for MtlArray{T,2,S}.\n\nSee also Matrix(@ref), and the Array Programming section of the Metal.jl docs for more details.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Metal.MtlVecOrMat","page":"Array programming","title":"Metal.MtlVecOrMat","text":"MtlVecOrMat{T,S}\n\nUnion type of MtlVector{T,S} and MtlMatrix{T,S} which allows functions to accept either an MtlMatrix or an MtlVector.\n\nSee also VecOrMat(@ref) for examples.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Storage-modes","page":"Array programming","title":"Storage modes","text":"","category":"section"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"The Metal API has various storage modes that dictate how a resource can be accessed. MtlArrays are Metal.PrivateStorage by default, but they can also be Metal.SharedStorage or Metal.ManagedStorage. For more information on storage modes, see the official Metal documentation.","category":"page"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"Metal.PrivateStorage\nMetal.SharedStorage\nMetal.ManagedStorage","category":"page"},{"location":"api/array/#Metal.MTL.PrivateStorage","page":"Array programming","title":"Metal.MTL.PrivateStorage","text":"struct Metal.PrivateStorage <: MTL.StorageMode\n\nUsed to indicate that the resource is stored using MTLStorageModePrivate in memory.\n\nFor more information on Metal storage modes, refer to the official Metal documentation.\n\nSee also Metal.SharedStorage and Metal.ManagedStorage.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Metal.MTL.SharedStorage","page":"Array programming","title":"Metal.MTL.SharedStorage","text":"struct Metal.SharedStorage <: MTL.StorageMode\n\nUsed to indicate that the resource is stored using MTLStorageModeShared in memory.\n\nFor more information on Metal storage modes, refer to the official Metal documentation.\n\nSee also Metal.PrivateStorage and Metal.ManagedStorage.\n\n\n\n\n\n","category":"type"},{"location":"api/array/#Metal.MTL.ManagedStorage","page":"Array programming","title":"Metal.MTL.ManagedStorage","text":"struct Metal.ManagedStorage <: MTL.StorageMode\n\nUsed to indicate that the resource is stored using MTLStorageModeManaged in memory.\n\nFor more information on Metal storage modes, refer to the official Metal documentation.\n\nSee also Metal.SharedStorage and Metal.PrivateStorage.\n\n\n\n\n\n","category":"type"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"There also exist the following convenience functions to check if an MtlArray is using a specific storage mode:","category":"page"},{"location":"api/array/","page":"Array programming","title":"Array programming","text":"is_private\nis_shared\nis_managed","category":"page"},{"location":"api/array/#Metal.is_private","page":"Array programming","title":"Metal.is_private","text":"is_private(A::MtlArray) -> Bool\n\nReturns true if A has storage mode Metal.PrivateStorage.\n\nSee also is_shared and is_managed.\n\n\n\n\n\n","category":"function"},{"location":"api/array/#Metal.is_shared","page":"Array programming","title":"Metal.is_shared","text":"is_shared(A::MtlArray) -> Bool\n\nReturns true if A has storage mode Metal.SharedStorage.\n\nSee also is_private and is_managed.\n\n\n\n\n\n","category":"function"},{"location":"api/array/#Metal.is_managed","page":"Array programming","title":"Metal.is_managed","text":"is_managed(A::MtlArray) -> Bool\n\nReturns true if A has storage mode Metal.ManagedStorage.\n\nSee also is_shared and is_private.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Essentials","page":"Essentials","title":"Essentials","text":"","category":"section"},{"location":"api/essentials/#Versions-and-Support","page":"Essentials","title":"Versions and Support","text":"","category":"section"},{"location":"api/essentials/","page":"Essentials","title":"Essentials","text":"Metal.macos_version\nMetal.darwin_version\nMetal.metal_support\nMetal.metallib_support\nMetal.air_support","category":"page"},{"location":"api/essentials/#Metal.macos_version","page":"Essentials","title":"Metal.macos_version","text":"Metal.macos_version() -> VersionNumber\n\nReturns the host macOS version.\n\nSee also Metal.darwin_version.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.darwin_version","page":"Essentials","title":"Metal.darwin_version","text":"Metal.darwin_version() -> VersionNumber\n\nReturns the host Darwin kernel version.\n\nSee also Metal.macos_version.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.metal_support","page":"Essentials","title":"Metal.metal_support","text":"Metal.metal_support() -> VersionNumber\n\nReturns the highest supported version for the Metal Shading Language.\n\nSee also Metal.metallib_support and Metal.air_support.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.metallib_support","page":"Essentials","title":"Metal.metallib_support","text":"Metal.metallib_support() -> VersionNumber\n\nReturns the highest supported version for the metallib file format.\n\nSee also Metal.air_support and Metal.metal_support.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.air_support","page":"Essentials","title":"Metal.air_support","text":"Metal.air_support() -> VersionNumber\n\nReturns the highest supported version for the embedded AIR bitcode format.\n\nSee also Metal.metallib_support and Metal.metal_support.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Global-State","page":"Essentials","title":"Global State","text":"","category":"section"},{"location":"api/essentials/","page":"Essentials","title":"Essentials","text":"Metal.device!\nMetal.devices\nMetal.device\nMetal.global_queue\nMetal.synchronize\nMetal.device_synchronize","category":"page"},{"location":"api/essentials/#Metal.device!","page":"Essentials","title":"Metal.device!","text":"device!(dev::MTLDevice)\n\nSets the Metal GPU device associated with the current Julia task.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.MTL.devices","page":"Essentials","title":"Metal.MTL.devices","text":"devices()\n\nGet an iterator for the compute devices.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.device","page":"Essentials","title":"Metal.device","text":"device()::MTLDevice\n\nReturn the Metal GPU device associated with the current Julia task.\n\nSince all M-series systems currently only externally show a single GPU, this function effectively returns the only system GPU.\n\n\n\n\n\ndevice(<:MtlArray)\n\nGet the Metal device for an MtlArray.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.global_queue","page":"Essentials","title":"Metal.global_queue","text":"global_queue(dev::MTLDevice)::MTLCommandQueue\n\nReturn the Metal command queue associated with the current Julia thread.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.synchronize","page":"Essentials","title":"Metal.synchronize","text":"synchronize(queue)\n\nWait for currently committed GPU work on this queue to finish.\n\nCreate a new MTLCommandBuffer from the global command queue, commit it to the queue, and simply wait for it to be completed. Since command buffers should execute in a First-In-First-Out manner, this synchronizes the GPU.\n\n\n\n\n\n","category":"function"},{"location":"api/essentials/#Metal.device_synchronize","page":"Essentials","title":"Metal.device_synchronize","text":"device_synchronize()\n\nSynchronize all committed GPU work across all global queues\n\n\n\n\n\n","category":"function"},{"location":"usage/kernel/#Kernel-programming","page":"Kernel programming","title":"Kernel programming","text":"","category":"section"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Metal.jl is based off of Apple's Metal Shading Language (MSL) and Metal framework. The interface allows you to utilize the graphics and computing power of Mac GPUs. Like many other GPU frameworks, its history is rooted in graphics processing but has found use in computing/general purpose GPU (GPGPU) applications.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"The most fundamental idea of programming GPUs (when compared to serial CPU programming) is its parallelism. A GPU function (kernel), when called, is not just ran once in isolation. Rather, numerous (often thousands to millions) psuedo-independent instances (called threads) of the kernel are executed in parallel. These threads are arranged in a hierarchy that allows for varying levels of synchronization. For Metal, the hierarchy is as follows:","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Thread: A single execution unit of the kernel\nThreadgroup: A collection of threads that share a common block of memory and synchronization","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"barriers","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Grid: A collection of threadgroups","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"The threadgroup and grid sizes are set by the user when launching the GPU kernel. There are upper limits determined by the targeted hardware, and the sizes can be 1, 2, or 3-dimensional. For Metal.jl, these sizes are set using the @metal macro's keyword arguments. The grid keyword determines the grid size while the threads keyword determines the threadgroup size.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"For example, given a 10x10x3 image that you want to run a function independently on each pixel, the kernel launch code might look like the following: @metal threads=(10,10) groups=3 my_kernel(gpu_image_array) This would launch 3 separate threadgroups of 100 threads each (10 in the first dimension and 10 in the second dimension)","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"There is also additional hierarchy layers that consists of small groups of threads that execute in lockstep called waves/SIMD groups/wavefronts* and quadgroups. However, the basic three-tier hierarchy is enough to get started.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Here is a helpful link with good visualizations of Metal's thread hierarchy (also covering SIMD groups).","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Each thread has its own set of private variables. Most importantly, each thread has associated unique indices to identify itself within its threadgroup and grid. These are traditionally what are used to differentiate execution across threads. You can also query what the grid and threadgroup sizes are as well.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"For Metal.jl, these values are accessed via the following functions:","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"thread_index_in_threadgroup()\ngrid_size_Xd()\nthread_position_in_grid_Xd()\nthread_position_in_threadgroup_Xd()\nthreadgroup_position_in_grid_Xd()\nthreadgroups_per_grid_Xd()\nthreads_per_grid_Xd()\nthreads_per_threadgroup_Xd()","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Where 'X' is 1, 2, or 3 according to the number of dimensions requested.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Using these in a kernel (taken directly from the vadd example):","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"function vadd(a, b, c)\n    i = thread_position_in_grid_1d()\n    c[i] = a[i] + b[i]\n    return\nend","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"This kernel takes in three vectors (a,b,c) all of the same length and stores the element-wise sum of a and b into c. Each thread in this kernel gets its unique position in the grid (arrangement of all threadgroups) and stores this value into the variable i which is then used as the index into the vectors. Thus, each thread is computing one sum and storing the result in the output vector.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"To ensure this kernel functions properly, we have to launch it with exactly as many threads as the length of the vectors. If we under or over-launch threads, the result could be incorrect.","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"An example of a good launch:","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"len = prod(size(d_a))\n@metal threads=len vadd(d_a, d_b, d_c)","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Additional notes:","category":"page"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Kernels must always return nothing\nKernels are asynchronous. To synchronize, use the Metal.@sync macro.","category":"page"},{"location":"usage/kernel/#Other-Helpful-Links","page":"Kernel programming","title":"Other Helpful Links","text":"","category":"section"},{"location":"usage/kernel/","page":"Kernel programming","title":"Kernel programming","text":"Metal Shading Language Specification","category":"page"},{"location":"api/compiler/#Compiler","page":"Compiler","title":"Compiler","text":"","category":"section"},{"location":"api/compiler/#Execution","page":"Compiler","title":"Execution","text":"","category":"section"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"The main entry-point to the compiler is the @metal macro:","category":"page"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"@metal","category":"page"},{"location":"api/compiler/#Metal.@metal","page":"Compiler","title":"Metal.@metal","text":"@metal threads=... groups=... [kwargs...] func(args...)\n\nHigh-level interface for executing code on a GPU.\n\nThe @metal macro should prefix a call, with func a callable function or object that should return nothing. It will be compiled to a Metal function upon first use, and to a certain extent arguments will be converted and managed automatically using mtlconvert. Finally, a call to mtlcall is performed, creating a command buffer in the current global command queue then committing it.\n\nThere is one supported keyword argument that influences the behavior of @metal:\n\nlaunch: whether to launch this kernel, defaults to true. If false the returned kernel object should be launched by calling it and passing arguments again.\nname: the name of the kernel in the generated code. Defaults to an automatically- generated name.\nqueue: the command queue to use for this kernel. Defaults to the global command queue.\n\n\n\n\n\n","category":"macro"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"If needed, you can use a lower-level API that lets you inspect the compiler kernel:","category":"page"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"Metal.mtlconvert\nMetal.mtlfunction","category":"page"},{"location":"api/compiler/#Metal.mtlconvert","page":"Compiler","title":"Metal.mtlconvert","text":"mtlconvert(x, [cce])\n\nThis function is called for every argument to be passed to a kernel, allowing it to be converted to a GPU-friendly format. By default, the function does nothing and returns the input object x as-is.\n\nDo not add methods to this function, but instead extend the underlying Adapt.jl package and register methods for the the Metal.Adaptor type.\n\n\n\n\n\n","category":"function"},{"location":"api/compiler/#Metal.mtlfunction","page":"Compiler","title":"Metal.mtlfunction","text":"mtlfunction(f, tt=Tuple{}; kwargs...)\n\nLow-level interface to compile a function invocation for the currently-active GPU, returning a callable kernel object. For a higher-level interface, use @metal.\n\nThe output of this function is automatically cached, i.e. you can simply call mtlfunction in a hot path without degrading performance. New code will be generated automatically when the function changes, or when different types or keyword arguments are provided.\n\n\n\n\n\n","category":"function"},{"location":"api/compiler/#Reflection","page":"Compiler","title":"Reflection","text":"","category":"section"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"If you want to inspect generated code, you can use macros that resemble functionality from the InteractiveUtils standard library:","category":"page"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"@device_code_lowered\n@device_code_typed\n@device_code_warntype\n@device_code_llvm\n@device_code_native\n@device_code_agx\n@device_code","category":"page"},{"location":"api/compiler/","page":"Compiler","title":"Compiler","text":"For more information, please consult the GPUCompiler.jl documentation. code_agx is actually code_native:","category":"page"},{"location":"#MacOS-GPU-programming-in-Julia","page":"Home","title":"MacOS GPU programming in Julia","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The Metal.jl package is the main entry point for GPU programming on MacOS in Julia. The package makes it possible to do so at various abstraction levels, from easy-to-use arrays down to hand-written kernels using low-level Metal APIs.","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you have any questions, please feel free to use the #gpu channel on the Julia slack, or the GPU domain of the Julia Discourse.","category":"page"},{"location":"","page":"Home","title":"Home","text":"As this package is still under development, if you spot a bug, please file an issue.","category":"page"},{"location":"#Quick-Start","page":"Home","title":"Quick Start","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Metal.jl ties into your system's existing Metal Shading Language compiler toolchain, so no additional installs are required (unless you want to view profiled GPU operations)","category":"page"},{"location":"","page":"Home","title":"Home","text":"# install the package\nusing Pkg\nPkg.add(\"Metal\")\n\n# smoke test\nusing Metal\nMetal.versioninfo()","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you want to ensure everything works as expected, you can execute the test suite.","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Pkg\nPkg.test(\"Metal\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"The following resources may also be of interest (although are mainly focused on the CUDA GPU  backend):","category":"page"},{"location":"","page":"Home","title":"Home","text":"Effectively using GPUs with Julia: slides\nHow Julia is compiled to GPUs: video","category":"page"},{"location":"#Contributing","page":"Home","title":"Contributing","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"If you want to help improve this package, look at the contributing page for more details.","category":"page"},{"location":"#Acknowledgements","page":"Home","title":"Acknowledgements","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The Julia Metal stack has been a collaborative effort by many individuals. Significant contributions have been made by the following individuals:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Tim Besard (@maleadt) (lead developer)\nFilippo Vicentini (@PhilipVinc)\nMax Hawkins (@max-Hawkins)","category":"page"},{"location":"#Supporting-and-Citing","page":"Home","title":"Supporting and Citing","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Some of the software in this ecosystem was developed as part of academic research. If you would like to help support it, please star the repository as such metrics may help us secure funding in the future. If you use our software as part of your research, teaching, or other activities, we would be grateful if you could cite our work. The CITATION.cff file in the root of this repository lists the relevant papers.","category":"page"},{"location":"profiling/#Profiling","page":"Profiling","title":"Profiling","text":"","category":"section"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"Profiling GPU code is harder than profiling Julia code executing on the CPU. For one, kernels typically execute asynchronously, and thus require appropriate synchronization when measuring their execution time. Furthermore, because the code executes on a different processor, it is much harder to know what is currently executing.","category":"page"},{"location":"profiling/#Time-measurements","page":"Profiling","title":"Time measurements","text":"","category":"section"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"For robust measurements, it is advised to use the BenchmarkTools.jl package which goes to great lengths to perform accurate measurements. Due to the asynchronous nature of GPUs, you need to ensure the GPU is synchronized at the end of every sample, e.g. by calling synchronize() or, even better, wrapping your code in Metal.@sync:","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"Note that the allocations as reported by BenchmarkTools are CPU allocations.","category":"page"},{"location":"profiling/#Application-tracing","page":"Profiling","title":"Application tracing","text":"","category":"section"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"For profiling large applications, simple timings are insufficient. Instead, we want an overview of how and when the GPU was active to avoid times where the device was idle and/or find which kernels needs optimization.","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"As we cannot use the Julia profiler for this task, we will use Metal's GPU profiler directly. Use the Metal.@profile macro to surround the code code of interest. This macro tells your system to track GPU calls and usage statistics and will save this information in a temporary folder ending in '.trace'. For later viewing, copy this folder to a stable location or use the 'dir' argument of the profile macro to store the gputrace to a different location directly.","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"The resulting trace can be opened with the Instruments app, part of Xcode.","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"julia> using Metal\n\njulia> function vadd(a, b, c)\n           i = thread_position_in_grid_1d()\n           c[i] = a[i] + b[i]\n           return\n       end\njulia> a = MtlArray([1]); b = MtlArray([2]); c = similar(a);\n\njulia> Metal.@profile @metal threads=length(c) vadd(a, b, c);\n...\n[ Info: System trace saved to julia_3.trace; open the resulting trace in Instruments","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"It is possible to augment the trace with additional information by using signposts: Similar to NVTX markers and ranges in CUDA.jl, signpost intervals and events can be used to add respectively time intervals and points of interest to the trace. This can be done by using the signpost functionality from ObjectiveC.jl:","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"using ObjectiveC, .OS\n\n@signpost_interval \"My Interval\" begin\n    # code to profile\n    @signpost_event \"My Event\"\nend","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"For more information, e.g. how to pass additional messages to the signposts, or how to use a custom logger, consult the ObjectiveC.jl documentation, or the docstrings of the @signpost_interval and @signpost_event macros.","category":"page"},{"location":"profiling/#Frame-capture","page":"Profiling","title":"Frame capture","text":"","category":"section"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"For more details on specific operations, you can use Metal's frame capture feature to generate a more detailed, and replayable trace of the GPU operations. This requires that Julia is started with the METAL_CAPTURE_ENABLED environment variable set to 1. Frames are captured by wrapping the code of interest in Metal.@capture, and the resulting trace can be opened with Xcode.","category":"page"},{"location":"profiling/","page":"Profiling","title":"Profiling","text":"$ METAL_CAPTURE_ENABLED=1 julia\n...\n\njulia> using Metal\n\njulia> function vadd(a, b, c)\n           i = thread_position_in_grid_1d()\n           c[i] = a[i] + b[i]\n           return\n       end\n\njulia> a = MtlArray([1]); b = MtlArray([2]); c = similar(a);\n... Metal GPU Frame Capture Enabled\n\njulia> Metal.@capture @metal threads=length(c) vadd(a, b, c);\n...\n[ Info: GPU frame capture saved to julia_1.gputrace; open the resulting trace in Xcode","category":"page"}]
 }
diff --git a/dev/usage/array/index.html b/dev/usage/array/index.html
index e9dbe439..b7d0f531 100644
--- a/dev/usage/array/index.html
+++ b/dev/usage/array/index.html
@@ -1,5 +1,5 @@
 <!DOCTYPE html>
-<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Array programming · Metal.jl</title><meta name="title" content="Array programming · Metal.jl"/><meta property="og:title" content="Array programming · Metal.jl"/><meta property="twitter:title" content="Array programming · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/usage/array/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/usage/array/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/usage/array/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../overview/">Overview</a></li><li class="is-active"><a class="tocitem" href>Array programming</a><ul class="internal"><li><a class="tocitem" href="#Construction-and-Initialization"><span>Construction and Initialization</span></a></li><li><a class="tocitem" href="#Higher-order-abstractions"><span>Higher-order abstractions</span></a></li></ul></li><li><a class="tocitem" href="../kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li><a class="tocitem" href="../../api/essentials/">Essentials</a></li><li><a class="tocitem" href="../../api/compiler/">Compiler</a></li><li><a class="tocitem" href="../../api/kernel/">Kernel programming</a></li><li><a class="tocitem" href="../../api/array/">Array programming</a></li><li><a class="tocitem" href="../../api/mps/">Metal Performance Shaders</a></li></ul></li><li><span class="tocitem">FAQ</span><ul><li><a class="tocitem" href="../../faq/faq/">Frequently Asked Questions</a></li><li><a class="tocitem" href="../../faq/contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">Usage</a></li><li class="is-active"><a href>Array programming</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Array programming</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/usage/array.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Array-programming"><a class="docs-heading-anchor" href="#Array-programming">Array programming</a><a id="Array-programming-1"></a><a class="docs-heading-anchor-permalink" href="#Array-programming" title="Permalink"></a></h1><p>The easiest way to use the GPU&#39;s massive parallelism, is by expressing operations in terms of arrays: Metal.jl provides an array type, <code>MtlArray</code>, and many specialized array operations that execute efficiently on the GPU hardware. In this section, we will briefly demonstrate use of the <code>MtlArray</code> type. Since we expose Metal&#39;s functionality by implementing existing Julia interfaces on the <code>MtlArray</code> type, you should refer to the <a href="https://docs.julialang.org">upstream Julia documentation</a> for more information on these operations.</p><p>If you encounter missing functionality, or are running into operations that trigger so-called &quot;scalar iteration&quot;, have a look at the <a href="https://github.com/JuliaGPU/Metal.jl/issues">issue tracker</a> and file a new issue if there&#39;s none. Do note that you can always access the underlying Metal APIs by calling into the relevant submodule.</p><h2 id="Construction-and-Initialization"><a class="docs-heading-anchor" href="#Construction-and-Initialization">Construction and Initialization</a><a id="Construction-and-Initialization-1"></a><a class="docs-heading-anchor-permalink" href="#Construction-and-Initialization" title="Permalink"></a></h2><p>The <code>MtlArray</code> type aims to implement the <code>AbstractArray</code> interface, and provide implementations of methods that are commonly used when working with arrays. That means you can construct <code>MtlArray</code>s in the same way as regular <code>Array</code> objects:</p><pre><code class="language-julia-repl hljs">julia&gt; MtlArray{Int}(undef, 2)
+<html lang="en"><head><meta charset="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><title>Array programming · Metal.jl</title><meta name="title" content="Array programming · Metal.jl"/><meta property="og:title" content="Array programming · Metal.jl"/><meta property="twitter:title" content="Array programming · Metal.jl"/><meta name="description" content="Documentation for Metal.jl."/><meta property="og:description" content="Documentation for Metal.jl."/><meta property="twitter:description" content="Documentation for Metal.jl."/><meta property="og:url" content="https://metal.juliagpu.org/stable/usage/array/"/><meta property="twitter:url" content="https://metal.juliagpu.org/stable/usage/array/"/><link rel="canonical" href="https://metal.juliagpu.org/stable/usage/array/"/><script data-outdated-warner src="../../assets/warner.js"></script><link href="https://cdnjs.cloudflare.com/ajax/libs/lato-font/3.0.0/css/lato-font.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/juliamono/0.050/juliamono.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/fontawesome.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/solid.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/brands.min.css" rel="stylesheet" type="text/css"/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min.css" rel="stylesheet" type="text/css"/><script>documenterBaseURL="../.."</script><script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" data-main="../../assets/documenter.js"></script><script src="../../search_index.js"></script><script src="../../siteinfo.js"></script><script src="../../../versions.js"></script><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-mocha.css" data-theme-name="catppuccin-mocha"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-macchiato.css" data-theme-name="catppuccin-macchiato"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-frappe.css" data-theme-name="catppuccin-frappe"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/catppuccin-latte.css" data-theme-name="catppuccin-latte"/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-dark.css" data-theme-name="documenter-dark" data-theme-primary-dark/><link class="docs-theme-link" rel="stylesheet" type="text/css" href="../../assets/themes/documenter-light.css" data-theme-name="documenter-light" data-theme-primary/><script src="../../assets/themeswap.js"></script><link href="../../assets/favicon.ico" rel="icon" type="image/x-icon"/></head><body><div id="documenter"><nav class="docs-sidebar"><a class="docs-logo" href="../../"><img src="../../assets/logo.png" alt="Metal.jl logo"/></a><div class="docs-package-name"><span class="docs-autofit"><a href="../../">Metal.jl</a></span></div><button class="docs-search-query input is-rounded is-small is-clickable my-2 mx-auto py-1 px-2" id="documenter-search-query">Search docs (Ctrl + /)</button><ul class="docs-menu"><li><a class="tocitem" href="../../">Home</a></li><li><span class="tocitem">Usage</span><ul><li><a class="tocitem" href="../overview/">Overview</a></li><li class="is-active"><a class="tocitem" href>Array programming</a><ul class="internal"><li><a class="tocitem" href="#Construction-and-Initialization"><span>Construction and Initialization</span></a></li><li><a class="tocitem" href="#Higher-order-abstractions"><span>Higher-order abstractions</span></a></li><li><a class="tocitem" href="#Random-numbers"><span>Random numbers</span></a></li></ul></li><li><a class="tocitem" href="../kernel/">Kernel programming</a></li></ul></li><li><a class="tocitem" href="../../profiling/">Profiling</a></li><li><span class="tocitem">API reference</span><ul><li><a class="tocitem" href="../../api/essentials/">Essentials</a></li><li><a class="tocitem" href="../../api/compiler/">Compiler</a></li><li><a class="tocitem" href="../../api/kernel/">Kernel programming</a></li><li><a class="tocitem" href="../../api/array/">Array programming</a></li><li><a class="tocitem" href="../../api/mps/">Metal Performance Shaders</a></li></ul></li><li><span class="tocitem">FAQ</span><ul><li><a class="tocitem" href="../../faq/faq/">Frequently Asked Questions</a></li><li><a class="tocitem" href="../../faq/contributing/">Contributing</a></li></ul></li></ul><div class="docs-version-selector field has-addons"><div class="control"><span class="docs-label button is-static is-size-7">Version</span></div><div class="docs-selector control is-expanded"><div class="select is-fullwidth is-size-7"><select id="documenter-version-selector"></select></div></div></div></nav><div class="docs-main"><header class="docs-navbar"><a class="docs-sidebar-button docs-navbar-link fa-solid fa-bars is-hidden-desktop" id="documenter-sidebar-button" href="#"></a><nav class="breadcrumb"><ul class="is-hidden-mobile"><li><a class="is-disabled">Usage</a></li><li class="is-active"><a href>Array programming</a></li></ul><ul class="is-hidden-tablet"><li class="is-active"><a href>Array programming</a></li></ul></nav><div class="docs-right"><a class="docs-navbar-link" href="https://github.com/JuliaGPU/Metal.jl/blob/main/docs/src/usage/array.md#" title="Edit source on GitHub"><span class="docs-icon fa-solid"></span></a><a class="docs-settings-button docs-navbar-link fa-solid fa-gear" id="documenter-settings-button" href="#" title="Settings"></a><a class="docs-article-toggle-button fa-solid fa-chevron-up" id="documenter-article-toggle-button" href="javascript:;" title="Collapse all docstrings"></a></div></header><article class="content" id="documenter-page"><h1 id="Array-programming"><a class="docs-heading-anchor" href="#Array-programming">Array programming</a><a id="Array-programming-1"></a><a class="docs-heading-anchor-permalink" href="#Array-programming" title="Permalink"></a></h1><p>The easiest way to use the GPU&#39;s massive parallelism, is by expressing operations in terms of arrays: Metal.jl provides an array type, <code>MtlArray</code>, and many specialized array operations that execute efficiently on the GPU hardware. In this section, we will briefly demonstrate use of the <code>MtlArray</code> type. Since we expose Metal&#39;s functionality by implementing existing Julia interfaces on the <code>MtlArray</code> type, you should refer to the <a href="https://docs.julialang.org">upstream Julia documentation</a> for more information on these operations.</p><p>If you encounter missing functionality, or are running into operations that trigger so-called &quot;scalar iteration&quot;, have a look at the <a href="https://github.com/JuliaGPU/Metal.jl/issues">issue tracker</a> and file a new issue if there&#39;s none. Do note that you can always access the underlying Metal APIs by calling into the relevant submodule.</p><h2 id="Construction-and-Initialization"><a class="docs-heading-anchor" href="#Construction-and-Initialization">Construction and Initialization</a><a id="Construction-and-Initialization-1"></a><a class="docs-heading-anchor-permalink" href="#Construction-and-Initialization" title="Permalink"></a></h2><p>The <code>MtlArray</code> type aims to implement the <code>AbstractArray</code> interface, and provide implementations of methods that are commonly used when working with arrays. That means you can construct <code>MtlArray</code>s in the same way as regular <code>Array</code> objects:</p><pre><code class="language-julia-repl hljs">julia&gt; MtlArray{Int}(undef, 2)
 2-element MtlVector{Int64, Metal.PrivateStorage}:
  0
  0
@@ -50,4 +50,20 @@
 
 julia&gt; Base.mapreducedim!(identity, +, b, a)
 1×1 MtlMatrix{Float32, Metal.PrivateStorage}:
- 6.0</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../overview/">« Overview</a><a class="docs-footer-nextpage" href="../kernel/">Kernel programming »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+ 6.0</code></pre><h2 id="Random-numbers"><a class="docs-heading-anchor" href="#Random-numbers">Random numbers</a><a id="Random-numbers-1"></a><a class="docs-heading-anchor-permalink" href="#Random-numbers" title="Permalink"></a></h2><p>Base&#39;s convenience functions for generating random numbers are available in Metal as well:</p><pre><code class="language-julia-repl hljs">julia&gt; Metal.rand(2)
+2-element MtlVector{Float32, Metal.PrivateStorage}:
+ 0.89025915
+ 0.8946847
+
+julia&gt; Metal.randn(Float32, 2, 1)
+2×1 MtlMatrix{Float32, Metal.PrivateStorage}:
+ 1.2279074
+ 1.2518331</code></pre><p>Behind the scenes, these random numbers come from two different generators: one backed by <a href="https://developer.apple.com/documentation/metalperformanceshaders/mpsmatrixrandom?language=objc">Metal Performance Shaders</a>, another by using the GPUArrays.jl random methods. Operations on these generators are implemented using methods from the Random standard library:</p><pre><code class="language-julia-repl hljs">julia&gt; using Random, GPUArrays
+
+julia&gt; a = Random.rand(MPS.default_rng(), Float32, 1)
+1-element MtlVector{Float32, Metal.PrivateStorage}:
+ 0.89025915
+
+julia&gt; a = Random.rand!(GPUArrays.default_rng(MtlArray), a)
+1-element MtlVector{Float32, Metal.PrivateStorage}:
+ 0.0705002</code></pre><div class="admonition is-info"><header class="admonition-header">Note</header><div class="admonition-body"><p><code>MPSMatrixRandom</code> functionality requires Metal.jl &gt;= v1.4</p></div></div><div class="admonition is-warning"><header class="admonition-header">Warning</header><div class="admonition-body"><p><code>Random.rand!(::MPS.RNG, args...)</code> and <code>Random.randn!(::MPS.RNG, args...)</code> have a framework limitation that requires the byte offset and byte size of the destination array to be a multiple of 4.</p></div></div></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../overview/">« Overview</a><a class="docs-footer-nextpage" href="../kernel/">Kernel programming »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/usage/kernel/index.html b/dev/usage/kernel/index.html
index a5e41a1a..e1ffdf66 100644
--- a/dev/usage/kernel/index.html
+++ b/dev/usage/kernel/index.html
@@ -4,4 +4,4 @@
     c[i] = a[i] + b[i]
     return
 end</code></pre><p>This kernel takes in three vectors (a,b,c) all of the same length and stores the element-wise sum of <code>a</code> and <code>b</code> into <code>c</code>. Each thread in this kernel gets its unique position in the grid (arrangement of all threadgroups) and stores this value into the variable <code>i</code> which is then used as the index into the vectors. Thus, each thread is computing one sum and storing the result in the output vector.</p><p>To ensure this kernel functions properly, we have to launch it with exactly as many threads as the length of the vectors. If we under or over-launch threads, the result could be incorrect.</p><p>An example of a good launch:</p><pre><code class="language-julia hljs">len = prod(size(d_a))
-@metal threads=len vadd(d_a, d_b, d_c)</code></pre><p>Additional notes:</p><ul><li>Kernels must always return nothing</li><li>Kernels are asynchronous. To synchronize, use the <code>Metal.@sync</code> macro.</li></ul><h2 id="Other-Helpful-Links"><a class="docs-heading-anchor" href="#Other-Helpful-Links">Other Helpful Links</a><a id="Other-Helpful-Links-1"></a><a class="docs-heading-anchor-permalink" href="#Other-Helpful-Links" title="Permalink"></a></h2><p><a href="https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf">Metal Shading Language Specification</a> <a href="https://wiki.illinois.edu/wiki/display/ECE408/Materials+from+prior+semesters">An Introduction to GPU Programming course from University of Illinois</a> (primarily in CUDA, but the concepts are transferable)</p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../array/">« Array programming</a><a class="docs-footer-nextpage" href="../../profiling/">Profiling »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+@metal threads=len vadd(d_a, d_b, d_c)</code></pre><p>Additional notes:</p><ul><li>Kernels must always return nothing</li><li>Kernels are asynchronous. To synchronize, use the <code>Metal.@sync</code> macro.</li></ul><h2 id="Other-Helpful-Links"><a class="docs-heading-anchor" href="#Other-Helpful-Links">Other Helpful Links</a><a id="Other-Helpful-Links-1"></a><a class="docs-heading-anchor-permalink" href="#Other-Helpful-Links" title="Permalink"></a></h2><p><a href="https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf">Metal Shading Language Specification</a></p></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../array/">« Array programming</a><a class="docs-footer-nextpage" href="../../profiling/">Profiling »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
diff --git a/dev/usage/overview/index.html b/dev/usage/overview/index.html
index bba0cb8a..514f973f 100644
--- a/dev/usage/overview/index.html
+++ b/dev/usage/overview/index.html
@@ -9,4 +9,4 @@
 # automatic memory management
 a = nothing</code></pre><p>Beyond memory management, there are a whole range of array operations to process your data. This includes several higher-order operations that take other code as arguments, such as <code>map</code>, <code>reduce</code> or <code>broadcast</code>. With these, it is possible to perform kernel-like operations without actually writing your own GPU kernels:</p><pre><code class="language-julia hljs">a = Metal.zeros(1024)
 b = Metal.ones(1024)
-a.^2 .+ sin.(b)</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../">« Home</a><a class="docs-footer-nextpage" href="../array/">Array programming »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Friday 23 August 2024 07:50">Friday 23 August 2024</span>. Using Julia version 1.10.4.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>
+a.^2 .+ sin.(b)</code></pre></article><nav class="docs-footer"><a class="docs-footer-prevpage" href="../../">« Home</a><a class="docs-footer-nextpage" href="../array/">Array programming »</a><div class="flexbox-break"></div><p class="footer-message">Powered by <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> and the <a href="https://julialang.org/">Julia Programming Language</a>.</p></nav></div><div class="modal" id="documenter-settings"><div class="modal-background"></div><div class="modal-card"><header class="modal-card-head"><p class="modal-card-title">Settings</p><button class="delete"></button></header><section class="modal-card-body"><p><label class="label">Theme</label><div class="select"><select id="documenter-themepicker"><option value="auto">Automatic (OS)</option><option value="documenter-light">documenter-light</option><option value="documenter-dark">documenter-dark</option><option value="catppuccin-latte">catppuccin-latte</option><option value="catppuccin-frappe">catppuccin-frappe</option><option value="catppuccin-macchiato">catppuccin-macchiato</option><option value="catppuccin-mocha">catppuccin-mocha</option></select></div></p><hr/><p>This document was generated with <a href="https://github.com/JuliaDocs/Documenter.jl">Documenter.jl</a> version 1.5.0 on <span class="colophon-date" title="Thursday 29 August 2024 13:41">Thursday 29 August 2024</span>. Using Julia version 1.10.5.</p></section><footer class="modal-card-foot"></footer></div></div></div></body></html>