Class NVCudaKernelLaunch
The application will first have to create a CUDA module using CreateCudaModuleNV then create the CUDA function entry point with CreateCudaFunctionNV.
Then in order to dispatch this function, the application will create a command buffer where it will launch the kernel with CmdCudaLaunchKernelNV.
When done, the application will then destroy the function handle, as well as the CUDA module handle with DestroyCudaFunctionNV and DestroyCudaModuleNV.
To reduce the impact of compilation time, this extension offers the capability to return a binary cache from the PTX that was provided. For this, a first query for the required cache size is made with GetCudaModuleCacheNV with a NULL pointer to a buffer and with a valid pointer receiving the size; then another call of the same function with a valid pointer to a buffer to retrieve the data. The resulting cache could then be used later for further runs of this application by sending this cache instead of the PTX code (using the same CreateCudaModuleNV), thus significantly speeding up the initialization of the CUDA module.
As with VkPipelineCache, the binary cache depends on the hardware architecture. The application must assume the cache might fail, and need to handle falling back to the original PTX code as necessary. Most often, the cache will succeed if the same GPU driver and architecture is used between the cache generation from PTX and the use of this cache. In the event of a new driver version, or if using a different GPU architecture, the cache is likely to become invalid.
- Name String
VK_NV_cuda_kernel_launch- Extension Type
- Device extension
- Registered Extension Number
- 308
- Revision
- This is a provisional extension and must be used with caution. See the description of provisional header files for enablement and stability details.
- API Interactions
- Interacts with VK_EXT_debug_report
- Contact
- Tristan Lorach tlorach
Other Extension Metadata
- Last Modified Date
- 2020-09-30
- Contributors
- Eric Werness, NVIDIA
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intExtendsVkDebugReportObjectTypeEXT.static final intExtendsVkDebugReportObjectTypeEXT.static final StringThe extension name.static final intThe extension specification version.static final intExtendsVkObjectType.static final intExtendsVkObjectType.static final intExtendsVkStructureType.static final intExtendsVkStructureType.static final intExtendsVkStructureType.static final intExtendsVkStructureType.static final intExtendsVkStructureType. -
Method Summary
Modifier and TypeMethodDescriptionstatic voidnvkCmdCudaLaunchKernelNV(org.lwjgl.vulkan.VkCommandBuffer commandBuffer, long pLaunchInfo) Unsafe version of:CmdCudaLaunchKernelNVstatic intnvkCreateCudaFunctionNV(org.lwjgl.vulkan.VkDevice device, long pCreateInfo, long pAllocator, long pFunction) Unsafe version of:CreateCudaFunctionNVstatic intnvkCreateCudaModuleNV(org.lwjgl.vulkan.VkDevice device, long pCreateInfo, long pAllocator, long pModule) Unsafe version of:CreateCudaModuleNVstatic voidnvkDestroyCudaFunctionNV(org.lwjgl.vulkan.VkDevice device, long function, long pAllocator) Unsafe version of:DestroyCudaFunctionNVstatic voidnvkDestroyCudaModuleNV(org.lwjgl.vulkan.VkDevice device, long module, long pAllocator) Unsafe version of:DestroyCudaModuleNVstatic intnvkGetCudaModuleCacheNV(org.lwjgl.vulkan.VkDevice device, long module, long pCacheSize, long pCacheData) Unsafe version of:GetCudaModuleCacheNVstatic voidvkCmdCudaLaunchKernelNV(org.lwjgl.vulkan.VkCommandBuffer commandBuffer, VkCudaLaunchInfoNV pLaunchInfo) Dispatch compute work items.static intvkCreateCudaFunctionNV(org.lwjgl.vulkan.VkDevice device, VkCudaFunctionCreateInfoNV pCreateInfo, @Nullable VkAllocationCallbacks pAllocator, long[] pFunction) Array version of:CreateCudaFunctionNVstatic intvkCreateCudaFunctionNV(org.lwjgl.vulkan.VkDevice device, VkCudaFunctionCreateInfoNV pCreateInfo, @Nullable VkAllocationCallbacks pAllocator, LongBuffer pFunction) Creates a new CUDA function object.static intvkCreateCudaModuleNV(org.lwjgl.vulkan.VkDevice device, VkCudaModuleCreateInfoNV pCreateInfo, @Nullable VkAllocationCallbacks pAllocator, long[] pModule) Array version of:CreateCudaModuleNVstatic intvkCreateCudaModuleNV(org.lwjgl.vulkan.VkDevice device, VkCudaModuleCreateInfoNV pCreateInfo, @Nullable VkAllocationCallbacks pAllocator, LongBuffer pModule) Creates a new CUDA module object.static voidvkDestroyCudaFunctionNV(org.lwjgl.vulkan.VkDevice device, long function, @Nullable VkAllocationCallbacks pAllocator) Destroy a CUDA function.static voidvkDestroyCudaModuleNV(org.lwjgl.vulkan.VkDevice device, long module, @Nullable VkAllocationCallbacks pAllocator) Destroy a CUDA module.static intvkGetCudaModuleCacheNV(org.lwjgl.vulkan.VkDevice device, long module, org.lwjgl.PointerBuffer pCacheSize, @Nullable ByteBuffer pCacheData) Get CUDA module cache.
-
Field Details
-
VK_NV_CUDA_KERNEL_LAUNCH_SPEC_VERSION
public static final int VK_NV_CUDA_KERNEL_LAUNCH_SPEC_VERSIONThe extension specification version.- See Also:
-
VK_NV_CUDA_KERNEL_LAUNCH_EXTENSION_NAME
The extension name.- See Also:
-
VK_STRUCTURE_TYPE_CUDA_MODULE_CREATE_INFO_NV
public static final int VK_STRUCTURE_TYPE_CUDA_MODULE_CREATE_INFO_NVExtendsVkStructureType.Enum values:
- See Also:
-
VK_STRUCTURE_TYPE_CUDA_FUNCTION_CREATE_INFO_NV
public static final int VK_STRUCTURE_TYPE_CUDA_FUNCTION_CREATE_INFO_NVExtendsVkStructureType.Enum values:
- See Also:
-
VK_STRUCTURE_TYPE_CUDA_LAUNCH_INFO_NV
public static final int VK_STRUCTURE_TYPE_CUDA_LAUNCH_INFO_NVExtendsVkStructureType.Enum values:
- See Also:
-
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_CUDA_KERNEL_LAUNCH_FEATURES_NV
public static final int VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_CUDA_KERNEL_LAUNCH_FEATURES_NVExtendsVkStructureType.Enum values:
- See Also:
-
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_CUDA_KERNEL_LAUNCH_PROPERTIES_NV
public static final int VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_CUDA_KERNEL_LAUNCH_PROPERTIES_NVExtendsVkStructureType.Enum values:
- See Also:
-
VK_OBJECT_TYPE_CUDA_MODULE_NV
public static final int VK_OBJECT_TYPE_CUDA_MODULE_NV- See Also:
-
VK_OBJECT_TYPE_CUDA_FUNCTION_NV
public static final int VK_OBJECT_TYPE_CUDA_FUNCTION_NV- See Also:
-
VK_DEBUG_REPORT_OBJECT_TYPE_CUDA_MODULE_NV_EXT
public static final int VK_DEBUG_REPORT_OBJECT_TYPE_CUDA_MODULE_NV_EXTExtendsVkDebugReportObjectTypeEXT.Enum values:
- See Also:
-
VK_DEBUG_REPORT_OBJECT_TYPE_CUDA_FUNCTION_NV_EXT
public static final int VK_DEBUG_REPORT_OBJECT_TYPE_CUDA_FUNCTION_NV_EXTExtendsVkDebugReportObjectTypeEXT.Enum values:
- See Also:
-
-
Method Details
-
nvkCreateCudaModuleNV
public static int nvkCreateCudaModuleNV(org.lwjgl.vulkan.VkDevice device, long pCreateInfo, long pAllocator, long pModule) Unsafe version of:CreateCudaModuleNV -
vkCreateCudaModuleNV
public static int vkCreateCudaModuleNV(org.lwjgl.vulkan.VkDevice device, VkCudaModuleCreateInfoNV pCreateInfo, @Nullable VkAllocationCallbacks pAllocator, LongBuffer pModule) Creates a new CUDA module object.C Specification
To create a CUDA module, call:
VkResult vkCreateCudaModuleNV( VkDevice device, const VkCudaModuleCreateInfoNV* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkCudaModuleNV* pModule);Description
Once a CUDA module has been created, the application may create the function entry point, which must refer to one function in the module.
Valid Usage (Implicit)
devicemust be a validVkDevicehandlepCreateInfomust be a valid pointer to a validVkCudaModuleCreateInfoNVstructure- If
pAllocatoris notNULL,pAllocatormust be a valid pointer to a validVkAllocationCallbacksstructure pModulemust be a valid pointer to aVkCudaModuleNVhandle
Return Codes
- On success, this command returns
- On failure, this command returns
See Also
- Parameters:
device- the logical device that creates the shader module.pCreateInfo- a pointer to aVkCudaModuleCreateInfoNVstructure.pAllocator- controls host memory allocation as described in the Memory Allocation chapter.pModule- a pointer to aVkCudaModuleNVhandle in which the resulting CUDA module object is returned.
-
nvkGetCudaModuleCacheNV
public static int nvkGetCudaModuleCacheNV(org.lwjgl.vulkan.VkDevice device, long module, long pCacheSize, long pCacheData) Unsafe version of:GetCudaModuleCacheNV- Parameters:
pCacheSize- a pointer containing the amount of bytes to be copied inpCacheData
-
vkGetCudaModuleCacheNV
public static int vkGetCudaModuleCacheNV(org.lwjgl.vulkan.VkDevice device, long module, org.lwjgl.PointerBuffer pCacheSize, @Nullable ByteBuffer pCacheData) Get CUDA module cache.C Specification
To get the CUDA module cache call:
VkResult vkGetCudaModuleCacheNV( VkDevice device, VkCudaModuleNV module, size_t* pCacheSize, void* pCacheData);Description
If
pCacheDataisNULL, then the size of the binary cache, in bytes, is returned inpCacheSize. Otherwise,pCacheSizemust point to a variable set by the application to the size of the buffer, in bytes, pointed to bypCacheData, and on return the variable is overwritten with the amount of data actually written topCacheData. IfpCacheSizeis less than the size of the binary shader code, nothing is written topCacheData, andINCOMPLETEwill be returned instead ofSUCCESS.The returned cache may then be used later for further initialization of the CUDA module, by sending this cache instead of the PTX code when using
CreateCudaModuleNV.Note
Using the binary cache instead of the original PTX code should significantly speed up initialization of the CUDA module, given that the whole compilation and validation will not be necessary.
As with
VkPipelineCache, the binary cache depends on the specific implementation. The application must assume the cache upload might fail in many circumstances and thus may have to get ready for falling back to the original PTX code if necessary. Most often, the cache may succeed if the same device driver and architecture is used between the cache generation from PTX and the use of this cache. In the event of a new driver version or if using a different device architecture, this cache may become invalid.Valid Usage (Implicit)
devicemust be a validVkDevicehandlemodulemust be a validVkCudaModuleNVhandlepCacheSizemust be a valid pointer to asize_tvalue- If the value referenced by
pCacheSizeis not 0, andpCacheDatais notNULL,pCacheDatamust be a valid pointer to an array ofpCacheSizebytes modulemust have been created, allocated, or retrieved fromdevice
Return Codes
- On success, this command returns
- On failure, this command returns
- Parameters:
device- the logical device that destroys the Function.module- the CUDA module.pCacheSize- a pointer containing the amount of bytes to be copied inpCacheDatapCacheData- a pointer to a buffer in which to copy the binary cache
-
nvkCreateCudaFunctionNV
public static int nvkCreateCudaFunctionNV(org.lwjgl.vulkan.VkDevice device, long pCreateInfo, long pAllocator, long pFunction) Unsafe version of:CreateCudaFunctionNV -
vkCreateCudaFunctionNV
public static int vkCreateCudaFunctionNV(org.lwjgl.vulkan.VkDevice device, VkCudaFunctionCreateInfoNV pCreateInfo, @Nullable VkAllocationCallbacks pAllocator, LongBuffer pFunction) Creates a new CUDA function object.C Specification
To create a CUDA function, call:
VkResult vkCreateCudaFunctionNV( VkDevice device, const VkCudaFunctionCreateInfoNV* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkCudaFunctionNV* pFunction);Valid Usage (Implicit)
devicemust be a validVkDevicehandlepCreateInfomust be a valid pointer to a validVkCudaFunctionCreateInfoNVstructure- If
pAllocatoris notNULL,pAllocatormust be a valid pointer to a validVkAllocationCallbacksstructure pFunctionmust be a valid pointer to aVkCudaFunctionNVhandle
Return Codes
- On success, this command returns
- On failure, this command returns
See Also
- Parameters:
device- the logical device that creates the shader module.pCreateInfo- a pointer to aVkCudaFunctionCreateInfoNVstructure.pAllocator- controls host memory allocation as described in the Memory Allocation chapter.pFunction- a pointer to aVkCudaFunctionNVhandle in which the resulting CUDA function object is returned.
-
nvkDestroyCudaModuleNV
public static void nvkDestroyCudaModuleNV(org.lwjgl.vulkan.VkDevice device, long module, long pAllocator) Unsafe version of:DestroyCudaModuleNV -
vkDestroyCudaModuleNV
public static void vkDestroyCudaModuleNV(org.lwjgl.vulkan.VkDevice device, long module, @Nullable VkAllocationCallbacks pAllocator) Destroy a CUDA module.C Specification
To destroy a CUDA shader module, call:
void vkDestroyCudaModuleNV( VkDevice device, VkCudaModuleNV module, const VkAllocationCallbacks* pAllocator);Valid Usage (Implicit)
devicemust be a validVkDevicehandlemodulemust be a validVkCudaModuleNVhandle- If
pAllocatoris notNULL,pAllocatormust be a valid pointer to a validVkAllocationCallbacksstructure modulemust have been created, allocated, or retrieved fromdevice
See Also
- Parameters:
device- the logical device that destroys the shader module.module- the handle of the CUDA module to destroy.pAllocator- controls host memory allocation as described in the Memory Allocation chapter.
-
nvkDestroyCudaFunctionNV
public static void nvkDestroyCudaFunctionNV(org.lwjgl.vulkan.VkDevice device, long function, long pAllocator) Unsafe version of:DestroyCudaFunctionNV -
vkDestroyCudaFunctionNV
public static void vkDestroyCudaFunctionNV(org.lwjgl.vulkan.VkDevice device, long function, @Nullable VkAllocationCallbacks pAllocator) Destroy a CUDA function.C Specification
To destroy a CUDA function handle, call:
void vkDestroyCudaFunctionNV( VkDevice device, VkCudaFunctionNV function, const VkAllocationCallbacks* pAllocator);Valid Usage (Implicit)
devicemust be a validVkDevicehandlefunctionmust be a validVkCudaFunctionNVhandle- If
pAllocatoris notNULL,pAllocatormust be a valid pointer to a validVkAllocationCallbacksstructure functionmust have been created, allocated, or retrieved fromdevice
See Also
- Parameters:
device- the logical device that destroys the Function.function- the handle of the CUDA function to destroy.pAllocator- controls host memory allocation as described in the Memory Allocation chapter.
-
nvkCmdCudaLaunchKernelNV
public static void nvkCmdCudaLaunchKernelNV(org.lwjgl.vulkan.VkCommandBuffer commandBuffer, long pLaunchInfo) Unsafe version of:CmdCudaLaunchKernelNV -
vkCmdCudaLaunchKernelNV
public static void vkCmdCudaLaunchKernelNV(org.lwjgl.vulkan.VkCommandBuffer commandBuffer, VkCudaLaunchInfoNV pLaunchInfo) Dispatch compute work items.C Specification
To record a CUDA kernel launch, call:
void vkCmdCudaLaunchKernelNV( VkCommandBuffer commandBuffer, const VkCudaLaunchInfoNV* pLaunchInfo);Description
When the command is executed, a global workgroup consisting of
gridDimX × gridDimY × gridDimZlocal workgroups is assembled.Valid Usage (Implicit)
commandBuffermust be a validVkCommandBufferhandlepLaunchInfomust be a valid pointer to a validVkCudaLaunchInfoNVstructurecommandBuffermust be in the recording state- The
VkCommandPoolthatcommandBufferwas allocated from must support graphics, or compute operations - This command must only be called outside of a video coding scope
Host Synchronization
- Host access to the
VkCommandPoolthatcommandBufferwas allocated from must be externally synchronized
Command Properties
Command Buffer Levels Render Pass Scope Video Coding Scope Supported Queue Types Command Type Primary Secondary Both Outside Graphics Compute Action See Also
- Parameters:
commandBuffer- the command buffer into which the command will be recorded.pLaunchInfo- a pointer to aVkCudaLaunchInfoNVstructure in which the grid (similar to workgroup) dimension, function handle and related arguments are defined.
-
vkCreateCudaModuleNV
public static int vkCreateCudaModuleNV(org.lwjgl.vulkan.VkDevice device, VkCudaModuleCreateInfoNV pCreateInfo, @Nullable VkAllocationCallbacks pAllocator, long[] pModule) Array version of:CreateCudaModuleNV -
vkCreateCudaFunctionNV
public static int vkCreateCudaFunctionNV(org.lwjgl.vulkan.VkDevice device, VkCudaFunctionCreateInfoNV pCreateInfo, @Nullable VkAllocationCallbacks pAllocator, long[] pFunction) Array version of:CreateCudaFunctionNV
-