Ray Tracing Acceleration Structure Builder Extension#

API#

Ray Tracing Acceleration Structure Builder#

The Ray Tracing Acceleration Structure Builder extension provides the functionality to build ray tracing acceleration structures (RTAS) for 3D scenes on the host for use with GPU devices.

It is the user’s responsibility to manage the acceleration structure buffer and scratch buffer resources. The required sizes may be queried via zeRTASBuilderGetBuildPropertiesExp. Once built, an acceleration structure is a self-contained entity; any input resources may be released after the successful construction. Note that acceleration structures are non-copyable resources.

Scene Data#

To build an acceleration structure, first setup a scene that consists of one or more geometry infos.

The following example creates a ze_rtas_builder_triangles_geometry_info_exp_t to specify a triangle mesh:

std::vector<ze_rtas_triangle_indices_uint32_exp_t> triangleIndexBuffer;
std::vector<ze_rtas_float3_exp_t> triangleVertexBuffer;

// Populate vertex and index buffers
{
    // ...
}

ze_rtas_builder_triangles_geometry_info_exp_t mesh;
memset(&mesh, 0, sizeof(mesh));

mesh.geometryType = ZE_RTAS_BUILDER_GEOMETRY_TYPE_EXP_TRIANGLES;
mesh.geometryFlags = 0;
mesh.geometryMask = 0xFF;

mesh.triangleFormat = ZE_RTAS_BUILDER_INPUT_DATA_FORMAT_EXP_TRIANGLE_INDICES_UINT32;
mesh.triangleCount = triangleIndexBuffer.size();
mesh.triangleStride = sizeof(ze_rtas_triangle_indices_uint32_exp_t);
mesh.pTriangleBuffer = triangleIndexBuffer.data();

mesh.vertexFormat = ZE_RTAS_BUILDER_INPUT_DATA_FORMAT_EXP_FLOAT3;
mesh.vertexCount = triangleVertexBuffer.size();
mesh.vertexStride = sizeof(ze_rtas_float3_exp_t);
mesh.pVertexBuffer = triangleVertexBuffer.data();

Geometry is considered to be opaque by default, enabling a fast mode where traversal does not return to the caller of ray tracing for each triangle or quad hit. To process each triangle or quad hit by some any-hit shader, the geometryFlags member of the geometry infos must include the ZE_RTAS_BUILDER_GEOMETRY_EXP_FLAG_NON_OPAQUE flag. The proper data formats of the triangle index- and vertex- buffers are specified, including the strides, and a pointer to the first element for each buffer.

To refer to multiple geometries that make a scene, pointers to geometry info structures can be put into an array as follows:

std::vector<ze_rtas_builder_geometry_info_exp_t*> geometries;
geometries.push_back((ze_rtas_builder_geometry_info_exp_t*)&mesh0);
geometries.push_back((ze_rtas_builder_geometry_info_exp_t*)&mesh1);
...

This completes the definition of the geometry for the scene for which to construct the acceleration structure.

Device Properties#

The next step is to query the target device for acceleration structure properties.

ze_rtas_device_exp_properties_t rtasDeviceProps;
rtasDeviceProps.stype = ZE_STRUCTURE_TYPE_RTAS_DEVICE_EXP_PROPERTIES;
rtasDeviceProps.pNext = nullptr;

ze_device_properties_t deviceProps;
deviceProps.stype = ZE_STRUCTURE_TYPE_DEVICE_PROPERTIES;
deviceProps.pNext = &rtasDeviceProps;

zeDeviceGetProperties(hDevice, &deviceProps);

The device properties contain information (a device-specific ray tracing acceleration structure format) that is required to complete an RTAS build operation.

Acceleration Structure Builder#

With the scene data prepared and relevant device properties known, create a ray tracing acceleration structure builder object and query for the necessary build properties.

ze_rtas_builder_exp_desc_t desc;
desc.stype = ZE_STRUCTURE_TYPE_RTAS_BUILDER_EXP_DESC;
desc.pNext = nullptr;
desc.builderVersion = ZE_RTAS_BUILDER_EXP_VERSION_CURRENT;

ze_rtas_builder_exp_handle_t hBuilder = nullptr;
ze_result_t result = zeRTASBuilderCreateExp(hDriver, &desc, &hBuilder);
assert(result == ZE_RESULT_SUCCESS);

ze_rtas_builder_exp_properties_t builderProps;
builderProps.stype = ZE_STRUCTURE_TYPE_RTAS_BUILDER_EXP_PROPERTIES;
builderProps.pNext = nullptr;

ze_rtas_builder_build_op_exp_desc_t buildOpDesc;
buildOpDesc.stype = ZE_STRUCTURE_TYPE_RTAS_BUILDER_BUILD_OP_EXP_DESC;
buildOpDesc.pNext = nullptr;
buildOpDesc.rtasFormat = rtasDeviceProps.rtasFormat;
buildOpDesc.buildQuality = ZE_RTAS_BUILDER_BUILD_QUALITY_HINT_EXP_MEDIUM;
buildOpDesc.buildFlags = 0;
buildOpDesc.ppGeometries = geometries.data();
buildOpDesc.numGeometries = geometries.size();

result = zeRTASBuilderGetBuildPropertiesExp(hBuilder, &buildOpDesc, &builderProps);
assert(result == ZE_RESULT_SUCCESS);

Note, the parameters of the build operation descriptor, such as acceleration structure build quality, affect the buffer requirements, etc.

An application may create and use a single RTAS builder object, as multiple concurrent build operations may be performed with a single such object.

Buffers#

With the builder properties along with everything else known at this point, the resources for the acceleration structure may be allocated.

Scratch Buffer#

A system memory scratch buffer is required to perform the build operation. It is used by the implementation for intermediate storage.

void* pScratchBuffer = malloc(builderProps.scratchBufferSizeBytes);

Acceleration Structure Buffer#

The acceleration structure buffer is where the ray tracing acceleration structure is written to. It must be accessible on the host as well as the device; consequently, it must be allocated as a USM resource. This example uses the worst-case sizing.

ze_raytracing_mem_alloc_ext_desc_t rtasMemAllocDesc;
rtasMemAllocDesc.stype = ZE_STRUCTURE_TYPE_DEVICE_RAYTRACING_EXT_PROPERTIES;
rtasMemAllocDesc.pNext = nullptr;
rtasMemAllocDesc.flags = 0;

ze_device_mem_alloc_desc_t deviceMemAllocDesc;
deviceMemAllocDesc.stype = ZE_STRUCTURE_TYPE_DEVICE_MEM_ALLOC_DESC;
deviceMemAllocDesc.pNext = &rtasMemAllocDesc;
deviceMemAllocDesc.flags = ZE_DEVICE_MEM_ALLOC_FLAG_BIAS_CACHED;
deviceMemAllocDesc.ordinal = 0;

ze_host_mem_alloc_desc_t hostMemAllocDesc;
hostMemAllocDesc.stype = ZE_STRUCTURE_TYPE_HOST_MEM_ALLOC_DESC;
hostMemAllocDesc.pNext = nullptr;
hostMemAllocDesc.flags = ZE_HOST_MEM_ALLOC_FLAG_BIAS_CACHED;

void* pRtasBuffer = nullptr;
result = zeMemAllocShared(hContext, &deviceMemAllocDesc, &hostMemAllocDesc, builderProps.rtasBufferSizeBytesMaxRequired, rtasDeviceProps.rtasBufferAlignment, hDevice, &pRtasBuffer);
assert(result == ZE_RESULT_SUCCESS);

Executing an Acceleration Structure Build#

Single-Threaded Build#

A single-threaded acceleration structure build on the host is initiated using zeRTASBuilderBuildExp.

result = zeRTASBuilderBuildExp(hBuilder, &buildOpDesc, pScratchBuffer, builderProps.scratchBufferSizeBytes, pRtasBuffer, builderProps.rtasBufferSizeBytesMaxRequired, nullptr, nullptr, nullptr, nullptr);
assert(result == ZE_RESULT_SUCCESS);

When the build completes successfully the acceleration structure buffer is ready for use by the ray tracing API.

Parallel Build#

In order to speed up the build operation using multiple worker threads, a parallel operation object can be associated with the build operation and joined with the application-provided worker threads as in the following example:

Note The following example uses oneTBB to dispatch worker threads, but this is not a requirement.

ze_rtas_parallel_operation_exp_handle_t hParallelOperation = nullptr;
result = zeRTASParallelOperationCreateExp(hDriver, &hParallelOperation);
assert(result == ZE_RESULT_SUCCESS);

// Initiate the acceleration structure build operation with a handle
// of a parallel operation object. This causes the parallel operation to be
// bound to the build operation and the function returns immediately without
// building any acceleration structure yet.
result = zeRTASBuilderBuildExp(hBuilder, &buildOpDesc, pScratchBuffer, builderProps.scratchBufferSizeBytes, pRtasBuffer, builderProps.rtasBufferSizeBytesMaxRequired, hParallelOperation, nullptr, nullptr, nullptr);
assert(result == ZE_RESULT_EXP_RTAS_BUILD_DEFERRED);

// Once the parallel operation is bound to the build operation the number
// of worker threads to join the parallel operation can be queried.
ze_rtas_parallel_operation_exp_properties_t parallelOpProps;
parallelOpProps.stype = ZE_STRUCTURE_TYPE_RTAS_PARALLEL_OPERATION_EXP_PROPERTIES;
parallelOpProps.pNext = nullptr;

result = zeRTASParallelOperationGetPropertiesExp(hParallelOperation, &parallelOpProps);
assert(result == ZE_RESULT_SUCCESS);

// Now worker threads can join the build operation to perform the actual build
// of the acceleration structure.
tbb::parallel_for(0, parallelOpProps.maxConcurrency, 1, [&](uint32_t i) {
    ze_result_t buildResult = zeRTASParallelOperationJoinExp(hParallelOperation);
    assert(buildResult == ZE_RESULT_SUCCESS);
});

// With the parallel operation complete, the parallel operation object can be released.
result = zeRTASParallelOperationDestroyExp(hParallelOperation);
assert(result == ZE_RESULT_SUCCESS);

Note that the number of worker threads to be used can only be queried from the parallel operation object after it is bound to the build operation by the call to zeRTASBuilderBuildExp.

Conservative Acceleration Structure Buffer Size#

Sizing the acceleration structure buffer using the rtasBufferSizeBytesMaxRequired member of ze_rtas_builder_exp_properties_t guarantees that the build operation will not fail due to an out-of-memory condition. However, this size represents the memory requirement for the worst-case scenario and is larger than is typically needed. To reduce memory usage, the application may attempt to execute a build using an acceleration structure buffer sized to the rtasBufferSizeBytesExpected member of ze_rtas_builder_exp_properties_t. When using the expected size, however, it is possible for the build operation to fail with ZE_RESULT_EXP_RTAS_BUILD_RETRY. If this occurs, the application may resize the acceleration structure buffer with an updated size estimate provided by the builder build API.

ze_result_t result;

void* pRtasBuffer = nullptr;
size_t rtasBufferSizeBytes = builderProps.rtasBufferSizeBytesExpected;

while (true)
{
    pRtasBuffer = allocate_accel_buffer(rtasBufferSizeBytes);

    result = zeRTASBuilderBuildExp(hBuilder, &buildOpDesc, pScratchBuffer, builderProps.scratchBufferSizeBytes, pRtasBuffer, rtasBufferSizeBytes, nullptr, nullptr, nullptr, &rtasBufferSizeBytes);

    if (result == ZE_RESULT_SUCCESS)
    {
        break;
    }

    assert(result == ZE_RESULT_EXP_RTAS_BUILD_RETRY);

    free_accel_buffer(pRtasBuffer);
}

The loop starts with the minimum acceleration buffer size for which the build will mostly likely succeed. If the build runs out of memory, ZE_RESULT_EXP_RTAS_BUILD_RETRY is returned and the build is retried with a larger acceleration structure buffer.

The example above passes a pointer to the rtasBufferSizeBytes variable as a parameter to the build API, which it will update with a larger acceleration structure buffer size estimate to be used in the next attempt should the build operation fail. Alternatively, the application could increase the acceleration buffer size for the next attempt by some percentage, which could fail again, or just use the maximum size from the builder properties for the second attempt.

Cleaning Up#

Once the acceleration structure has been built, any resources associated with the build may be released. Additionally, any parallel operation objects should be destroyed as well as any builder objects.

// Free the scratch buffer
free(pScratchBuffer);

// Destroy the builder object
zeRTASBuilderDestroyExp(hBuilder);

// Use the acceleration structure buffer with the ray tracing API
{
    // ...
}

// Release the acceleration structure buffer once it is no longer needed
zeMemFree(hContext, pRtasBuffer);
pRtasBuffer = nullptr;