DirectX 12 - 根描述符无法正常工作

发布于 2025-01-12 01:44:33 字数 5753 浏览 0 评论 0原文

在我的测试应用程序中，我将模型、视图和投影矩阵作为 32 位常量传递给着色器。现在我想切换到根描述符以减少我的根签名大小。我想将两个常量缓冲区传递给着色器。第一个包含模型矩阵（一个 4x4 矩阵），第二个包含视图和投影矩阵（两个 4x4 矩阵）。然而，使用根描述符，视图和投影矩阵工作得非常好。一旦我将模型矩阵从 32 位常量切换到根描述符，场景就不再渲染，尽管两个常量缓冲区的过程完全相同。 DirectX 没有显示任何错误，即使在调试层也是如此。

根参数代码

// Root Parameter: "CB_ModelMatrix"
rootParameters[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV;
rootParameters[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX;
rootParameters[0].Descriptor.ShaderRegister = 0;
rootParameters[0].Descriptor.RegisterSpace = 0;

// Root Parameter: "CB_ViewProjectionMatrices"
rootParameters[1].ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV;
rootParameters[1].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX;
rootParameters[1].Descriptor.ShaderRegister = 1;
rootParameters[1].Descriptor.RegisterSpace = 0;

资源创建代码

// model matrix resource

D3D12_HEAP_PROPERTIES heapProperties = {};
heapProperties.Type = D3D12_HEAP_TYPE_UPLOAD;
heapProperties.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;
heapProperties.MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
heapProperties.CreationNodeMask = 1;
heapProperties.VisibleNodeMask = 1;

D3D12_RESOURCE_DESC resourceDescription = {};
resourceDescription.Dimension = D3D12_RESOURCE_DIMENSION_BUFFER;
resourceDescription.Alignment = 0;
resourceDescription.Width = (sizeof(t_ConstantBufferData_ModelMatrix) + 255) & ~255;
resourceDescription.Height = 1;
resourceDescription.DepthOrArraySize = 1;
resourceDescription.MipLevels = 1;
resourceDescription.Format = DXGI_FORMAT_UNKNOWN;
resourceDescription.SampleDesc.Count = 1;
resourceDescription.SampleDesc.Quality = 0;
resourceDescription.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR;
resourceDescription.Flags = D3D12_RESOURCE_FLAG_NONE;

ThrowIfFailed(g_GraphicsDevice->CreateCommittedResource(&heapProperties, D3D12_HEAP_FLAG_NONE,
  &resourceDescription, D3D12_RESOURCE_STATE_GENERIC_READ, nullptr, IID_PPV_ARGS(&g_ConstantBuffer_ModelMatrix)));

ThrowIfFailed(g_ConstantBuffer_ModelMatrix->Map(0, nullptr, reinterpret_cast<void**>(&g_ConstantBufferPointer_ModelMatrix)));

// view and projection matrices resource

D3D12_HEAP_PROPERTIES heapProperties = {};
heapProperties.Type = D3D12_HEAP_TYPE_UPLOAD;
heapProperties.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;
heapProperties.MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
heapProperties.CreationNodeMask = 1;
heapProperties.VisibleNodeMask = 1;

D3D12_RESOURCE_DESC resourceDescription = {};
resourceDescription.Dimension = D3D12_RESOURCE_DIMENSION_BUFFER;
resourceDescription.Alignment = 0;
resourceDescription.Width = (sizeof(t_ConstantBufferData_ViewProjectionMatrices) + 255) & ~255;
resourceDescription.Height = 1;
resourceDescription.DepthOrArraySize = 1;
resourceDescription.MipLevels = 1;
resourceDescription.Format = DXGI_FORMAT_UNKNOWN;
resourceDescription.SampleDesc.Count = 1;
resourceDescription.SampleDesc.Quality = 0;
resourceDescription.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR;
resourceDescription.Flags = D3D12_RESOURCE_FLAG_NONE;

ThrowIfFailed(g_GraphicsDevice->CreateCommittedResource(&heapProperties, D3D12_HEAP_FLAG_NONE,
  &resourceDescription, D3D12_RESOURCE_STATE_GENERIC_READ, nullptr, IID_PPV_ARGS(&g_ConstantBuffer_ViewProjectionMatrices)));

ThrowIfFailed(g_ConstantBuffer_ViewProjectionMatrices->Map(0, nullptr, reinterpret_cast<void**>(&g_ConstantBufferPointer_ViewProjectionMatrices)));

资源更新（模型矩阵）

t_ConstantBufferData_ModelMatrix CB_ModelMatrix = {};

// ...

std::memcpy(g_ConstantBufferPointer_ModelMatrix, &CB_ModelMatrix, sizeof(CB_ModelMatrix));
g_CommandList->SetGraphicsRootConstantBufferView(0, g_ConstantBuffer_ModelMatrix->GetGPUVirtualAddress());

资源更新（视图和投影矩阵）

t_ConstantBufferData_ViewProjectionMatrices CB_ViewProjectionMatrices = {};

// ...

std::memcpy(g_ConstantBufferPointer_ViewProjectionMatrices, &CB_ViewProjectionMatrices, sizeof(CB_ViewProjectionMatrices));
g_CommandList->SetGraphicsRootConstantBufferView(1, g_ConstantBuffer_ViewProjectionMatrices->GetGPUVirtualAddress());

顶点着色器中的常量缓冲区

struct t_ConstantBufferData_ModelMatrix
{
  float4x4 ModelMatrix;
};

struct t_ConstantBufferData_ViewProjectionMatrices
{
  float4x4 ViewMatrix;
  float4x4 ProjectionMatrix;
};

ConstantBuffer<t_ConstantBufferData_ModelMatrix> CB_ModelMatrix : register(b0, space0);
ConstantBuffer<t_ConstantBufferData_ViewProjectionMatrices> CB_ViewProjectionMatrices : register(b1, space0);

资源在程序开始时创建和映射，并在程序结束时取消映射。视图和投影矩阵资源每帧更新一次。由于我有多个具有不同转换的游戏对象，因此模型矩阵资源每帧更新多次。

我不明白为什么根描述符适用于视图和投影矩阵，但不适用于模型矩阵。也许我忽略了根描述符的基本机制？如果有人能告诉我我错过了什么，我将非常感激。

顺便问一个附带问题：在应用程序的整个生命周期内保持资源映射是否安全？我认为这样我可以实现更好的性能，而不是每次更新资源时都映射和取消映射资源。

编辑：

场景由 27 个立方体组成，它们在场景中心的 3x3x3 网格中渲染。其中一些立方体是在相机后面渲染的。在我使用相机更精确地探索场景并在图形调试工具的帮助下，我意识到只有最后一个立方体被渲染。为了渲染立方体网格，我在渲染函数中使用以下循环：

for (unsigned char t = 0; t < 27; t++)
{
  t_ConstantBufferData_ModelMatrix CB_ModelMatrix = {};
  CB_ModelMatrix.ModelMatrix = createModelMatrix(g_Transforms[t]);

  std::memcpy(g_ConstantBufferPointer_ModelMatrix, &CB_ModelMatrix, sizeof(CB_ModelMatrix));
  
  g_CommandList->SetGraphicsRootConstantBufferView(0, g_ConstantBuffer_ModelMatrix->GetGPUVirtualAddress());
  g_CommandList->DrawIndexedInstanced(36, 1, 0, 0, 0);
}

我根据当前立方体的变换创建一个模型矩阵，并将该矩阵复制到常量缓冲区。之后，绘制立方体。模型矩阵已正确创建，常量缓冲区的内存按预期更改。然而，只绘制了最后一个立方体。

我的假设是，命令列表中的绘制命令仅存储指向常量缓冲区的指针，并且在执行命令列表之前，缓冲区中仅存在最后一个立方体的模型矩阵。这可能是问题的原因吗？如果是这样，可以采取什么措施来解决？

原文

In my test application, I passed the model, view and projection matrices as 32 bit constants to the shaders. Now I wanted to switch to root descriptors in order to reduce my root signature size. I want to pass two constant buffers to shader. The first one contains the model matrix (one 4x4 matrix) and the second one contains the view and projection matrices (two 4x4 matrices). However, the view and projection matrices work absolutely fine using a root descriptor. As soon as I switched the model matrix from 32 bit constants to a root descriptor, the scene does not render anymore although the procedures are exactly the same for both of the constant buffers. DirectX shows no errors, not even in the debug layer.

Root Parameters Code

// Root Parameter: "CB_ModelMatrix"
rootParameters[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV;
rootParameters[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX;
rootParameters[0].Descriptor.ShaderRegister = 0;
rootParameters[0].Descriptor.RegisterSpace = 0;

// Root Parameter: "CB_ViewProjectionMatrices"
rootParameters[1].ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV;
rootParameters[1].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX;
rootParameters[1].Descriptor.ShaderRegister = 1;
rootParameters[1].Descriptor.RegisterSpace = 0;

Resource Creation Code

// model matrix resource

D3D12_HEAP_PROPERTIES heapProperties = {};
heapProperties.Type = D3D12_HEAP_TYPE_UPLOAD;
heapProperties.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;
heapProperties.MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
heapProperties.CreationNodeMask = 1;
heapProperties.VisibleNodeMask = 1;

D3D12_RESOURCE_DESC resourceDescription = {};
resourceDescription.Dimension = D3D12_RESOURCE_DIMENSION_BUFFER;
resourceDescription.Alignment = 0;
resourceDescription.Width = (sizeof(t_ConstantBufferData_ModelMatrix) + 255) & ~255;
resourceDescription.Height = 1;
resourceDescription.DepthOrArraySize = 1;
resourceDescription.MipLevels = 1;
resourceDescription.Format = DXGI_FORMAT_UNKNOWN;
resourceDescription.SampleDesc.Count = 1;
resourceDescription.SampleDesc.Quality = 0;
resourceDescription.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR;
resourceDescription.Flags = D3D12_RESOURCE_FLAG_NONE;

ThrowIfFailed(g_GraphicsDevice->CreateCommittedResource(&heapProperties, D3D12_HEAP_FLAG_NONE,
  &resourceDescription, D3D12_RESOURCE_STATE_GENERIC_READ, nullptr, IID_PPV_ARGS(&g_ConstantBuffer_ModelMatrix)));

ThrowIfFailed(g_ConstantBuffer_ModelMatrix->Map(0, nullptr, reinterpret_cast<void**>(&g_ConstantBufferPointer_ModelMatrix)));

// view and projection matrices resource

D3D12_HEAP_PROPERTIES heapProperties = {};
heapProperties.Type = D3D12_HEAP_TYPE_UPLOAD;
heapProperties.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;
heapProperties.MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
heapProperties.CreationNodeMask = 1;
heapProperties.VisibleNodeMask = 1;

D3D12_RESOURCE_DESC resourceDescription = {};
resourceDescription.Dimension = D3D12_RESOURCE_DIMENSION_BUFFER;
resourceDescription.Alignment = 0;
resourceDescription.Width = (sizeof(t_ConstantBufferData_ViewProjectionMatrices) + 255) & ~255;
resourceDescription.Height = 1;
resourceDescription.DepthOrArraySize = 1;
resourceDescription.MipLevels = 1;
resourceDescription.Format = DXGI_FORMAT_UNKNOWN;
resourceDescription.SampleDesc.Count = 1;
resourceDescription.SampleDesc.Quality = 0;
resourceDescription.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR;
resourceDescription.Flags = D3D12_RESOURCE_FLAG_NONE;

ThrowIfFailed(g_GraphicsDevice->CreateCommittedResource(&heapProperties, D3D12_HEAP_FLAG_NONE,
  &resourceDescription, D3D12_RESOURCE_STATE_GENERIC_READ, nullptr, IID_PPV_ARGS(&g_ConstantBuffer_ViewProjectionMatrices)));

ThrowIfFailed(g_ConstantBuffer_ViewProjectionMatrices->Map(0, nullptr, reinterpret_cast<void**>(&g_ConstantBufferPointer_ViewProjectionMatrices)));

Resource Updating (Model Matrix)

t_ConstantBufferData_ModelMatrix CB_ModelMatrix = {};

// ...

std::memcpy(g_ConstantBufferPointer_ModelMatrix, &CB_ModelMatrix, sizeof(CB_ModelMatrix));
g_CommandList->SetGraphicsRootConstantBufferView(0, g_ConstantBuffer_ModelMatrix->GetGPUVirtualAddress());

Resource Updating (View and Projection Matrices)

t_ConstantBufferData_ViewProjectionMatrices CB_ViewProjectionMatrices = {};

// ...

std::memcpy(g_ConstantBufferPointer_ViewProjectionMatrices, &CB_ViewProjectionMatrices, sizeof(CB_ViewProjectionMatrices));
g_CommandList->SetGraphicsRootConstantBufferView(1, g_ConstantBuffer_ViewProjectionMatrices->GetGPUVirtualAddress());

Constant Buffers in the Vertex Shader

struct t_ConstantBufferData_ModelMatrix
{
  float4x4 ModelMatrix;
};

struct t_ConstantBufferData_ViewProjectionMatrices
{
  float4x4 ViewMatrix;
  float4x4 ProjectionMatrix;
};

ConstantBuffer<t_ConstantBufferData_ModelMatrix> CB_ModelMatrix : register(b0, space0);
ConstantBuffer<t_ConstantBufferData_ViewProjectionMatrices> CB_ViewProjectionMatrices : register(b1, space0);

The resources are created and mapped at the beginning of the program and are unmapped at the end of it. The view and projection matrices resource is updated once per frame. The model matrix resource is updated multiple times per frame since I have multiple game objects with different transformations.

I can't figure out, why a root descriptor works for the view and projection matrices but wont work for the model matrix. Maybe I overlooked a basic mechanic of root descriptors? I would really appreciate it, if someone can tell me, what I missed.

By the way a side question: Is it safe to keep the resources mapped during the whole life time of the application? I assume I can achieve better performance with that instead of mapping and unmapping the resources every time, they are updated.

EDIT:

The scene consists of 27 cubes, that are rendered in 3x3x3 grid at the center of the scene. Some of those cubes are rendered behind the camera. After I explored the scene with the camera more precisely and with the help of graphics debugging tools I realized, that only the last cube is rendered. To render the grid of cubes I use the following loop in the render function:

for (unsigned char t = 0; t < 27; t++)
{
  t_ConstantBufferData_ModelMatrix CB_ModelMatrix = {};
  CB_ModelMatrix.ModelMatrix = createModelMatrix(g_Transforms[t]);

  std::memcpy(g_ConstantBufferPointer_ModelMatrix, &CB_ModelMatrix, sizeof(CB_ModelMatrix));
  
  g_CommandList->SetGraphicsRootConstantBufferView(0, g_ConstantBuffer_ModelMatrix->GetGPUVirtualAddress());
  g_CommandList->DrawIndexedInstanced(36, 1, 0, 0, 0);
}

I create a model matrix from the transformation of the current cube and copy the matrix to the constant buffer. After that, the cube is drawn. The model matrix is created correctly and the memory of the constant buffer changes as intended. However, only the last cube is drawn.

My assumption is, that the draw commands in the command list only store a pointer to the constant buffer and right before the command list is executed, only the model matrix of the last cube is present in the buffer. Could this be the cause of the issue? If so, what could be done to resolve it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

酷到爆炸 2025-01-19 01:44:33

简单的解决方案是让每个对象保留自己的常量缓冲区。请记住，您将需要 NUMBER_OF_OBJECTS * FRAME_COUNT 个 cbuffer，因为可能会覆盖 GPU 仍在使用的旧 cbuffer。（仅当您有超过 1 帧处于飞行状态时，即您没有在每帧结束时等待 GPU 完成）。

更好的解决方案是每次需要时从全局大上传缓冲区中进行分配。这里有两种策略：线性分配器或环形分配器。我将在这里解释线性的，但是您可以在帖子末尾看到两种实现的链接。

它可能看起来像这样：

template<typename CBuffer>
inline constexpr uint32_t GetCBufferSize()
{
    return (sizeof(CBuffer) + (D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT - 1)) & ~(D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT - 1);
}
struct Allocation
{
    ID3D12Resource* buffer = nullptr;
    void* cpu_address = nullptr;
    D3D12_GPU_VIRTUAL_ADDRESS gpu_address = 0;
    size_t offset = 0;
    size_t size = 0;

    void Update(void* data, size_t size)
    {
        memcpy(cpu_address, data, size);
    }

    template<typename T>
    void Update(T const& data)
    {
        memcpy(cpu_address, &data, sizeof(T));
    }
};

class UploadBuffer
{
public:

    UploadBuffer(ID3D12Device* device, SIZE_T max_size_in_bytes)
    {
    auto heap_properties = CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD);
        auto buffer_desc = CD3DX12_RESOURCE_DESC::Buffer(max_size_in_bytes);
        BREAK_IF_FAILED(device->CreateCommittedResource(
        &heap_properties,
        D3D12_HEAP_FLAG_NONE,
        &buffer_desc,
        D3D12_RESOURCE_STATE_GENERIC_READ,
        nullptr,
        IID_PPV_ARGS(&buffer)));

        CD3DX12_RANGE read_range(0, 0);
        BREAK_IF_FAILED(buffer->Map(0, &read_range, reinterpret_cast<void**>(&cpu_address)));
        gpu_address = buffer->GetGPUVirtualAddress();
    }

    Allocation Allocate(SIZE_T size_in_bytes, SIZE_T alignment)
    {
        offset_in_buffer = linear_allocator.Allocate(size_in_bytes, alignment);
        Allocation allocation{}; 
        allocation.buffer = buffer.Get();
        allocation.cpu_address = reinterpret_cast<uint8*>(cpu_address) + offset_in_buffer;
        allocation.gpu_address = gpu_address + offset_in_buffer;
        allocation.offset_in_buffer = offset_in_buffer;
        allocation.size = size_in_bytes;

        return allocation;
    }


    void Clear()
    {
        linear_allocator.Clear(); 
    }

private:
    LinearAllocator allocator;
    ComPtr<ID3D12Resource> buffer;
    uint8_t* cpu_address = nullptr;
    D3D12_GPU_VIRTUAL_ADDRESS gpu_address = 0;
};

使用示例：

//initalization
for(size_t i = 0; i < FRAMES_IN_FLIGHT; ++i)
{
    upload_buffers[i] = UploadBuffer(device, MAX_UPLOAD_BUFFER_SIZE);
}
//frame
UploadBuffer upload_buffer = GetUploadBufferForThisFrame();
upload_buffer.Clear(); 
//...
for(auto&& object : scene)
{
    //...
    model_matrix_cbuf.model_matrix = object.model_matrix;
    
    object_allocation = upload_buffer->Allocate(GetCBufferSize<ModelMatrixCBuffer>(), D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT);
    object_allocation.Update(model_matrix_cbuf);
    cmd_list->SetGraphicsRootConstantBufferView(0, object_allocation.gpu_address); //or whatever root parameter index your cbuffer is 
}

请记住，这有点简化，因此更容易理解。

此外，如果您的应用程序是多线程的，您将需要使用互斥锁或使用原子来保护分配器调用。例如，您可以看到这个repo，一些但不是全部相关文件将是： LinearUploadBuffer.h/cpp、RingUploadBuffer.h/cpp、LinearAllocator.h/cpp、RingAllocator.h/cpp、DynamicAllocation.h。

The simple solution would be to have each object keep their own constant buffer. Bear in mind that you would need NUMBER_OF_OBJECTS * FRAME_COUNT cbuffers since there would be a possibility of overwriting the old cbuffer that is still in use by the GPU. (That's only if you have more than 1 frame in flight, i.e. you are not waiting at the end of the each frame for GPU to finish).

Better solution would be to allocate each time you need from an global large upload buffer. Here you have two strategies: linear allocator or ring allocator. I am going to explain the linear one here, but you can see the link at the end of the post for both implementations.

It could look something like this:

template<typename CBuffer>
inline constexpr uint32_t GetCBufferSize()
{
    return (sizeof(CBuffer) + (D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT - 1)) & ~(D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT - 1);
}
struct Allocation
{
    ID3D12Resource* buffer = nullptr;
    void* cpu_address = nullptr;
    D3D12_GPU_VIRTUAL_ADDRESS gpu_address = 0;
    size_t offset = 0;
    size_t size = 0;

    void Update(void* data, size_t size)
    {
        memcpy(cpu_address, data, size);
    }

    template<typename T>
    void Update(T const& data)
    {
        memcpy(cpu_address, &data, sizeof(T));
    }
};

class UploadBuffer
{
public:

    UploadBuffer(ID3D12Device* device, SIZE_T max_size_in_bytes)
    {
    auto heap_properties = CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD);
        auto buffer_desc = CD3DX12_RESOURCE_DESC::Buffer(max_size_in_bytes);
        BREAK_IF_FAILED(device->CreateCommittedResource(
        &heap_properties,
        D3D12_HEAP_FLAG_NONE,
        &buffer_desc,
        D3D12_RESOURCE_STATE_GENERIC_READ,
        nullptr,
        IID_PPV_ARGS(&buffer)));

        CD3DX12_RANGE read_range(0, 0);
        BREAK_IF_FAILED(buffer->Map(0, &read_range, reinterpret_cast<void**>(&cpu_address)));
        gpu_address = buffer->GetGPUVirtualAddress();
    }

    Allocation Allocate(SIZE_T size_in_bytes, SIZE_T alignment)
    {
        offset_in_buffer = linear_allocator.Allocate(size_in_bytes, alignment);
        Allocation allocation{}; 
        allocation.buffer = buffer.Get();
        allocation.cpu_address = reinterpret_cast<uint8*>(cpu_address) + offset_in_buffer;
        allocation.gpu_address = gpu_address + offset_in_buffer;
        allocation.offset_in_buffer = offset_in_buffer;
        allocation.size = size_in_bytes;

        return allocation;
    }


    void Clear()
    {
        linear_allocator.Clear(); 
    }

private:
    LinearAllocator allocator;
    ComPtr<ID3D12Resource> buffer;
    uint8_t* cpu_address = nullptr;
    D3D12_GPU_VIRTUAL_ADDRESS gpu_address = 0;
};

Example of usage:

//initalization
for(size_t i = 0; i < FRAMES_IN_FLIGHT; ++i)
{
    upload_buffers[i] = UploadBuffer(device, MAX_UPLOAD_BUFFER_SIZE);
}
//frame
UploadBuffer upload_buffer = GetUploadBufferForThisFrame();
upload_buffer.Clear(); 
//...
for(auto&& object : scene)
{
    //...
    model_matrix_cbuf.model_matrix = object.model_matrix;
    
    object_allocation = upload_buffer->Allocate(GetCBufferSize<ModelMatrixCBuffer>(), D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT);
    object_allocation.Update(model_matrix_cbuf);
    cmd_list->SetGraphicsRootConstantBufferView(0, object_allocation.gpu_address); //or whatever root parameter index your cbuffer is 
}

Bear in mind this is somewhat simplified so it's easier to understand.

Also, if your application is multithreaded, you would need to protect allocator calls with mutex or use an atomic. For example of implementation, you can see this repo, some but not all relevant files would be: LinearUploadBuffer.h/cpp, RingUploadBuffer.h/cpp, LinearAllocator.h/cpp, RingAllocator.h/cpp, DynamicAllocation.h.

回复收藏 0 原文

最美不过初阳 2025-01-19 01:44:33

我再次做了一些研究，假设我的方法不正确。我搜索了使用不同变换多次渲染同一组顶点的技术。最后，我偶然发现了“实例化”或“实例化绘图”。我知道这种技术存在，但在我读过的大多数教程中，它被归类为“高级图形编程”，稍后将讨论。在此教程（最初是为 DirectX 11 编写，但可以非常轻松地移植到 DirectX 12）我能够使用 3D 对象的多个实例正确渲染场景。除此之外，我可以摆脱将模型矩阵作为根参数传递给图形管道的情况。当使用实例时，渲染性能也会提高。

对于那些不熟悉这个概念的人，这里有一个简短的总结：

如果需要绘制共享同一组顶点但具有不同变换（以及根据教程的一些其他属性）的多个对象，则使用实例化。无需使用单独的绘制调用来渲染每个对象，而是可以仅使用单个绘制调用来渲染所有这些对象。这需要通过管道的输入数据布局将一些额外的信息（例如转换）传递给着色器。这些布局参数可以配置为由着色器接收，而不是每个顶点而是每个实例。当需要渲染许多对象（数百或数千个对象）时，实例化将极大地提高性能。

技术细节

在实现这一概念时，可以使用多种技术。

实例数据可以直接在着色器中计算。这是通过根据实例编号操作顶点数据来完成的，可以使用 SV_InstanceID 着色器语义检索实例编号。这种方法可能不太灵活。

还可以使用常量缓冲区将附加数据传递到着色器。

最后一种方法是使用“实例缓冲区”。实例缓冲区的创建方式与顶点缓冲区完全相同。它是一种资源，必须创建、填充数据（例如转换数据）并上传到 GPU。此外，实例缓冲区需要在管道的输入布局中引用。渲染帧时，必须使用 ID3D12GraphicsCommandList::IASetVertexBuffers 函数在管道中设置此缓冲区。最后，必须调用 ID3D12GraphicsCommandList::DrawIndexedInstanced，它将实例计数作为参数。然后，数据的进一步处理取决于着色器的实现。

I did a bit of research again assuming, that my approach is not the right one. I searched about techniques to render the same set of vertices multiple times with different transformations. Finally, I stumbled upon "Instancing" or "Instanced Drawing". I knew, that such a technique existed but in most of the tutorials I read, it was classified as "advanced graphics programming", that will be discussed later. With the help of this tutorial (which was originally written for DirectX 11 but could be ported really easily to DirectX 12) I was able to render the scene correctly using multiple instances of my 3D object. In addition to that, I could get rid of passing my model matrices as root parameters to the graphics pipeline. Also the rendering performance increases when instancing is utilized.

For those, who are not familiar with this concept, here a short summary:

Instancing is used, if multiple objects, that share the same set of vertices but have different transformations (and some other properties according to the tutorial), need to be drawn. Instead of rendering each object with a separate draw call, all of these objects can be rendered with only a single draw call. This requires to pass some extra information (like transformations) to the shader via the input data layout of the pipeline. These layout parameters can be configured, to be received by the shaders not per vertex but per instance. When many objects need to be rendered (hundreds or thousands of objects), instancing will increase the performance drastically.

Technical Details

When implementing this concept, multiple techniques are available.

The instance data can be calculated directly in the shader. This is done by manipulating the vertex data according to the instance number, which can be retrieved using the SV_InstanceID shader semantic. This approach may not be very flexible.

Additional data can also be passed to the shader using a constant buffer.

The last approach is to use an "instance buffer". An instance buffer is created exactly the same as a vertex buffer. It is a resource, that must be created, filled with data (e.g. transformation data) and uploaded to the GPU. Also, the instance buffer needs to be referenced in the input layout of the pipeline. When a frame is rendered, this buffer must be set in the pipeline using the ID3D12GraphicsCommandList::IASetVertexBuffers function. Finally, ID3D12GraphicsCommandList::DrawIndexedInstanced must be called, which takes the instance count as a parameter. The further processing of the data is then up to the shader implementation.

回复收藏 0 原文

~没有更多了~