PCIE Linux 内核驱动程序中的流式 DMA
我正在为 Linux 内核开发 FPGA 驱动程序。代码似乎在 x86 上运行良好,但在 x86_64 上我遇到了一些问题。我实现了流 DMA。所以它就像
get_user_pages(...);
for (...) {
sg_set_page();
}
pci_map_sg();
但是 pci_map_sg
返回了类似 0xbd285800
的地址,这些地址没有按 PAGE_SIZE
对齐,所以我无法发送完整的第一页,因为PCIE规范说
“请求不得指定地址/长度组合,这会导致 内存空间访问跨越 4 KB 边界。”
有没有办法获得对齐的地址,或者我只是错过了一些重要的事情?
I'm working on FPGA driver for Linux kernel. Code seems to work fine on x86, but on x86_64 I've got some problems. I implemented streaming DMA. So it goes like
get_user_pages(...);
for (...) {
sg_set_page();
}
pci_map_sg();
But pci_map_sg
returned addresses like 0xbd285800
, which are not aligned by PAGE_SIZE
, so I can't send full first page, because PCIE specification says
"Requests must not specify an Address/Length combination which causes
a Memory Space access to cross a 4-KB boundary."
Is there any way to get aligned addresses, or did I just missed something important?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我想到的第一个可能性是传入的用户缓冲区不是从页面边界开始的。如果您的起始地址是一页中的 0x800 字节,则第一个 sg_set_page 调用的偏移量将为 0x800。这将产生一个以 0x800 结尾的 DMA 地址。这是正常现象,而不是错误。
当
pci_map_sg
合并页面时,第一个段可能大于一个页面。重要的是,pci_map_sg
生成连续的 DMA 可寻址内存块,但它不会生成低级 PCIe 事务列表。在 x64 上,您更有可能获得较大的区域,因为大多数 x64 平台都有 IOMMU。我处理的许多设备都有 DMA 引擎,允许我指定几兆字节的逻辑传输长度。通常,PCIe 端点中的 DMA 实现负责在每个 4kB 边界处启动一个新的 PCIe 事务,程序员可以忽略该约束。如果 FPGA 中的资源太有限而无法处理该问题,您可以考虑编写驱动程序代码,将 Linux 内存块列表转换为(更长的)PCIe 事务列表。
The first possibility that comes to mind is that the user buffer coming in does not start on a page boundary. If your start address is 0x800 bytes through a page, then the offset on your first
sg_set_page
call will be 0x800. This will produce a DMA address ending in 0x800. This is a normal thing to happen, and not a bug.As
pci_map_sg
coalesces pages, this first segment may be larger than one page. The important thing is thatpci_map_sg
produces contiguous blocks of DMA addressable memory, but it does not produce a list of low-level PCIe transactions. On x64 you are more likely to get a large region, because most x64 platforms have an IOMMU.Many devices I deal with have DMA engines that allow me to specify a logical transfer length of several megabytes. Normally the DMA implementation in the PCIe endpoint is responsible for starting a new PCIe transaction at each 4kB boundary, and the programmer can ignore that constraint. If resources in the FPGA are too limited to handle that, you can consider writing driver code to convert the Linux list of memory blocks into a (much longer) list of PCIe transactions.