当前位置：文江博客话题详情

Linux linux-kernel c linux-device-driver

将 DMA 缓冲区映射到用户空间

发布于 2024-09-11 09:10:48 字数 1539 浏览 18 评论 0原文

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

分分钟 2024-09-18 09:10:48

简而言之，这是我使用的...

get_user_pages 来固定用户页面并为您提供一个 struct page * 指针数组。

每个struct page *上的dma_map_page以获取该页的DMA地址（也称为“I/O地址”）。这还会创建 IOMMU 映射（如果您的平台需要）。

现在告诉您的设备使用这些 DMA 地址对内存执行 DMA。显然它们可以是不连续的；内存仅保证页面大小的倍数是连续的。

dma_sync_single_for_cpu 进行任何必要的缓存刷新或反弹缓冲区位块传送或其他操作。此调用保证 CPU 实际上可以看到 DMA 的结果，因为在许多系统上，修改 CPU 后面的物理 RAM 会导致缓存失效。

dma_unmap_page 释放 IOMMU 映射（如果您的平台需要）。

put_page 取消固定用户页面。

请注意，您必须从头到尾检查错误，因为各处的资源都是有限的。 get_user_pages 对于彻底错误 (-errno) 返回一个负数，但它可以返回一个正数来告诉您它实际管理了多少页（物理内存不是无限的）。如果这小于您的请求，您仍然必须循环遍历它固定的所有页面，以便在它们上调用put_page。（否则您将泄漏内核内存；非常糟糕。）

dma_map_page 也可能返回错误 (-errno)，因为 IOMMU 映射是另一个有限资源。

dma_unmap_page 和 put_page 返回 void，与 Linux“释放”函数一样。（Linux 内核资源管理例程只返回错误，因为确实出了问题，而不是因为您搞砸了并传递了错误的指针或其他东西。基本假设是您永远不会搞砸，因为这是内核代码尽管 get_user_pages 确实会检查以确保用户地址的有效性，并且如果用户向您提供了错误的指针，则会返回错误。）

如果您想要一个友好的界面，您也可以考虑使用 _sg 函数。分散/聚集。然后，您将调用dma_map_sg而不是dma_map_page，dma_sync_sg_for_cpu而不是dma_sync_single_for_cpu等。

另请注意，其中许多函数在您的平台上可能或多或少是无操作的，因此您通常可以避免马虎。（特别是， dma_sync_... 和 dma_unmap_... 在我的 x86_64 系统上不执行任何操作。）但是在这些平台上，调用本身不会编译成任何内容，因此没有理由马虎。

Here is what I have used, in brief...

get_user_pages to pin the user page(s) and give you an array of struct page * pointers.

dma_map_page on each struct page * to get the DMA address (aka. "I/O address") for the page. This also creates an IOMMU mapping (if needed on your platform).

Now tell your device to perform the DMA into the memory using those DMA addresses. Obviously they can be non-contiguous; memory is only guaranteed to be contiguous in multiples of the page size.

dma_sync_single_for_cpu to do any necessary cache flushes or bounce buffer blitting or whatever. This call guarantees that the CPU can actually see the result of the DMA, since on many systems, modifying physical RAM behind the CPU's back results in stale caches.

dma_unmap_page to free the IOMMU mapping (if it was needed on your platform).

put_page to un-pin the user page(s).

Note that you must check for errors all the way through here, because there are limited resources all over the place. get_user_pages returns a negative number for an outright error (-errno), but it can return a positive number to tell you how many pages it actually managed to pin (physical memory is not limitless). If this is less than you requested, you still must loop through all of the pages it did pin in order to call put_page on them. (Otherwise you are leaking kernel memory; very bad.)

dma_map_page can also return an error (-errno), because IOMMU mappings are another limited resource.

dma_unmap_page and put_page return void, as usual for Linux "freeing" functions. (Linux kernel resource management routines only return errors because something actually went wrong, not because you screwed up and passed a bad pointer or something. The basic assumption is that you are never screwing up because this is kernel code. Although get_user_pages does check to ensure the validity of the user addresses and will return an error if the user handed you a bad pointer.)

You can also consider using the _sg functions if you want a friendly interface to scatter/gather. Then you would call dma_map_sg instead of dma_map_page, dma_sync_sg_for_cpu instead of dma_sync_single_for_cpu, etc.

Also note that many of these functions may be more-or-less no-ops on your platform, so you can often get away with being sloppy. (In particular, dma_sync_... and dma_unmap_... do nothing on my x86_64 system.) But on those platforms, the calls themselves get compiled into nothing, so there is no excuse for being sloppy.

回复收藏 0 原文

爱要勇敢去追 2024-09-18 09:10:48

好的，这就是我所做的。

免责声明：我是一名纯粹意义上的黑客，我的代码并不是最漂亮的。

我阅读了 LDD3 和 infiniband 源代码以及其他前身的东西，并认为 get_user_pages 并固定它们以及所有其他繁琐的事情在宿醉时思考起来太痛苦了。此外，我还与其他人一起通过 PCIe 总线进行工作，并且还负责“设计”用户空间应用程序。

我编写了驱动程序，以便在加载时，它通过调用函数 myAddr[i] = pci_alloc_concient(blah,size,&pci_addr[i]) 来预分配尽可能多的具有最大大小的缓冲区直到失败。（失败 -> myAddr[i] 是 NULL 我想，我忘了）。我能够分配大约 2.5GB 的缓冲区，在我那台只有 4GiB 内存的小机器上，每个缓冲区大小为 4MiB。当然，缓冲区的总数根据内核模块加载的时间而变化。在启动时加载驱动程序并分配大部分缓冲区。在我的系统中，每个单独缓冲区的大小最大为 4MiB。不知道为什么。我 catted /proc/buddyinfo 以确保我没有做任何愚蠢的事情，这当然是我通常的起始模式。

然后，驱动程序将 pci_addr 数组及其大小提供给 PCIe 设备。然后驱动程序就坐在那里等待中断风暴开始。同时在用户空间中，应用程序打开驱动程序，查询分配的缓冲区数量（n）及其大小（使用ioctl或read等），然后继续调用系统调用 mmap() 多次 (n)。当然，mmap() 必须在驱动程序中正确实现，LDD3 第 422-423 页很方便。

用户空间现在有 n 个指针，指向驱动程序内存的 n 个区域。当驱动程序被 PCIe 设备中断时，它会被告知哪些缓冲区“已满”或“可用”以被吸干。应用程序又等待 read() 或 ioctl() 来告知哪些缓冲区充满了有用数据。

棘手的部分是管理用户空间到内核空间的同步，以便 PCIe DMA 进入的缓冲区也不会被用户空间修改，但这就是我们得到的报酬。我希望这是有道理的，如果有人告诉我我是个白痴，我会非常高兴，但请告诉我为什么。

顺便说一下，我也推荐这本书： http://www.amazon.com/ Linux编程接口系统手册/dp/1593272200。我希望七年前当我编写第一个 Linux 驱动程序时就拥有这本书。

还有另一种可能的欺骗方式，即添加更多内存而不让内核使用它，并在用户空间/内核空间划分的两侧进行 mmap ping，但 PCI 设备还必须支持高于 32 位的DMA 寻址。我没有尝试过，但如果我最终被迫这样做，我不会感到惊讶。

OK, this is what I did.

Disclaimer: I'm a hacker in the pure sense of the word and my code ain't the prettiest.

I read LDD3 and infiniband source code and other predecessor stuff and decided that get_user_pages and pinning them and all that other rigmarole was just too painful to contemplate while hungover. Also, I was working with the other person across the PCIe bus and I was also responsible in "designing" the user space application.

I wrote the driver such that at load time, it preallocates as many buffers as it can with the largest size by calling the function myAddr[i] = pci_alloc_consistent(blah,size,&pci_addr[i]) until it fails. (failure -> myAddr[i] is NULL I think, I forget). I was able to allocate around 2.5GB of buffers, each 4MiB in size in my meagre machine which only has 4GiB of memory. The total number of buffers varies depending on when the kernel module is loaded of course. Load the driver at boot time and the most buffers are allocated. Each individual buffer's size maxed out at 4MiB in my system. Not sure why. I catted /proc/buddyinfo to make sure I wasn't doing anything stupid which is of course my usual starting pattern.

The driver then proceeds to give the array of pci_addr to the PCIe device along with their sizes. The driver then just sits there waiting for the interrupt storm to begin. Meanwhile in userspace, the application opens the driver, queries the number of allocated buffers(n) and their sizes (using ioctls or reads etc) and then proceeds to call the system call mmap() multiple (n) times. Of course mmap() must be properly implemented in the driver and LDD3 pages 422-423 were handy.

Userspace now has n pointers to n areas of driver memory. As the driver is interrupted by the PCIe device, it's told which buffers are "full" or "available" to be sucked dry. The application in turn is pending on a read() or ioctl() to be told which buffers are full of useful data.

The tricky part was to manage the userspace to kernel space synchronization such that buffers which are being DMA's into by the PCIe are not also being modified by userspace but that's what we get paid for. I hope this makes sense and I'd be more than happy to be told I'm an idiot but please tell me why.

I recommend this book as well by the way: http://www.amazon.com/Linux-Programming-Interface-System-Handbook/dp/1593272200 . I wish I had that book seven years ago when I wrote my first Linux driver.

There is another type of trickery possible by adding even more memory and not letting the kernel use it and mmapping on both sides of the userspace/kernelspace divide but the PCI device must also support higher than 32-bit DMA addressing. I haven't tried but I wouldn't be surprised if I'll eventually be forced to.

回复收藏 0 原文

初相遇 2024-09-18 09:10:48

好吧，如果你有 LDD，你可以看看第 15 章，更准确地说是第 435 页，其中描述了直接 I/O 操作。

可以帮助您实现此目的的内核调用是 get_user_pages。在您的情况下，由于您想要将数据从内核发送到用户空间，因此应该将写入标志设置为 1。

另请注意，异步 I/O 可能允许您实现相同的结果，但用户空间应用程序不必等待读完这可能会更好。

回复收藏 0 原文

爺獨霸怡葒院 2024-09-18 09:10:48

仔细看看 Infiniband 驱动程序。他们付出了很大的努力来使零复制 DMA 和 RDMA 能够在用户空间工作。

我忘记在保存之前添加这一点：

直接对用户空间内存映射进行 DMA 充满了问题，因此除非您有非常高的性能要求（例如 Infiniband 或 10 Gb 以太网），否则不要这样做。相反，将 DMA 数据复制到用户空间缓冲区中。它会为你省去很多悲伤。

仅举一个例子，如果用户的程序在 DMA 完成之前退出怎么办？如果用户内存在退出后被重新分配给另一个进程，但硬件仍设置为 DMA 进入该页面怎么办？灾难！

回复收藏 0 原文

心病无药医 2024-09-18 09:10:48

remap_pfn_range 函数（在驱动程序中的 mmap 调用中使用）可用于将内核内存映射到用户空间。

一个真实的例子可以在mem字符驱动drivers/char/mem.c。

回复收藏 0 原文

~没有更多了~

关于作者

囚你心

暂无简介

文章

497 人气

关注发私信

友情链接

文江博客

将 DMA 缓冲区映射到用户空间

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

将 DMA 缓冲区映射到用户空间

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。