我正在尝试从用户空间应用程序读取RAW GPU内存。这个想法是从应用程序中的mmap /sys/bus/pci/devices/[设备addr]/resource1
从该应用程序中进行负载和存储。
这里的设备是NVIDIA 3060TI,具有8GIB的板载内存。该栏配置为可解析的,因此所有内存的8GIB都应访问:
(base) [xps] pcimem git:(master) ✗ ls -lah /sys/bus/pci/devices/0000:01:00.0/resource*
-r--r--r-- 1 root root 4,0K avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource
-rw------- 1 root root 16M avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource0
-rw------- 1 root root 8,0G avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource1
-rw------- 1 root root 8,0G avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource1_wc
-rw------- 1 root root 32M avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource3
-rw------- 1 root root 32M avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource3_wc
-rw------- 1 root root 128 avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource5
使用 pcimem 行不通。将0写入一个位置将在下一个读取中返回零,但将返回 0x000000005665bdf5
在任何后续读取上。值 0x000000005665bdf5
在第一次读取后的所有位置均相同。
将这些(失败)读/写入的基准测试似乎表明它们确实确实达到了GPU。读取延迟约为900NS,接近PCIE往返时间。
我已经尝试了 mmap
直接将framebuffer(/dev/dev/fb0
)和读/写入。这起作用了,我看到类似的读/写延迟。但是,对于我的用户酶来说,框架缓冲区太小了。
CUDA不起作用,因为在从设备内存的读取中,GPU会将该页面移至主机。
有没有办法从Linux访问GPU上的内存?
我的目标是能够在用户空间应用程序中映射GPU的内存并将其用作内存扩展。用户空间应用程序(在CPU上运行)将直接在GPU内存上分配和访问数据结构。
tia
I'm trying to read the raw gpu memory from a userspace application. The idea is to mmap /sys/bus/pci/devices/[device addr]/resource1
from the application and do loads and stores to it.
The device here is an Nvidia 3060Ti with 8GiB of on-board memory. The BAR is configured to be resizable, so all 8GiB of the memory should be accessible:
(base) [xps] pcimem git:(master) ✗ ls -lah /sys/bus/pci/devices/0000:01:00.0/resource*
-r--r--r-- 1 root root 4,0K avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource
-rw------- 1 root root 16M avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource0
-rw------- 1 root root 8,0G avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource1
-rw------- 1 root root 8,0G avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource1_wc
-rw------- 1 root root 32M avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource3
-rw------- 1 root root 32M avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource3_wc
-rw------- 1 root root 128 avril 22 11:17 /sys/bus/pci/devices/0000:01:00.0/resource5
Accessing the memory using pcimem doesn't work. Writing 0 to a location would return zero on the next read, but would return 0x000000005665BDF5
on any subsequent reads. The value 0x000000005665BDF5
is same across all locations after the first read.
Benchmarking these (failed) reads/writes seem to suggest that they actually do reach the GPU. The read latency is around 900ns which is close to a PCIe round trip time.
I have tried mmap
ing the framebuffer directly (/dev/fb0
) and read/write to it. This works, and I see similar read/write latencies. But, the frame buffer is way too small for my usecase.
CUDA doesn't work because on a read from the device memory, GPU would move that page to the host.
Is there a way to access the memory on the GPU from Linux?
My goal here is to be able to map the GPU's memory in the userspace application and use it as memory expansion. The userspace application (running on the CPU) would allocate and access data-structures directly on the GPU's memory.
TIA
发布评论
评论(2)
似乎您可以使用GDRCOPY库或至少其内核驱动程序。来自网站:
It seems like you could use the GDRCopy library, or at least its kernel driver. From the website:
解决方案是使用Vulcan API在GPU上分配堆并访问它。但是,由于X86无法缓存MMIO地址,因此每个访问权限都将通过PCIE访问GPU。
该实现与NVIDIA的服务器解决方案的延迟差不多。
这是C ++中快速而肮脏的实现,它将GPU作为堆存储器抽象,并允许
malloc()
andfree()free()
。要查找堆类型,请检查:
您需要检查一下在调用
findmemorytype(findmemorytype()
fromcreatseverevertexbuffer()
The solution is to use vulcan API to allocate a heap on the GPU and access it. However, since x86 cannot cache MMIO addresses, every access would go to the GPU over the PCIe.
The implementation has about the same latency as Nvidia's server solution.
Here is a quick and dirty implementation in C++ that abstracts the GPU as a heap memory and allows
malloc()
andfree()
on it.To find out the heap types, check: http://vulkan.gpuinfo.org/displayreport.php?id=14928#memory
You'd need that to check which flag your GPU supports when making the call to
findMemoryType()
fromcreateVertexBuffer()