ENQCMD指令的好处和微观是什么?

发布于 2025-02-04 13:13:44 字数 1219 浏览 3 评论 0原文

enqcmd movdir64b 是Intel DSA中的两个指令。

movdir64b从源内存地址读取64个字样,并对目标地址执行64个字节的直接商店操作。 ENQCMD指令允许软件编写命令以使用内存映射I/O(MMIO)访问的特殊设备寄存器。

我的问题是 - 设计这两个说明的目的是什么?

根据我的理解,设置内存映射的IO区域(寄存器)需要操作系统支持,即设备驱动程序。设置MMIO区域后,我们可以使用write()系统调用访问它,该调用也已在设备驱动程序中实现。对于一般体系结构,Linux支持iowrite64()一次编写8字节值。因此,如果我们想编写64个字节,则需要调用iowrite64() 8次。

With the help of MOVDIR64B, for Intel DSA, a

我同意后一个至少比前一个更有效,但是我对传输数据所需的时间感到困惑。

考虑以下情况:如果我们给出了支持movdir64benqcmd的设备(Intel DSA),则假设我们想将64个数据从内存传输到MMIO寄存器。有两个选项:iowrite64() 8次(使用循环);或__ iowrite512()一次。后一个会比上一个快8倍吗?

我的想法是,差异不大8倍,但后者的差异会更快。我可以知道它会更快吗?它在任何地方记录了吗?我没有Intel DSA,所以我不确定如何测试它。

此外,enqcmd还有什么其他好处?它会分解为几个微型操作吗?如果是,那么enqcmd的微型操作是什么?

ENQCMD and MOVDIR64B are two instructions in Intel DSA.

MOVDIR64B reads 64-bytes from the source memory address and performs a 64-byte direct-store operation to the destination address. The ENQCMD instruction allows software to write commands to enqueue registers, which are special device registers accessed using memory-mapped I/O (MMIO).

My question is - what is the aim of designing those two instructions?

Based on my understanding, setting up the memory-mapped IO area (the register) requires OS support, i.e. the device driver. After setting up the MMIO area, we could access it using write() system call, which is also implemented in the device driver. For general architectures, Linux supports iowrite64() to write 8-byte values at a time. Hence, if we want to write 64 bytes, needs to call iowrite64() 8 times.

With the help of MOVDIR64B, for Intel DSA, a new API is created - __iowrite512() which writes 64 bytes atomically.

I agree that the latter one is at least more efficient than the previous one, but I am confused about the time it requires to transfer data.

Consider the following case: if we are given a device (Intel DSA) that supports MOVDIR64B and ENQCMD, suppose we want to transfer 64 bytes of data from memory to MMIO register. There are two options: iowrite64() 8 times (using a loop); or __iowrite512() once. Will the latter one be 8 times faster than the previous one?

My thoughts is that it is less likely to be 8 times difference, but the latter one will be faster. May I know how faster it would be? Is it documented anywhere? I do not have Intel DSA, so I am not sure how to test it.

Besides, what other benefits do ENQCMD have? Will it be broken up into several micro operations? If yes, then what are the micro operations that does ENQCMD?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

转瞬即逝 2025-02-11 13:13:44

IOWRITE64使用UC访问MMIO空间的访问,因此写入序列化,而不是管道。也就是说,只有一个UC写入可以在一次CPU线程中一次进行飞行,并且在MMIO写入完成之前,CPU不会继续执行。

MovDir64b具有比单个Iowrite64更快的潜力,因为它使用WC内存类型而不是UC(即使目标地址为UC)。 CPU发出写入后,它可以继续执行。可以将多个直接商店流传输到设备。这意味着可以从单个CPU线程一次飞行多个直接商店。 Movdiri也是如此。

据我所知,无论大小如何(1至64个字节),将数据实际传输到目的地的时间是相同的。当然,这取决于SOC中数据路径的宽度,这对于不同的实现可能会有所不同。

Movdir64b的主要优点是描述符立即到达设备,而不是零件。该设备不必担心接收部分描述符或接收两个描述符的零件。实际上,英特尔DSA忽略了将小于64个字节的写入门户。

为了实现流媒体写入的全部好处,每个CPU线程中每个Movdir64b的目标地址应不同。每个Intel DSA门户都是4096字节页面,因此每个门户内部都有64个唯一地址。描述符从单个CPU中写入可以在64个地址中划分。 (从多个CPU写入相同的地址或不同的地址都没关系,但是通常您不会期望多个CPU在DSA中使用相同的专用WQ。)

ENQCMD允许设备响应该软件是否接受软件是否接受了该设备描述符与否。这允许多个应用程序使用相同的共享WQ,而不会丢失描述符的风险,因为共享的WQ已满。应用程序可以提交描述符,而无需任何驱动程序参与(设置后),并且应用程序之间没有任何锁定或通信。

iowrite64 uses a UC access to MMIO space, so writes are serialized, not pipelined. That is, only one UC write can be in flight at a time from a single CPU thread, and the CPU doesn't continue execution until the MMIO write is complete.

MOVDIR64B has the potential to be faster than even a single iowrite64, because it uses the WC memory type instead of UC (even if the destination address is UC). After the write is issued by the CPU, it can continue execution. Multiple direct stores can be streamed to the device. That means that multiple direct stores can be in flight at one time from a single CPU thread. MOVDIRI also behaves this way.

As far as I know, the time to actually transfer the data to the destination is the same regardless of the size (between 1 and 64 bytes). Of course that is dependent on the width of the data path within the SoC, which could be different for different implementations.

The main advantage of MOVDIR64B is that the descriptor arrives at the device all at once instead of in pieces. The device doesn't have to worry about receiving a partial descriptor or receiving parts of two descriptors interleaved. In fact, Intel DSA ignores writes smaller than 64 bytes to a portal.

To realize the full benefit of streaming writes, the destination address for each MOVDIR64B from a single CPU thread should be different. Each Intel DSA portal is a 4096-byte page, so there are 64 unique addresses within each portal. Descriptor writes from a single CPU can be striped across the 64 addresses. (It doesn't matter whether writes from multiple CPUs use the same address or different addresses, but normally you would not expect multiple CPUs to be using the same dedicated WQ in DSA.)

ENQCMD allows the device to respond to software whether it accepted the descriptor or not. This allows multiple applications to use the same shared WQ without risk of a descriptor being lost because the shared WQ is full. Applications can submit descriptors without any driver involvement (after setup), and without any lock or communication between the applications.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文