如何确定SSE预取指令大小?
我正在使用包含 SSE 预取指令的内联汇编的代码。预处理器常量确定是否使用 32、64 或 128 字节预取的指令。该应用程序在多种平台上使用,到目前为止,我必须在每种情况下调查对于给定 CPU 来说哪一个是最佳选择。据我所知,这是缓存行大小。这些信息可以自动获取吗?它似乎没有明确存在于 /proc/cpuinfo 中。
I am working with code which contains inline assembly for SSE prefetch instructions. A preprocessor constant determines whether the instructions for 32-, 64- or 128-bye prefetches are used. The application is used on a wide variety of platforms, and so far I have had to investigate in each case which is the best option for the given CPU. I understand that this is the cache line size. Is this information obtainable automatically? It doesn't seem to be explicitly present in /proc/cpuinfo.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您的问题与此 问题 或 这个。我认为很明显 - 除非您可以依赖操作系统或库函数 - 您将需要使用 CPUID 指令,但问题就变成了您正在寻找什么信息。 - 当然,AMD 和 Intel 的实现不需要达成一致。此页面建议使用Cpuid.1.EBX[15:8](例如,BH)用于在 Intel 上查找并在 AMD 上查找函数 80000005h。另外,在Intel上,CPUID.2...似乎包含相关信息,但解析出所需信息看起来确实很痛苦。
我认为,根据我的阅读,AMD 和 Intel CPUID 指令都将支持 CPUID.1.EBX[15:8],它返回 QUADWORD 中的一个缓存行的大小,由CLFLUSH 指令(并非所有处理器上都存在,所以我不知道您是否总能在那里找到一些东西)。 因此,执行 CPUID.1 后,您必须将 BH 乘以 8 才能获得以字节为单位的缓存行大小。 这取决于我的隐含假设(请问有人可以说它是否真的有效吗? )对于 CLFLUSH 和 PREFETCHh 指令,一个高速缓存行大小的定义始终相同。
此外,英特尔的 手册 指出 PREFETCHh 只是一个提示,但,如果它预取任何内容,则始终最小为 32 字节。
编辑1:
用于优化 PREFETCHh 的另一个有用资源(即使没有直接回答您的问题)是英特尔的优化手册 此处。
I think your question is related to this question or this one. I think it is clear that - unless you can rely on a OS or library-function - you will want to use the CPUID instruction, but the question then becomes exactly what information you are looking for. - And of course, AMD's and Intel's implementations don't need to agree. This page suggests using Cpuid.1.EBX[15:8] (i.e., BH) for finding out on Intel and function 80000005h on AMD. In addition, on Intel, CPUID.2... seems to contain the relevant information, but it looks like a real pain to parse out the desired information.
I think, from what I've read, both AMD and Intel CPUID instructions will support CPUID.1.EBX[15:8], which returns the size of one cache line in QUADWORDs as used by the CLFLUSH instruction (which isn't present on all processors, so I don't know whether you'll always find something there). So, after executing CPUID.1, you'd have to multiply BH by 8 to get the cache line size in bytes. This hinges on my implicit assumption (please can anyone say whether it is really valid?) that the definition of one cache line size is always the same for CLFLUSH and PREFETCHh instructions.
Also, Intel's manuals states that PREFETCHh is only a hint, but that, if it prefetches anything, it will always be a minimum of 32 bytes.
EDIT1:
Another useful resource (even if not directly answering your question) for the optimised use of PREFETCHh is Intel's optimisation manual here.