“cpuid”在“rdtsc”之前

发布于 2024-09-02 18:19:13 字数 131 浏览 3 评论 0原文

有时我会遇到使用 rdtsc 指令读取 TSC 的代码,但之前调用 cpuid 。

为什么需要调用cpuid?我意识到这可能与具有 TSC 值的不同内核有关,但是当您按顺序调用这两个指令时,到底会发生什么?

Sometimes I encounter code that reads TSC with rdtsc instruction, but calls cpuid right before.

Why is calling cpuid necessary? I realize it may have something to do with different cores having TSC values, but what exactly happens when you call those two instructions in sequence?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

污味仙女 2024-09-09 18:19:13

这是为了防止乱序执行。这段文字来自现已从网络上消失的链接(但在消失之前被偶然复制到此处),这段文字来自约翰·埃克达尔 (John Eckerdal) 撰写的一篇题为“性能监控”的文章:

Pentium Pro 和 Pentium II 处理器支持乱序执行指令,可以按照您编程时的其他顺序执行。如果不加以注意,这可能会成为错误的根源。

为了防止这种情况,程序员必须序列化指令队列。这可以通过在 RDTSC 指令之前插入一条序列化指令(如 CPUID 指令)来完成。

It's to prevent out-of-order execution. From a link that has now disappeared from the web (but which was fortuitously copied here before it disappeared), this text is from an article entitled "Performance monitoring" by one John Eckerdal:

The Pentium Pro and Pentium II processors support out-of-order execution instructions may be executed in another order as you programmed them. This can be a source of errors if not taken care of.

To prevent this the programmer must serialize the the instruction queue. This can be done by inserting a serializing instruction like CPUID instruction before the RDTSC instruction.

最偏执的依靠 2024-09-09 18:19:13

原因有两个:

  • 正如 paxdiablo 所说,当 CPU 看到 CPUID 操作码时,它会确保执行所有先前的指令,然后在执行任何后续指令之前获取 CPUID。如果没有这样的指令,CPU 执行管道可能会在您想要计时的指令之前执行 TSC。
  • 很大一部分机器无法跨内核同步 TSC 寄存器。如果您想从马口中阅读它 - 请访问 http://msdn.microsoft.com/en-us/library/ee417693%28VS.85%29.aspx。因此,在测量 TSC 读数之间的间隔时,除非它们是在同一核心上获取的,否则您将引入一个有效随机但可能恒定(见下文)的间隔 - 即使在启动后不久,它也可能很容易达到几秒(是的几秒) 。这有效地反映了 BIOS 在启动其他核心之前在单个核心上运行的时间,此外,如果您打开了任何令人讨厌的节能选项,则会增加因以不同频率运行的核心或再次关闭而导致的漂移。因此,如果您尚未将读取 TSC 寄存器的线程固定到同一核心,那么您需要构建某种跨核心增量表并按顺序了解每个 TSC 样本的核心 ID(由 CPUID 返回)来补偿这个偏移量。这是您可以在 RDTSC 旁边看到 CPUID 的另一个原因,实际上也是为什么使用较新的 RDTSCP 许多操作系统将核心 ID 号存储到返回的额外 TSC_AUX[31:0] 数据中的原因。 (Core i7 和 Athlon 64 X2 提供,RDTSCP 在各个方面都是一个更好的选择 - 操作系统通常会为您提供如上所述的核心 ID,对于 TSC 读取来说是原子的,并且防止指令重新排序)。

Two reasons:

  • As paxdiablo says, when the CPU sees a CPUID opcode it makes sure all the previous instructions are executed, then the CPUID taken, before any subsequent instructions execute. Without such an instruction, the CPU execution pipeline may end up executing TSC before the instruction(s) you'd like to time.
  • A significant proportion of machines fail to synchronise the TSC registers across cores. In you want to read it from a horse's mouth - knock yourself out at http://msdn.microsoft.com/en-us/library/ee417693%28VS.85%29.aspx. So, when measuring an interval between TSC readings, unless they're taken on the same core you'll have an effectively random but possibly constant (see below) interval introduced - it can easily be several seconds (yes seconds) even soon after bootup. This effectively reflects how long the BIOS was running on a single core before kicking off the others, plus - if you've any nasty power saving options on - increasing drift caused by cores running at different frequencies or shutting down again. So, if you haven't nailed the threads reading TSC registers to the same core then you'll need to build some kind of cross-core delta table and know the core id (which is returned by CPUID) of each TSC sample in order to compensate for this offset. That's another reason you can see CPUID alongside RDTSC, and indeed a reason why with newer RDTSCP many OSes are storing core id numbers into the extra TSC_AUX[31:0] data returned. (Available from Core i7 and Athlon 64 X2, RDTSCP is a much better option in all respects - the OS normally gives you the core id as mentioned, atomic to the TSC read, and prevent instruction reordering).
你与清晨阳光 2024-09-09 18:19:13

CPUID 正在序列化,防止 RDTSC 乱序执行。

现在您可以安全地使用 LFENCE 来代替。它被记录为在 Intel CPU 上的指令流上进行序列化(但不存储到内存),现在在 Spectre 的微代码更新后也在 AMD 上进行序列化。

https://hadibrais.wordpress .com/2018/05/14/the-significance-of-the-x86-lfence-instruction/ 解释了有关 LFENCE 的更多信息。

另请参阅https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code- execution-paper.pdf 了解使用 RDTSCP 将 CPUID(或 LFENCE)排除在定时区域之外的方法:

LFENCE     ; (or CPUID) Don't start the timed region until everything above has executed
RDTSC           ; EDX:EAX = timestamp
mov  ebx, eax   ; low 32 bits of start time

   code under test

RDTSCP     ; built-in one way barrier stops it from running early
LFENCE     ; (or CPUID) still use a barrier after to prevent anything weird
sub  eax, ebx   ; low 32 bits of end-start

另请参阅获取 CPU 周期计数? 了解有关 RDTSC 警告的更多信息,例如constant_tsc 和 nonstop_tsc。

作为奖励,RDTSCP 会为您提供一个核心 ID。如果您想检查核心迁移,您也可以使用 RDTSCP 作为开始时间。但如果您的 CPU 具有 constant_tsc 功能,则软件包中的所有内核都应同步其 TSC,因此在现代 x86 上通常不需要此功能。

正如 @Tony 的回答指出的那样,您可以从 CPUID 获取核心 ID。

CPUID is serializing, preventing out-of-order execution of RDTSC.

These days you can safely use LFENCE instead. It's documented as serializing on the instruction stream (but not stores to memory) on Intel CPUs, and now also on AMD after their microcode update for Spectre.

https://hadibrais.wordpress.com/2018/05/14/the-significance-of-the-x86-lfence-instruction/ explains more about LFENCE.

See also https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf for a way to use RDTSCP that keeps CPUID (or LFENCE) out of the timed region:

LFENCE     ; (or CPUID) Don't start the timed region until everything above has executed
RDTSC           ; EDX:EAX = timestamp
mov  ebx, eax   ; low 32 bits of start time

   code under test

RDTSCP     ; built-in one way barrier stops it from running early
LFENCE     ; (or CPUID) still use a barrier after to prevent anything weird
sub  eax, ebx   ; low 32 bits of end-start

See also Get CPU cycle count? for more about RDTSC caveats, like constant_tsc and nonstop_tsc.

As a bonus, RDTSCP gives you a core ID. You could use RDTSCP for the start time as well, if you want to check for core migration. But if your CPU has the constant_tsc features, all cores in the package should have their TSCs synced so you typically don't need this on modern x86.

You could get the core ID from CPUID instead, as @Tony's answer points out.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文