从两个32位计时器读取64位计时器值时,正确的ARM64(AARCH64)数据内存屏障用法是什么?

发布于 2025-01-22 08:37:10 字数 621 浏览 2 评论 0 原文

为了使从两个32位计时器计数器中读取64位计时器值

在读取之间插入ARM64内存屏障的正确方法是什么?

像以下是正确的吗?在这种情况下,有人可以解释如何以及哪些数据记忆障碍?

do {
  high1 = read(base+4);
  asm volatile("dmb sy");
  low = read(base);
  asm volatile("dmb sy");
  high2 = read(base+4);
  asm volatile("dmb sy");
} while (high2 != high1);

我知道有关如何阅读64位计时器的问题读取两个-32位级别-AS-A-64BIT-INTEGER-WITHEG-WITHIT-WITHET-WITHET-CONDITION”>如何将两个32位计数器读取为没有种族条件的64位整数

For the sequence to read 64bit timer value from two 32bit timer counters mentioned in
https://developer.arm.com/documentation/100400/0001/multiprocessing/global-timer/global-timer-registers

What is the correct way to insert ARM64 memory barriers between the reads?

Is something like below proper? Can someone please explain how and what data memory barriers to use in this case?

do {
  high1 = read(base+4);
  asm volatile("dmb sy");
  low = read(base);
  asm volatile("dmb sy");
  high2 = read(base+4);
  asm volatile("dmb sy");
} while (high2 != high1);

I know question on how to read 64bit timer already exists but there is no detail of memory barrier usage there and I need this for ARM machines - How to read two 32bit counters as a 64bit integer without race condition

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

沙沙粒小 2025-01-29 08:37:10

有不同类型的内存映射。每种类型都定义了如何进行内存访问,并可能对阅读/写作进行重新排序。

例如,在这种情况下重新排序时,指令序列 high1 = read(base+4); low = read(base); 由CPU执行,例如 low = read(base); high1 =读(基本+4); 。从性能的角度来看,这是完全合理的。在CPU试图执行的阶段(HIGH2!= HIGH1); 一般来说,首先分配了哪个寄存器,首先是“低”或“ high1”。基本上,CPU根本不知道2个单词之间的相互依赖性。

对于这种64位价值的情况,我们应该采取额外的步骤来防止CPU删除此寄存器依赖性。

首先,“最正确”的方法是将计时器映射为“设备”内存。通常,所有硬件映射的内存都是“设备”内存。 “设备”内存映射保证严格的内存顺序。因此,CPU不会对内存阅读(或写作或两者兼而有之)的任何重新排序,并且它始终将是 high1 low high2 。设备内存也是不可接受的。在这种情况下,它并不重要,但是对于使用DMA的某些东西,可以从维护缓存记忆一致性中保存。总而言之,在这种情况下,任何同步壁垒都是多余的

如果一个人想遇到麻烦,则可以将硬件映射为“通用”/“常见”内存。
对于“通用”内存重新排序。因此,您可能会结束以下情况。假设我们具有像 0000-9999 (小数点,高的4位,低4位数字)的计数器价值。

  • high1 = read(base+4); low = read(base); 被重新排序并执行为 low = read(base); high1 = read(base+4);
  • low读取为 9999 ,在读取完成计时器后。
  • 现在计时器是 0001-0000
  • 高读数为 0001
  • ,我们有 0001-9999
    读取 High2 将再次给出 0001 ,并且从这一点开始生活变得非常有趣。

因此,如我所见,有必要防止重新排序读取 high1 low ,以及 low high2 ,因为在两种情况下>高)。

因此,我会说

do {
  high1 = read(base+4);
  asm volatile("dmb sy");
  low = read(base);
  asm volatile("dmb sy");
  high2 = read(base+4);
  // asm volatile("dmb sy"); This looks like excessive
} while (high2 != high1);

PS:看起来您不需要严格的订购,例如 sy ,非常最小的订购可以保证在特定CPU上订购的订购就足够了。

There are different types of memory mapping. Each type defines how memory access is made and possible reordering of reading/writing.

Reordering in this case for example when instruction sequence high1 = read(base+4); low = read(base); is performed by CPU like low = read(base); high1 = read(base+4);. And that's perfectly reasonable from performance point of view. At stage when CPU trying to execute while (high2 != high1); generally it does not matter what register was assigned first 'low' or 'high1'. Basically CPU simply is not aware about interdependence between 2 words.

For this 64bit value situation, we should take extra steps to prevent CPU to remove this register dependency.

First and 'the most right' way is to map timer as 'Device' memory. Usually all hardware mapped memory is made 'device' memory. 'Device' memory mapping guaranty strict memory ordering. So CPU would not do any reordering of memory reading (or writing or both) and it's always will be high1, low, high2. Device memory is also uncacheable. It does not matter in this case but for something using DMA for instance, that saves from maintain cache-memory consistency. As a conclusion, any sync barriers for 'device' memory are redundant in this case.

If one want to go for troubles, hardware might be mapped as 'generic'/'common' memory.
For 'generic' memory reordering is allowed. So you might finish with following situation. Say we have counter value like 0000-9999 (decimal, 4digits for high and 4 digits for low).

  • high1 = read(base+4); low = read(base); is reordered and executed as low = read(base); high1 = read(base+4);
  • low is read as 9999, after reading is finished timer is incremented.
  • now timer is 0001-0000
  • high is read as 0001
  • and we have 0001-9999
    Reading high2 would give 0001 again and life getting very interesting from this point.

So as I see it's necessary to prevent reordering of reading high1 and low, as well as low and high2 because we could get 0001-9999 situation in both cases (well for second case it would be high1=0000, high2=0000 and low=0000 with missing 0001 placed in high).

So I'd say

do {
  high1 = read(base+4);
  asm volatile("dmb sy");
  low = read(base);
  asm volatile("dmb sy");
  high2 = read(base+4);
  // asm volatile("dmb sy"); This looks like excessive
} while (high2 != high1);

PS: it does not look like you need such strict ordering as sy, very minimal one that guarantee ordering on specific CPU should be sufficient.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文