为了使从两个32位计时器计数器中读取64位计时器值
在读取之间插入ARM64内存屏障的正确方法是什么?
像以下是正确的吗?在这种情况下,有人可以解释如何以及哪些数据记忆障碍?
do {
high1 = read(base+4);
asm volatile("dmb sy");
low = read(base);
asm volatile("dmb sy");
high2 = read(base+4);
asm volatile("dmb sy");
} while (high2 != high1);
我知道有关如何阅读64位计时器的问题读取两个-32位级别-AS-A-64BIT-INTEGER-WITHEG-WITHIT-WITHET-WITHET-CONDITION”>如何将两个32位计数器读取为没有种族条件的64位整数
For the sequence to read 64bit timer value from two 32bit timer counters mentioned in
https://developer.arm.com/documentation/100400/0001/multiprocessing/global-timer/global-timer-registers
What is the correct way to insert ARM64 memory barriers between the reads?
Is something like below proper? Can someone please explain how and what data memory barriers to use in this case?
do {
high1 = read(base+4);
asm volatile("dmb sy");
low = read(base);
asm volatile("dmb sy");
high2 = read(base+4);
asm volatile("dmb sy");
} while (high2 != high1);
I know question on how to read 64bit timer already exists but there is no detail of memory barrier usage there and I need this for ARM machines - How to read two 32bit counters as a 64bit integer without race condition
发布评论
评论(1)
有不同类型的内存映射。每种类型都定义了如何进行内存访问,并可能对阅读/写作进行重新排序。
例如,在这种情况下重新排序时,指令序列
high1 = read(base+4); low = read(base);
由CPU执行,例如low = read(base); high1 =读(基本+4);
。从性能的角度来看,这是完全合理的。在CPU试图执行的阶段(HIGH2!= HIGH1);
一般来说,首先分配了哪个寄存器,首先是“低”或“ high1”。基本上,CPU根本不知道2个单词之间的相互依赖性。对于这种64位价值的情况,我们应该采取额外的步骤来防止CPU删除此寄存器依赖性。
首先,“最正确”的方法是将计时器映射为“设备”内存。通常,所有硬件映射的内存都是“设备”内存。 “设备”内存映射保证严格的内存顺序。因此,CPU不会对内存阅读(或写作或两者兼而有之)的任何重新排序,并且它始终将是
high1
,low
,high2
。设备内存也是不可接受的。在这种情况下,它并不重要,但是对于使用DMA的某些东西,可以从维护缓存记忆一致性中保存。总而言之,在这种情况下,任何同步壁垒都是多余的。如果一个人想遇到麻烦,则可以将硬件映射为“通用”/“常见”内存。
对于“通用”内存重新排序。因此,您可能会结束以下情况。假设我们具有像
0000-9999
(小数点,高的4位,低4位数字)的计数器价值。high1 = read(base+4); low = read(base);
被重新排序并执行为low = read(base); high1 = read(base+4);
9999
,在读取完成计时器后。0001-0000
0001
0001-9999
读取
High2
将再次给出0001
,并且从这一点开始生活变得非常有趣。因此,如我所见,有必要防止重新排序读取
high1
和low
,以及low
和high2
,因为在两种情况下>高)。因此,我会说
PS:看起来您不需要严格的订购,例如
sy
,非常最小的订购可以保证在特定CPU上订购的订购就足够了。There are different types of memory mapping. Each type defines how memory access is made and possible reordering of reading/writing.
Reordering in this case for example when instruction sequence
high1 = read(base+4); low = read(base);
is performed by CPU likelow = read(base); high1 = read(base+4);
. And that's perfectly reasonable from performance point of view. At stage when CPU trying to executewhile (high2 != high1);
generally it does not matter what register was assigned first 'low' or 'high1'. Basically CPU simply is not aware about interdependence between 2 words.For this 64bit value situation, we should take extra steps to prevent CPU to remove this register dependency.
First and 'the most right' way is to map timer as 'Device' memory. Usually all hardware mapped memory is made 'device' memory. 'Device' memory mapping guaranty strict memory ordering. So CPU would not do any reordering of memory reading (or writing or both) and it's always will be
high1
,low
,high2
. Device memory is also uncacheable. It does not matter in this case but for something using DMA for instance, that saves from maintain cache-memory consistency. As a conclusion, any sync barriers for 'device' memory are redundant in this case.If one want to go for troubles, hardware might be mapped as 'generic'/'common' memory.
For 'generic' memory reordering is allowed. So you might finish with following situation. Say we have counter value like
0000-9999
(decimal, 4digits for high and 4 digits for low).high1 = read(base+4); low = read(base);
is reordered and executed aslow = read(base); high1 = read(base+4);
9999
, after reading is finished timer is incremented.0001-0000
0001
0001-9999
Reading
high2
would give0001
again and life getting very interesting from this point.So as I see it's necessary to prevent reordering of reading
high1
andlow
, as well aslow
andhigh2
because we could get0001-9999
situation in both cases (well for second case it would be high1=0000, high2=0000 and low=0000 with missing0001
placed inhigh
).So I'd say
PS: it does not look like you need such strict ordering as
sy
, very minimal one that guarantee ordering on specific CPU should be sufficient.