当前位置：文江博客话题详情

与其他ARM说明相比

发布于 2025-02-06 16:07:34 字数 1057 浏览 3 评论 0原文

较新的ARM架构参考手册不再提供说明时间。（至少对于早期的ARM2和ARM3芯片，指示时间给出了）。

我知道缓存失误会导致外部内存访问非常缓慢，例如添加x0，x1，x2或bic x0，x1，x2之类的数据指令。

但是，L1缓存速度有多快？

如果答案是“取决于...”，那将是一个粗略的猜测（球场）数字？

启用缓存（显然）。 “平面”内存映射（即虚拟地址=物理地址）。

我想答案还取决于所使用的精确硬件。而且应该简单地编写测试用例，并测量特定的时机感兴趣...

我对ARMV8 Raspberry Pi模型感兴趣 - 我不拥有。（我正在使用QEMU）。

我也对任何其他时间感兴趣，例如：

ADD x0, xzr, xzr         ; == 1

ADD d0, d1, d2           ; floating-point

LDR x0, [x2]             ; L1 cache hit
LDR x0, [x2]             ; L1 cache miss, L2 cache hit
LDR x0, [x2]             ; L1 cache miss, L2 cache miss

LDP x0, x1, [x2]         ; L1 cache hit
LDP x0, x1, [x2]         ; L1 cache miss, L2 cache hit
LDP x0, x1, [x2]         ; L1 cache miss, L2 cache miss

基本上，我真正想知道的是“什么时候何时从内存中加载值而不是计算它？（ON a raspberry pi 4b）“

有页面缓存和主内存？，但这是指英特尔芯片。

原文

The newer ARM Architecture Reference Manuals don't give instruction timings any more. (Instruction timings were given, at least for the early ARM2 and ARM3 chips).

I know that cache misses result in external memory accesses that are very slow, compared with, say, data instructions like ADD x0, x1, x2 or BIC x0, x1, x2.

But how fast is a L1 cache hit?

If the answer is "it depends ..." what would be a rough guess (ballpark) figure?

Cache enabled (obviously). "Flat" memory mapping (ie. virtual address = physical address).

I suppose the answer also depends on the precise hardware being used. And that one should simply write test cases and measure the specific timings one's interested in...

I'm interested in the ARMv8 Raspberry Pi models -- which I don't possess. (I'm using QEMU).

I'd also be interested in any other timings, say, relative to:

ADD x0, xzr, xzr         ; == 1

ADD d0, d1, d2           ; floating-point

LDR x0, [x2]             ; L1 cache hit
LDR x0, [x2]             ; L1 cache miss, L2 cache hit
LDR x0, [x2]             ; L1 cache miss, L2 cache miss

LDP x0, x1, [x2]         ; L1 cache hit
LDP x0, x1, [x2]         ; L1 cache miss, L2 cache hit
LDP x0, x1, [x2]         ; L1 cache miss, L2 cache miss

Basically, what I really want to know is "when is it faster to load a value from memory rather than compute it? (on a Raspberry Pi 4B)"

There's the page Approximate cost to access various caches and main memory? but that refers to Intel chips.

分享到QQ

分享到微博