如何使用A72核心将数据放入L2缓存中?

发布于 2025-01-28 17:14:17 字数 345 浏览 3 评论 0原文

我有一系列看起来像这样的数据:

uint32_t data[128]; //Could be more than L1D Cache size

为了对其进行计算,我想将数据尽可能靠近我的计算单元,以便在L2缓存中。

我的目标使用Linux内核和一些添加的应用程序运行

,我知道我可以通过MMAP获得对内存的某个内存区域的访问,并且在内核之间共享的可用内存的某些部分中,我成功地完成了它。

如何做同样的事情,但是在L2高速缓存区域?

我已经阅读了GCC文档和AARCH64汇编指令集的一部分,但无法找到实现这一目标的方法。

I have an array of data that looks like this :

uint32_t data[128]; //Could be more than L1D Cache size

In order to do computation on it, I want to put the data as close as possible to my computing unit so in the L2 Cache.

My target runs with a linux kernel and some additionnal apps

I know that I can get an access to a certain area of the memory with mmap and I have succesfully done it in some part of my available memory shared between cores.

How to do the same thing but in L2 Cache area ?

I've read part of gcc documentation and AArch64 assembly instruction set but cannot figure out the way to achieve this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

空城旧梦 2025-02-04 17:14:17

如何做同样的事情,但是在L2缓存区域?

您的硬件不支持这一点。

通常,ARMV8体系结构对缓存的内容没有任何保证,也没有提供任何明确操纵或查询它们的方法 - 它仅提供保证并提供用于处理 cooherency 的工具。

具体而言,从 spec

[...] the architecture cannot guarantee whether:

• A memory location present in the cache remains in the cache.
• A memory location not present in the cache is brought into the cache.

Instead, the following principles apply to the behavior of caches:

• The architecture has a concept of an entry locked down in the cache.
  How lockdown is achieved is IMPLEMENTATION DEFINED, and lockdown might
  not be supported by:

  — A particular implementation.
  — Some memory attributes.

• An unlocked entry in a cache might not remain in that cache. The
  architecture does not guarantee that an unlocked cache entry remains in
  the cache or remains incoherent with the rest of memory. Software must
  not assume that an unlocked item that remains in the cache remains dirty.

• A locked entry in a cache is guaranteed to remain in that cache. The
  architecture does not guarantee that a locked cache entry remains
  incoherent with the rest of memory, that is, it might not remain dirty.

[...]

• Any memory location is not guaranteed to remain incoherent with the rest of memory.

因此,基本上您需要缓存锁定。咨询但是,您的CPU

• The Cortex-A72 processor does not support TLB or cache lockdown.

因此您不能故意将某些内容放在缓存中。现在,您可能能够通过尝试观察副作用来告诉是否已经通过试图避免了某些东西。缓存的两个常见副作用是延迟和相干性。因此,您可以尝试和时间访问时间或修改DRAM的内容,并检查您是否看到缓存映射的变化...但这仍然是一个可怕的主意。

一方面,这两个都是破坏性的操作,这意味着它们将通过测量您要测量的属性。对于另一个人来说,仅仅因为您曾经观察到它们曾经并不意味着您可以依靠对此发生。

底线:您无法保证在使用时在任何特定的缓存中持有某些东西。

How to do the same thing but in L2 Cache area ?

Your hardware doesn't support that.

In general, the ARMv8 architecture doesn't make any guarantees about the contents of caches and does not provide any means to explicitly manipulate or query them - it only makes guarantees and provides tools for dealing with coherency.

Specifically, from section D4.4.1 "General behavior of the caches" of the spec:

[...] the architecture cannot guarantee whether:

• A memory location present in the cache remains in the cache.
• A memory location not present in the cache is brought into the cache.

Instead, the following principles apply to the behavior of caches:

• The architecture has a concept of an entry locked down in the cache.
  How lockdown is achieved is IMPLEMENTATION DEFINED, and lockdown might
  not be supported by:

  — A particular implementation.
  — Some memory attributes.

• An unlocked entry in a cache might not remain in that cache. The
  architecture does not guarantee that an unlocked cache entry remains in
  the cache or remains incoherent with the rest of memory. Software must
  not assume that an unlocked item that remains in the cache remains dirty.

• A locked entry in a cache is guaranteed to remain in that cache. The
  architecture does not guarantee that a locked cache entry remains
  incoherent with the rest of memory, that is, it might not remain dirty.

[...]

• Any memory location is not guaranteed to remain incoherent with the rest of memory.

So basically you want cache lockdown. Consulting the manual of your CPU though:

• The Cortex-A72 processor does not support TLB or cache lockdown.

So you can't put something in cache on purpose. Now, you might be able to tell whether something has been cached by trying to observe side effects. The two common side effects of caches are latency and coherency. So you could try and time access times or modify the contents of DRAM and check whether you see that change in your cached mapping... but that's still a terrible idea.
For one, both of these are destructive operations, meaning they will change the property you're measuring, by measuring it. And for another, just because you observe them once does not mean you can rely on that happening.

Bottom line: you cannot guarantee that something is held in any particular cache by the time you use it.

美胚控场 2025-02-04 17:14:17

缓存 - 不是应该存储数据的地方,只是...缓存吗? :)

我的意思是,您的处理器确定它应该缓存的数据以及何处(L1/L2/L3)和逻辑取决于CPU实现。

如果您愿意,您可以尝试通过使用专用说明来预获取数据,然后使用非临床说明来维护您的数据,然后使用专用说明来查找缓存和替换数据的算法并播放此算法(当然没有保证)其他程序。

也许对于现代手臂,我从x86/x64的角度讲话了,但我的全部观点是“您真的确定需要这个吗?”?
CPU足够聪明,可以缓存所需的数据,并且它们逐年做得更好。

我建议您使用任何可能向您显示缓存失误的探测器,以确保比缓存中没有显示数据。
如果没有,那么优化的第一件事就是算法。尝试弄清为什么会有缓存失误 - 也许您应该使用临时变量加载更少的循环数据,甚至可以手动移动循环以控制访问的位置以及所访问的内容。

Cache - is not a place where data should be stored, it's just... cache? :)

I mean, your processor decide which data it should cache and where (L1/L2/L3) and logic depends on CPU implementation.

If you wanted to, you could try to find the algorithm of placing and replacing data in cache and play with this (without guaranties, of course) by using dedicated instructions to prefetch your data and then maintain the cache with non-caching instructions for your other program.

Maybe for modern ARM there are easier ways, I spoke from x86/x64 perspective, but my whole point is "are you really sure that you need this"?
CPU's smart enough to cache the data which they need and they do it better and better year by year.

I'd recommend you to use any profiler that can show you cache misses to be sure than your data is not presented in the cache already.
If it don't, the first thing to optimize - is an algorithm. Try to figure out why there was a cache miss - maybe you should load less data in loop by using temp variables, for example, or even unroll the loop manually to control where and what being accessed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文