如何使用A72核心将数据放入L2缓存中?
我有一系列看起来像这样的数据:
uint32_t data[128]; //Could be more than L1D Cache size
为了对其进行计算,我想将数据尽可能靠近我的计算单元,以便在L2缓存中。
我的目标使用Linux内核和一些添加的应用程序运行
,我知道我可以通过MMAP获得对内存的某个内存区域的访问,并且在内核之间共享的可用内存的某些部分中,我成功地完成了它。
如何做同样的事情,但是在L2高速缓存区域?
我已经阅读了GCC文档和AARCH64汇编指令集的一部分,但无法找到实现这一目标的方法。
I have an array of data that looks like this :
uint32_t data[128]; //Could be more than L1D Cache size
In order to do computation on it, I want to put the data as close as possible to my computing unit so in the L2 Cache.
My target runs with a linux kernel and some additionnal apps
I know that I can get an access to a certain area of the memory with mmap and I have succesfully done it in some part of my available memory shared between cores.
How to do the same thing but in L2 Cache area ?
I've read part of gcc documentation and AArch64 assembly instruction set but cannot figure out the way to achieve this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的硬件不支持这一点。
通常,ARMV8体系结构对缓存的内容没有任何保证,也没有提供任何明确操纵或查询它们的方法 - 它仅提供保证并提供用于处理 cooherency 的工具。
具体而言,从 spec :
因此,基本上您需要缓存锁定。咨询但是,您的CPU :
因此您不能故意将某些内容放在缓存中。现在,您可能能够通过尝试观察副作用来告诉是否已经通过试图避免了某些东西。缓存的两个常见副作用是延迟和相干性。因此,您可以尝试和时间访问时间或修改DRAM的内容,并检查您是否看到缓存映射的变化...但这仍然是一个可怕的主意。
一方面,这两个都是破坏性的操作,这意味着它们将通过测量您要测量的属性。对于另一个人来说,仅仅因为您曾经观察到它们曾经并不意味着您可以依靠对此发生。
底线:您无法保证在使用时在任何特定的缓存中持有某些东西。
Your hardware doesn't support that.
In general, the ARMv8 architecture doesn't make any guarantees about the contents of caches and does not provide any means to explicitly manipulate or query them - it only makes guarantees and provides tools for dealing with coherency.
Specifically, from section D4.4.1 "General behavior of the caches" of the spec:
So basically you want cache lockdown. Consulting the manual of your CPU though:
So you can't put something in cache on purpose. Now, you might be able to tell whether something has been cached by trying to observe side effects. The two common side effects of caches are latency and coherency. So you could try and time access times or modify the contents of DRAM and check whether you see that change in your cached mapping... but that's still a terrible idea.
For one, both of these are destructive operations, meaning they will change the property you're measuring, by measuring it. And for another, just because you observe them once does not mean you can rely on that happening.
Bottom line: you cannot guarantee that something is held in any particular cache by the time you use it.
缓存 - 不是应该存储数据的地方,只是...缓存吗? :)
我的意思是,您的处理器确定它应该缓存的数据以及何处(L1/L2/L3)和逻辑取决于CPU实现。
如果您愿意,您可以尝试通过使用专用说明来预获取数据,然后使用非临床说明来维护您的数据,然后使用专用说明来查找缓存和替换数据的算法并播放此算法(当然没有保证)其他程序。
也许对于现代手臂,我从x86/x64的角度讲话了,但我的全部观点是“您真的确定需要这个吗?”?
CPU足够聪明,可以缓存所需的数据,并且它们逐年做得更好。
我建议您使用任何可能向您显示缓存失误的探测器,以确保比缓存中没有显示数据。
如果没有,那么优化的第一件事就是算法。尝试弄清为什么会有缓存失误 - 也许您应该使用临时变量加载更少的循环数据,甚至可以手动移动循环以控制访问的位置以及所访问的内容。
Cache - is not a place where data should be stored, it's just... cache? :)
I mean, your processor decide which data it should cache and where (L1/L2/L3) and logic depends on CPU implementation.
If you wanted to, you could try to find the algorithm of placing and replacing data in cache and play with this (without guaranties, of course) by using dedicated instructions to prefetch your data and then maintain the cache with non-caching instructions for your other program.
Maybe for modern ARM there are easier ways, I spoke from x86/x64 perspective, but my whole point is "are you really sure that you need this"?
CPU's smart enough to cache the data which they need and they do it better and better year by year.
I'd recommend you to use any profiler that can show you cache misses to be sure than your data is not presented in the cache already.
If it don't, the first thing to optimize - is an algorithm. Try to figure out why there was a cache miss - maybe you should load less data in loop by using temp variables, for example, or even unroll the loop manually to control where and what being accessed.