测量LLC/L3 CACHE MISS在AMD ZEN2 CPU上

发布于 2025-02-04 21:06:56 字数 1138 浏览 5 评论 0 原文

我有问题与这个

我想(通过编程)测量L3命中(访问)和AMD EPYC 7742 CPU(ZEN2)上的错过。我在Ubuntu Server 20.04.2 LTS上运行Linux内核5.4.0-66代。根据上面链接的问题,事件RFF04(L3LookupState)和R0106(L3COMBCLSTRSTATE)应分别代表L3访问和失误。此外,内核5.4应该支持这些事件。

但是,在用perf进行衡量时,我会遇到问题。类似于上面链接的问题,如果我运行 numActl -c 0 -m 0 perf Stat -e指令,循环,R0106,RFF04 ./benchmark ,我只能测量0个值。如果我尝试使用 numActl -C 0 -M 0 perf Stat -e指令,循环,AMD_L3/R8001/,AMD_L3/R0106/,则对“未知术语”抱怨。如果我使用perf事件名称,即 numActl -C 0 -M 0 perf Stat -e指令,循环,L3_request_g1.caching_l3_cache_accesses,l3_comb_clstr_state.request.request_miss_miss perf uptufs perf offect &lt 对于这些事件。

此外,我实际上想使用perf的C API测量它。当前,我使用类型 perf_type_raw config> config> config 设置为,例如,eg,eg, 0x8001 。如何将 amd_l3 pmu的东西获取到我的 perf_event_attr 对象?否则,它等效于 numActl -C 0 -M 0 Perf Stat -E指令,循环,R0106,RFF04 ./benchmark ,它正在测量未定义的值。

I have question related to this one.

I want to (programatically) measure L3 Hits (Accesses) and Misses on an AMD EPYC 7742 CPU (Zen2). I run Linux Kernel 5.4.0-66-generic on Ubuntu Server 20.04.2 LTS. According to the question linked above, the events rFF04 (L3LookupState) and r0106 (L3CombClstrState) should represent the L3 accesses and misses, respectively. Furthermore, Kernel 5.4 should support these events.

However, when measuring it with perf, I run into issues. Similar to the question linked above, if I run numactl -C 0 -m 0 perf stat -e instructions,cycles,r0106,rFF04 ./benchmark, I only measure 0 values. If I try to use numactl -C 0 -m 0 perf stat -e instructions,cycles,amd_l3/r8001/,amd_l3/r0106/, perf complains about "unknown terms". If I use the perf event names, i.e. numactl -C 0 -m 0 perf stat -e instructions,cycles,l3_request_g1.caching_l3_cache_accesses, l3_comb_clstr_state.request_miss perf outputs <not supported> for these events.

Furthermore, I actually want to measure this using perf's C API. Currently, I dispatch a perf_event_attr with type PERF_TYPE_RAW and config set to, e.g., 0x8001. How do I get the amd_l3 PMU stuff into my perf_event_attr object? Otherwise, it would be equivalent to numactl -C 0 -m 0 perf stat -e instructions,cycles,r0106,rFF04 ./benchmark, which is measuring undefined values.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

沉溺在你眼里的海 2025-02-11 21:06:56

简短答案:尝试 -e RFF0F00000040FF04 参数,该参数显示在您的CPU PPR DOC

详细的

也许我可以帮助您解决第一个问题,这在您的第三段中所说的 。第二段中的第二个说,我不能。对不起。

由于您的CPU为'noreflow noreferrer'>'family 23型号49',然后我将31H'。它说使用 l3event [0xff0f00000040ff04] for'l3 accesses'( 0xff0f0000000040ff04 是64bits,它与 l3 performance> )。另外, man perf-list 还显示AMD使用此格式,其中32 -35'位。尽管在PPR文档中, l3pmcx04 没有太多信息,但该文档在 l3 Performance Event select select select 中具有一些有用的Infos。

我使用了我的cpu ryzen 7 4800H,它是 renoir processor(也是zen2。 “ https://github.com/torvalds/linux/blob/7796916146b8c34cbbef66470abf66470ab8b8b5b28cf47e83/x86/x86/events/events/events/events/amd/core.core.core.core.c.c” 两个,Zen2的配置应该几乎相同。)没有 AMD_L3 支持,在这里我使用 ls_dc_accesses 作为表示,并且 729 的代码在我的CPU系列中-for-amd-family-17H-Model-60h-revision-a1“ rel =” nofollow noreferrer“ 8个相应的位代表umask。它也可以在上面的代码中找到两个(在您的17H_31H家族中31H-b0-Processors.pdf“ rel =” nofollow noreferrer“> ppr doc p182,该数字为 0x430729 ):

$ ls /sys/devices/*/format | grep amd_l
$ perf list | grep ls_dc_accesses -A 1
  ls_dc_accesses
       [Number of accesses to the dcache for load/store references]
$ perf stat -e r729 ls          
...
 Performance counter stats for 'ls':

         1,097,092      r729                       
$ perf stat -e ls_dc_accesses ls
...
 Performance counter stats for 'ls':

           974,666      ls_dc_accesses   

而且不是每个人都有一个EPYC CPU,所以它可能不便利看看您的问题出现在哪里。也许您可以在可能的情况下提供更多有价值的信息。

希望这可以帮助您。

Short answer: Try -e rFF0F00000040FF04 parameter which is shown in your CPU PPR doc.

Detailed:

Maybe I can help you with the first problem which is said in your 3rd paragraph. The second which is said in 4th paragraph, I can't. Sorry.

Since your cpu is 'Family 23 Model 49', then I refered to '17h model 31h' amd PPR doc. It says use L3Event[0xFF0F00000040FF04] for 'L3 Accesses ' (0xFF0F00000040FF04 is 64bits which is same as L3 Performance Event Select width as amd doc shows). Also, the man perf-list also shows AMD uses this format where it has '32-35' bits. Although in the PPR doc, the L3PMCx04 doesn't have much information, the doc has some useful infos located in L3 Performance Event Select.

I used my cpu ryzen 7 4800h which is 17h_60h family Renoir processor (It is also zen2. From these two source code one which lists some encodings for the AMD CPU and two, zen2's config should be almost same.) which don't have amd_l3 support, here I used ls_dc_accesses as the representation and 729 is the code of All DC Accesses in my cpu family amd doc where PMCx029 represents the EventCode and 8 corresponding bits represent UMask. It can be also found in the above code two (in your 17h_31h family PPR doc p182, the number is 0x430729):

$ ls /sys/devices/*/format | grep amd_l
$ perf list | grep ls_dc_accesses -A 1
  ls_dc_accesses
       [Number of accesses to the dcache for load/store references]
$ perf stat -e r729 ls          
...
 Performance counter stats for 'ls':

         1,097,092      r729                       
$ perf stat -e ls_dc_accesses ls
...
 Performance counter stats for 'ls':

           974,666      ls_dc_accesses   

And not everyone has one epyc cpu, so it may be not convenient to see where it goes wrong with your problem. Maybe you can offer more valuable information if possible.

Hope this can help you.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文