在这个指针基准的基准中,什么是Chasens
试图从Google中找出以下基准的输出:
https://github.com/google.com/google/multichase
输出就像:
./multiload -s 16 -n 5 -t 16 -m 512M -c chaseload -l stream-sum
Samples , Byte/thd , ChaseThds , ChaseNS , ChaseMibs , ChDeviate , LoadThds , LdMaxMibs , LdAvgMibs , LdDeviate , ChaseArg , MemLdArg
5 , 536870912 , 1 , 212.726 , 36 , 0.017 , 15 , 17427 , 17331 , 0.012 , chaseload , stream-sum
这里是什么chasens,是从512MB数组中访问每16个字节所花费的时间吗?
吗?
Chasemibs是我们在第16个字节访问地址时获得的带宽
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为这是每件事的纳秒。也许每个负载(取消)。 212 ns是 等待缓存失误负载的时间,但是有多个内核的争论可能是合理的吗?
这是一个 pointer-chasing microbenchmark,例如
p = p = p-> next
,因此,您可以通过使每个负载 - 地址取决于上一个负载的负载来测量负载潜伏期结果。因此,希望访问模式不是 ,否则硬件预取的方法会通过在已知负载附加条件之前将下一个已经在本地的L1D高速缓存中加载来打败它。例如,对指针进行一系列指针(例如
struct foo {struct foo *next;};
),每个指向下一个指向下一个指向该512 MIB工作集中的随机订单。我猜Chasethds = 1 / loadthds = 15说我们有1个线程追逐指针,还有15个线程试图饱和记忆带宽?那会做到的。他们不会在启动下一个负载之前等待负载完成,因此每个负载线程都可以实现一些内存级的并行性(进行MEMSET,MEMCPY,MEMCHR或其他任何),我们可以看到它们正在达到17331 MIB/s。
哦,“流式”可能是[i] = b [i] + c [I] =“ nofollow noreferrer”>流基准,也许是那个代码。
(我怀疑它们确实是指mibimibi- bits 。确切地说是使用si binary-prefix mi,但随后使用b(位)而不是b字节。但是17 gib/s是一个现代系统的典型内存带宽号。)
我不知道该基准在构建其数据方面做什么,以及负载线程是否正在读取自己的内存块或相同的指针阵列。我没有查看GitHub页面的内容,单独的名称和结果表明基本知识非常清楚,这是一种记忆延迟基准,以通常的方式完成。
BandWidth博士该基准使用几何均值来计算
chasens
。这通常不是您想要的平均延迟。 min/max/arithmentecentemente + - 标准偏差通常更有意义。以及诸如查看第90个百分位 /第99个百分位数最差的事情,对于探索长尾巴以实时用例很有用。I'd assume it's nanoseconds per something. Perhaps per load (dereference). 212 ns is a long time to wait for a cache-miss load, but with contention from multiple cores it's maybe plausible?
This is a pointer-chasing microbenchmark, like
p = p->next
, so you're measuring load latency by making each load-address dependent on the previous load's result. So hopefully the access pattern is not regular, otherwise hardware prefetching would defeat it, by having the next thing to load already in local L1d cache before the load-address is known.e.g. make an array of pointers to pointers (like
struct foo { struct foo *next; };
) with each one pointing to the next, then shuffle it, so iterating over that linked list touches cache lines in a random order within that 512 MiB working set.I guess ChaseThds = 1 / LoadThds = 15 saying that we have 1 thread chasing pointers, and 15 other threads trying to saturate memory bandwidth? That would do it. They're not waiting for a load to complete before starting the next one, so each of those load threads can achieve some memory-level parallelism (doing memset or memcpy, or memchr or whatever), and we can see they're achieving 17331 MiB/s.
Oh, "stream-sum" is probably A[i] = B[i] + C[i], like the Dr. Bandwidth's STREAM benchmark, perhaps exactly that code.
(I doubt they really mean Mib mibi-bits. It's weird to be precise about using SI binary-prefix Mi, but then use b (bits) instead of B bytes. But 17 GiB/s is a typical memory-bandwidth number for a modern-ish system.)
I don't know exactly what this benchmark does to construct its data, and if the load threads are reading their own block of memory or the same array of pointers. I didn't look at the github page for it, the name alone and the results make the basics pretty clear, that's it's a memory latency benchmark done the usual way.
Dr. Bandwidth commented that this benchmark uses a geometric mean to calculate the
ChaseNS
. This is usually not what you want for average latency. Min/max / arithmetic mean +- standard deviation is typically more meaningful. And stuff like looking at 90th percentile / 99th percentile worst cases is useful to explore the long tail for real-time use-cases.