使用随机数组索引时代码速度差

发布于 2025-02-03 07:04:05 字数 615 浏览 2 评论 0原文

在[0,1]中给定一个实际数字x,在特定的binning之后,我必须在bin x跌落的内容中识别。给定bin尺寸DX,我正在使用i = std :: size_t(x/dx),效果很好。然后,我查找给定数组V的相应值,并使用double y = V [i]设置第二个变量y。整个代码看起来如下:

double X = func();
dx=0.01;
int i = std::size_t(X/dx);
double Y = v[i];
print(Y)

此方法正确地给出了索引i在[0,length(v)]范围内的预期值。

我的主要问题不是找到索引,而是使用它:x是从辅助函数确定的,并且只要我需要设置y = v [i]使用在代码上方确定的索引变得非常慢。 在不评论或删除任何线路的情况下,当将X设置为定义后的0和1之间的某些随机值时,代码会变得更快,或通过将I设置为第三行之后V的0和V的某个随机值。

谁能告诉为什么会发生这种情况? 1000因子的速度变化(如果不是更多),并且由于更快的方法中只有其他步骤,并且func()被调用,无论如何我都不明白为什么它应该变得更快。

Given a real number X within [0,1], after a specific binning I have to identify in what bin X falls. Given the bin size dx, I am using i = std::size_t(X/dx) , which works very well. I then look for the respective value of a given array v and set a second variable Y using double Y=v[i]. The whole code looks as follows:

double X = func();
dx=0.01;
int i = std::size_t(X/dx);
double Y = v[i];
print(Y)

This method correctly gives the expected value for the index i within the range [0, length(v)].

My main issue is not with finding the index, but using it: X is determined from an auxiliary function, and whenever I need to set Y=v[i] using the index determined above the code becomes extremely slow.
Without commenting or removing any of the lines, the code becomes much faster when setting X to some random value between 0 and 1 right after its definition or by setting i to some random value between 0 and length of v after the third line.

Could anyone be able to tell why this occurs? The speed changes of a factor 1000 if not more, and since there are only additional steps in the faster method and func() is called anyway I can't understand why it should become faster.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

奈何桥上唱咆哮 2025-02-10 07:04:05

由于您没有在问题中提出任何代码,因此必须有这样的狂野猜测:

  • 您在访问查找表之前没有对所有X结果进行排序。处理排序的数组更快。

  • 一些X具有非规范化值,这些值对包括您的某些CPU类型的计算时间造成了损失。

  • 数据集对于L3缓存而言太大,并且它总是访问RAM,而不是在其他测试中看到的快速缓存命中。

  • 编译器正在优化所有昂贵的功能调用,但是在现实世界测试方案中,它不是。

  • 时间测量有错误

  • 计算机的性能不稳定(例如共享服务器或抗病毒干预措施,以RAM带宽为feed)

Since you have put no code in the question, there has to be a wild-guess like this:

  • You didn't sort all the X results before accessing lookup table. Processing a sorted array is faster.

  • Some of X had denormalized values which took a toll on computation time for certain CPU types including yours.

  • The dataset is too big for the L3 cache and it accessed RAM always, instead of quick cache hits that were seen in the other test.

  • Compiler was optimizing all of the expensive function calls out, but in real-world test scenario, it is not.

  • Time measurement has bugs

  • Computer is not stable in performance (like being a shared server or an antivirus intervention feeding on RAM bandwidth)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文