使用随机数组索引时代码速度差
在[0,1]中给定一个实际数字x,在特定的binning之后,我必须在bin x跌落的内容中识别。给定bin尺寸DX,我正在使用i = std :: size_t(x/dx)
,效果很好。然后,我查找给定数组V的相应值,并使用double y = V [i]
设置第二个变量y。整个代码看起来如下:
double X = func();
dx=0.01;
int i = std::size_t(X/dx);
double Y = v[i];
print(Y)
此方法正确地给出了索引i在[0,length(v)]范围内的预期值。
我的主要问题不是找到索引,而是使用它:x
是从辅助函数确定的,并且只要我需要设置y = v [i]
使用在代码上方确定的索引变得非常慢。 在不评论或删除任何线路的情况下,当将X设置为定义后的0和1之间的某些随机值时,代码会变得更快,或通过将I设置为第三行之后V的0和V的某个随机值。
谁能告诉为什么会发生这种情况? 1000因子的速度变化(如果不是更多),并且由于更快的方法中只有其他步骤,并且func()
被调用,无论如何我都不明白为什么它应该变得更快。
Given a real number X within [0,1], after a specific binning I have to identify in what bin X falls. Given the bin size dx, I am using i = std::size_t(X/dx)
, which works very well. I then look for the respective value of a given array v and set a second variable Y using double Y=v[i]
. The whole code looks as follows:
double X = func();
dx=0.01;
int i = std::size_t(X/dx);
double Y = v[i];
print(Y)
This method correctly gives the expected value for the index i within the range [0, length(v)].
My main issue is not with finding the index, but using it: X
is determined from an auxiliary function, and whenever I need to set Y=v[i]
using the index determined above the code becomes extremely slow.
Without commenting or removing any of the lines, the code becomes much faster when setting X to some random value between 0 and 1 right after its definition or by setting i to some random value between 0 and length of v after the third line.
Could anyone be able to tell why this occurs? The speed changes of a factor 1000 if not more, and since there are only additional steps in the faster method and func()
is called anyway I can't understand why it should become faster.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于您没有在问题中提出任何代码,因此必须有这样的狂野猜测:
您在访问查找表之前没有对所有X结果进行排序。处理排序的数组更快。
一些X具有非规范化值,这些值对包括您的某些CPU类型的计算时间造成了损失。
数据集对于L3缓存而言太大,并且它总是访问RAM,而不是在其他测试中看到的快速缓存命中。
编译器正在优化所有昂贵的功能调用,但是在现实世界测试方案中,它不是。
时间测量有错误
计算机的性能不稳定(例如共享服务器或抗病毒干预措施,以RAM带宽为feed)
Since you have put no code in the question, there has to be a wild-guess like this:
You didn't sort all the X results before accessing lookup table. Processing a sorted array is faster.
Some of X had denormalized values which took a toll on computation time for certain CPU types including yours.
The dataset is too big for the L3 cache and it accessed RAM always, instead of quick cache hits that were seen in the other test.
Compiler was optimizing all of the expensive function calls out, but in real-world test scenario, it is not.
Time measurement has bugs
Computer is not stable in performance (like being a shared server or an antivirus intervention feeding on RAM bandwidth)