创建用于基准测试的随机向量的最快方法
所以,我只是尝试用 C++ 实现一些排序算法,但我发现目前对它们进行基准测试很烦人,因为不运行算法而是创建输入数据所需的时间很长。我目前测试每个输入长度(1000、2000、...) 10 次,以获得稍微平均的时间。对于这 10 次中的每一次,我都会创建一个长度正确的新随机向量:
// Each of the 10 times.
for(int j = 0; j < 10; j++) {
A.clear();
// 'i' is the current input size.
for(int k = 0; k < i; k++) {
A.push_back(rand() % 10000);
}
// Other stuff
}
是否有更好的方法来做到这一点?我是否应该费心将 rand() 限制在 10000,或者这只是我的强迫症大脑喜欢整数? (即,当您考虑到目前 10 次循环中的每个循环最多执行 10,000 次时,模运算实际上是否会花费大量时间。)或者,我是否应该在每次运行时创建一个新向量种类?我一直这样做是因为我觉得创建的向量可能会存在偏差,因此如果生成该向量然后使用 10 次,答案可能会非常错误......
So, I'm just playing around implementing some sorting algorithms in C++, but I'm finding it irritating to benchmark them at the moment, due to the length of time it takes to not run the algorithm, but to create the input data. I currently test each length of input (1000, 2000, ...) 10 times, to gain a somewhat averaged time. For each of these 10 times, I create a new random vector
of the correct length, by doing:
// Each of the 10 times.
for(int j = 0; j < 10; j++) {
A.clear();
// 'i' is the current input size.
for(int k = 0; k < i; k++) {
A.push_back(rand() % 10000);
}
// Other stuff
}
Is there a better way to do this? Should I be bothering to cap the rand() at 10000, or is that just my OCD brain liking round numbers? (I.e., could that modulo operation actually be taking a serious amount of time when you consider it's performed up to - at the moment - 10,000 for each loop of the 10.) Alternatively, should I really create a new vector each time I run the sort? I've been doing so because I felt that it's possible that a created vector might be biased, and so if that one was generated and then used 10 times the answer might be quite off...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的,您可能需要在这里做一些事情来帮助加快速度。
如前所述,在 std::vector 中保留空间,然后将值分配给已知元素,速度更快。此外,使用非优化编译器时,预递增( ++var 而不是 var++ )速度更快。只是为了保持代码快速,无论谁构建它,您可能需要考虑从现在开始这样做。就内存而言,您可能会发现它微不足道,但是当我使用无符号的已知大小并且不是不合理地大时,我在 for 循环中使用无符号短整型。
然而,关于模数。如果您不需要它,您可能不想使用它。根据向量中保存的数据类型,如果结果超出该类型的最大存储容量,则结果应该换行。
我不知道它是否会消耗更多具有变量包装的处理能力,如果是的话,我仍然不确定它是否是一个比执行模数更便宜的操作。在使用 rand 之前,可能需要运行一些已知大小的速度测试。
编辑
需要注意的非常小的变化:循环仅进行 10 次,因此您最好使用 unsigned char,而不是短字符。在 Win32 上至少需要一半的内存。
Yup, there are a few things you might want to do here to help speed things up.
As mentioned before, reserving space in the std::vector and then assigning values to the known elements, is faster. Also, pre incrementing ( ++var instead of var++ ) is faster when using non optimized compilers. Just for the sake of keeping your code fast, no matter who builds it, you might want to consider doing that from now on. As far as memory goes, you may find it trivial, but when I use known sizes that are unsigned, and not unreasonably large, I use unsigned short's for my for loops.
About the modulo, however. You might want to not use it, if you don't need it. Depending on the data type held in the vector, your results should wrap if they go above the maximum storage capacity of the type.
I don't know off hand if it eats up more processing power having variables wrap, and if it does, I'm still not sure if its a less expensive operation then preforming modulo. Might want to run some speed tests with known sizes before going with rand.
Edit
Very small change to note: The loop is going only 10 times, so you might as well use an unsigned char, rather than a short. On Win32 at the very least, it takes half the memory.
来自 cplusplus.com 的引用 (http://www.cplusplus.com/reference/stl/ vector/),它提供了一个非常有用的提示:
“就性能而言,重新分配可能是一项成本高昂的操作,因为它们通常涉及将向量使用的整个存储空间复制到新位置。因此,每当计划大幅增加向量的大小,建议使用成员函数
vector::reserve
显式指示向量的容量。”使用
vector::reserve
几乎肯定会提高您的情况的性能。编辑:您可以尝试使用
random_shuffle
(http://www. cplusplus.com/reference/algorithm/random_shuffle/)在向量创建后对其进行洗牌(显然,random_shuffle
与元素数量呈线性关系)。A quote from cplusplus.com (http://www.cplusplus.com/reference/stl/vector/), which offers a very useful hint:
"Reallocations may be a costly operation in terms of performance, since they generally involve the entire storage space used by the vector to be copied to a new location. Therefore, whenever large increases in size are planned for a vector, it is recommended to explicitly indicate a capacity for the vector using member function
vector::reserve
."Using
vector::reserve
will almost certainly give a performance increase in your case.EDIT: You could try using
random_shuffle
(http://www.cplusplus.com/reference/algorithm/random_shuffle/) to shuffle your vector once it's been created (apparently,random_shuffle
is linear in the number of elements).我创建了一个,你看一下:
I create one, take a look: