何时比简单重复分配快的速度?
假设一个人想将阵列的副本声明为
data_type src [n];
是memcpy
始终比以下代码段一样快或更快,无论如何data_type
和数组元素的数量是?
DATA_TYPE dest[N];
for (int i=0; i<N; i++)
dest[i] = src[i];
对于char
和大型n
的小型类型,我们可以确定memcpy
更快(除非编译器用调用呼叫来代替loop <代码> memcpy )。但是,如果类型更大,例如double
和/或数组元素的数量很小?
当我复制许多double
的数组时,我想到了这个问题。
我在评论中提到的另一个问题的答案中没有找到我的问题的答案。该问题中公认的答案本质上说“将其留给编译器决定”。那不是我想要的答案。编译器可以通过选择一种替代方案来优化内存复制的事实不是答案。为什么以及何时更快地替代?也许编译器知道答案,但是包括编译器开发人员在内的开发人员不知道!
Assume that one wants to make a copy of an array declared as
DATA_TYPE src[N];
Is memcpy
always as fast as or faster than the following code snippet, regardless of what DATA_TYPE
and the number of elements of the array are?
DATA_TYPE dest[N];
for (int i=0; i<N; i++)
dest[i] = src[i];
For a small type like char
and large N
we can be sure that memcpy
is faster (unless the compiler replaces the loop with a call to memcpy
). But what if the type is larger, like double
, and/or the number of array elements is small?
This question came to my mind when copying many arrays of double
s each with 3 elements.
I didn't find an answer to my question in the answer to the other question mentioned by wohlstad in the comments. The accepted answer in that question essentially says "leave it for the compiler to decide." That's not the sort of answer I'm looking for. The fact that a compiler can optimize memory copying by choosing one alternative is not an answer. Why and when is one alternative faster? Maybe compilers know the answer, but developers, including compiler developers, don't know!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于
memcpy
是库函数,因此它完全取决于库实现的实际效率,并且没有明确的答案。也就是说,任何提供的标准库都可能经过高度优化,甚至可能使用特定于硬件的功能,例如DMA传输。尽管您的代码循环性能会根据优化设置而有所不同,因此在不优化的调试构建中可能会表现差得多。
另一个考虑因素是,
memcpy()
的性能将独立于数据类型,并且通常确定性,而您的循环性能可能会根据data_type而变化。代码>,甚至
n
的值。通常,我期望
memcpy()
最佳,更快或更快地,就像分配循环一样快,当然可以更加一致和确定性,独立于特定的编译器设置,甚至是使用的编译器。最后,唯一的说明方法是为您的特定平台,工具链,库和构建选项以及各种数据类型进行测量。最终,由于您必须为每种用法组合衡量知道的,如果它更快,我建议它通常是浪费时间,并且仅仅是学术兴趣 - 不仅是使用图书馆 - 不仅是为了性能和一致性,也出于清晰度和可维护性。
Since
memcpy
is a library function, it is entirely dependent on the library implementation how efficient it actually is and no definitive answer is possible.That said, any provided standard library is likely to be highly optimised and may even use hardware specific features such as DMA transfer. Whereas your code loop performance will vary depending on the optimisation settings, so is likely to perform much worse in unoptimised debug builds.
Another consideration is that the performance of
memcpy()
will be independent of data type and generally deterministic, whereas your loop performance is likely to vary depending onDATA_TYPE
, or even the value ofN
.Generally, I would expect
memcpy()
to be optimal and faster or as fast as an assignment loop, and certainly more consistent and deterministic, being independent of specific compiler settings, and even the compiler used.In the end, the only way to tell is to measure it for your specific platform, toolchain, library and build options, and also for various data types. Ultimately since you would have to measure it for every usage combination to know if it were faster, I suggest that it is generally a waste of time, and of academic interest only - use the library - not only for performance and consistency, but also for clarity and maintainability.