何时比简单重复分配快的速度?

发布于 2025-01-23 13:54:25 字数 604 浏览 2 评论 0原文

假设一个人想将阵列的副本声明为

data_type src [n];

memcpy始终比以下代码段一样快或更快,无论如何data_type和数组元素的数量是?

DATA_TYPE dest[N];

for (int i=0; i<N; i++)
    dest[i] = src[i];

对于char和大型n的小型类型,我们可以确定memcpy更快(除非编译器用调用呼叫来代替loop <代码> memcpy )。但是,如果类型更大,例如double和/或数组元素的数量很小?

当我复制许多double的数组时,我想到了这个问题。

我在评论中提到的另一个问题的答案中没有找到我的问题的答案。该问题中公认的答案本质上说“将其留给编译器决定”。那不是我想要的答案。编译器可以通过选择一种替代方案来优化内存复制的事实不是答案。为什么以及何时更快地替代?也许编译器知道答案,但是包括编译器开发人员在内的开发人员不知道!

Assume that one wants to make a copy of an array declared as

DATA_TYPE src[N];

Is memcpy always as fast as or faster than the following code snippet, regardless of what DATA_TYPE and the number of elements of the array are?

DATA_TYPE dest[N];

for (int i=0; i<N; i++)
    dest[i] = src[i];

For a small type like char and large N we can be sure that memcpy is faster (unless the compiler replaces the loop with a call to memcpy). But what if the type is larger, like double, and/or the number of array elements is small?

This question came to my mind when copying many arrays of doubles each with 3 elements.

I didn't find an answer to my question in the answer to the other question mentioned by wohlstad in the comments. The accepted answer in that question essentially says "leave it for the compiler to decide." That's not the sort of answer I'm looking for. The fact that a compiler can optimize memory copying by choosing one alternative is not an answer. Why and when is one alternative faster? Maybe compilers know the answer, but developers, including compiler developers, don't know!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

彻夜缠绵 2025-01-30 13:54:25

由于memcpy是库函数,因此它完全取决于库实现的实际效率,并且没有明确的答案。

也就是说,任何提供的标准库都可能经过高度优化,甚至可能使用特定于硬件的功能,例如DMA传输。尽管您的代码循环性能会根据优化设置而有所不同,因此在不优化的调试构建中可能会表现差得多。

另一个考虑因素是,memcpy()的性能将独立于数据类型,并且通常确定性,而您的循环性能可能会根据data_type而变化。代码>,甚至n的值。

通常,我期望memcpy()最佳,更快或更快地,就像分配循环一样快,当然可以更加一致和确定性,独立于特定的编译器设置,甚至是使用的编译器。

最后,唯一的说明方法是为您的特定平台,工具链,库和构建选项以及各种数据类型进行测量。最终,由于您必须为每种用法组合衡量知道的,如果它更快,我建议它通常是浪费时间,并且仅仅是学术兴趣 - 不仅是使用图书馆 - 不仅是为了性能和一致性,也出于清晰度和可维护性。

Since memcpy is a library function, it is entirely dependent on the library implementation how efficient it actually is and no definitive answer is possible.

That said, any provided standard library is likely to be highly optimised and may even use hardware specific features such as DMA transfer. Whereas your code loop performance will vary depending on the optimisation settings, so is likely to perform much worse in unoptimised debug builds.

Another consideration is that the performance of memcpy() will be independent of data type and generally deterministic, whereas your loop performance is likely to vary depending on DATA_TYPE, or even the value of N.

Generally, I would expect memcpy() to be optimal and faster or as fast as an assignment loop, and certainly more consistent and deterministic, being independent of specific compiler settings, and even the compiler used.

In the end, the only way to tell is to measure it for your specific platform, toolchain, library and build options, and also for various data types. Ultimately since you would have to measure it for every usage combination to know if it were faster, I suggest that it is generally a waste of time, and of academic interest only - use the library - not only for performance and consistency, but also for clarity and maintainability.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文