当 memcpy() 比 memmove() 更快时,什么是真正重要的情况?
memcpy() 之间的主要区别
和 memmove() 的优点是,当源和目标重叠时,memmove() 可以正常工作。当缓冲区肯定不重叠时 memcpy()更可取,因为它可能更快。
让我烦恼的是这个潜在。这是一个微优化,还是有真正重要的例子,当 memcpy()
更快时,我们确实需要使用 memcpy()
而不是坚持使用 memmove()到处都是?
The key difference between memcpy()
and memmove()
is that memmove()
will work fine when source and destination overlap. When buffers surely don't overlap memcpy() is preferable since it's potentially faster.
What bothers me is this potentially. Is it a microoptimization or are there real significant examples when memcpy()
is faster so that we really need to use memcpy()
and not stick to memmove()
everywhere?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
如果编译器无法推断出重叠是不可能的,那么至少有一个隐式分支可以向前或向后复制 memmove() 。这意味着,如果无法针对
memcpy()
进行优化,memmove()
至少会慢一个分支,并且内联指令占用的任何额外空间都需要处理每种情况(如果可以内联)。阅读
eglibc-2.11.1
代码://www.eglibc.org/cgi-bin/viewcvs.cgi/trunk/libc/string/memcpy.c?rev=77&view=markup" rel="noreferrer">memcpy()
和memmove()
证实了这一点。此外,在向后复制期间不可能进行页面复制,只有在没有重叠机会的情况下才能显着加快速度。总之,这意味着:如果您可以保证区域不重叠,那么选择
memcpy()
而不是memmove()
可以避免分支。如果源和目标包含相应的页面对齐和页面大小区域,并且不重叠,则某些架构可以为这些区域使用硬件加速副本,无论您调用的是memmove()
还是memcpy()。
Update0
除了我上面列出的假设和观察结果之外,实际上还有一个差异。从 C99 开始,这两个函数存在以下原型:
由于能够假设 2 个指针
s1
和s2
不指向重叠内存,因此可以直接使用 C 实现memcpy
能够利用这一点生成更高效的代码,而无需求助于汇编程序,请参阅 此处了解更多信息。我确信memmove
可以做到这一点,但是除了我在eglibc
中看到的检查之外,还需要进行额外的检查,这意味着性能成本可能略高于单个分支这些函数的 C 实现。There's at least an implicit branch to copy either forwards or backwards for
memmove()
if the compiler is not able to deduce that an overlap is not possible. This means that without the ability to optimize in favor ofmemcpy()
,memmove()
is at least slower by one branch, and any additional space occupied by inlined instructions to handle each case (if inlining is possible).Reading the
eglibc-2.11.1
code for bothmemcpy()
andmemmove()
confirms this as suspected. Furthermore, there's no possibility of page copying during backward copying, a significant speedup only available if there's no chance for overlapping.In summary this means: If you can guarantee the regions are not overlapped, then selecting
memcpy()
overmemmove()
avoids a branch. If the source and destination contain corresponding page aligned and page sized regions, and don't overlap, some architectures can employ hardware accelerated copies for those regions, regardless of whether you calledmemmove()
ormemcpy()
.Update0
There is actually one more difference beyond the assumptions and observations I've listed above. As of C99, the following prototypes exist for the 2 functions:
Due to the ability to assume the 2 pointers
s1
ands2
do not point at overlapping memory, straightforward C implementations ofmemcpy
are able to leverage this to generate more efficient code without resorting to assembler, see here for more. I'm sure thatmemmove
can do this, however additional checks would be required above those I saw present ineglibc
, meaning the performance cost may be slightly more than a single branch for C implementations of these functions.最好的情况是,调用
memcpy
而不是memmove
将保存指针比较和条件分支。对于大型副本来说,这完全是微不足道的。如果您要制作许多小副本,那么可能值得衡量差异;这是你判断它是否重要的唯一方法。这绝对是一个微优化,但这并不意味着当您可以轻松证明它是安全的时,您不应该使用memcpy。过早的悲观主义是万恶之源。
At best, calling
memcpy
rather thanmemmove
will save a pointer comparison and a conditional branch. For a large copy, this is completely insignificant. If you are doing many small copies, then it might be worth measuring the difference; that is the only way you can tell whether it's significant or not.It is definitely a microoptimisation, but that doesn't mean you shouldn't use
memcpy
when you can easily prove that it is safe. Premature pessimisation is the root of much evil.嗯,当源和目标重叠时,
memmove
必须向后复制,并且源位于目标之前。因此,当源位于目标之前时,memmove
的某些实现只是向后复制,而不考虑两个区域是否重叠。memmove
的高质量实现可以检测区域是否重叠,并在不重叠时进行前向复制。在这种情况下,与 memcpy 相比,唯一的额外开销就是重叠检查。Well,
memmove
has to copy backwards when the source and destination overlap, and the source is before the destination. So, some implementations ofmemmove
simply copy backwards when the source is before the destination, without regard for whether the two regions overlap.A quality implementation of
memmove
can detect whether the regions overlap, and do a forward-copy when they don't. In such a case, the only extra overhead compared tomemcpy
is simply the overlap checks.简单来说,
memmove
需要测试重叠,然后执行适当的操作;使用memcpy,可以断言不存在重叠,因此不需要额外的测试。话虽如此,我见过具有完全相同的
memcpy
和memmove
代码的平台。Simplistically,
memmove
needs to test for overlap and then do the appropriate thing; withmemcpy
, one asserts that there is not overlap so no need for additional tests.Having said that, I have seen platforms that have exactly the same code for
memcpy
andmemmove
.当然有可能
memcpy
只是对memmove
的调用,在这种情况下,使用memcpy
没有任何好处。在另一个极端,实现者可能假设很少使用 memmove,并使用 C 中最简单的一次字节循环来实现它,在这种情况下,它可能是十次比优化的memcpy
慢。正如其他人所说,最可能的情况是 memmove 在检测到可以进行正向复制时使用 memcpy,但某些实现可能只是简单地比较源地址和目标地址而不查看为重叠。话虽如此,我建议永远不要使用
memmove
,除非您要在单个缓冲区内移动数据。它可能不会慢,但话又说回来,它可能会慢,所以当您知道不需要memmove
时为什么还要冒险呢?It's certainly possible that
memcpy
is merely a call tomemmove
, in which case there'd be no benefit to usingmemcpy
. On the other extreme, it's possible that an implementor assumedmemmove
would rarely be used, and implemented it with the simplest possible byte-at-a-time loops in C, in which case it could be ten times slower than an optimizedmemcpy
. As others have said, the likeliest case is thatmemmove
usesmemcpy
when it detects that a forward copy is possible, but some implementations may simply compare the source and destination addresses without looking for overlap.With that said, I would recommend never using
memmove
unless you're shifting data within a single buffer. It might not be slower, but then again, it might be, so why risk it when you know there's no need formemmove
?只需简化并始终使用
memmove
。始终正确的函数比只有一半正确的函数要好。Just simplify and always use
memmove
. A function that's right all the time is better than a function that's only right half the time.在大多数实现中,在定义了两者行为的任何场景中,memmove() 函数调用的成本完全有可能不会显着大于 memcpy()。不过,还有两点尚未提及:
<前> mov esi,[src]
mov edi,[目的地]
mov ecx,1234/4;编译器可能会注意到它是一个常量
CLD
代表 movsl
这将需要相同数量的内联代码,但运行速度比:
<前>推[源]
推[目的地]
推送双字 1234
调用_memcpy
...
_memcpy:
推送ebp
移动 ebp,esp
mov ecx,[ebp+numbytes]
测试 ecx,3 ;看看是否是四的倍数
jz 四倍数
四倍数:
推ESI;无法知道调用者是否需要保留该值
推送电子数据交换;无法知道调用者是否需要保留该值
mov esi,[ebp+src]
mov edi,[ebp+dest]
代表 movsl
流行电子数据编辑
流行ESI
雷特
相当多的编译器会使用 memcpy() 执行此类优化。我不知道有什么可以用 memmove 来做到这一点,尽管在某些情况下 memcpy 的优化版本可能提供与 memmove 相同的语义。例如,如果 numbytes 为 20:
即使地址范围重叠,这也能正常工作,因为它在写入任何内容之前有效地复制了要移动的整个区域(在寄存器中)。理论上,编译器可以通过查看将 memmove() 作为 memcpy() 执行是否会产生即使地址范围重叠也是安全的实现来处理 memmove(),并在无法替换 memcpy() 实现的情况下调用 _memmove安全的。不过,我不知道有谁做过这样的优化。
It is entirely possible that in most implementations, the cost of a memmove() function call will not be significantly greater than memcpy() in any scenario in which the behavior of both is defined. There are two points not yet mentioned, though:
This would take the same amount of in-line code, but run much faster than:
Quite a number of compilers will perform such optimizations with memcpy(). I don't know of any that will do it with memmove, although in some cases an optimized version of memcpy may offer the same semantics as memmove. For example, if numbytes was 20:
This will work correctly even if the address ranges overlap, since it effectively makes a copy (in registers) of the entire region to be moved before any of it is written. In theory, a compiler could process memmove() by seeing if treading it as memcpy() would yield an implementation that would be safe even if the address ranges overlap, and call _memmove in those cases where substituting the memcpy() implementation would not be safe. I don't know of any that do such optimization, though.