C++ memcpy函数，为什么要用*s++而不是 s[i]

发布于 2025-01-06 09:10:29 字数 635 浏览 6 评论 0原文

我正在从头开始编写 memcpy，并且一直在查找其他人的实现...我的实现是：

void* memcpy (void *destination, const void *source, size_t num)
{
    char *D = (char*)destination;
    char *S = (char*)source;
    for(int i = 0; i < num; i++)
            D[i] = S[i];
    return D;
}

我研究过的各种其他来源和参考文献让

void* memcpy (void *destination, const void *source, size_t num)
{
    char *D = (char*)destination;
    char *S = (char*)source;
    for(int i = 0; i < num; i++) 
    {
            *D = *S;
            D++;
            S++;
    }
    return D;
}

我无法理解其中的差异以及它们是否会产生不同的输出。让我特别困惑的部分是 D++；和S++；

原文

I am writing memcpy from scratch and I have been looking up other peoples implementations...My implementation is:

void* memcpy (void *destination, const void *source, size_t num)
{
    char *D = (char*)destination;
    char *S = (char*)source;
    for(int i = 0; i < num; i++)
            D[i] = S[i];
    return D;
}

various other sources and references that I have researched have

void* memcpy (void *destination, const void *source, size_t num)
{
    char *D = (char*)destination;
    char *S = (char*)source;
    for(int i = 0; i < num; i++) 
    {
            *D = *S;
            D++;
            S++;
    }
    return D;
}

I am having trouble understanding the difference and whether they would produce different outputs. The portion that confuses me specifically is the D++; and S++;

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

许你一世情深 2025-01-13 09:10:30

在编写此代码时，避免向指针添加索引可能会更快。

在我自己对 x86 架构的测试中，该架构在低级指令中内置了索引模式，索引方法稍微快一些。

回复收藏 0 原文

牛↙奶布丁 2025-01-13 09:10:30

算法是等效的。第二个版本使用指针算术将指针前进到下一个位置而不是使用 a[i] 语法对数组进行索引。

这是有效的，因为 a[i] 实际上是 *(a+i) 的简写（阅读：将 i 位置提前到 a 并读取该位置的值）。它不是在每次迭代时执行总偏移量 (+i)，而是在每次迭代时执行部分偏移量 (++a) 并累加结果。

回复收藏 0 原文

冷月断魂刀 2025-01-13 09:10:30

让您感到困惑的是指针算术：D++、S++。指针递增以引用下一个char（因为它们是char*）

回复收藏 0 原文

墨落成白 2025-01-13 09:10:30

虽然两者在语义上含义相同，但 *s++ 版本将避免在增量复制数组字节时必须偏移初始指针值。换句话说，s[i] 的“底层”表示实际上是 *(s + i*sizeof(type)) 和乘法，尤其是 大值的乘法code>i 比简单的小值增量慢得多，至少取决于机器架构。

但最终，由于使用了与机器相关的优化内存复制程序集，memcpy 的 libc 实现将比您用 C 手写的任何内容快得多您无法通过普通 C 代码有意访问的指令。

回复收藏 0 原文

对风讲故事 2025-01-13 09:10:30

以下是由 GCC 编译的内部循环。我刚刚添加了 restrict 关键字并删除了返回值，并为 Core 2 编译了 32 位：

第一个，数组版本：

.L3:
  movzbl  (%edi,%edx), %ecx
  addl    $1, %eax  
  cmpl    %ebx, %eax
  movb    %cl, (%esi,%edx)
  movl    %eax, %edx
  jne .L3

第二个，增量版本：

.L9:
  movzbl  (%edx), %ebx
  addl    $1, %ecx   
  addl    $1, %edx   
  movb    %bl, (%eax)
  addl    $1, %eax  
  cmpl    %ecx, %esi
  ja  .L9

如您所见，编译器正确地理解了这两种结构。

Here are the inner loops, as compiled by GCC. I just added restrict keywords and removed the return value and compiled 32 bit for Core 2:

First one, array version:

.L3:
  movzbl  (%edi,%edx), %ecx
  addl    $1, %eax  
  cmpl    %ebx, %eax
  movb    %cl, (%esi,%edx)
  movl    %eax, %edx
  jne .L3

Second one, increment version:

.L9:
  movzbl  (%edx), %ebx
  addl    $1, %ecx   
  addl    $1, %edx   
  movb    %bl, (%eax)
  addl    $1, %eax  
  cmpl    %ecx, %esi
  ja  .L9

The compiler, as you can see, saw right through both constructs.

回复收藏 0 原文

长梦不多时 2025-01-13 09:10:30

虽然这两种方式都没什么区别，但看到其中任何一种我都会感到有点惊讶。我期望更接近：

void* memcpy (void *destination, const void *source, size_t num) {
    char *S = (char *)source;
    char *D = (char *)destination;
    while (--num)
        *D++ = *S++;
    return destination;
}

无论如何，大多数像样的编译器都会生成大约相同的代码。我最近没有检查过，但大多数针对 x86 的编译器曾经会将大部分循环转换为单个 rep movsd 指令。但它们可能不再存在——这不再是最佳选择。

Though it makes little difference either way, I'd be a bit surprised to see either of those. I'd expect something closer to:

void* memcpy (void *destination, const void *source, size_t num) {
    char *S = (char *)source;
    char *D = (char *)destination;
    while (--num)
        *D++ = *S++;
    return destination;
}

Most decent compilers will produce about the same code regardless. I haven't checked recently, but at one time most compilers targeting x86 would turn most of the loop into a single rep movsd instruction. They might not any more though -- that's no longer optimal.

回复收藏 0 原文