使用 movsd 让编译器复制字符

发布于 2024-07-29 02:36:36 字数 440 浏览 7 评论 0原文

我想在时间关键的函数中复制相对较短的内存序列（小于 1 KB，通常为 2-200 字节）。 CPU 端的最佳代码似乎是 rep movsd。但是我不知何故无法让我的编译器生成此代码。我希望（我隐约记得看到过）使用 memcpy 可以使用编译器内置的内在函数来完成此操作，但基于反汇编和调试，编译器似乎正在使用对 memcpy/memmove 库实现的调用来代替。我还希望编译器足够聪明，能够识别以下循环并单独使用rep movsd，但似乎没有。

char *dst;
const char *src;
// ...
for (int r=size; --r>=0; ) *dst++ = *src++;

除了使用内联汇编之外，还有其他方法可以使 Visual Studio 编译器生成 rep movsd 序列吗？

原文

I would like to copy a relatively short sequence of memory (less than 1 KB, typically 2-200 bytes) in a time critical function. The best code for this on CPU side seems to be rep movsd. However I somehow cannot make my compiler to generate this code. I hoped (and I vaguely remember seeing so) using memcpy would do this using compiler built-in intrinsics, but based on disassembly and debugging it seems compiler is using call to memcpy/memmove library implementation instead. I also hoped the compiler might be smart enough to recognize following loop and use rep movsd on its own, but it seems it does not.

char *dst;
const char *src;
// ...
for (int r=size; --r>=0; ) *dst++ = *src++;

Is there some way to make the Visual Studio compiler to generate rep movsd sequence other than using inline assembly?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

缱绻入梦 2024-08-05 02:36:36

我想到了几个问题。

首先，你怎么知道 movsd 会更快？您查看过它的延迟/吞吐量吗？ x86 架构充满了不应该使用的旧指令，因为它们在现代 CPU 上效率不高。

其次，如果使用 std::copy 而不是 memcpy 会发生什么？ std::copy 可能更快，因为它可以在编译时针对特定数据类型进行专门化。

第三，您是否在项目属性下启用了内部函数 -> C/C++-> 优化？

当然，我假设还启用了其他优化。

回复收藏 0 原文

只等公子 2024-08-05 02:36:36

您正在运行优化的构建吗？除非启用优化，否则它不会使用内在函数。还值得注意的是，它可能会使用比rep movsd更好的复制循环。它应该尝试并至少使用 MMX 一次执行 64 位复制。事实上，六七年前，我编写了一个 MMX 优化的复制循环来完成此类事情。不幸的是，编译器的内在 memcpy 的性能比我的 MMX 副本高出大约 1%。这确实教会了我不要对编译器正在做什么做出假设。

回复收藏 0 原文

避讳 2024-08-05 02:36:36

使用具有恒定大小的 memcpy

同时我发现：

当复制的块大小在编译时已知时，编译器将使用内在函数。如果不是，则调用库实现。当大小已知时，生成的代码非常好，根据大小进行选择。根据需要，它可以是单个 mov、movsd、或 movsd 后跟 movsb。

看来，如果我真的想始终使用 movsb 或 movsd，即使使用“动态”大小，我也必须使用内联汇编或特殊内在函数（见下文）。我知道大小“相当短”，但编译器不知道，我无法将其传达给它 - 我什至尝试使用 __assume(size<16)，但这还不够。

演示代码，使用“-Ob1（仅内联扩展）进行编译：

  #include <memory.h>

  void MemCpyTest(void *tgt, const void *src, size_t size)
  {
    memcpy(tgt,src,size);
  }

  template <int size>
  void MemCpyTestT(void *tgt, const void *src)
  {
    memcpy(tgt,src,size);
  }

  int main ( int argc, char **argv )
  {
    int src;
    int dst;
    MemCpyTest(&dst,&src,sizeof(dst));
    MemCpyTestT<sizeof(dst)>(&dst,&src);
    return 0;
  }

专门的内在函数

我最近发现存在非常简单的方法如何使 Visual Studio 编译器使用 movsd 复制字符 - 非常自然和简单：使用内在函数。以下内在函数可能会派上用场：

Using memcpy with a constant size

What I have found meanwhile:

Compiler will use intrinsic when the copied block size is compile time known. When it is not, is calls the library implementation. When the size is known, the code generated is very nice, selected based on the size. It may be a single mov, or movsd, or movsd followed by movsb, as needed.

It seems that if I really want to use movsb or movsd always, even with a "dynamic" size I will have to use inline assembly or special intrinsic (see below). I know the size is "quite short", but the compiler does not know it and I cannot communicate this to it - I have even tried to use __assume(size<16), but it is not enough.

Demo code, compile with "-Ob1 (expansion for inline only):

  #include <memory.h>

  void MemCpyTest(void *tgt, const void *src, size_t size)
  {
    memcpy(tgt,src,size);
  }

  template <int size>
  void MemCpyTestT(void *tgt, const void *src)
  {
    memcpy(tgt,src,size);
  }

  int main ( int argc, char **argv )
  {
    int src;
    int dst;
    MemCpyTest(&dst,&src,sizeof(dst));
    MemCpyTestT<sizeof(dst)>(&dst,&src);
    return 0;
  }

Specialized intrinsics

I have found recently there exists very simple way how to make Visual Studio compiler copy characters using movsd - very natural and simple: using intrinsics. Following intrinsics may come handy:

回复收藏 0 原文