在 64 位系统中复制 unsigned int 2 次和 unsigned long 1 次有什么区别?

发布于 2024-12-10 04:14:37 字数 238 浏览 0 评论 0原文

64位系统上有什么区别

*(unsigned*)d = *(unsigned*)s; 
d+=4; s+=4; 
*(unsigned*)d = *(unsigned*)s; 
d+=4; s+=4;

64位系统上和

*(unsigned long*)d = *(unsigned long*)s;
d+=8; s+=8;

What is the difference between

*(unsigned*)d = *(unsigned*)s; 
d+=4; s+=4; 
*(unsigned*)d = *(unsigned*)s; 
d+=4; s+=4;

and

*(unsigned long*)d = *(unsigned long*)s;
d+=8; s+=8;

on 64bit systems?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

紙鸢 2024-12-17 04:14:37

假设在填充位或严格的别名规则方面没有发生任何令人不快的情况,并且假设类型的大小如您所期望的那样,并且假设内存区域不重叠并且正确对齐,那么它们每个都会从从一个地方到另一个地方。

当然,除了实际效果之外,性能和/或代码大小可能存在差异。

如果您发现有问题,请查看发出的实际代码,这可能会告诉您出了什么问题。除非你打开了很多优化,甚至可能进行了优化,否则我不会立即明白为什么这些与 AMD64、Ubuntu 和 gcc 不等效。

我提到的可能出错的事情:

  • 填充位 - 不适用于 GCC,但标准允许 unsigned 一个 unsigned long 具有填充位,如果是这样那么可能存在位模式,它们是一个或两个的陷阱表示,一旦取消引用,它们就会爆炸。
  • 严格别名 - 不太可能影响该代码的功能,但可能会影响您用来检查结果的代码。例如,如果 sd 是将双精度型指针转换为 uint8_t* 的结果,并且您查看生成的双精度型,那么在一种或两种情况下,您可能看不到更改的效果,因为您有非法的类型双关语。
  • 类型的大小 - 不应该应用于这里,因为 64 位 linux 是 LP64,但显然如果 sizeof(long) == 4 那么两者不等价。 long 在 64 位 Windows 系统上是 32 位,但在 64 位 Linux 系统上不是。
  • 重叠 - 如果d == s + 4,则两个代码片段具有不同的效果。因此,除非编译器知道 ds 指向完全不同的位置(这就是 C99 >restrict 用于)。
  • 对齐 - 我不记得 x86-64 的对齐要求是什么:对于 x86,你可以摆脱未对齐的读/写,只是速度较慢。一般来说,如果 sd 对于 int 正确对齐,但对于 long 则不正确,那么就会存在差异。 (编辑:显然您可以启用或禁用 x86-64 上未对齐访问的硬件异常)。

Provided that nothing unpleasant happens in respect of padding bits or strict aliasing rules, and assuming the sizes of the types are as you expect, and provided that the memory regions don't overlap, and are correctly aligned, then they each copy 8 bytes from one place to another.

Of course, aside from the practical effect there may be a difference in performance and/or code size.

If you're seeing something break, then look at the actual code emitted, that might tell you what has gone wrong. Unless you have a lot of optimization switched on, and maybe even with optimization, I don't immediately see why those wouldn't be equivalent with AMD64, Ubuntu, and gcc.

Things I've mentioned that could go wrong:

  • padding bits - doesn't apply to GCC, but the standard permits unsigned an unsigned long to have padding bits, and if so then there could be bit patterns which are trap representations of one or both, which could explode as soon as you dereference.
  • strict aliasing - unlikely to affect what that code does, but could affect the code you use to check the result. For example, if s and d are the result of casting pointers-to-double to uint8_t*, and you look at the resulting double, then in one or both cases you might not see the effects of the change because you have an illegal type-pun.
  • sizes of the types - shouldn't apply here since 64 bit linux is LP64, but obviously if sizeof(long) == 4 then the two aren't equivalent. long is 32 bits on 64bit Windows systems, just not 64bit Linux ones.
  • overlap - if d == s + 4, then the two code snippets have different effect. Because of this, you won't see the first optimized to become the second unless the compiler knows that d and s point to entirely different places (and that's what C99 restrict is for).
  • alignment - I can't remember what the alignment requirements are for x86-64: for x86 you can get away with an unaligned read/write, it's just slower. In general, if s or d is correctly aligned for int but not long then there's a difference. (Edit: apparently you can enable or disable hardware exceptions for unaligned access on x86-64).
寻找我们的幸福 2024-12-17 04:14:37

如果您需要恰好复制八个字节,为什么不使用 memcpy() ?

memcpy(d, s, 8);

使用GCC,它将发出内联代码而不是调用库函数,因此它应该与您手写的内存复制一样快。

额外的好处是,您的代码可以在 ILP32 系统、LP64(大多数 64 位 Unix)和 LLP64 (win64) 上运行,甚至可以在具有严格对齐要求的系统上运行。

If you need to copy exactly eight byte, why not using memcpy() ?

memcpy(d, s, 8);

Using GCC, it will emit inline code instead of calling the library function, so it should be as faster as your hand written memory copy.

Added bonuses, your code will work on ILP32 systems, LP64 (most 64bits Unix) and LLP64 (win64), and even on system with strict alignment requirements.

猫烠⑼条掵仅有一顆心 2024-12-17 04:14:37

如果性能并不重要,您可能应该像另一个答案一样使用 memcpy()

如果此代码在写入 *s 后不久出现,请匹配类型;如果此代码在读取 *d 之前出现,则匹配类型。这将确保存储到负载转发(将数据从存储直接移动到负载,而不等待存储将数据写回数据缓存)将在尽可能多的 CPU 上运行。如果存储和加载的地址和大小匹配且对齐,则存储到加载转发几乎总是有效,并且可能会更频繁地工作,具体取决于 CPU。如果存储到加载转发失败,惩罚往往是 10 个时钟周期的量级。

如果您可以通过添加额外的移位/和/或操作来避免存储到加载的转发问题,那么这通常会更快。

如果您更有效地使用 C 的类型系统并避免强制转换,则可以避免许多存储到加载的转发问题。

If performance is not critical, you should probably just use the memcpy() as in another answer.

If this code occurs soon after a write to *s, match the types; if this code occurs soon before a read from *d, match the types. This will ensure store-to-load forwarding (moving the data from the store directly to the load, without waiting for the store to write the data back into the data cache) will work on as many CPUs as possible. Store-to-load forwarding almost always works if the addresses and sizes of the store and load match and are aligned, and may work more often depending on CPU. If store-to-load forwarding fails, the penalty tends to be in the order of 10 clock cycles.

If you can avoid a store-to-load forwarding problem by adding additional shift/and/or operations, this is often faster.

If you use C's type system more effectively and avoid casts, many store-to-load forwarding problems will be avoided.

回眸一笑 2024-12-17 04:14:37

尝试转换为 (unsigned long long*)

Try casting as (unsigned long long*)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文