当前位置：文江博客话题详情

realloc 和 memcpy 如何工作？

发布于 2024-07-10 09:13:22 字数 267 浏览 5 评论 0原文

我有两个问题。

realloc() 和 memcpy() 将数组中的条目复制到另一个数组中的方式比迭代每个元素更快 O(N) ？如果答案是肯定的，那么您认为它的复杂性是多少？
如果分配的大小小于原始大小，realloc() 是否将条目复制到其他位置，或者只是保留它们，因为它们正在减小数组的大小？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空城仅有旧梦在 2024-07-17 09:13:22

让我们仔细看看 memcpy 以及“big O”或 Landau 表示法。

首先，大O。正如我在其他地方谈到的，值得记住大 O 的定义，即某个函数 g(n) 被称为 O(f(n)) > 当存在一个常数 k 时，g(n) ≤ kf(n)。常量的作用是让你忽略小细节而关注重要部分。正如每个人都指出的那样，在大多数正常架构中，n 个字节的 memcpy 将是 O(n)，因为无论您必须移动什么这些 n 个字节，一次一个块。因此，可以用 C 语言编写 memcpy 的第一个简单实现。

unsigned char *
memcpy(unsigned char * s1, unsigned char * s2, long size){
    long ix;
    for(ix=0; ix < size; ix++)
        s1[ix] = s2[ix];
    return s1;
}

这实际上是 O(n)，并且可能会让您想知道为什么我们还要费心使用库例程。然而，libc 函数的问题在于它们是编写特定于平台的实用程序的地方；如果您想优化架构，这是您可以做到的地方之一。因此，根据架构，可能有更有效的实现选项；例如，在 IBM 360 架构中，有一条 MOVL 指令，它使用高度优化的微码来移动大块数据。因此，代替该循环，memcpy 的 360 度实现可能看起来像这样

LR 3,S1      LOAD S1 ADDR in Register 3
LR 4,S2      
MOVL 3,4,SIZE

（顺便说一句，不能保证这是完全正确的 360 代码，但它可以用作说明。）这个实现看起来像它只执行 3 条指令，而不是像 C 代码那样围绕循环执行 n 步。

然而，真正发生的是它在幕后执行O(n) 微条指令。两者之间的不同是常数k；因为微代码速度更快，并且指令上只有三个解码步骤，所以它比简单版本快得多，但仍然是 O(n) --只是常数变小了。

这就是为什么您可以充分利用 memcpy ——它并不是渐近更快，但实现速度与某人在特定架构上实现的速度一样快。

Let's take a little closer look at memcpy and, while we're at it, at "big O" or Landau notation.

First, big-O. As i've talked about elsewhere, it's worth remembering the definition of big O, which is that some function g(n) is said to be O(f(n)) when there exists a constant k for which g(n) ≤ kf(n). What the constant does is lets you ignore the little details in favor of the important part. As everyone has noted, memcpy of n bytes will be O(n) in most any normal architecture, because no matter what you have to move those n bytes, one chunk at a time. So, a first, naive implementation of memcpy in C could be written

unsigned char *
memcpy(unsigned char * s1, unsigned char * s2, long size){
    long ix;
    for(ix=0; ix < size; ix++)
        s1[ix] = s2[ix];
    return s1;
}

This is in fact O(n), and might make you wonder why we even bother with a library routine. however, the thing about the libc functions is that they are the place where platform-specific utilities get written; if you want to optimize for the architecture, this is one of the places you can do it. So, depending on the architecture, there may be a more efficient implementation options; for example, in the IBM 360 archiecture, there is a MOVL instruction that moves data is big chunks using very highly optimized microcode. So in place of that loop, a 360 implementation of memcpy might instead look something like

LR 3,S1      LOAD S1 ADDR in Register 3
LR 4,S2      
MOVL 3,4,SIZE

(No guarantees that's exactly right 360 code by the way, but it'll serve for an illustration.) This implementation looks like instead of doing n steps around the loop as the C code did, it just executes 3 instructions.

What really happens, though, is that it's executing O(n) micro instructions under the covers. What's different between the two is the constant k; because the microcode is much faster, and because there's only three decode steps on the instructions, it is dramatically faster than the naive version, but it's still O(n) -- it's just the constant is smaller.

And that's why you can make good use of memcpy -- it's not asymptotically faster, but the implementation is as fast as someone could make it on that particular architecture.

回复收藏 0 原文

忆梦 2024-07-17 09:13:22

1 - 不。他们一次复制一个块。请参阅http://www.embedded.com/design/ configurable-systems/4024961/Optimizing-Memcpy-improves-speed 进行了相当好的分析。

2 - 这取决于实现。请参阅 http://www.gnu.org/software/ libtool/manual/libc/Changing-Block-Size.html 了解 glibc 详细信息。 “在几种分配实现中，使块变小有时需要复制它”

回复收藏 0 原文

感情洁癖 2024-07-17 09:13:22

绝对没有办法比 O(N) 更快地复制 N 个项目。但是，它可能能够一次复制多个项目，或者使用特殊的处理器指令 - 因此它仍然可能比您自己完成的速度更快。
我不确定，但我假设内存已完全重新分配。这是最安全的假设，而且无论如何它可能取决于实现。

回复收藏 0 原文

迷荒 2024-07-17 09:13:22

memcpy 的性能实际上不可能比 O(N) 更好，但可以对其进行优化，使其优于手动复制；例如，它可能能够在复制 1 个字节的时间内复制 4 个字节。许多 memcpy 实现都是使用优化指令以汇编语言编写的，这些指令可以一次复制多个元素，通常比一次复制一个字节的数据要快。
我不太明白这个问题，如果你使用realloc减少内存大小并且成功（返回非NULL），新位置将包含与旧位置相同的数据位置最多可达新请求的大小。如果由于调用realloc（减小大小时不常见）而更改了内存位置，则将复制内容，否则不需要进行复制，因为内存尚未移动。

回复收藏 0 原文

白馒头 2024-07-17 09:13:22

可以推测 memcpy 可以被编写为可以移动大量的位。例如，如果有利的话，完全可以使用SSE 指令复制数据。

正如其他人所说，它不会比 O(n) 更快，但内存系统通常有首选的块大小，而且也可以一次写入缓存行的大小。

回复收藏 0 原文

一身软味 2024-07-17 09:13:22

假设您正在谈论 glibc，并且由于您的问题取决于实现，因此最好检查源代码：

malloc.c

memcpy.c

按照我的理解，答案是：

O(N) --- 没有办法在比线性时间更好的时间内复制项目。
当使用 realloc() 缩小项目时，有时会复制大项目。

回复收藏 0 原文

雪若未夕 2024-07-17 09:13:22

x86 还具有用于扫描和匹配内存块中的字节/字的特殊指令，以及可用于复制内存块的特殊指令（毕竟它是 CISC cpu）。许多实现内联汇编语言和用于内联整个函数的编译指示的 C 编译器多年来一直在其库函数中利用这一点。

用于mem复制的是movsb/movsw与rep指令的组合。

CMPS/MOVS/SCAS/STOS
REP, REPE, REPNE, REPNZ, REPZ

设置寄存器包含 src/trg 地址和 int 计数，然后就可以了。

The x86 has special instructions for scanning and matching a byte/word in a block of memory as well and one that can be used to copy a block of memory (it is a CISC cpu after all). A lot of C compilers that implement inline assembly language and a pragma to do inlining of entire functions have for many many years taken advantage of this in their library functions.

The ones used for mem copy are movsb/movsw in combination to the rep instruction.

CMPS/MOVS/SCAS/STOS
REP, REPE, REPNE, REPNZ, REPZ

Setup registers with src/trg addresses and int count and away you go.

回复收藏 0 原文

时光暖心i 2024-07-17 09:13:22

与 realloc 相关的一些要点（检查 dev c++）：
void *realloc(void *ptr, size_t 大小);

realloc()函数将ptr指向的内存对象的大小更改为size指定的大小。
对象的内容应保持不变，直至新旧大小中的较小者。
如果新大小较大，则对象新分配部分的内容未指定。
如果 size 为 0 并且 ptr 不是空指针，则释放指向的对象。
如果 size 为 0
如果 ptr 是空指针，对于指定大小，realloc() 应等效于 malloc()。
如果 ptr 是空指针，对于指定大小，
如果 ptr 与之前由 calloc()、malloc() 或 realloc() 返回的指针不匹配，或者空间先前已通过调用 free() 或 realloc() 释放，则行为未定义。