realloc 和 memcpy 如何工作?

发布于 2024-07-10 09:13:22 字数 267 浏览 5 评论 0原文

我有两个问题。

  1. realloc()memcpy() 将数组中的条目复制到另一个数组中的方式比迭代每个元素更快 O(N) ? 如果答案是肯定的,那么您认为它的复杂性是多少?

  2. 如果分配的大小小于原始大小,realloc() 是否将条目复制到其他位置,或者只是保留它们,因为它们正在减小数组的大小?

I have two questions.

  1. Do realloc() and memcpy() copy the entries in an array to another in a way faster than just iterating on each element O(N) ? If the answer is yes then what do you think is its complexity ?

  2. If the size allocated is smaller than the original size, does realloc() copy the entries to somewhere else or just leave them as they are decreasing the size of the array ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

空城仅有旧梦在 2024-07-17 09:13:22

让我们仔细看看 memcpy 以及“big O”或 Landau 表示法。

首先,大O。 正如我在其他地方谈到的,值得记住大 O 的定义,即某个函数 g(n) 被称为 O(f(n)) > 当存在一个常数 k 时,g(n)kf(n)。 常量的作用是让你忽略小细节而关注重要部分。 正如每个人都指出的那样,在大多数正常架构中,n 个字节的 memcpy 将是 O(n),因为无论您必须移动什么这些 n 个字节,一次一个块。 因此,可以用 C 语言编写 memcpy 的第一个简单实现。

unsigned char *
memcpy(unsigned char * s1, unsigned char * s2, long size){
    long ix;
    for(ix=0; ix < size; ix++)
        s1[ix] = s2[ix];
    return s1;
}

这实际上是 O(n),并且可能会让您想知道为什么我们还要费心使用库例程。 然而,libc 函数的问题在于它们是编写特定于平台的实用程序的地方; 如果您想优化架构,这是您可以做到的地方之一。 因此,根据架构,可能有更有效的实现选项; 例如,在 IBM 360 架构中,有一条 MOVL 指令,它使用高度优化的微码来移动大块数据。 因此,代替该循环,memcpy 的 360 度实现可能看起来像这样

LR 3,S1      LOAD S1 ADDR in Register 3
LR 4,S2      
MOVL 3,4,SIZE

(顺便说一句,不能保证这是完全正确的 360 代码,但它可以用作说明。)这个实现看起来像它只执行 3 条指令,而不是像 C 代码那样围绕循环执行 n 步。

然而,真正发生的是它在幕后执行O(n) 微条指令。 两者之间的不同是常数k; 因为微代码速度更快,并且指令上只有三个解码步骤,所以它比简单版本快得多,但仍然是 O(n) --只是常数变小了。

这就是为什么您可以充分利用 memcpy ——它并不是渐近更快,但实现速度与某人在特定架构上实现的速度一样快。

Let's take a little closer look at memcpy and, while we're at it, at "big O" or Landau notation.

First, big-O. As i've talked about elsewhere, it's worth remembering the definition of big O, which is that some function g(n) is said to be O(f(n)) when there exists a constant k for which g(n)kf(n). What the constant does is lets you ignore the little details in favor of the important part. As everyone has noted, memcpy of n bytes will be O(n) in most any normal architecture, because no matter what you have to move those n bytes, one chunk at a time. So, a first, naive implementation of memcpy in C could be written

unsigned char *
memcpy(unsigned char * s1, unsigned char * s2, long size){
    long ix;
    for(ix=0; ix < size; ix++)
        s1[ix] = s2[ix];
    return s1;
}

This is in fact O(n), and might make you wonder why we even bother with a library routine. however, the thing about the libc functions is that they are the place where platform-specific utilities get written; if you want to optimize for the architecture, this is one of the places you can do it. So, depending on the architecture, there may be a more efficient implementation options; for example, in the IBM 360 archiecture, there is a MOVL instruction that moves data is big chunks using very highly optimized microcode. So in place of that loop, a 360 implementation of memcpy might instead look something like

LR 3,S1      LOAD S1 ADDR in Register 3
LR 4,S2      
MOVL 3,4,SIZE

(No guarantees that's exactly right 360 code by the way, but it'll serve for an illustration.) This implementation looks like instead of doing n steps around the loop as the C code did, it just executes 3 instructions.

What really happens, though, is that it's executing O(n) micro instructions under the covers. What's different between the two is the constant k; because the microcode is much faster, and because there's only three decode steps on the instructions, it is dramatically faster than the naive version, but it's still O(n) -- it's just the constant is smaller.

And that's why you can make good use of memcpy -- it's not asymptotically faster, but the implementation is as fast as someone could make it on that particular architecture.

忆梦 2024-07-17 09:13:22

1 - 不。他们一次复制一个块。 请参阅http://www.embedded.com/design/ configurable-systems/4024961/Optimizing-Memcpy-improves-speed 进行了相当好的分析。

2 - 这取决于实现。 请参阅 http://www.gnu.org/software/ libtool/manual/libc/Changing-Block-Size.html 了解 glibc 详细信息。 “在几种分配实现中,使块变小有时需要复制它”

1 - No. They copy a block at a time. See http://www.embedded.com/design/configurable-systems/4024961/Optimizing-Memcpy-improves-speed for a pretty good analysis.

2 - This is implementation dependent. See http://www.gnu.org/software/libtool/manual/libc/Changing-Block-Size.html for glibc details. "In several allocation implementations, making a block smaller sometimes necessitates copying it"

感情洁癖 2024-07-17 09:13:22
  1. 绝对没有办法比 O(N) 更快地复制 N 个项目。 但是,它可能能够一次复制多个项目,或者使用特殊的处理器指令 - 因此它仍然可能比您自己完成的速度更快。
  2. 我不确定,但我假设内存已完全重新分配。 这是最安全的假设,而且无论如何它可能取决于实现。
  1. There is absolutely no way to copy N items faster than O(N). However, it might be able to copy multiple items at once, or use special processor instructions - so it still might be faster than you could do it yourself.
  2. I don't know for sure, but I'd assume that the memory is completely reallocated. That's the safest assumption, and it's probably implementation dependent anyway.
迷荒 2024-07-17 09:13:22
  1. memcpy 的性能实际上不可能比 O(N) 更好,但可以对其进行优化,使其优于手动复制; 例如,它可能能够在复制 1 个字节的时间内复制 4 个字节。 许多 memcpy 实现都是使用优化指令以汇编语言编写的,这些指令可以一次复制多个元素,通常比一次复制一个字节的数据要快。

  2. 我不太明白这个问题,如果你使用realloc减少内存大小并且成功(返回非NULL),新位置将包含与旧位置相同的数据位置最多可达新请求的大小。 如果由于调用realloc(减小大小时不常见)而更改了内存位置,则将复制内容,否则不需要进行复制,因为内存尚未移动。

  1. The performance of memcpy can't really be better than O(N) but it can be optimized so that it outperforms manual copying; for example, it might be able to copy 4 bytes in the time it takes you to copy 1 byte. Many memcpy implementations are written in assembly using optimized instructions that can copy multiple elements at a time which is usually faster than copying data one byte at a time.

  2. I don't quite understand this question, if you use realloc to decrease the size of memory and it succeeds (returns non-NULL), the new location will contain the same data as the old location up to the size of the new request. If the memory location was changed as a result of calling realloc (not usual when decreasing the size) the contents will be copied, otherwise no copying needs to happen as the memory hasn't moved.

白馒头 2024-07-17 09:13:22
  1. 可以推测 memcpy 可以被编写为可以移动大量的位。 例如,如果有利的话,完全可以使用SSE 指令复制数据。

正如其他人所说,它不会比 O(n) 更快,但内存系统通常有首选的块大小,而且也可以一次写入缓存行的大小。

  1. It can be conjectured that memcpy could be written such that it would move large number of bits around. e.g. It's entirely possible to copy the data using SSE instructions, if it is advantageous.

As other said, it won't be faster than O(n), but memory systems often have a preferred block size, and also it's possible to, say, write the size of a cache line at a time.

一身软味 2024-07-17 09:13:22

假设您正在谈论 glibc,并且由于您的问题取决于实现,因此最好检查源代码:

malloc.c

memcpy.c

按照我的理解,答案是:

  1. O(N) --- 没有办法在比线性时间更好的时间内复制项目。
  2. 当使用 realloc() 缩小项目时,有时会复制大项目。

Presuming you are talking about glibc, and since your questions are implementation dependent, it's probably best just to check the source:

malloc.c

memcpy.c

The way I read it, the answers would be:

  1. O(N) --- there is no way to copy items in better than linear time.
  2. Occasionally large items will be copied when realloc() is used to shrink them.
雪若未夕 2024-07-17 09:13:22

x86 还具有用于扫描和匹配内存块中的字节/字的特殊指令,以及可用于复制内存块的特殊指令(毕竟它是 CISC cpu)。 许多实现内联汇编语言和用于内联整个函数的编译指示的 C 编译器多年来一直在其库函数中利用这一点。

用于mem复制的是movsb/movsw与rep指令的组合。

CMPS/MOVS/SCAS/STOS
REP, REPE, REPNE, REPNZ, REPZ

设置寄存器包含 src/trg 地址和 int 计数,然后就可以了。

The x86 has special instructions for scanning and matching a byte/word in a block of memory as well and one that can be used to copy a block of memory (it is a CISC cpu after all). A lot of C compilers that implement inline assembly language and a pragma to do inlining of entire functions have for many many years taken advantage of this in their library functions.

The ones used for mem copy are movsb/movsw in combination to the rep instruction.

CMPS/MOVS/SCAS/STOS
REP, REPE, REPNE, REPNZ, REPZ

Setup registers with src/trg addresses and int count and away you go.

时光暖心i 2024-07-17 09:13:22

与 realloc 相关的一些要点(检查 dev c++):
void *realloc(void *ptr, size_t 大小);

  1. realloc()函数将ptr指向的内存对象的大小更改为size指定的大小。

  2. 对象的内容应保持不变,直至新旧大小中的较小者。

  3. 如果新大小较大,则对象新分配部分的内容未指定。

  4. 如果 size 为 0 并且 ptr 不是空指针,则释放指向的对象。

    如果 size 为 0

  5. 如果 ptr 是空指针,对于指定大小,realloc() 应等效于 malloc()。

    如果 ptr 是空指针,对于指定大小,

  6. 如果 ptr 与之前由 calloc()、malloc() 或 realloc() 返回的指针不匹配,或者空间先前已通过调用 free() 或 realloc() 释放,则行为未定义。

Some of the important points related to realloc(check on dev c++) :
void *realloc(void *ptr, size_t size);

  1. The realloc() function shall change the size of the memory object pointed to by ptr to the size specified by size.

  2. The contents of the object shall remain unchanged up to the lesser of the new and old sizes.

  3. If the new size is larger, the contents of the newly allocated portion of the object are unspecified.

  4. If size is 0 and ptr is not a null pointer, the object pointed to is freed.

  5. If ptr is a null pointer, realloc() shall be equivalent to malloc() for the specified size.

  6. If ptr does not match a pointer returned earlier by calloc(), malloc(), or realloc() or if the space has previously been deallocated by a call to free() or realloc(), the behavior is undefined.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文