C动态分配速度问题
我使用这段代码动态创建一个二维数组:
char **FileTables;
int rows = 1000;
int i;
FileTables = (char**)malloc(rows * sizeof(char));
for (i = 0; i < rows; i++) {
FileTables[i] = (char*)malloc(256 * sizeof(char));
}
问题是有 1000 行,而且可能还有更多,分配所有内存需要几秒钟。 有没有更快/更好的方法来做到这一点?
编辑: 除了明显更简单的代码之外,使用其中一种方法相对于另一种方法是否还有优势?
char **FileTables;
int rows = 1000;
int i;
FileTables = malloc(rows * sizeof(char*));
FileTables[0] = malloc(rows * 256 * sizeof(char));
for (i = 0; i < rows; i++) {
FileTables[i] = FileTables[0] + i * 256;
}
而且..
char (*FileTables)[256];
int rows = 1000;
FileTables = malloc(rows * sizeof(*FileTables));
(是的,我修复了不必要的演员)
I'm using this code to dynamically create a 2d array:
char **FileTables;
int rows = 1000;
int i;
FileTables = (char**)malloc(rows * sizeof(char));
for (i = 0; i < rows; i++) {
FileTables[i] = (char*)malloc(256 * sizeof(char));
}
Problem is with 1000 rows, and there could be more, it takes a couple of seconds to allocate all the memory.
Is there any faster/better method to doing this?
EDIT:
Is there an advantage to using one of these methods over the other, besides the obvious simpler code?
char **FileTables;
int rows = 1000;
int i;
FileTables = malloc(rows * sizeof(char*));
FileTables[0] = malloc(rows * 256 * sizeof(char));
for (i = 0; i < rows; i++) {
FileTables[i] = FileTables[0] + i * 256;
}
And..
char (*FileTables)[256];
int rows = 1000;
FileTables = malloc(rows * sizeof(*FileTables));
(And yes, I fixed the unnecessary casting)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
您只需两次分配和一些指针算术即可逃脱:
另请注意,我修复了
malloc(rows * sizeof(char))
中的错误(sizeof(char)
应该是sizeof(char*)
,因为您正在将一个指针数组分配给char
)。You could get away with just two allocations and some pointer arithmetic:
Also note that I fixed a bug in
malloc(rows * sizeof(char))
(thesizeof(char)
should besizeof(char*)
, since you're allocating an array of pointers tochar
).只要列数不变,或者如果您使用的是 C99,您就可以使用单个
malloc
,而不必自己执行丑陋的行/列寻址算术:As long as the number of columns is constant, or if you're using C99, you can get away with a single
malloc
without having to do ugly row/column addressing arithmetic yourself:如果数组的大小始终为
row
× 256,那么您可以考虑使用一维数组malloc(row * 256)
,并按步幅访问它:这可以避免多次分配并提供更好的内存局部性。最重要的是,您可以选择行或列顺序进行微观优化。
If the array is always of the size
row
× 256, then you might consider a one-dimensional arraymalloc(row * 256)
, and access it in strides:This avoids multiple allocations and gives better memory locality. On top of that, you can pick row or column ordering to micro-optimize.
应该是一个更好的解决方案。
Should be a better solution.
我不相信你能达到接近秒的速度。在我的机器上将行数增加到 1000 万仍然不到一秒。
但是,如果您想最小化分配,则只需要一个。
更有效的方法是避免第二级间接。
这避免了指针查找,因为 C 可以计算其余部分。
I don't believe you will get anywhere near seconds. Increasing the rows to 10 million is still under a second on my machine.
However if you want to minimise allocations, you only need one.
A more efficient way to do this is to avoid the second level of indirection.
This avoid a pointer lookup as the C can calculate the rest.
首先,你确定是内存分配的问题吗?分配 1000 个内存块通常不会花费几秒钟。
如果您有特殊需求,您可以研究替代的 malloc 实现(例如,如果您在线程中分配内存,则可以使用 google 的 tcmalloc)。
否则,malloc 真正“慢”的部分实际上是从操作系统获取内存(使用 sbrk() 或 mmap()),并且大多数 malloc 实现一次会抓取一大块,然后将其分成较小的部分返回,因此这里不是有 1000 个调用来分配 1k,可能有 60 个调用来分配 16k。在 strace 或类似的环境下运行程序可能会让您了解实际进行了多少次缓慢的系统调用。您可以自己实现类似的行为,通过一次调用来分配 256K 并将其细分为更小的块。您可以尝试分配一大块内存,然后立即 free() 释放它,并希望库 malloc 保留该内存并且不会返回操作系统获取更多内存。
First of all, are you sure it's the memory allocation that is the problem? allocating 1000 blocks of memory should generally not take a few seconds.
You could look into alternate malloc implementations if you have particular needs (e.g., google's tcmalloc if you are allocating memory in threads).
Otherwise, the real "slow" part of malloc is actually getting memory from the OS (with sbrk() or mmap()), and most malloc implementations will grab a big chunk at a time and give it back in smaller pieces, so there are not 1000 calls to allocate 1k each here, there are maybe 60 calls to allocate 16k. Running the program under strace or similar may give you an idea of how many slow system calls are really being made.. You could implement similar behavior yourself, by making a single call to allocate 256K and subdividing that up into smaller chunks. You could try allocating a big chunk of memory and then immediately free()-ing it and hope that the library malloc holds onto that memory and doesn't go back to the OS for more.
这看起来确实像是过早的优化;因为,你要求更快,但你没有指出多快才足够快。不过,如果您确实需要这样做...
加快分配速度的提示:
如您所见,如果您需要分配 10M,这些提示很快就会产生冲突。为了确定较小和较少分配之间的正确平衡,需要进行分析。
查看内存块大小并立即分配整页内存。这是一个古老的硬件黑客,但它确实保证您不会一次请求多个连续内存页(这可以加快从空闲页列表中的选择速度),并且它还保证您不会浪费一些周期地址通过请求内存管理器的块保留子系统已保留的地址来分配空间。
如果这不能让您获得所需的性能,请重写代码以不需要按照其呈现的方式进行分配。
无论哪种方式,如果不详细了解计算机上的内存管理子系统的实际设计方式,就不可能保证最佳分配速度。
This really looks like premature optimization; because, you are asking for faster, but you haven't indicated how fast is fast enough. Still, if you really need to do it this way...
Tips to speed up allocation:
As you can see, if you need 10M allocated, these tips soon become conflicting. To determine the right balance between smaller and fewer allocations, on needs to do profiling.
Look to your memory block size and allocate whole pages of memory at once. It's an old hardware hack, but it does guarantee that you don't ask for multiple pages of continuous memory at once (which speeds up the selecting from the free page lists), and it also guarantees that you don't waste some cycles address space by asking for addresses already reserved by the block reservation subsystem of the memory manager.
If that doesn't get you the performance you need, then rewrite the code to not require allocation the way it's been presented.
Either way, it's not possible to guarantee optimum allocation speed without detailed knowledge of how the memory management subsystem on your computer is actually designed.