如何生成具有不同“分支”的二维数组长度非常快
我是一名德尔福程序员。 在程序中,我必须生成具有不同“分支”长度的二维数组。 它们非常大,操作需要几秒钟(烦人)。
例如:
var a: array of array of Word;
i: Integer;
begin
SetLength(a, 5000000);
for i := 0 to 4999999 do
SetLength(a[i], Diff_Values);
end;
我知道命令 SetLength(a, dim1, dim2) 但不适用。甚至没有为dim2设置最小值(> 0)并从那里继续,因为dim2的最小值为0(某些“分支”可以为空)。
那么,有没有办法让它快一点呢?不只是 5..10%,而且真的很快...
谢谢。
I am a Delphi programmer.
In a program I have to generate bidimensional arrays with different "branch" lengths.
They are very big and the operation takes a few seconds (annoying).
For example:
var a: array of array of Word;
i: Integer;
begin
SetLength(a, 5000000);
for i := 0 to 4999999 do
SetLength(a[i], Diff_Values);
end;
I am aware of the command SetLength(a, dim1, dim2) but is not applicable. Not even setting a min value (> 0) for dim2 and continuing from there because min of dim2 is 0 (some "branches" can be empty).
So, is there a way to make it fast? Not just by 5..10% but really FAST...
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
处理大量数据时,需要完成大量工作,这对完成这些工作所需的时间设置了理论上的最小值。
500 万次迭代中的每一次,您需要:
步骤 1 完全在您的控制之下,并且可以进行优化。不过,如果您使用的是现代版本的 Delphi,则 2 和 3 的速度大约是最快的。 (如果您使用的是旧版本,您可能会受益于安装 FastMM 和 FastCode,它们可以加快这些操作的速度。)
如果合适的话,您可以做的另一件事是延迟初始化。不要尝试一次分配所有 500 万个数组,只需首先执行
SetLength(a, 5000000);
即可。然后,当你需要到达一个“分支”时,首先检查它的长度是否= 0。如果是,则它尚未初始化,因此将其初始化为适当的长度。这并不能节省总体时间,事实上,它总共花费的时间会稍长一些,但它确实分散了初始化时间,因此用户不会注意到。如果你的初始化已经尽可能快,并且你的情况是这里不能使用延迟初始化,那么你基本上就不走运了。这就是处理大量数据的代价。
When dealing with a large amount of data, there's a lot of work that has to be done, and this places a theoretical minimum on the amount of time it can be done in.
For each of 5 million iterations, you need to:
Step 1 is completely under your control and can possibly be optimized. 2 and 3, though, are about as fast as they're gonna get if you're using a modern version of Delphi. (If you're on an old version, you might benefit from installing FastMM and FastCode, which can speed up these operations.)
The other thing you might do, if appropriate, is lazy initialization. Instead of trying to allocate all 5 million arrays at once, just do the
SetLength(a, 5000000);
at first. Then when you need to get at a "branch", first check if its length = 0. If so, it hasn't been initialized, so initialize it to the proper length. This doesn't save time overall, in fact it will take slightly longer in total, but it does spread out the initialization time so the user doesn't notice.If your initialization is already as fast as it will get, and your situation is such that lazy initialization can't be used here, then you're basically out of luck. That's the price of dealing with large amounts of data.
我刚刚测试了您的确切代码,使用
Diff_Values
常量,使用GetTickCount()
对其进行计时以进行基本计时。如果Diff_Values
为186
,则需要 1466 毫秒,如果Diff_Values
为187
,则失败并显示内存不足
。您知道,内存不足
意味着地址空间不足
,而不是真正的内存不足
。在我看来,您分配的数据太多,导致 RAM 耗尽,Windows 开始分页,这就是速度缓慢的原因。在我的系统上,我有足够的 RAM 供进程分配所需的内存;确实如此,直到失败。
可能的解决方案
I just tested your exact code, with a constant for
Diff_Values
, timed it usingGetTickCount()
for rudimentary timing. IfDiff_Values
is186
it takes 1466 milliseconds, ifDiff_Values
is187
it fails withOut of Memory
. You know,Out of Memory
meansOut of Address Space
, not reallyOut of Memory
.In my opinion you're allocating so much data you run out of RAM and Windows starts paging, that's why it's slow. On my system I've got enough RAM for the process to allocate as much as it wants; And it does, until it fails.
Possible solutions
除了梅森的观点之外,这里还有一些需要考虑的想法:
如果分支长度在分配后永远不会改变,并且您对将存储在所有分支的数组中的项目总数有一个上限,那么您通过分配一大块内存并自己分配该内存块中的“分支”,也许可以节省一些时间。您的数组将成为一维指针数组,该数组中的每个条目都指向该分支的数据的开头。您可以使用单个指针变量跟踪大块中已用空间的“结束”,并且当您需要为新“分支”保留空间时,您可以将当前“结束”指针值作为新“分支”的开始分支并将“结束”指针增加分支所需的空间量。不要忘记四舍五入到双字边界以避免错位惩罚。
这种技术需要更多地使用指针,但它提供了消除所有堆分配开销的潜力,或者至少用与您的特定使用模式相匹配的专门构建的非常简单、非常快速的子分配器来替换通用堆分配。它的执行速度应该更快,但需要更多的时间来编写和测试。
此技术还将避免堆碎片,并将所有内存的释放减少到一次释放(而不是当前模型中的数百万次单独分配)。
另一个需要考虑的提示:如果您对每个新分配的数组“分支”所做的第一件事是将数据分配到每个槽中,那么您可以消除 Mason 示例中的步骤 3 - 如果全部都将内存清零,则无需将内存清零您要做的就是立即将真实数据分配给它。这将使您的内存写入操作减少一半。
In addition to Mason's points, here are some more ideas to consider:
If the branch lengths never change after they are allocated, and you have an upper bound on the total number of items that will be stored in the array across all branches, then you might be able to save some time by allocating one huge chunk of memory and divvying up the "branches" within that chunk yourself. Your array would become a 1 dimensional array of pointers, and each entry in that array points to the start of the data for that branch. You keep track of the "end" of the used space in your big block with a single pointer variable, and when you need to reserve space for a new "branch" you take the current "end" pointer value as the start of the new branch and increment the "end" pointer by the amount of space that branch requires. Don't forget to round up to dword boundaries to avoid misalignment penalties.
This technique will require more use of pointers, but it offers the potential of eliminating all the heap allocation overhead, or at least replacing the general purpose heap allocation with a purpose-built very simple, very fast suballocator that matches your specific use pattern. It should be faster to execute, but it will require more time to write and test.
This technique will also avoid heap fragmentation and reduces the releasing of all the memory to a single deallocation (instead of millions of separate allocations in your present model).
Another tip to consider: If the first thing you always do with the each newly allocated array "branch" is assign data into every slot, then you can eliminate step 3 in Mason's example - you don't need to zero out the memory if all you're going to do is immediately assign real data into it. This will cut your memory write operations by half.
假设您可以将整个数据结构放入连续的内存块中,则可以一次性完成分配,然后接管索引。
注意:即使您无法将数据放入单个连续的内存块中,您仍然可以通过分配多个大块然后将它们拼凑在一起来使用此技术。
首先形成一个辅助数组
colIndex
,它包含每行第一列的索引。将colIndex
的长度设置为RowCount+1
。您可以通过设置colIndex[0] := 0
然后设置colIndex[i+1] := colIndex[i] + ColCount[i]
来构建它。在 for 循环中执行此操作,该循环运行到并包括RowCount
。因此,在最后一个条目colIndex[RowCount]
中,您存储元素的总数。现在将 a 的长度设置为
colIndex[RowCount]
。这可能需要一些时间,但会比您之前所做的更快。现在您需要编写几个索引器。将它们放入一个类或一个记录中。
getter 看起来像这样:
setter 很明显。您可以内联这些访问方法以提高性能。将它们公开为索引属性以方便对象的客户端。
您需要添加一些代码来检查
row
和col
的有效性。对于后者,您需要使用colIndex
。如果您想模仿本机索引的范围检查,可以使用{$IFOPT R+}
将此检查设置为可选。当然,如果您想在初始实例化后更改任何列计数,那么这是完全行不通的!
Assuming you can fit the entire data structure into a contiguous block of memory, you can do the allocation in one shot and then take over the indexing.
Note: Even if you can't fit the data into a single contiguous block of memory, you can still use this technique by allocating multiple large blocks and then piecing them together.
First off form a helper array,
colIndex
, which is to contain the index of the first column of each row. Set the length ofcolIndex
toRowCount+1
. You build this by settingcolIndex[0] := 0
and thencolIndex[i+1] := colIndex[i] + ColCount[i]
. Do this in a for loop which runs up to and includingRowCount
. So, in the final entry,colIndex[RowCount]
, you store the total number of elements.Now set the length of a to be
colIndex[RowCount]
. This may take a little while, but it will be quicker than what you were doing before.Now you need to write a couple of indexers. Put them in a class or a record.
The getter looks like this:
The setter is obvious. You can inline these access methods for increased performance. Expose them as an indexed property for convenience to the object's clients.
You'll want to add some code to check for validity of
row
andcol
. You need to usecolIndex
for the latter. You can make this checking optional with{$IFOPT R+}
if you want to mimic range checking for native indexing.Of course, this is a total non-starter if you want to change any of your column counts after the initial instantiation!