编码非常大的文件时如何解决此 EOutOfMemory 异常?
我正在使用带有 Unicode 字符串的 Delphi 2009。
我正在尝试对一个非常大的文件进行编码以将其转换为 Unicode:
var
Buffer: TBytes;
Value: string;
Value := Encoding.GetString(Buffer);
这对于 40 MB 的缓冲区效果很好,缓冲区大小加倍,并以 80 MB Unicode 字符串的形式返回值。
当我使用 300 MB 缓冲区尝试此操作时,它给出了 EOutOfMemory 异常。
嗯,这并不完全出乎意料。但我还是决定追查到底。
它进入系统单元中的 DynArraySetLength 过程。在该过程中,它进入堆并调用 ReallocMem。令我惊讶的是,它成功分配了 665,124,864 字节!
但在 DynArraySetLength 的末尾,它调用 FillChar:
// Set the new memory to all zero bits
FillChar((PAnsiChar(p) + elSize * oldLength)^, elSize * (newLength - oldLength), 0);
您可以通过注释看到它应该做什么。该例程没有太多内容,但它是导致 EOutOfMemory 异常的例程。这是来自系统单元的 FillChar:
procedure _FillChar(var Dest; count: Integer; Value: Char);
{$IFDEF PUREPASCAL}
var
I: Integer;
P: PAnsiChar;
begin
P := PAnsiChar(@Dest);
for I := count-1 downto 0 do
P[I] := Value;
end;
{$ELSE}
asm // Size = 153 Bytes
CMP EDX, 32
MOV CH, CL // Copy Value into both Bytes of CX
JL @@Small
MOV [EAX ], CX // Fill First 8 Bytes
MOV [EAX+2], CX
MOV [EAX+4], CX
MOV [EAX+6], CX
SUB EDX, 16
FLD QWORD PTR [EAX]
FST QWORD PTR [EAX+EDX] // Fill Last 16 Bytes
FST QWORD PTR [EAX+EDX+8]
MOV ECX, EAX
AND ECX, 7 // 8-Byte Align Writes
SUB ECX, 8
SUB EAX, ECX
ADD EDX, ECX
ADD EAX, EDX
NEG EDX
@@Loop:
FST QWORD PTR [EAX+EDX] // Fill 16 Bytes per Loop
FST QWORD PTR [EAX+EDX+8]
ADD EDX, 16
JL @@Loop
FFREE ST(0)
FINCSTP
RET
NOP
NOP
NOP
@@Small:
TEST EDX, EDX
JLE @@Done
MOV [EAX+EDX-1], CL // Fill Last Byte
AND EDX, -2 // No. of Words to Fill
NEG EDX
LEA EDX, [@@SmallFill + 60 + EDX * 2]
JMP EDX
NOP // Align Jump Destinations
NOP
@@SmallFill:
MOV [EAX+28], CX
MOV [EAX+26], CX
MOV [EAX+24], CX
MOV [EAX+22], CX
MOV [EAX+20], CX
MOV [EAX+18], CX
MOV [EAX+16], CX
MOV [EAX+14], CX
MOV [EAX+12], CX
MOV [EAX+10], CX
MOV [EAX+ 8], CX
MOV [EAX+ 6], CX
MOV [EAX+ 4], CX
MOV [EAX+ 2], CX
MOV [EAX ], CX
RET // DO NOT REMOVE - This is for Alignment
@@Done:
end;
{$ENDIF}
所以我的内存已分配,但在尝试用零填充时崩溃了。这对我来说没有意义。就我而言,内存甚至不需要用零填充 - 无论如何这可能是浪费时间 - 因为编码语句无论如何都会填充它。
我可以以某种方式阻止 Delphi 进行内存填充吗?
或者有其他方法可以让 Delphi 成功地为我分配这个内存吗?
我的真正目标是为我的非常大的文件执行该编码语句,因此任何允许这样做的解决方案将不胜感激。
结论:请参阅我对答案的评论。
这是在调试汇编代码时要小心的警告。确保在所有“RET”行上中断,因为我错过了 FillChar 例程中间的一行,并错误地得出结论是 FillChar 导致了问题。感谢梅森指出了这一点。
我必须将输入分解为块才能处理非常大的文件。
I am using Delphi 2009 with Unicode strings.
I'm trying to Encode a very large file to convert it to Unicode:
var
Buffer: TBytes;
Value: string;
Value := Encoding.GetString(Buffer);
This works fine for a Buffer of 40 MB that gets doubled in size and returns Value as an 80 MB Unicode string.
When I try this with a 300 MB Buffer, it gives me an EOutOfMemory exception.
Well, that wasn't totally unexpected. But I decided to trace it through anyway.
It goes into the DynArraySetLength procedure in the System unit. In that procedure, it goes to the heap and calls ReallocMem. To my surprise, it successfully allocates 665,124,864 bytes!!!
But then towards the end of DynArraySetLength, it calls FillChar:
// Set the new memory to all zero bits
FillChar((PAnsiChar(p) + elSize * oldLength)^, elSize * (newLength - oldLength), 0);
You can see by the comment what that is supposed to do. There is not much to that routine, but that is the routine that causes the EOutOfMemory exception. Here is FillChar from the System unit:
procedure _FillChar(var Dest; count: Integer; Value: Char);
{$IFDEF PUREPASCAL}
var
I: Integer;
P: PAnsiChar;
begin
P := PAnsiChar(@Dest);
for I := count-1 downto 0 do
P[I] := Value;
end;
{$ELSE}
asm // Size = 153 Bytes
CMP EDX, 32
MOV CH, CL // Copy Value into both Bytes of CX
JL @@Small
MOV [EAX ], CX // Fill First 8 Bytes
MOV [EAX+2], CX
MOV [EAX+4], CX
MOV [EAX+6], CX
SUB EDX, 16
FLD QWORD PTR [EAX]
FST QWORD PTR [EAX+EDX] // Fill Last 16 Bytes
FST QWORD PTR [EAX+EDX+8]
MOV ECX, EAX
AND ECX, 7 // 8-Byte Align Writes
SUB ECX, 8
SUB EAX, ECX
ADD EDX, ECX
ADD EAX, EDX
NEG EDX
@@Loop:
FST QWORD PTR [EAX+EDX] // Fill 16 Bytes per Loop
FST QWORD PTR [EAX+EDX+8]
ADD EDX, 16
JL @@Loop
FFREE ST(0)
FINCSTP
RET
NOP
NOP
NOP
@@Small:
TEST EDX, EDX
JLE @@Done
MOV [EAX+EDX-1], CL // Fill Last Byte
AND EDX, -2 // No. of Words to Fill
NEG EDX
LEA EDX, [@@SmallFill + 60 + EDX * 2]
JMP EDX
NOP // Align Jump Destinations
NOP
@@SmallFill:
MOV [EAX+28], CX
MOV [EAX+26], CX
MOV [EAX+24], CX
MOV [EAX+22], CX
MOV [EAX+20], CX
MOV [EAX+18], CX
MOV [EAX+16], CX
MOV [EAX+14], CX
MOV [EAX+12], CX
MOV [EAX+10], CX
MOV [EAX+ 8], CX
MOV [EAX+ 6], CX
MOV [EAX+ 4], CX
MOV [EAX+ 2], CX
MOV [EAX ], CX
RET // DO NOT REMOVE - This is for Alignment
@@Done:
end;
{$ENDIF}
So my memory was allocated, but it crashed trying to fill it with zeros. This doesn't make sense to me. As far as I'm concerned, the memory doesn't even need to be filled with zeros - and that is probably a time waster anyhow - since the Encoding statement is about to fill it anyway.
Can I somehow prevent Delphi from doing the memory fill?
Or is there some other way I can get Delphi to allocate this memory successfully for me?
My real goal is to do that Encoding statement for my very large file, so any solution that will allow this would be much appreciated.
Conclusion: See my comments on the answers.
This is a warning to be careful in debugging assembler code. Make sure you break on all the "RET" lines, since I missed the one in the middle of the FillChar routine and erroneously concluded that FillChar caused the problem. Thanks Mason, for pointing this out.
I will have to break the input into Chunks to handle the very large file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
FillChar 没有分配任何内存,所以这不是你的问题。尝试跟踪它并在 RET 语句处放置断点,您将看到 FillChar 完成。无论问题是什么,都可能在稍后的步骤中出现。
FillChar isn't allocating any memory, so that's not your problem. Try tracing into it and placing breakpoints at the RET statements, and you'll see that the FillChar finishes. Whatever the problem is, it's probably in a later step.
从文件中读取一个块,编码并写入另一个文件,如此重复。
Read a chunk from the file, encode and write to another file, repeat.
一个大胆的猜测:问题是否可能是内存被过度使用,并且当 FillChar 实际访问内存时,它找不到实际提供给您的页面?我不知道 Windows 是否会过度使用内存,但我确实知道某些操作系统会过度使用内存——直到您真正尝试使用内存时您才会发现这一点。
如果是这种情况,可能会导致 FillChar 发生爆炸。
A wild guess: Could the problem be memory being overcommitted and when the FillChar actually accesses the memory it can't find a page to actually give you? I don't know if Windows will even overcommit memory, I do know that some OSes do--you don't find out about it until you actually try to make use of the memory.
If this is the case it could cause the blowup in FillChar.
程序非常擅长循环。他们不知疲倦地循环,毫无怨言。
分配大量内存需要时间。将会有很多对堆管理器的调用。您的操作系统甚至不会提前知道它是否具有您需要的连续内存量。你的操作系统说,是的,我有 1 GB 可用空间。但是,一旦您开始使用它,您的操作系统就会说,等等,您想要将所有内容合并为一个块吗?让我确保我把足够多的东西集中在一处。如果没有,您会收到错误。
如果它确实有内存,那么堆管理器仍然需要做很多工作来准备内存并将其标记为已使用。
因此,显然,分配更少的内存并简单地循环它是有意义的。这可以使计算机免于执行大量只需在完成后撤消的工作。为什么不让它做一点工作来搁置你的记忆,然后继续重新使用它呢?
堆栈内存的分配速度比堆内存快得多。如果您保持较小的内存使用量(默认情况下低于 1 MB),编译器可能只使用堆栈内存而不是堆内存,这将使您的循环更快。此外,在寄存器中分配的局部变量速度非常快。
硬盘集群和缓存大小、CPU 缓存大小等因素可以提供有关最佳块大小的提示。关键是要找到一个好号码。我喜欢使用 64 KB 的块。
Programs are great at looping. They loop tirelessly without complaining.
Allocating a huge amount of memory takes time. There will be many calls to the heap manager. Your OS won't even know if it has the amount of contiguous memory that you need ahead of time. Your OS says, yeah, I have 1 GB free. But as soon as you go to use it, your OS says, wait, you want all of it in one chunk? Let me make sure I have enough all in one place. If it doesn't you get the error.
If it does have the memory, well, there's still a lot of work for the heap manager in preparing the memory and marking it as used.
So, obviously, it makes some sense to allocate less memory and simply loop through it. This saves the computer from doing a lot of work that it will only have to undo when it's done. Why not have it do just a little bit of work in setting aside your memory, then just keep re-using it?
Stack memory is allocated much faster than heap memory. If you keep your memory usage small (under 1 MB, by default), the compiler may just use stack memory over heap memory, which will make your loops even faster. In addition, local variables that get allocated in the register are very fast.
There are factors such as hard drive cluster and cache sizes, CPU cache sizes, and things, that offer hints about the best chunk sizes. The key is to find a good number. I like to use 64 KB chunks.