Delphi XE中占用大量内存的TStringList对象

发布于 2024-12-01 13:17:58 字数 513 浏览 1 评论 0原文

我正在开发一个模拟程序。

该程序首先要做的事情之一是读取一个巨大的文件(28 MB,大约 79'000 行),解析每一行(大约 150 个字段),为该对象创建一个类,并将其添加到 TStringList 中。

它还读取另一个文件,该文件在运行期间添加更多对象。最后,大约有 85'000 个对象。

我正在使用Delphi 2007,该程序使用了大量内存,但运行正常。我升级到 Delphi XE,并将程序迁移过来,现在它使用了更多内存,最终在运行中途内存不足。

因此,在 Delphi 2007 中,读取初始文件后最终会使用 1.4 gig,这显然是一个巨大的数量,但在 XE 中,它最终会使用几乎 1.8 gig,这确实很大,导致耗尽并获取错误

所以我的问题是

  1. 为什么它使用这么多内存?
  2. 为什么 XE 使用的内存比 2007 多得多?
  3. 对此我能做什么?我无法更改文件的大小或长度,并且我确实需要为每一行创建一个对象并将其存储在某处

谢谢

I'm working on a simulation program.

One of the first things the program does is read in a huge file (28 mb, about 79'000 lines,), parse each line (about 150 fields), create a class for the object, and add it to a TStringList.

It also reads in another file, which adds more objects during the run. At the end, it ends up being about 85'000 objects.

I was working with Delphi 2007, and the program used a lot of memory, but it ran OK. I upgraded to Delphi XE, and migrated the program over and now it's using a LOT more memory, and it ends up running out of memory half way through the run.

So in Delphi 2007, it would end up using 1.4 gigs after reading in the initial file, which is obviously a huge amount, but in XE, it ends up using almost 1.8 gigs, which is really huge and leads to running out and getting the error

So my question is

  1. Why is it using so much memory?
  2. Why is it using so much more memory in XE than 2007?
  3. What can I do about this? I can't change how big or long the file is, and I do need to create an object for each line and to store it somewhere

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

请止步禁区 2024-12-08 13:17:58

只有一个可以节省内存的想法。

您可以让数据保留在原始文件上,然后只需从内存结构中指向它们。

例如,这就是我们几乎立即浏览大日志文件< /a>:我们对日志文件内容进行内存映射,然后快速解析它以在内存中创建有用信息的索引,然后动态读取内容。读取期间不会创建任何字符串。仅指向每行开头的指针,以及包含所需索引的动态数组。调用 TStringList.LoadFromFile 肯定会慢得多并且消耗内存。

代码位于此处 - 请参阅TSynLogFile 类。诀窍是只读取文件一次,并动态创建所有索引。

例如,以下是我们如何从 UTF-8 文件内容中检索一行文本:

function TMemoryMapText.GetString(aIndex: integer): string;
begin
  if (self=nil) or (cardinal(aIndex)>=cardinal(fCount)) then
    result := '' else
    result := UTF8DecodeToString(fLines[aIndex],GetLineSize(fLines[aIndex],fMapEnd));
end;

我们使用完全相同的技巧 解析 JSON 内容最快的 XML 访问库都使用这种混合方法。

要处理高级数据并快速查询它们,您可以尝试使用动态记录数组以及我们优化的 TDynArrayTDynArrayHashed 包装器(在同一单元中) 。记录数组消耗的内存更少,搜索速度更快,因为数据不会碎片化(如果使用有序索引或散列,搜索速度甚至更快),并且您将能够对内容进行高级访问(例如,您可以定义自定义函数来从内存映射文件中检索数据)。动态数组不适合快速删除项目(或者您必须使用查找表) - 但您写道您不会删除太多数据,因此在您的情况下这不会成为问题。

所以你不会再有任何重复的结构,只有 RAM 中的逻辑和内存映射文件上的数据 - 我在这里添加了一个“s”,因为相同的逻辑可以完美地映射到多个源数据文件(你需要一些“合并”和“实时刷新”AFAIK)。

Just one idea which may save memory.

You could let the data stay on the original files, then just point to them from in-memory structures.

For instance, it's what we do for browsing big log files almost instantly: we memory-map the log file content, then we parse it quick to create indexes of useful information in memory, then we read the content dynamically. No string is created during the reading. Only pointers to each line beginning, with dynamic arrays containing the needed indexes. Calling TStringList.LoadFromFile would be definitively much slower and memory consuming.

The code is here - see the TSynLogFile class. The trick is to read the file only once, and make all indexes on the fly.

For instance, here is how we retrieve a line of text from the UTF-8 file content:

function TMemoryMapText.GetString(aIndex: integer): string;
begin
  if (self=nil) or (cardinal(aIndex)>=cardinal(fCount)) then
    result := '' else
    result := UTF8DecodeToString(fLines[aIndex],GetLineSize(fLines[aIndex],fMapEnd));
end;

We use the exact same trick to parse JSON content. Using such a mixed approach is used by the fastest XML access libraries.

To handle your high-level data, and query them fast, you may try to use dynamic arrays of records, and our optimized TDynArray and TDynArrayHashed wrappers (in the same unit). Arrays of records will be less memory consuming, will be faster to search in because the data won't be fragemented (even faster if you use ordered indexes or hashes), and you'll be able to have high-level access to the content (you can define custom functions to retrieve the data from the memory mapped file, for instance). Dynamic arrays won't fit fast deletion of items (or you'll have to use lookup tables) - but you wrote you are not deleting much data, so it won't be a problem in your case.

So you won't have any duplicated structure any more, only logic in RAM, and data on memory-mapped file(s) - I added a "s" here because the same logic could perfectly map to several source data files (you need some "merge" and "live refresh" AFAIK).

两相知 2024-12-08 13:17:58

当您在没有看到代码和类声明的情况下将 28 MB 文件解析为对象时,很难说为什么它会扩展到 1.4 GB 的对象。另外,您说您将其存储在 TStringList 中,而不是 TListTObjecList 中。这听起来像是您将其用作某种字符串->对象键/值映射。如果是这样,您可能需要查看 XE 中 Generics.Collections 单元中的 TDictionary 类。

至于为什么在XE中使用更多内存,是因为在Delphi 2009中string类型从ANSI字符串更改为UTF-16字符串。如果不需要Unicode,可以使用TDictionary 以节省空间。

此外,为了节省更多内存,如果您不需要立即需要所有 79,000 个对象,还可以使用另一个技巧:延迟加载。这个想法是这样的:

  • 将文件读入 TStringList。 (这将使用与文件大小一样多的内存。如果转换为 Unicode 字符串,可能会使用两倍的内存。)不要创建任何数据对象。
  • 当您需要特定的数据对象时,调用一个例程来检查字符串列表并查找该对象的字符串键。
  • 检查该字符串是否有与其关联的对象。如果没有,则从字符串创建对象并将其与 TStringList 中的字符串关联。
  • 返回与字符串关联的对象。

这将减少内存使用量和加载时间,但只有在加载后不需要立即需要所有(或大部分)对象时,它才有用。

It's hard to say why your 28 MB file is expanding to 1.4 GB worth of objects when you parse it out into objects without seeing the code and the class declarations. Also, you say you're storing it in a TStringList instead of a TList or TObjecList. This sounds like you're using it as some sort of string->object key/value mapping. If so, you might want to look at the TDictionary class in the Generics.Collections unit in XE.

As for why you're using more memory in XE, it's because the string type changed from an ANSI string to a UTF-16 string in Delphi 2009. If you don't need Unicode, you could use a TDictionary to save space.

Also, to save even more memory, there's another trick you could use if you don't need all 79,000 of the objects right away: lazy loading. The idea goes something like this:

  • Read the file into a TStringList. (This will use about as much memory as the file size. Maybe twice as much if it gets converted into Unicode strings.) Don't create any data objects.
  • When you need a specific data object, call a routine that checks the string list and looks up the string key for that object.
  • Check if that string has an object associated with it. If not, create the object from the string and associate it with the string in the TStringList.
  • Return the object associated with the string.

This will keep both your memory usage and your load time down, but it's only helpful if you don't need all (or a large percentage) of the objects immediately after loading.

萌无敌 2024-12-08 13:17:58
  • 在Delphi 2007(及更早版本)中,字符串是Ansi字符串,即每个字符占用1个字节的内存。

  • 在Delphi 2009(及更高版本)中,字符串是Unicode字符串,即每个字符占用2个字节的内存。

AFAIK,没有办法让 Delphi 2009+ TStringList 对象使用 Ansi 字符串。您真的使用了 TStringList 的任何功能吗?如果没有,您可以使用字符串数组代替。

然后,自然地,您可以选择

type
  TAnsiStringArray = array of AnsiString;
  // or
  TUnicodeStringArray = array of string; // In Delphi 2009+, 
                                         // string = UnicodeString
  • In Delphi 2007 (and earlier), a string is an Ansi string, that is, every character occupies 1 byte of memory.

  • In Delphi 2009 (and later), a string is a Unicode string, that is, every character occupies 2 bytes of memory.

AFAIK, there is no way to make a Delphi 2009+ TStringList object use Ansi strings. Are you really using any of the features of the TStringList? If not, you could use an array of strings instead.

Then, naturally, you can choose between

type
  TAnsiStringArray = array of AnsiString;
  // or
  TUnicodeStringArray = array of string; // In Delphi 2009+, 
                                         // string = UnicodeString
梅窗月明清似水 2024-12-08 13:17:58

阅读评论,听起来您需要将数据从 Delphi 中取出并放入数据库中。

从这里可以很容易地将器官捐献者与接受者进行匹配*)

SELECT pw.* FROM patients_waiting pw
INNER JOIN organs_available oa ON (pw.bloodtype = oa.bloodtype) 
                              AND (pw.tissuetype = oa.tissuetype)
                              AND (pw.organ_needed = oa.organ_offered)
WHERE oa.id = '15484'

如果您想查看可能与新器官捐献者 15484 匹配的患者。

在记忆中,您只处理少数匹配的患者。

*)简化得面目全非,但仍然如此。

Reading though the comments, it sounds like you need to lift the data out of Delphi and into a database.

From there it is easy to match organ donors to receivers*)

SELECT pw.* FROM patients_waiting pw
INNER JOIN organs_available oa ON (pw.bloodtype = oa.bloodtype) 
                              AND (pw.tissuetype = oa.tissuetype)
                              AND (pw.organ_needed = oa.organ_offered)
WHERE oa.id = '15484'

If you want to see the patients that might match against new organ-donor 15484.

In memory you only handle the few patients that match.

*) simplified beyond all recognition, but still.

↘人皮目录ツ 2024-12-08 13:17:58

除了Andreas的帖子:

在Delphi 2009之前,字符串头占用8个字节。从 Delphi 2009 开始,字符串标头占用 12 个字节。因此,每个唯一的字符串都比以前多使用 4 个字节,而且每个字符占用的内存是原来的两倍。

另外,我相信从 Delphi 2010 开始,TObject 开始使用 8 个字节而不是 4 个字节。因此,对于 delphi 创建的每个单个对象,delphi 现在多使用 4 个字节。我相信这 4 个字节是为了支持 TMonitor 类而添加的。

如果您迫切需要节省内存,如果您有很多重复的字符串值,这里有一个小技巧可以帮助您。

var
  uUniqueStrings : TStringList;

function ReduceStringMemory(const S : String) : string;
var idx : Integer;
begin
  if not uUniqueStrings.Find(S, idx) then
    idx := uUniqueStrings.Add(S);

  Result := uUniqueStrings[idx]
end;

请注意,只有当您有很多重复的字符串值时,这才会有帮助。例如,这段代码在我的系统上少使用了 150mb。

var sl : TStringList;
  I: Integer;
begin
  sl := TStringList.Create;
  try
    for I := 0 to 5000000 do
      sl.Add(ReduceStringMemory(StringOfChar('A',5)));every
  finally
    sl.Free;
  end;
end;

In addition to Andreas' post:

Before Delphi 2009, a string header occupied 8 bytes. Starting with Delphi 2009, a string header takes 12 bytes. So every unique string uses 4 bytes more than before, + the fact that each character takes twice the memory.

Also, starting with Delphi 2010 I believe, TObject started using 8 bytes instead of 4. So for each single object created by delphi, delphi now uses 4 more bytes. Those 4 bytes were added to support the TMonitor class I believe.

If you're in desperate need to save memory, here's a little trick that could help if you have a lot of string value that repeats themselve.

var
  uUniqueStrings : TStringList;

function ReduceStringMemory(const S : String) : string;
var idx : Integer;
begin
  if not uUniqueStrings.Find(S, idx) then
    idx := uUniqueStrings.Add(S);

  Result := uUniqueStrings[idx]
end;

Note that this will help ONLY if you have a lot of string values that repeat themselves. For exemple, this code use 150mb less on my system.

var sl : TStringList;
  I: Integer;
begin
  sl := TStringList.Create;
  try
    for I := 0 to 5000000 do
      sl.Add(ReduceStringMemory(StringOfChar('A',5)));every
  finally
    sl.Free;
  end;
end;
泪之魂 2024-12-08 13:17:58

我还在程序中读取了很多字符串,对于大文件来说,这些字符串可能接近几 GB。

不用等待 64 位 XE2,这里有一个可能对您有所帮助的想法:

我发现将单个字符串存储在字符串列表中既慢又浪费内存。我最终把琴弦堵在一起。我的输入文件有逻辑记录,可能包含 5 到 100 行。因此,我不是将每一行存储在字符串列表中,而是存储每条记录。处理记录来查找我需要的行只增加了很少的处理时间,所以这对我来说是可能的。

如果您没有逻辑记录,您可能只想选择一个分块大小,并将每(例如)10 或 100 个字符串存储在一起作为一个字符串(用分隔符分隔它们)。

另一种选择是将它们存储在快速高效的磁盘文件中。我推荐的是开源 Synopse Big Table 作者:Arnaud Bouchez

I also read in a lot of strings in my program that can approach a couple of GB for large files.

Short of waiting for 64-bit XE2, here is one idea that might help you:

I found storing individual strings in a stringlist to be slow and wasteful in terms of memory. I ended up blocking the strings together. My input file has logical records, which may contain between 5 and 100 lines. So instead of storing each line in the stringlist, I store each record. Processing a record to find the line I need adds very little time to my processing, so this is possible for me.

If you don't have logical records, you might just want to pick a blocking size, and store every (say) 10 or 100 strings together as one string (with a delimiter separating them).

The other alternative, is to store them in a fast and efficient on-disk file. The one I'd recommend is the open source Synopse Big Table by Arnaud Bouchez.

自由如风 2024-12-08 13:17:58

我建议您尝试使用 jedi 类库 (JCL) 类 TAnsiStringList,它类似于 Delphi 2007 中的 TStringList,因为它由 AnsiStrings 组成。

即便如此,正如其他人所提到的,XE 将使用比 delphi 2007 更多的内存。

我真的看不到将巨大平面文件的全文加载到字符串列表中的价值。其他人提出了一种 bigtable 方法,例如 Arnaud Bouchez 的方法,或者使用 SqLite 或类似的方法,我同意他们的观点。

我认为您还可以编写一个简单的类,将您拥有的整个文件加载到内存中,并提供一种将逐行对象链接添加到巨大的内存中 ansichar 缓冲区的方法。

May I suggest you try using the jedi class library (JCL) class TAnsiStringList, which is like TStringList fromDelphi 2007 in that it is made up of AnsiStrings.

Even then, as others have mentioned, XE will be using more memory than delphi 2007.

I really don't see the value of loading the full text of a giant flat file into a stringlist. Others have suggested a bigtable approach such as Arnaud Bouchez's one, or using SqLite, or something like that, and I agree with them.

I think you could also write a simple class that will load the entire file you have into memory, and provide a way to add line-by-line object links to a giant in-memory ansichar buffer.

流绪微梦 2024-12-08 13:17:58

从 Delphi 2009 开始,不仅是字符串,而且每个 TObject 的大小都增加了一倍。 (请参阅为什么在 Delphi 2009 中 TObject 的大小加倍?) 。但如果只有 85,000 个对象,这无法解释这种增加。仅当这些对象包含许多嵌套对象时,它们的大小才可能是内存使用的相关部分。

Starting with Delphi 2009, not only strings but also every TObject has doubled in size. (See Why Has the Size of TObject Doubled In Delphi 2009?). But this would not explain this increase if there are only 85,000 objects. Only if these objects contain many nested objects, their size could be a relevant part of the memory usage.

病女 2024-12-08 13:17:58

您的列表中是否有许多重复的字符串?也许尝试只存储唯一的字符串将有助于减少内存大小。查看我的问题
关于字符串池寻找可能的(但可能太简单)答案。

Are there many duplicate strings in your list? Maybe trying to only store unique strings will help reducing the memory size. See my Question
about a string pool for a possible (but maybe too simple) answer.

橙味迷妹 2024-12-08 13:17:58

你确定你没有出现记忆碎片的情况吗?

请务必使用最新的 FastMM(当前为 4.97),然后查看包含内存的UsageTrackerDemo演示以map形式显示Delphi内存的实际使用情况。

最后看一下VMMap,它显示了进程内存的使用情况。

Are you sure you don't suffer from a case of memory fragementation?

Be sure to use the latest FastMM (currently 4.97), then take a look at the UsageTrackerDemo demo that contains a memory map form showing the actual usage of the Delphi memory.

Finally take a look at VMMap that shows you how your process memory is used.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文