当前位置：文江博客话题详情

Delphi XE中占用大量内存的TStringList对象

发布于 2024-12-01 13:17:58 字数 513 浏览 1 评论 0原文

我正在开发一个模拟程序。

该程序首先要做的事情之一是读取一个巨大的文件（28 MB，大约 79'000 行），解析每一行（大约 150 个字段），为该对象创建一个类，并将其添加到 TStringList 中。

它还读取另一个文件，该文件在运行期间添加更多对象。最后，大约有 85'000 个对象。

我正在使用Delphi 2007，该程序使用了大量内存，但运行正常。我升级到 Delphi XE，并将程序迁移过来，现在它使用了更多内存，最终在运行中途内存不足。

因此，在 Delphi 2007 中，读取初始文件后最终会使用 1.4 gig，这显然是一个巨大的数量，但在 XE 中，它最终会使用几乎 1.8 gig，这确实很大，导致耗尽并获取错误

所以我的问题是

为什么它使用这么多内存？
为什么 XE 使用的内存比 2007 多得多？
对此我能做什么？我无法更改文件的大小或长度，并且我确实需要为每一行创建一个对象并将其存储在某处

谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

请止步禁区 2024-12-08 13:17:58

只有一个可以节省内存的想法。

您可以让数据保留在原始文件上，然后只需从内存结构中指向它们。

例如，这就是我们几乎立即浏览大日志文件< /a>：我们对日志文件内容进行内存映射，然后快速解析它以在内存中创建有用信息的索引，然后动态读取内容。读取期间不会创建任何字符串。仅指向每行开头的指针，以及包含所需索引的动态数组。调用 TStringList.LoadFromFile 肯定会慢得多并且消耗内存。

代码位于此处 - 请参阅TSynLogFile 类。诀窍是只读取文件一次，并动态创建所有索引。

例如，以下是我们如何从 UTF-8 文件内容中检索一行文本：

function TMemoryMapText.GetString(aIndex: integer): string;
begin
  if (self=nil) or (cardinal(aIndex)>=cardinal(fCount)) then
    result := '' else
    result := UTF8DecodeToString(fLines[aIndex],GetLineSize(fLines[aIndex],fMapEnd));
end;

我们使用完全相同的技巧解析 JSON 内容。最快的 XML 访问库都使用这种混合方法。

要处理高级数据并快速查询它们，您可以尝试使用动态记录数组以及我们优化的 TDynArray 和 TDynArrayHashed 包装器（在同一单元中）。记录数组消耗的内存更少，搜索速度更快，因为数据不会碎片化（如果使用有序索引或散列，搜索速度甚至更快），并且您将能够对内容进行高级访问（例如，您可以定义自定义函数来从内存映射文件中检索数据）。动态数组不适合快速删除项目（或者您必须使用查找表） - 但您写道您不会删除太多数据，因此在您的情况下这不会成为问题。

所以你不会再有任何重复的结构，只有 RAM 中的逻辑和内存映射文件上的数据 - 我在这里添加了一个“s”，因为相同的逻辑可以完美地映射到多个源数据文件（你需要一些“合并”和“实时刷新”AFAIK）。

Just one idea which may save memory.

You could let the data stay on the original files, then just point to them from in-memory structures.

For instance, it's what we do for browsing big log files almost instantly: we memory-map the log file content, then we parse it quick to create indexes of useful information in memory, then we read the content dynamically. No string is created during the reading. Only pointers to each line beginning, with dynamic arrays containing the needed indexes. Calling TStringList.LoadFromFile would be definitively much slower and memory consuming.

The code is here - see the TSynLogFile class. The trick is to read the file only once, and make all indexes on the fly.

For instance, here is how we retrieve a line of text from the UTF-8 file content:

function TMemoryMapText.GetString(aIndex: integer): string;
begin
  if (self=nil) or (cardinal(aIndex)>=cardinal(fCount)) then
    result := '' else
    result := UTF8DecodeToString(fLines[aIndex],GetLineSize(fLines[aIndex],fMapEnd));
end;

We use the exact same trick to parse JSON content. Using such a mixed approach is used by the fastest XML access libraries.

To handle your high-level data, and query them fast, you may try to use dynamic arrays of records, and our optimized TDynArray and TDynArrayHashed wrappers (in the same unit). Arrays of records will be less memory consuming, will be faster to search in because the data won't be fragemented (even faster if you use ordered indexes or hashes), and you'll be able to have high-level access to the content (you can define custom functions to retrieve the data from the memory mapped file, for instance). Dynamic arrays won't fit fast deletion of items (or you'll have to use lookup tables) - but you wrote you are not deleting much data, so it won't be a problem in your case.

So you won't have any duplicated structure any more, only logic in RAM, and data on memory-mapped file(s) - I added a "s" here because the same logic could perfectly map to several source data files (you need some "merge" and "live refresh" AFAIK).

回复收藏 0 原文

两相知 2024-12-08 13:17:58

当您在没有看到代码和类声明的情况下将 28 MB 文件解析为对象时，很难说为什么它会扩展到 1.4 GB 的对象。另外，您说您将其存储在 TStringList 中，而不是 TList 或 TObjecList 中。这听起来像是您将其用作某种字符串->对象键/值映射。如果是这样，您可能需要查看 XE 中 Generics.Collections 单元中的 TDictionary 类。

至于为什么在XE中使用更多内存，是因为在Delphi 2009中string类型从ANSI字符串更改为UTF-16字符串。如果不需要Unicode，可以使用TDictionary 以节省空间。

此外，为了节省更多内存，如果您不需要立即需要所有 79,000 个对象，还可以使用另一个技巧：延迟加载。这个想法是这样的：

将文件读入 TStringList。（这将使用与文件大小一样多的内存。如果转换为 Unicode 字符串，可能会使用两倍的内存。）不要创建任何数据对象。
当您需要特定的数据对象时，调用一个例程来检查字符串列表并查找该对象的字符串键。
检查该字符串是否有与其关联的对象。如果没有，则从字符串创建对象并将其与 TStringList 中的字符串关联。
返回与字符串关联的对象。

这将减少内存使用量和加载时间，但只有在加载后不需要立即需要所有（或大部分）对象时，它才有用。

回复收藏 0 原文

萌无敌 2024-12-08 13:17:58

在Delphi 2007（及更早版本）中，字符串是Ansi字符串，即每个字符占用1个字节的内存。
在Delphi 2009（及更高版本）中，字符串是Unicode字符串，即每个字符占用2个字节的内存。

AFAIK，没有办法让 Delphi 2009+ TStringList 对象使用 Ansi 字符串。您真的使用了 TStringList 的任何功能吗？如果没有，您可以使用字符串数组代替。

然后，自然地，您可以选择

type
  TAnsiStringArray = array of AnsiString;
  // or
  TUnicodeStringArray = array of string; // In Delphi 2009+, 
                                         // string = UnicodeString

In Delphi 2007 (and earlier), a string is an Ansi string, that is, every character occupies 1 byte of memory.
In Delphi 2009 (and later), a string is a Unicode string, that is, every character occupies 2 bytes of memory.

AFAIK, there is no way to make a Delphi 2009+ TStringList object use Ansi strings. Are you really using any of the features of the TStringList? If not, you could use an array of strings instead.

Then, naturally, you can choose between

type
  TAnsiStringArray = array of AnsiString;
  // or
  TUnicodeStringArray = array of string; // In Delphi 2009+, 
                                         // string = UnicodeString

回复收藏 0 原文

梅窗月明清似水 2024-12-08 13:17:58

阅读评论，听起来您需要将数据从 Delphi 中取出并放入数据库中。

从这里可以很容易地将器官捐献者与接受者进行匹配*)

SELECT pw.* FROM patients_waiting pw
INNER JOIN organs_available oa ON (pw.bloodtype = oa.bloodtype) 
                              AND (pw.tissuetype = oa.tissuetype)
                              AND (pw.organ_needed = oa.organ_offered)
WHERE oa.id = '15484'

如果您想查看可能与新器官捐献者 15484 匹配的患者。

在记忆中，您只处理少数匹配的患者。

*）简化得面目全非，但仍然如此。

Reading though the comments, it sounds like you need to lift the data out of Delphi and into a database.

From there it is easy to match organ donors to receivers*)

SELECT pw.* FROM patients_waiting pw
INNER JOIN organs_available oa ON (pw.bloodtype = oa.bloodtype) 
                              AND (pw.tissuetype = oa.tissuetype)
                              AND (pw.organ_needed = oa.organ_offered)
WHERE oa.id = '15484'

If you want to see the patients that might match against new organ-donor 15484.

In memory you only handle the few patients that match.

*) simplified beyond all recognition, but still.

回复收藏 0 原文

↘人皮目录ツ 2024-12-08 13:17:58

除了Andreas的帖子：

在Delphi 2009之前，字符串头占用8个字节。从 Delphi 2009 开始，字符串标头占用 12 个字节。因此，每个唯一的字符串都比以前多使用 4 个字节，而且每个字符占用的内存是原来的两倍。

另外，我相信从 Delphi 2010 开始，TObject 开始使用 8 个字节而不是 4 个字节。因此，对于 delphi 创建的每个单个对象，delphi 现在多使用 4 个字节。我相信这 4 个字节是为了支持 TMonitor 类而添加的。

如果您迫切需要节省内存，如果您有很多重复的字符串值，这里有一个小技巧可以帮助您。

var
  uUniqueStrings : TStringList;

function ReduceStringMemory(const S : String) : string;
var idx : Integer;
begin
  if not uUniqueStrings.Find(S, idx) then
    idx := uUniqueStrings.Add(S);

  Result := uUniqueStrings[idx]
end;

请注意，只有当您有很多重复的字符串值时，这才会有帮助。例如，这段代码在我的系统上少使用了 150mb。

var sl : TStringList;
  I: Integer;
begin
  sl := TStringList.Create;
  try
    for I := 0 to 5000000 do
      sl.Add(ReduceStringMemory(StringOfChar('A',5)));every
  finally
    sl.Free;
  end;
end;

In addition to Andreas' post:

Before Delphi 2009, a string header occupied 8 bytes. Starting with Delphi 2009, a string header takes 12 bytes. So every unique string uses 4 bytes more than before, + the fact that each character takes twice the memory.

Also, starting with Delphi 2010 I believe, TObject started using 8 bytes instead of 4. So for each single object created by delphi, delphi now uses 4 more bytes. Those 4 bytes were added to support the TMonitor class I believe.

If you're in desperate need to save memory, here's a little trick that could help if you have a lot of string value that repeats themselve.

var
  uUniqueStrings : TStringList;

function ReduceStringMemory(const S : String) : string;
var idx : Integer;
begin
  if not uUniqueStrings.Find(S, idx) then
    idx := uUniqueStrings.Add(S);

  Result := uUniqueStrings[idx]
end;

Note that this will help ONLY if you have a lot of string values that repeat themselves. For exemple, this code use 150mb less on my system.

var sl : TStringList;
  I: Integer;
begin
  sl := TStringList.Create;
  try
    for I := 0 to 5000000 do
      sl.Add(ReduceStringMemory(StringOfChar('A',5)));every
  finally
    sl.Free;
  end;
end;

回复收藏 0 原文