C# 垃圾收集

发布于 2024-11-05 12:50:09 字数 352 浏览 1 评论 0原文

我编写了一个商业应用程序,它通过目录结构有效地递归查找特定的 Excel 文件,并存储它们的地址。然后它循环遍历这些文件,并通过为每个文件创建一个 DocumentParser 对象来解析它们,这是一次完成一个文件,而不是异步的。该软件似乎非常稳定,以至于企业希望运行它来递归浏览包含超过 10000 个相关 Excel 文件的庞大目录。

我的问题是,当我每次创建一个新的 DocumentParser 对象时,GC 是否足够有效,当每个对象超出范围时(即当该 Excel 工作表已被解析时)丢弃它们,或者有什么方法可以监控这一点并在必要时手动执行 GC?我以前从未处理过如此大量的数据,通常一次最多只在 40-50 个 Excel 文件上进行测试。

谢谢。

I have a business app that I have written, that effectively recurses through a directory structure looking for specific Excel files, and stores their addresses. It then loops through these files and parses them by creating a DocumentParser object for each file, this is done one at a time, and not async. The software seems to be very stable, so much so that the business would like to run it to recurse through a massive directory containing upwards of 10000 relevant Excel files.

My question is, as I am creating a new DocumentParser object each time, will the GC be effective enough to discard each of the objects when they go out of scope, ie when that Excel sheet has been parsed, or is there a way I can monitor this and where necessary manually do a GC? I've never had to deal with such large amounts of data before, generally only testing it on a maximum of 40-50 Excel files at a time.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

绳情 2024-11-12 12:50:09

GC 是一个非常复杂的软件。而且 GC 至少是唯一知道何时需要进行垃圾回收的。所以我的建议是让 GC 独立存在。

另外:GC 将处理这些质量对象。也许您会发现性能下降。如果这是一个问题,您可以尝试优化您的代码。但还不算早。

The GC is a very complex piece of software. And the GC is at least the only one that knows when garbage collection is necessary. So my advice is to leave the GC on it's own.

Additionally: The GC will handle these masses objects. Perhaps you will recognize a decrease of performance. If this is a problem you can try to optimize your code. But not premature.

ゝ杯具 2024-11-12 12:50:09

我会让 GC 去管它的事。 10,000 个对象对于 GC 来说并不是什么工作。而且 GC 工作的成本可能会比 Excel 工作的成本低得多。因此,不值得为了 GC 的调整而使设计复杂化。如果您最终需要处理大量文件,以致您的应用程序无法及时完成,那么很可能是 Excel 处理速度阻碍了您。

然而,有一个可能相关的注释:如果 DocumentParser 在处理 Excel 文件时使用非托管内存,则可以使用 GC.Add/RemoveMemoryPressure 向 GC 指示打开文件时的实际增加成本。如果您没有自己编写 DocumentParser,那么作者可能已经在这样做了。

这里的问题是,您可能有一个花费大约 100 字节的托管对象,它在 Excel 工作时分配大量非托管内存。 GC 将无法知道这一点,因此这些方法有助于通知 GC 存在比其意识到的更多的内存压力。这可能会改变它决定收集的方式/时间的行为,这可能会导致应用程序保持较低的内存占用。如果应用程序的内存使用量随着时间的推移而激增,那么您可能会开始看到由于长度垃圾收集以及可能在计算机上进行分页(取决于您拥有多少内存)而导致的一些速度减慢。您需要密切关注其内存使用情况,以确保它在处理时不会泄漏内存 - 内存分析器可能会有所帮助。

I would leave the GC to its business. 10,000 objects is not really much work for the GC. And it's likely the cost of the GC work will be much lower than the cost of the Excel work. So it's not worth complicating your design to tweak things for the GC. If you end up with so many files to process that your application can't finish in time, it's most likely going to be the speed of the Excel processing holding you up.

However one note which may be relevant: if the DocumentParser is using unmanaged memory in its work with the Excel file, you can use GC.Add/RemoveMemoryPressure to indicate to the GC the real added cost when opening the file. If you didn't write the DocumentParser yourself, the author may already be doing this.

The issue here is that you may have a managed object that costs something in the order of 100 bytes, which allocates a large amount of unmanaged memory when it does Excel work. The GC will have no way of knowing this, so these methods help notify the GC that there is more memory pressure than it was aware of. This may change its behaviour in how/when it decides to collect, which may lead to the application maintaining a lower memory footprint. If the application's memory usage balloons out over time, then you may start seeing some slow downs from length garbage collection and possibly paging on the machine (depending on how much memory you have). You'll want to keep an eye on its memory usage to make sure it's not leaking memory as it processes - a memory profiler may be helpful there.

弥繁 2024-11-12 12:50:09

您不需要手动调用 GC,除非您拥有一些非常大的资源,但您的情况并非如此。 GC 会在每次调用时进行自我调整,如果您手动调用它,您只会破坏其内部分析数据。

BTW GC 不仅可以在超出范围时收集内容,还可以在上次使用后收集内容(即,当它仍在范围内但变量不再使用时)。

You don't need to manually call the GC unless you are holding some very large resource which is not the case in your situation. The GC will tweak itself with every call and if you call it manually you will just disrupt its internal profiling data.

BTW GC can collect stuff not only when it goes out of scope but also after its last usage (i.e. while it is still in scope but the variable is not used anymore).

眼藏柔 2024-11-12 12:50:09

是和否 - GC 足够有效,可以在需要时释放,但您通常无法确定何时释放。

有一种方法可以强制 GC 收集,但在生产代码中通常被认为是不好的做法,因为在不需要时强制堆栈遍历的效果比使用一点额外的内存更糟糕,直到 GC 决定需要释放资源来分配更多的对象。

Yes and no - The GC is effective enough to release when it needs to, but you can't generally be sure when that is.

There is a way to force a GC collection but it's generally considered to be bad practise in production code because of the effects of forcing a stack walk when it's not required is worse then using a bit of extra memory until the GC decides it needs to free resources to allocate more objects.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文