linq 查询中的垃圾收集

发布于 2024-09-08 01:50:36 字数 529 浏览 1 评论 0原文

我有一个关于如何在 linq 查询中处理垃圾收集的问题。 假设我收到了一份要处理的请求列表。每个请求都会生成非常大的数据集,但随后会应用过滤器来仅保留每个请求负载中的关键数据。

//Input data
List<request> requests;
IEnumerable<filteredData> results = requests.Select(request => Process(request)).Select(data => Filter(data));

所以我知道每个数据项的查询都会延迟,直到请求每个过滤的数据项为止,所以这很好。但是,中间的内存密集部分是否会一直持续到可枚举完成为止?

我希望发生的是,每个数据元素在通过过滤阶段后就可以被垃圾收集,从而确保我有足够的内存来处理整个列表。是这种情况吗,还是中间可枚举保留所有内容直到整个查询结束?如果是这样,有没有 linq 方法来处理这个问题?


注意: Process() 函数生成内存密集型数据......这就是我担心的

I have a question about how garbage collection might be handled in a linq query.
Suppose I am given a list of requests to process. Each request generates a very large set of data, but then a filter is applied to only keep critical data from each requested load.

//Input data
List<request> requests;
IEnumerable<filteredData> results = requests.Select(request => Process(request)).Select(data => Filter(data));

So I know that the query is deferred for each data item until each filtered data item is requested, so thats good. But does that middle memory-intense part persist until the enumerable is completed?

What I am hoping happens is that each data element can be garbage collected as soon as it passes the filtered stage, thus making sure I have enough memory to process the whole list. Is this the case, or does the middle enumerable keep everything around until the entire query ends? If so, is there a linq way to deal with this?


note: the Process() function generates the memory intensive data... thats what I'm worried about

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

鹿港巷口少年归 2024-09-15 01:50:36

只要 Process(...)Filter(...) 的返回值不包含任何对内部使用的“大数据项”的引用,那么在处理每个元素后,该过程中使用的内存应成为无根内存并成为 GC 的候选内存。

这并不意味着它将被收集,只是意味着它将成为候选者。如果内存压力变大,GC 很可能会收集它。

As long as the return value of Process(...) and Filter(...) do not hold any references to the "large data items" used internally, then the memory used in that process should become unrooted and a candidate for GC after each element is processed.

This doesn't mean it will get collected, only that it will be a candidate. If memory pressure gets high, the GC will most likely collect it.

摇划花蜜的午后 2024-09-15 01:50:36

.NET 中的垃圾收集器非常积极,可以清理中间对象
当它们不再被引用时,即使在循环内部也是如此。事实上,在某些情况下,如果它发现永远不会再次访问仍然被引用的对象,它会清理该对象。

运行此代码表明对象被清理得非常快,并且不会挂起直到查询完成(它永远不会完成):

public class MyClass1 { ~MyClass1() { Console.WriteLine("Cleaned up MyClass1"); } }
public class MyClass2 { ~MyClass2() { Console.WriteLine("Cleaned up MyClass2"); } }

public class Program
{
    static IEnumerable<MyClass1> lotsOfObjects()
    {
        while (true)
            yield return new MyClass1();
    }

    static void Main()
    {
        var query = lotsOfObjects().Select(x => foo(x));
        foreach (MyClass2 x in query)
            query.ToString();
    }

    static MyClass2 foo(MyClass1 x)
    {
        return new MyClass2();
    }
}

结果:

Cleaned up MyClass1
Cleaned up MyClass1
Cleaned up MyClass1
Cleaned up MyClass2
Cleaned up MyClass2
Cleaned up MyClass1
Cleaned up MyClass2
etc...

The garbage collector is quite aggressive in .NET and can clean up intermediate objects
when they are no longer referenced, even inside loops. In fact in some cases it will clean up an object that still is referenced if it can see that it will never be accessed again.

Running this code shows that objects are cleaned up quite quickly and do not hang about until the query completes (which it never does):

public class MyClass1 { ~MyClass1() { Console.WriteLine("Cleaned up MyClass1"); } }
public class MyClass2 { ~MyClass2() { Console.WriteLine("Cleaned up MyClass2"); } }

public class Program
{
    static IEnumerable<MyClass1> lotsOfObjects()
    {
        while (true)
            yield return new MyClass1();
    }

    static void Main()
    {
        var query = lotsOfObjects().Select(x => foo(x));
        foreach (MyClass2 x in query)
            query.ToString();
    }

    static MyClass2 foo(MyClass1 x)
    {
        return new MyClass2();
    }
}

Result:

Cleaned up MyClass1
Cleaned up MyClass1
Cleaned up MyClass1
Cleaned up MyClass2
Cleaned up MyClass2
Cleaned up MyClass1
Cleaned up MyClass2
etc...
时光是把杀猪刀 2024-09-15 01:50:36

很难回答您的问题,因为您发布的内容实际上不会编译(Select 生成 IEnumerable,但您将其分配给 < code>List假设 Filter(data) 函数返回 filteredData,您必须调用 ToList()。 code> 在查询上将其存储在结果中)。

我认为 requests 已经填充了数据。该列表将遵循正常的垃圾收集规则。我假设您担心的是 Process 函数的结果。我不能具体地说会发生什么,因为我也不知道你的 Filter 函数是做什么的。除非 Filter 函数的结果保留对其参数的引用(换句话说,Process 函数的结果),否则由 创建的对象查询完成后,Process 将完全超出范围,并将遵循正常的垃圾收集规则。

请记住,这些规则管辖收集的资格。不保证在应用程序的生命周期内收集任何对象。然而,结果将是合格的,因此 GC 将能够收集它们。

It's difficult to answer your question, as what you've posted won't actually compile (Select produces an IEnumerable<T>, but you're assigning it to a List<T>. Assuming the Filter(data) function returns a filteredData, you'd have to call ToList() on the query to store it in results).

requests is, I assume, already populated with data. This list will follow normal garbage collection rules. I'm assuming what you're worried about is the result of the Process function. I can't say specifically what will happen, because I also have no idea what your Filter function does. Unless the result of the Filter function holds on to a reference to its parameter (in other words, the result of the Process function), then the objects created by Process will be fully out of scope upon the completion of the query and will follow normal garbage collection rules.

Bear in mind that these rules govern eligibility for collection. No objects are ever guaranteed to be collected during the lifetime of your application. The results, however, will be eligible, so the GC will be able to collect them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文