何时不使用yield(返回)

发布于 2024-09-28 07:36:00 字数 944 浏览 11 评论 0原文

这个问题已经有答案了:
是否有过返回 IEnumerable 时不使用“yield return”的原因是什么?

这里有几个关于 yield return 好处的有用问题。例如,

我正在寻找关于何时使用yield return的想法。例如,如果我期望需要返回集合中的所有项目,那么 yield 似乎没有用,对吧?

在什么情况下使用yield会受到限制、不必要、给我带来麻烦,或者应该避免?

This question already has an answer here:
Is there ever a reason to not use 'yield return' when returning an IEnumerable?

There are several useful questions here on SO about the benefits of yield return. For example,

I'm looking for thoughts on when NOT to use yield return. For example, if I expect to need to return all items in a collection, it doesn't seem like yield would be useful, right?

What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

顾冷 2024-10-05 07:36:00

在什么情况下,yield 的使用会受到限制、不必要、给我带来麻烦,或者应该避免?

在处理递归定义的结构时,仔细考虑“yield return”的使用是个好主意。例如,我经常看到这样的情况:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
    if (root == null) yield break;
    yield return root.Value;
    foreach(T item in PreorderTraversal(root.Left))
        yield return item;
    foreach(T item in PreorderTraversal(root.Right))
        yield return item;
}

看起来非常合理的代码,但它存在性能问题。假设树有 h 深。那么最多会构建 O(h) 个嵌套迭代器。在外部迭代器上调用“MoveNext”将对 MoveNext 进行 O(h) 嵌套调用。由于它对包含 n 个项目的树执行了 O(n) 次,因此算法的复杂度为 O(hn)。由于二叉树的高度为 lg n <= h <= n,这意味着该算法在时间上最好为 O(n lg n),最坏为 O(n^2),最好情况为 O (lg n) 和最坏情况 O(n) 在堆栈空间中。在堆空间中它是 O(h),因为每个枚举器都是在堆上分配的。 (我知道,在 C# 的实现上;符合要求的实现可能具有其他堆栈或堆空间特征。)

但是迭代树的时间复杂度为 O(n),堆栈空间复杂度为 O(1)。你可以这样写:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
    var stack = new Stack<Tree<T>>();
    stack.Push(root);
    while (stack.Count != 0)
    {
        var current = stack.Pop();
        if (current == null) continue;
        yield return current.Value;
        stack.Push(current.Left);
        stack.Push(current.Right);
    }
}

它仍然使用yield return,但更聪明。现在时间复杂度为 O(n),堆空间复杂度为 O(h),堆栈空间复杂度为 O(1)。

进一步阅读:请参阅 Wes Dyer 关于该主题的文章:

http:// /blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx

What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?

It's a good idea to think carefully about your use of "yield return" when dealing with recursively defined structures. For example, I often see this:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
    if (root == null) yield break;
    yield return root.Value;
    foreach(T item in PreorderTraversal(root.Left))
        yield return item;
    foreach(T item in PreorderTraversal(root.Right))
        yield return item;
}

Perfectly sensible-looking code, but it has performance problems. Suppose the tree is h deep. Then there will at most points be O(h) nested iterators built. Calling "MoveNext" on the outer iterator will then make O(h) nested calls to MoveNext. Since it does this O(n) times for a tree with n items, that makes the algorithm O(hn). And since the height of a binary tree is lg n <= h <= n, that means that the algorithm is at best O(n lg n) and at worst O(n^2) in time, and best case O(lg n) and worse case O(n) in stack space. It is O(h) in heap space because each enumerator is allocated on the heap. (On implementations of C# I'm aware of; a conforming implementation might have other stack or heap space characteristics.)

But iterating a tree can be O(n) in time and O(1) in stack space. You can write this instead like:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
    var stack = new Stack<Tree<T>>();
    stack.Push(root);
    while (stack.Count != 0)
    {
        var current = stack.Pop();
        if (current == null) continue;
        yield return current.Value;
        stack.Push(current.Left);
        stack.Push(current.Right);
    }
}

which still uses yield return, but is much smarter about it. Now we are O(n) in time and O(h) in heap space, and O(1) in stack space.

Further reading: see Wes Dyer's article on the subject:

http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx

紅太極 2024-10-05 07:36:00

什么情况下需要使用yield
将会受到限制,不必要,了解我
陷入麻烦,否则应该
避免?

我可以想到几种情况,IE:

  • 当返回现有迭代器时避免使用yield return。示例:

    // 不要这样做,它会无缘无故地产生开销
    //(需要生成一个新的状态机)
    公共 IEnumerable;获取密钥() 
    {
        foreach(_someDictionary.Keys 中的字符串键)
            产量返回键;
    }
    // 这样做
    公共 IEnumerable;获取密钥() 
    {
        返回_someDictionary.Keys;
    }
    
  • 当您不想推迟方法的执行代码时,请避免使用yield return。示例:

    // 不要这样做,在迭代器执行之前不会抛出异常
    // 迭代,可能距离该方法调用很远
    公共 IEnumerable; Foo(酒吧巴兹) 
    {
        if (baz == null)
            抛出新的 ArgumentNullException();
         屈服 ...
    }
    // 这样做
    公共 IEnumerable; Foo(酒吧巴兹) 
    {
        if (baz == null)
            抛出新的 ArgumentNullException();
         返回新的 BazIterator(baz);
    }
    

What are the cases where use of yield
will be limiting, unnecessary, get me
into trouble, or otherwise should be
avoided?

I can think of a couple of cases, IE:

  • Avoid using yield return when you return an existing iterator. Example:

    // Don't do this, it creates overhead for no reason
    // (a new state machine needs to be generated)
    public IEnumerable<string> GetKeys() 
    {
        foreach(string key in _someDictionary.Keys)
            yield return key;
    }
    // DO this
    public IEnumerable<string> GetKeys() 
    {
        return _someDictionary.Keys;
    }
    
  • Avoid using yield return when you don't want to defer execution code for the method. Example:

    // Don't do this, the exception won't get thrown until the iterator is
    // iterated, which can be very far away from this method invocation
    public IEnumerable<string> Foo(Bar baz) 
    {
        if (baz == null)
            throw new ArgumentNullException();
         yield ...
    }
    // DO this
    public IEnumerable<string> Foo(Bar baz) 
    {
        if (baz == null)
            throw new ArgumentNullException();
         return new BazIterator(baz);
    }
    
梦中楼上月下 2024-10-05 07:36:00

要认识到的关键是 yield 有何用处,然后您可以决定哪些情况不会从中受益。

换句话说,当您不需要延迟评估序列时,您可以跳过 yield 的使用。那会是什么时候呢?当您不介意立即将整个收藏都存入内存时,就会发生这种情况。否则,如果您有一个巨大的序列会对内存产生负面影响,您将需要使用 yield 逐步处理它(即,惰性地)。在比较这两种方法时,分析器可能会派上用场。

请注意大多数 LINQ 语句如何返回 IEnumerable。这使我们能够不断地将不同的 LINQ 操作串在一起,而不会对每个步骤的性能产生负面影响(也称为延迟执行)。另一种方案是在每个 LINQ 语句之间放置一个 ToList() 调用。这将导致前面的每个 LINQ 语句在执行下一个(链接的)LINQ 语句之前立即执行,从而放弃延迟计算的任何好处并在需要时利用 IEnumerable

The key thing to realize is what yield is useful for, then you can decide which cases do not benefit from it.

In other words, when you do not need a sequence to be lazily evaluated you can skip the use of yield. When would that be? It would be when you do not mind immediately having your entire collection in memory. Otherwise, if you have a huge sequence that would negatively impact memory, you would want to use yield to work on it step by step (i.e., lazily). A profiler might come in handy when comparing both approaches.

Notice how most LINQ statements return an IEnumerable<T>. This allows us to continually string different LINQ operations together without negatively impacting performance at each step (aka deferred execution). The alternative picture would be putting a ToList() call in between each LINQ statement. This would cause each preceding LINQ statement to be immediately executed before performing the next (chained) LINQ statement, thereby forgoing any benefit of lazy evaluation and utilizing the IEnumerable<T> till needed.

浸婚纱 2024-10-05 07:36:00

这里有很多优秀的答案。我想添加这一点:不要对您已经知道值的小集合或空集合使用yield return:

IEnumerable<UserRight> GetSuperUserRights() {
    if(SuperUsersAllowed) {
        yield return UserRight.Add;
        yield return UserRight.Edit;
        yield return UserRight.Remove;
    }
}

在这些情况下,创建 Enumerator 对象比仅仅生成数据结构更昂贵、更冗长。

IEnumerable<UserRight> GetSuperUserRights() {
    return SuperUsersAllowed
           ? new[] {UserRight.Add, UserRight.Edit, UserRight.Remove}
           : Enumerable.Empty<UserRight>();
}

更新

以下是我的基准测试的结果:

Benchmark Results

这些结果显示了花费的时间 (以毫秒为单位)执行该操作 1,000,000 次。数字越小越好。

重新审视这一点时,性能差异还不足以担心,因此您应该选择最容易阅读和维护的内容。

更新 2

我很确定上述结果是在禁用编译器优化的情况下实现的。使用现代编译器在发布模式下运行,看起来两者之间的性能几乎没有区别。选择对你来说最易读的内容。

There are a lot of excellent answers here. I would add this one: Don't use yield return for small or empty collections where you already know the values:

IEnumerable<UserRight> GetSuperUserRights() {
    if(SuperUsersAllowed) {
        yield return UserRight.Add;
        yield return UserRight.Edit;
        yield return UserRight.Remove;
    }
}

In these cases the creation of the Enumerator object is more expensive, and more verbose, than just generating a data structure.

IEnumerable<UserRight> GetSuperUserRights() {
    return SuperUsersAllowed
           ? new[] {UserRight.Add, UserRight.Edit, UserRight.Remove}
           : Enumerable.Empty<UserRight>();
}

Update

Here's the results of my benchmark:

Benchmark Results

These results show how long it took (in milliseconds) to perform the operation 1,000,000 times. Smaller numbers are better.

In revisiting this, the performance difference isn't significant enough to worry about, so you should go with whatever is the easiest to read and maintain.

Update 2

I'm pretty sure the above results were achieved with compiler optimization disabled. Running in Release mode with a modern compiler, it appears performance is practically indistinguishable between the two. Go with whatever is most readable to you.

感受沵的脚步 2024-10-05 07:36:00

Eric Lippert 提出了一个很好的观点(可惜 C# 没有 流扁平化,如 Cw)。我想补充一点,有时由于其他原因,枚举过程的成本很高,因此,如果您打算多次迭代 IEnumerable,则应该使用列表。

例如,LINQ-to-objects 是建立在“yield return”之上的。如果您编写了一个缓慢的 LINQ 查询(例如,将大列表过滤为小列表,或者进行排序和分组),那么明智的做法是对结果调用 ToList()查询以避免多次枚举(实际上执行了多次查询)。

如果您在编写方法时在“yield return”和 List 之间进行选择,请考虑:每个单个元素的计算成本是否昂贵,调用者是否需要多次枚举结果?如果您知道答案是肯定的,那么您不应该使用yield return(除非,例如,生成的列表非常大,并且您负担不起它将使用的内存。请记住, yield 的另一个好处是结果列表不必一次完全存储在内存中)。

不使用“产量返回”的另一个原因是交错操作是否危险。例如,如果你的方法看起来像这样,

IEnumerable<T> GetMyStuff() {
    foreach (var x in MyCollection)
        if (...)
            yield return (...);
}

如果 MyCollection 有可能因为调用者所做的事情而改变,那么这是危险的:

foreach(T x in GetMyStuff()) {
    if (...)
        MyCollection.Add(...);
        // Oops, now GetMyStuff() will throw an exception
        // because MyCollection was modified.
}

每当调用者更改了yield return的内容时,yield return就会导致麻烦函数假设不变。

Eric Lippert raises a good point (too bad C# doesn't have stream flattening like Cw). I would add that sometimes the enumeration process is expensive for other reasons, and therefore you should use a list if you intend to iterate over the IEnumerable more than once.

For example, LINQ-to-objects is built on "yield return". If you've written a slow LINQ query (e.g. that filters a large list into a small list, or that does sorting and grouping), it may be wise to call ToList() on the result of the query in order to avoid enumerating multiple times (which actually executes the query multiple times).

If you are choosing between "yield return" and List<T> when writing a method, consider: is each single element expensive to compute, and will the caller need to enumerate the results more than once? If you know the answers are yes and yes, you shouldn't use yield return (unless, for example, the List produced is very large and you can't afford the memory it would use. Remember, another benefit of yield is that the result list doesn't have to be entirely in memory at once).

Another reason not to use "yield return" is if interleaving operations is dangerous. For example, if your method looks something like this,

IEnumerable<T> GetMyStuff() {
    foreach (var x in MyCollection)
        if (...)
            yield return (...);
}

this is dangerous if there is a chance that MyCollection will change because of something the caller does:

foreach(T x in GetMyStuff()) {
    if (...)
        MyCollection.Add(...);
        // Oops, now GetMyStuff() will throw an exception
        // because MyCollection was modified.
}

yield return can cause trouble whenever the caller changes something that the yielding function assumes does not change.

你是我的挚爱i 2024-10-05 07:36:00

如果该方法具有您期望在调用该方法时产生的副作用,我会避免使用yield return。这是由于 Pop Catalin 提到的延迟执行

一个副作用可能是修改系统,这可能发生在像 IEnumerable这样的方法中。 SetAllFoosToCompleteAndGetAllFoos(),它打破了单一职责原则。这是非常明显的(现在......),但一个不太明显的副作用可能是设置缓存结果或类似的优化。

我的经验法则(现在再次...)是:

  • 仅当返回的对象需要一些处理时才使用 yield
  • 如果我需要使用 yield<,则该方法不会产生副作用/code>
  • 如果必须有副作用(并将其限制为缓存等),请不要使用 yield 并确保扩展迭代的好处超过成本

I would avoid using yield return if the method has a side effect that you expect on calling the method. This is due to the deferred execution that Pop Catalin mentions.

One side effect could be modifying the system, which could happen in a method like IEnumerable<Foo> SetAllFoosToCompleteAndGetAllFoos(), which breaks the single responsibility principle. That's pretty obvious (now...), but a not so obvious side effect could be setting a cached result or similar as an optimisation.

My rules of thumb (again, now...) are:

  • Only use yield if the object being returned requires a bit of processing
  • No side effects in the method if I need to use yield
  • If have to have side effects (and limiting that to caching etc), don't use yield and make sure the benefits of expanding the iteration outweigh the costs
浸婚纱 2024-10-05 07:36:00

当您需要随机访问时,产量将受到限制/不必要。如果您需要先访问元素 0,然后访问元素 99,那么您就几乎消除了惰性求值的用处。

Yield would be limiting/unnecessary when you need random access. If you need to access element 0 then element 99, you've pretty much eliminated the usefulness of lazy evaluation.

冰葑 2024-10-05 07:36:00

如果您正在序列化枚举结果并通过网络发送它们,则可能会让您陷入困境。由于执行被推迟到需要结果为止,因此您将序列化一个空枚举并将其发送回来,而不是您想要的结果。

One that might catch you out is if you are serialising the results of an enumeration and sending them over the wire. Because the execution is deferred until the results are needed, you will serialise an empty enumeration and send that back instead of the results you want.

夕嗳→ 2024-10-05 07:36:00

我必须维护一个完全痴迷于 Yield Return 和 IEnumerable 的人的一堆代码。问题是我们使用的很多第三方 API 以及我们自己的很多代码都依赖于列表或数组。所以我最终不得不这样做:

IEnumerable<foo> myFoos = getSomeFoos();
List<foo> fooList = new List<foo>(myFoos);
thirdPartyApi.DoStuffWithArray(fooList.ToArray());

不一定是坏事,但处理起来有点烦人,并且在某些情况下,它会导致在内存中创建重复的列表以避免重构所有内容。

I have to maintain a pile of code from a guy who was absolutely obsessed with yield return and IEnumerable. The problem is that a lot of third party APIs we use, as well as a lot of our own code, depend on Lists or Arrays. So I end up having to do:

IEnumerable<foo> myFoos = getSomeFoos();
List<foo> fooList = new List<foo>(myFoos);
thirdPartyApi.DoStuffWithArray(fooList.ToArray());

Not necessarily bad, but kind of annoying to deal with, and on a few occasions it's led to creating duplicate Lists in memory to avoid refactoring everything.

一生独一 2024-10-05 07:36:00

如果您不希望代码块返回迭代器以顺序访问底层集合,则不需要 yield return。然后您只需返回集合即可。

When you don't want a code block to return an iterator for sequential access to an underlying collection, you dont need yield return. You simply return the collection then.

迷爱 2024-10-05 07:36:00

如果您定义一个 Linq-y 扩展方法,并在其中包装实际的 Linq 成员,那么这些成员通常会返回一个迭代器。没有必要自己通过该迭代器进行屈服。

除此之外,使用 Yield 来定义基于 JIT 计算的“流式”枚举不会遇到太多麻烦。

If you're defining a Linq-y extension method where you're wrapping actual Linq members, those members will more often than not return an iterator. Yielding through that iterator yourself is unnecessary.

Beyond that, you can't really get into much trouble using yield to define a "streaming" enumerable that is evaluated on a JIT basis.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文