何时不使用yield(返回)
这个问题已经有答案了:
是否有过返回 IEnumerable 时不使用“yield return”的原因是什么?
这里有几个关于 yield return
好处的有用问题。例如,
我正在寻找关于何时不使用yield return
的想法。例如,如果我期望需要返回集合中的所有项目,那么 yield
似乎没有用,对吧?
在什么情况下使用yield
会受到限制、不必要、给我带来麻烦,或者应该避免?
This question already has an answer here:
Is there ever a reason to not use 'yield return' when returning an IEnumerable?
There are several useful questions here on SO about the benefits of yield return
. For example,
I'm looking for thoughts on when NOT to use yield return
. For example, if I expect to need to return all items in a collection, it doesn't seem like yield
would be useful, right?
What are the cases where use of yield
will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
在处理递归定义的结构时,仔细考虑“yield return”的使用是个好主意。例如,我经常看到这样的情况:
看起来非常合理的代码,但它存在性能问题。假设树有 h 深。那么最多会构建 O(h) 个嵌套迭代器。在外部迭代器上调用“MoveNext”将对 MoveNext 进行 O(h) 嵌套调用。由于它对包含 n 个项目的树执行了 O(n) 次,因此算法的复杂度为 O(hn)。由于二叉树的高度为 lg n <= h <= n,这意味着该算法在时间上最好为 O(n lg n),最坏为 O(n^2),最好情况为 O (lg n) 和最坏情况 O(n) 在堆栈空间中。在堆空间中它是 O(h),因为每个枚举器都是在堆上分配的。 (我知道,在 C# 的实现上;符合要求的实现可能具有其他堆栈或堆空间特征。)
但是迭代树的时间复杂度为 O(n),堆栈空间复杂度为 O(1)。你可以这样写:
它仍然使用yield return,但更聪明。现在时间复杂度为 O(n),堆空间复杂度为 O(h),堆栈空间复杂度为 O(1)。
进一步阅读:请参阅 Wes Dyer 关于该主题的文章:
http:// /blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx
It's a good idea to think carefully about your use of "yield return" when dealing with recursively defined structures. For example, I often see this:
Perfectly sensible-looking code, but it has performance problems. Suppose the tree is h deep. Then there will at most points be O(h) nested iterators built. Calling "MoveNext" on the outer iterator will then make O(h) nested calls to MoveNext. Since it does this O(n) times for a tree with n items, that makes the algorithm O(hn). And since the height of a binary tree is lg n <= h <= n, that means that the algorithm is at best O(n lg n) and at worst O(n^2) in time, and best case O(lg n) and worse case O(n) in stack space. It is O(h) in heap space because each enumerator is allocated on the heap. (On implementations of C# I'm aware of; a conforming implementation might have other stack or heap space characteristics.)
But iterating a tree can be O(n) in time and O(1) in stack space. You can write this instead like:
which still uses yield return, but is much smarter about it. Now we are O(n) in time and O(h) in heap space, and O(1) in stack space.
Further reading: see Wes Dyer's article on the subject:
http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx
我可以想到几种情况,IE:
当返回现有迭代器时避免使用yield return。示例:
当您不想推迟方法的执行代码时,请避免使用yield return。示例:
I can think of a couple of cases, IE:
Avoid using yield return when you return an existing iterator. Example:
Avoid using yield return when you don't want to defer execution code for the method. Example:
要认识到的关键是
yield
有何用处,然后您可以决定哪些情况不会从中受益。换句话说,当您不需要延迟评估序列时,您可以跳过
yield
的使用。那会是什么时候呢?当您不介意立即将整个收藏都存入内存时,就会发生这种情况。否则,如果您有一个巨大的序列会对内存产生负面影响,您将需要使用yield
逐步处理它(即,惰性地)。在比较这两种方法时,分析器可能会派上用场。请注意大多数 LINQ 语句如何返回
IEnumerable
。这使我们能够不断地将不同的 LINQ 操作串在一起,而不会对每个步骤的性能产生负面影响(也称为延迟执行)。另一种方案是在每个 LINQ 语句之间放置一个ToList()
调用。这将导致前面的每个 LINQ 语句在执行下一个(链接的)LINQ 语句之前立即执行,从而放弃延迟计算的任何好处并在需要时利用IEnumerable
。The key thing to realize is what
yield
is useful for, then you can decide which cases do not benefit from it.In other words, when you do not need a sequence to be lazily evaluated you can skip the use of
yield
. When would that be? It would be when you do not mind immediately having your entire collection in memory. Otherwise, if you have a huge sequence that would negatively impact memory, you would want to useyield
to work on it step by step (i.e., lazily). A profiler might come in handy when comparing both approaches.Notice how most LINQ statements return an
IEnumerable<T>
. This allows us to continually string different LINQ operations together without negatively impacting performance at each step (aka deferred execution). The alternative picture would be putting aToList()
call in between each LINQ statement. This would cause each preceding LINQ statement to be immediately executed before performing the next (chained) LINQ statement, thereby forgoing any benefit of lazy evaluation and utilizing theIEnumerable<T>
till needed.这里有很多优秀的答案。我想添加这一点:不要对您已经知道值的小集合或空集合使用yield return:
在这些情况下,创建 Enumerator 对象比仅仅生成数据结构更昂贵、更冗长。
更新
以下是我的基准测试的结果:
这些结果显示了花费的时间 (以毫秒为单位)执行该操作 1,000,000 次。数字越小越好。
重新审视这一点时,性能差异还不足以担心,因此您应该选择最容易阅读和维护的内容。
更新 2
我很确定上述结果是在禁用编译器优化的情况下实现的。使用现代编译器在发布模式下运行,看起来两者之间的性能几乎没有区别。选择对你来说最易读的内容。
There are a lot of excellent answers here. I would add this one: Don't use yield return for small or empty collections where you already know the values:
In these cases the creation of the Enumerator object is more expensive, and more verbose, than just generating a data structure.
Update
Here's the results of my benchmark:
These results show how long it took (in milliseconds) to perform the operation 1,000,000 times. Smaller numbers are better.
In revisiting this, the performance difference isn't significant enough to worry about, so you should go with whatever is the easiest to read and maintain.
Update 2
I'm pretty sure the above results were achieved with compiler optimization disabled. Running in Release mode with a modern compiler, it appears performance is practically indistinguishable between the two. Go with whatever is most readable to you.
Eric Lippert 提出了一个很好的观点(可惜 C# 没有 流扁平化,如 Cw)。我想补充一点,有时由于其他原因,枚举过程的成本很高,因此,如果您打算多次迭代 IEnumerable,则应该使用列表。
例如,LINQ-to-objects 是建立在“yield return”之上的。如果您编写了一个缓慢的 LINQ 查询(例如,将大列表过滤为小列表,或者进行排序和分组),那么明智的做法是对结果调用
ToList()
查询以避免多次枚举(实际上执行了多次查询)。如果您在编写方法时在“yield return”和
List
之间进行选择,请考虑:每个单个元素的计算成本是否昂贵,调用者是否需要多次枚举结果?如果您知道答案是肯定的,那么您不应该使用yield return
(除非,例如,生成的列表非常大,并且您负担不起它将使用的内存。请记住,yield
的另一个好处是结果列表不必一次完全存储在内存中)。不使用“产量返回”的另一个原因是交错操作是否危险。例如,如果你的方法看起来像这样,
如果 MyCollection 有可能因为调用者所做的事情而改变,那么这是危险的:
每当调用者更改了yield return的内容时,
yield return
就会导致麻烦函数假设不变。Eric Lippert raises a good point (too bad C# doesn't have stream flattening like Cw). I would add that sometimes the enumeration process is expensive for other reasons, and therefore you should use a list if you intend to iterate over the IEnumerable more than once.
For example, LINQ-to-objects is built on "yield return". If you've written a slow LINQ query (e.g. that filters a large list into a small list, or that does sorting and grouping), it may be wise to call
ToList()
on the result of the query in order to avoid enumerating multiple times (which actually executes the query multiple times).If you are choosing between "yield return" and
List<T>
when writing a method, consider: is each single element expensive to compute, and will the caller need to enumerate the results more than once? If you know the answers are yes and yes, you shouldn't useyield return
(unless, for example, the List produced is very large and you can't afford the memory it would use. Remember, another benefit ofyield
is that the result list doesn't have to be entirely in memory at once).Another reason not to use "yield return" is if interleaving operations is dangerous. For example, if your method looks something like this,
this is dangerous if there is a chance that MyCollection will change because of something the caller does:
yield return
can cause trouble whenever the caller changes something that the yielding function assumes does not change.如果该方法具有您期望在调用该方法时产生的副作用,我会避免使用
yield return
。这是由于 Pop Catalin 提到的延迟执行。一个副作用可能是修改系统,这可能发生在像 IEnumerable这样的方法中。 SetAllFoosToCompleteAndGetAllFoos(),它打破了单一职责原则。这是非常明显的(现在......),但一个不太明显的副作用可能是设置缓存结果或类似的优化。
我的经验法则(现在再次...)是:
yield
并确保扩展迭代的好处超过成本I would avoid using
yield return
if the method has a side effect that you expect on calling the method. This is due to the deferred execution that Pop Catalin mentions.One side effect could be modifying the system, which could happen in a method like
IEnumerable<Foo> SetAllFoosToCompleteAndGetAllFoos()
, which breaks the single responsibility principle. That's pretty obvious (now...), but a not so obvious side effect could be setting a cached result or similar as an optimisation.My rules of thumb (again, now...) are:
yield
if the object being returned requires a bit of processingyield
yield
and make sure the benefits of expanding the iteration outweigh the costs当您需要随机访问时,产量将受到限制/不必要。如果您需要先访问元素 0,然后访问元素 99,那么您就几乎消除了惰性求值的用处。
Yield would be limiting/unnecessary when you need random access. If you need to access element 0 then element 99, you've pretty much eliminated the usefulness of lazy evaluation.
如果您正在序列化枚举结果并通过网络发送它们,则可能会让您陷入困境。由于执行被推迟到需要结果为止,因此您将序列化一个空枚举并将其发送回来,而不是您想要的结果。
One that might catch you out is if you are serialising the results of an enumeration and sending them over the wire. Because the execution is deferred until the results are needed, you will serialise an empty enumeration and send that back instead of the results you want.
我必须维护一个完全痴迷于 Yield Return 和 IEnumerable 的人的一堆代码。问题是我们使用的很多第三方 API 以及我们自己的很多代码都依赖于列表或数组。所以我最终不得不这样做:
不一定是坏事,但处理起来有点烦人,并且在某些情况下,它会导致在内存中创建重复的列表以避免重构所有内容。
I have to maintain a pile of code from a guy who was absolutely obsessed with yield return and IEnumerable. The problem is that a lot of third party APIs we use, as well as a lot of our own code, depend on Lists or Arrays. So I end up having to do:
Not necessarily bad, but kind of annoying to deal with, and on a few occasions it's led to creating duplicate Lists in memory to avoid refactoring everything.
如果您不希望代码块返回迭代器以顺序访问底层集合,则不需要
yield return
。然后您只需返回
集合即可。When you don't want a code block to return an iterator for sequential access to an underlying collection, you dont need
yield return
. You simplyreturn
the collection then.如果您定义一个 Linq-y 扩展方法,并在其中包装实际的 Linq 成员,那么这些成员通常会返回一个迭代器。没有必要自己通过该迭代器进行屈服。
除此之外,使用 Yield 来定义基于 JIT 计算的“流式”枚举不会遇到太多麻烦。
If you're defining a Linq-y extension method where you're wrapping actual Linq members, those members will more often than not return an iterator. Yielding through that iterator yourself is unnecessary.
Beyond that, you can't really get into much trouble using yield to define a "streaming" enumerable that is evaluated on a JIT basis.