内存优化的 OrderBy 和 Take?
我有 9 GB 数据,但我只需要 10 行。当我这样做时:
data.OrderBy(datum => datum.Column1)
.Take(10)
.ToArray();
我收到一个 OutOfMemoryException
。我想使用经过优化以降低内存消耗的 OrderByAndTake
方法。写起来很容易,但我想已经有人写过了。我在哪里可以找到它。
编辑:这是 Linq-to-objects。数据来自文件。如果每行的 Column1
值小于当前列表中的 10 个最大值,则可以丢弃该行。
I have 9 GB of data, and I want only 10 rows. When I do:
data.OrderBy(datum => datum.Column1)
.Take(10)
.ToArray();
I get an OutOfMemoryException
. I would like to use an OrderByAndTake
method, optimized for lower memory consumption. It's easy to write, but I guess someone already did. Where can I find it.
Edit: It's Linq-to-objects. The data comes from a file. Each row can be discarded if its value for Column1
is smaller than the current list of 10 biggest values.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我假设您在 Linq to Objects 中执行此操作。你可以做类似的事情......
这样,并不是所有的项目都需要保存在一个新的排序集合中,只有你感兴趣的最好的 10 个项目。
这是最少的代码方式。由于您知道
soFar
列表已排序,因此可以优化测试插入current
的位置/是否插入。我不想为你做所有的工作。 ;-)PS:将
T
替换为您的类型。编辑:想想看,最有效的方法实际上是一个普通的旧
foreach
,它将每个项目与最佳 10 项的运行列表进行比较。I'm assuming you're doing this in Linq to Objects. You could do something like...
In this way, not all the items need to be kept in a new sorted collection, only the best 10 you're interested in.
This was the least code way. Since you know the
soFar
list is sorted, testing where/if to insertcurrent
could be optimized. I didn't feel like doing ALL the work for you. ;-)PS: Replace
T
with whatever your type is.EDIT: Thinking about it, the most efficient way would actually be a plain old
foreach
that compares each item to the running list of best 10.可以看出:OrderBy 是一个 Sort,需要存储所有元素(延迟执行被取消)。
当
data
是 IQueryable 时,它应该有效地工作,然后由数据库决定。It figures: OrderBy is a Sort and that requires storing all the elements (deferred execution is cancelled).
It ought to work efficiently when
data
is an IQueryable, then it's up to the database.要对一组无序对象进行排序,您必须查看所有对象,不是吗?
我不明白你如何能够避免解析所有 9 GB 的数据以获取以某种方式排序的前 10 个数据,除非 9 GB 的数据已经以这种方式排序或者如果有索引或其他辅助数据可以利用的结构。
您能否就您的问题提供更多背景信息。您是否使用 LINQ to SQL 或实体框架或其他 O/RM 查询数据库?
To order a set of unordered objects you have to look at all of them, no?
I don't see how you'd be able to avoid parsing all 9 GB of data to get the first 10 ordered in a certain way unless the 9 GB of data was already ordered in that fashion or if there were indexes or other ancillary data structures that could be utilized.
Could you provide a bit more background on your question. Are you querying a database using LINQ to SQL or Entity Framework or some other O/RM?
您可以将类似的内容与 投影比较器:
运行时应该是
O(log(count)*seq.Count())
并且spaceO(min(log(count),seq.Count()))
一个问题是,如果有两个元素
comp.Compare(a,b)= ,它就会中断=0
因为该集合不允许重复条目。You can use something like this together with a projection comparer:
Runtime should be
O(log(count)*seq.Count())
and spaceO(min(log(count),seq.Count()))
One issue is that it will break if you have two elements for which
comp.Compare(a,b)==0
since the set doesn't allow duplicate entries.