获取表中所有记录的样本
我有一个表,其中有几百万条记录。我想要进行采样,返回表中的每条记录。我当前的解决方案如下所示:
myQuery.Where((rec, index) => index % interval == 0);
但是 Linq to Entities 不支持此操作并引发异常。我也尝试过这个:
myQuery.Select((rec, index) => new { Index = index, Record = rec })
.Where(x => x.Index % interval == 0);
但是 Linq to Entities 也不支持这一点。
即使确实如此,它看起来也不是很优雅。还有另一种方法可以做到这一点吗?
I have a table with a few million records in it. I want to get a sampling, where I return every nth record in the table. My current solution looks like this:
myQuery.Where((rec, index) => index % interval == 0);
However Linq to Entities does not support this and throws an exception. I also tried this:
myQuery.Select((rec, index) => new { Index = index, Record = rec })
.Where(x => x.Index % interval == 0);
But Linq to Entities does not support that either.
Even if it did work that way, it doesn't seem very elegant. Is there another way of doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这里有一个类似的 Linq to SQL 讨论 LINQ to SQL Every Nth Row From Table,看看这个。
我认为 SQL 确实是一条出路。在 EF 中,您可以将 SQL 放入 DefiningQuery 中
There is a similar Linq to SQL discussion going on here LINQ to SQL Every Nth Row From Table, check this out.
I think SQL is really the way to go. In EF you can put your SQL in a DefiningQuery
您似乎没有定义
n
作为表索引的顺序。由于表中的数据是无序的,如果不指定顺序,记录号是没有意义的,因此第n条记录是未定义的。这是否意味着您只想要某个随机百分比的记录?如果是这样,那么随机抽取 1% 的数据样本的示例将如下所示:这实际上并不能保证是随机样本。它只是从数据库中的所有记录中返回一个未定义的样本,该样本可能是也可能不是多个查询中的同一组。同样,这是由于表是无序记录集的性质造成的。
It doesn't seem like you've defined an order by which
n
is the index of the table. Since data in a table in unordered, without specifying an order the record number is meaningless, and so then
th record is undefined. Does that mean you just want a certain random percent of the records? If so, an example of taking you a random one percent sample of the data would look like so:This is not actually guaranteed to be a random sample. It simply returns an undefined sample out of all the records in the database, which could or could not be the same set across several queries. Again, this is due to the nature of a table being an unordered record set.
尝试按索引%间隔排序。如果有效,您可以将第一个间隔/总记录作为样本。
Try ordering by index%interval. If that works, you can take the first interval/total records as your sample.
直接 SQL 方式可能是我的方式,但如果您有顺序 Id,则可以利用 Min 和 Max:
注意我使用 Linqpad 测试了这一点,它使用 linq to sql - 但我认为结果在 EF 中是相同的。
生成的SQL
The direct SQL way is probably how I would go but if you have sequential Ids, you can leverage Min and Max:
Note I tested this with Linqpad which uses linq to sql - but I think the result will be the same in EF.
Sql generated
您始终可以使用 Skip and Take 来完成此操作。但我敢打赌它的效率非常低。
You could always do it with Skip and Take. Bet its horribly inefficient though.