如何在C#中对数据库中的多个排序数据集进行排序?
我在SQL SERVER中有一个分区表,在ID上有一个聚集索引,并且该表通过period_key进行分区。 ID在分区内唯一,但跨分区不唯一。我需要的是找到所有唯一的ID。最简单的方法就是使用
select unique ID from tab
但是需要对数据库中的DB进行排序,这需要退出大量的临时磁盘,因此需要大量的磁盘IO。由于系统已经受到IO限制,我正在考虑削减磁盘IO。由于我们可以使用簇索引按顺序读取每个分区,假设我们有10个分区,我们可以从每个分区读取一行,然后比较它们,并输出具有最小ID的记录,例如从分区X,然后读取再次比较这 10 行,输出具有最小 ID 的记录等。就像外部排序一样。
我没有 C# 经验,但了解 java。谁能告诉我如何在 C# 中实现它?
I have a partitioned table in SQL SERVER, there is a clustered index on ID, and the table is partitioned by period_key. The ID is unique inside partition, but not unique cross partitions. What I need is to find all the unique ID. The simplest way is just use
select unique ID from tab
But that need to sort the DB in database which need quit a lot of temp disk, so lots of disk IO is required. Since the system is already IO bounded, I am thinking about cut the disk IO. Since we can read each partition in order by using the cluster index, suppose we have 10 partition, we can read one row from each partition, then compare them, and output the record with the min ID, say from partition X, and then read the next row from the partition X. And again compare these 10 rows, output the record with the min ID, etc. Just like external sort.
I don't have experience in C#, but know java. Could anyone give me some idea how to implement it in c#?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好的,如果要求绕过数据库服务器端的排序,而是确定 ID 在客户端是否唯一,您可以执行以下操作 - 选择所有 ID 值(查询中没有不同的值) :
然后循环遍历所有值,将它们添加到
List
中。完成后,您可以要求列表返回其自身的版本,并删除重复项。这是一个简单的例子:免责声明 - 我是凭空写下的,所以它可能包含错误。 这篇文章包含基于
HashSet的更快实现.
OK, if the requirement is to bypass the sort on the DB server side, and rather work out if an ID is unique or not on the client side, you can do something like this - select all ID values (no distinct in the query):
Then loop through all the values, adding them to a
List
. When that's done, you can ask the list to give back a version of itself with the duplicates removed. Here's a simplistic example:Disclaimer - I wrote that off the top of my head, so it might contain errors. This post contains a faster implementation based on
HashSet
.