为数据组选择 DISTINCT
我有下表:
ID Data
1 A
2 A
2 B
3 A
3 B
4 C
5 D
6 A
6 B
等。换句话说,我每个 ID 有几组数据。您会注意到数据组(A,B)出现了多次。我想要一个可以识别不同数据组并对它们进行编号的查询,例如:
DataID Data
101 A
102 A
102 B
103 C
104 D
因此 DataID 102 将类似于数据(A,B),DataID 103 将类似于数据(C)等。为了能够重写我的原始数据这种形式的表格:
ID DataID
1 101
2 102
3 102
4 103
5 104
6 102
我该怎么做?
附言。生成第一个表的代码:
CREATE TABLE #t1 (id INT, data VARCHAR(10))
INSERT INTO #t1
SELECT 1, 'A'
UNION ALL SELECT 2, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 4, 'C'
UNION ALL SELECT 5, 'D'
UNION ALL SELECT 6, 'A'
UNION ALL SELECT 6, 'B'
I have following table:
ID Data
1 A
2 A
2 B
3 A
3 B
4 C
5 D
6 A
6 B
etc. In other words, I have groups of data per ID. You will notice that the data group (A, B) occurs multiple times. I want a query that can identify the distinct data groups and number them, such as:
DataID Data
101 A
102 A
102 B
103 C
104 D
So DataID 102 would resemble data (A,B), DataID 103 would resemble data (C), etc. In order to be able to rewrite my original table in this form:
ID DataID
1 101
2 102
3 102
4 103
5 104
6 102
How can I do that?
PS. Code to generate the first table:
CREATE TABLE #t1 (id INT, data VARCHAR(10))
INSERT INTO #t1
SELECT 1, 'A'
UNION ALL SELECT 2, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 4, 'C'
UNION ALL SELECT 5, 'D'
UNION ALL SELECT 6, 'A'
UNION ALL SELECT 6, 'B'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在我看来,您必须创建一个连接数据的自定义聚合(出于性能原因,建议使用字符串 CLR 方法)。
然后,我将按 ID 进行分组并选择与分组不同的选项,添加 row_number() 函数或添加您选择的 dendense_rank() 。无论如何,它应该看起来像这样
In my opinion You have to create a custom aggregate that concatenates data (in case of strings CLR approach is recommended for perf reasons).
Then I would group by ID and select distinct from the grouping, adding a row_number()function or add a dense_rank() your choice. Anyway it should look like this
以下使用 CASE 的查询将给出如下所示的结果。
从那时起,获取不同的数据组并进一步进行应该不再是问题。
然而,这种逻辑只有在“数据”值既固定又事先已知的情况下才有效。
就你的情况而言,你确实说是这样。然而,考虑到您还说它们是其中的 1000 个,坦率地说,这肯定是一个看起来可笑的查询:-) 坦率地说,
LuckyLuke 的上述建议将是实现该解决方案的更通用的方法,并且可能是更明智的方法尽管就你而言。
The following query using CASE will give you the result shown below.
From there on, getting the distinct datagroups and proceeding further should not really be a problem.
However, this kind of logic will only work in case you the "Data" values are both fixed and known before hand.
In your case, you do say that is the case. However, considering that you also say that they are 1000 of them, this will be frankly, a ridiculous looking query for sure :-)
LuckyLuke's suggestion above would, frankly, be the more generic way and probably saner way to go about implementing the solution though in your case.
从您的示例数据(添加了缺少的 2,'A' 元组)中,以下给出了重新编号(且唯一)的数据:
给出:
我认为,这是一个关系划分的问题,将此输出中的行与中的行相匹配原来的表。
From your sample data (having added the missing 2,'A' tuple, the following gives the renumbered (and uniqueified) data:
Giving:
I think then, it's a matter of relational division to match up rows from this output with the rows in the original table.
只是分享我自己目前正在使用的肮脏解决方案:
然后模拟 LuckyLuke 的解决方案。
Just to share my own dirty solution that I'm using for the moment:
And then going analog to LuckyLuke's solution.