为数据组选择 DISTINCT

发布于 2024-10-04 08:06:03 字数 790 浏览 0 评论 0原文

我有下表:

ID  Data
1   A
2   A
2   B
3   A
3   B
4   C
5   D 
6   A
6   B

等。换句话说,我每个 ID 有几组数据。您会注意到数据组(A,B)出现了多次。我想要一个可以识别不同数据组并对它们进行编号的查询,例如:

DataID     Data
101        A
102        A
102        B
103        C
104        D

因此 DataID 102 将类似于数据(A,B),DataID 103 将类似于数据(C)等。为了能够重写我的原始数据这种形式的表格:

ID   DataID
1    101
2    102
3    102
4    103
5    104
6    102

我该怎么做?


附言。生成第一个表的代码:

CREATE TABLE #t1 (id INT, data VARCHAR(10))
INSERT INTO #t1
SELECT 1, 'A'
UNION ALL SELECT 2, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 4, 'C'
UNION ALL SELECT 5, 'D'
UNION ALL SELECT 6, 'A'
UNION ALL SELECT 6, 'B'

I have following table:

ID  Data
1   A
2   A
2   B
3   A
3   B
4   C
5   D 
6   A
6   B

etc. In other words, I have groups of data per ID. You will notice that the data group (A, B) occurs multiple times. I want a query that can identify the distinct data groups and number them, such as:

DataID     Data
101        A
102        A
102        B
103        C
104        D

So DataID 102 would resemble data (A,B), DataID 103 would resemble data (C), etc. In order to be able to rewrite my original table in this form:

ID   DataID
1    101
2    102
3    102
4    103
5    104
6    102

How can I do that?


PS. Code to generate the first table:

CREATE TABLE #t1 (id INT, data VARCHAR(10))
INSERT INTO #t1
SELECT 1, 'A'
UNION ALL SELECT 2, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 4, 'C'
UNION ALL SELECT 5, 'D'
UNION ALL SELECT 6, 'A'
UNION ALL SELECT 6, 'B'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

子栖 2024-10-11 08:06:03

在我看来,您必须创建一个连接数据的自定义聚合(出于性能原因,建议使用字符串 CLR 方法)。
然后,我将按 ID 进行分组并选择与分组不同的选项,添加 row_number() 函数或添加您选择的 dendense_rank() 。无论如何,它应该看起来像这样

with groupings as (
select concat(data) groups
from Table1
group by ID
)
select groups, rownumber() over () from groupings

In my opinion You have to create a custom aggregate that concatenates data (in case of strings CLR approach is recommended for perf reasons).
Then I would group by ID and select distinct from the grouping, adding a row_number()function or add a dense_rank() your choice. Anyway it should look like this

with groupings as (
select concat(data) groups
from Table1
group by ID
)
select groups, rownumber() over () from groupings
花桑 2024-10-11 08:06:03

以下使用 CASE 的查询将给出如下所示的结果。

从那时起,获取不同的数据组并进一步进行应该不再是问题。

SELECT     
    id, 
     MAX(CASE data WHEN 'A' THEN data ELSE '' END) + 
     MAX(CASE data WHEN 'B' THEN data ELSE '' END) + 
     MAX(CASE data WHEN 'C' THEN data ELSE '' END) + 
     MAX(CASE data WHEN 'D' THEN data ELSE '' END) AS DataGroups
FROM  t1
GROUP BY id

ID  DataGroups
1   A
2   AB
3   AB
4   C
5   D
6   AB

然而,这种逻辑只有在“数据”值既固定又事先已知的情况下才有效。

就你的情况而言,你确实说是这样。然而,考虑到您还说它们是其中的 1000 个,坦率地说,这肯定是一个看起来可笑的查询:-) 坦率地说,

LuckyLuke 的上述建议将是实现该解决方案的更通用的方法,并且可能是更明智的方法尽管就你而言。

The following query using CASE will give you the result shown below.

From there on, getting the distinct datagroups and proceeding further should not really be a problem.

SELECT     
    id, 
     MAX(CASE data WHEN 'A' THEN data ELSE '' END) + 
     MAX(CASE data WHEN 'B' THEN data ELSE '' END) + 
     MAX(CASE data WHEN 'C' THEN data ELSE '' END) + 
     MAX(CASE data WHEN 'D' THEN data ELSE '' END) AS DataGroups
FROM  t1
GROUP BY id

ID  DataGroups
1   A
2   AB
3   AB
4   C
5   D
6   AB

However, this kind of logic will only work in case you the "Data" values are both fixed and known before hand.

In your case, you do say that is the case. However, considering that you also say that they are 1000 of them, this will be frankly, a ridiculous looking query for sure :-)

LuckyLuke's suggestion above would, frankly, be the more generic way and probably saner way to go about implementing the solution though in your case.

十六岁半 2024-10-11 08:06:03

从您的示例数据(添加了缺少的 2,'A' 元组)中,以下给出了重新编号(且唯一)的数据:

with NonDups as (
select t1.id
from #t1 t1 left join #t1 t2
on t1.id > t2.id and t1.data = t2.data
group by t1.id
having COUNT(t1.data) > COUNT(t2.data)
), DataAddedBack as (
    select ID,data
    from #t1 where id in (select id from NonDups)
), Renumbered as (
    select DENSE_RANK() OVER (ORDER BY id) as ID,Data from DataAddedBack
)
select * from Renumbered

给出:

1          A
2          A
2          B
3          C
4          D

我认为,这是一个关系划分的问题,将此输出中的行与中的行相匹配原来的表。

From your sample data (having added the missing 2,'A' tuple, the following gives the renumbered (and uniqueified) data:

with NonDups as (
select t1.id
from #t1 t1 left join #t1 t2
on t1.id > t2.id and t1.data = t2.data
group by t1.id
having COUNT(t1.data) > COUNT(t2.data)
), DataAddedBack as (
    select ID,data
    from #t1 where id in (select id from NonDups)
), Renumbered as (
    select DENSE_RANK() OVER (ORDER BY id) as ID,Data from DataAddedBack
)
select * from Renumbered

Giving:

1          A
2          A
2          B
3          C
4          D

I think then, it's a matter of relational division to match up rows from this output with the rows in the original table.

小梨窩很甜 2024-10-11 08:06:03

只是分享我自己目前正在使用的肮脏解决方案:

SELECT DISTINCT t1.id, D.data
FROM #t1 t1
CROSS APPLY ( 
    SELECT CAST(Data AS VARCHAR) + ','
    FROM #t1 t2
    WHERE t2.id = t1.id
    ORDER BY Data ASC
    FOR XML PATH('') )  
D ( Data )

然后模拟 LuckyLuke 的解决方案。

Just to share my own dirty solution that I'm using for the moment:

SELECT DISTINCT t1.id, D.data
FROM #t1 t1
CROSS APPLY ( 
    SELECT CAST(Data AS VARCHAR) + ','
    FROM #t1 t2
    WHERE t2.id = t1.id
    ORDER BY Data ASC
    FOR XML PATH('') )  
D ( Data )

And then going analog to LuckyLuke's solution.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文