SQL在行中查找重复值

发布于 2025-02-10 16:21:16 字数 275 浏览 4 评论 0 原文

我的表有22列和多行。

我想获得一个新表，该表仅包含行V1，V3，V4，V13中的值在表中两次出现在彼此相关的表中（这些重复）

在Microsoft SQL Server Management Studio中起作用。

我尝试了这样的事情

SELECT 
    V1, V3, V4, V13
FROM  
    table 
GROUP BY 
    V1, V3, V4, V13
HAVING 
    (COUNT (*) = 2)

原文

I have table with 22 columns and multiple rows.

I want to get a new table that will contain only rows for which the values in the columns V1, V3, V4, V13 appear twice in the table in correlation with each other (these repetitions)

Works in Microsoft SQL Server Management Studio.

I've tried something like this

SELECT 
    V1, V3, V4, V13
FROM  
    table 
GROUP BY 
    V1, V3, V4, V13
HAVING 
    (COUNT (*) = 2)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

帥小哥 2025-02-17 16:21:16

您可以首先计算每个“ v1 ”，“” v3 “，” v4 “，” v13 “带有 count 窗口函数的字段。然后与您的原始表一起重新加入，尽管保持数量大于1的行。

WITH cte AS (
    SELECT <your_table_identifier_field(s)>, 
           COUNT(*) OVER(PARTITION BY V1, V3, V4, V13) as cnt 
    FROM table
)
SELECT * 
FROM       tab
INNER JOIN cte
        ON tab.<your_table_identifier_field(s)> = cte.<your_table_identifier_field(s)>
       AND cte.cnt > 1

编辑：如果没有特定的行标识符，该怎么办？然后，您被迫匹配所有列值。

WITH cte AS (
    SELECT *, 
           COUNT(*) OVER(PARTITION BY V1, V3, V4, V13) as cnt 
    FROM table
)
SELECT * 
FROM       tab
INNER JOIN cte
        ON tab.<field1> = cte.<field1>
       AND tab.<field2> = cte.<field2>
       AND ...
       AND cte.cnt > 1

You can do it by first counting how many values you have for each "V1", "V3", "V4", "V13" fields with the COUNT window function. Then join back with your original table, though keeping those rows which have count bigger than 1.

WITH cte AS (
    SELECT <your_table_identifier_field(s)>, 
           COUNT(*) OVER(PARTITION BY V1, V3, V4, V13) as cnt 
    FROM table
)
SELECT * 
FROM       tab
INNER JOIN cte
        ON tab.<your_table_identifier_field(s)> = cte.<your_table_identifier_field(s)>
       AND cte.cnt > 1

EDIT: What if there's no specific row identifier? You're forced to match all column values then.

WITH cte AS (
    SELECT *, 
           COUNT(*) OVER(PARTITION BY V1, V3, V4, V13) as cnt 
    FROM table
)
SELECT * 
FROM       tab
INNER JOIN cte
        ON tab.<field1> = cte.<field1>
       AND tab.<field2> = cte.<field2>
       AND ...
       AND cte.cnt > 1

回复收藏 0 原文

怎樣才叫好 2025-02-17 16:21:16

为什么要过度简单？只需将您的原始查询放入CTE中，然后将CTE加入原始表即可。

with cte as (
    SELECT V1, V3, V4, V13 FROM dbo.table 
    GROUP BY V1, V3, V4, V13
    HAVING COUNT (*) > 1
)
select t1.* 
from dbo.table as t1 
inner join cte on t1.V1 = cte.V1 and t1.V2 = cte.V2 ...
order by ...
;

我已经进行了一些更改以纳入最佳实践。将模式名称（假定为DBO）添加到您的表名称，语句终止器，按子句订购（因为通常很重要）。自您询问重复项以来，我更改了“ have”子句，这意味着计数为＆gt; 1。使用计数= 2意味着您将结果限制为正好两个重复的行。这是一个非常不寻常的要求，但在需要时进行更改。通常，我不会使用“*”作为列列表，因为通常要明确包含所需的列要好得多。我也把它留给你。

最后，考虑更改模式以防止重复。这将避免将来很多工作。

Why over-complicate things? Just take your original query, put it in a CTE and then join the CTE to the original table.

with cte as (
    SELECT V1, V3, V4, V13 FROM dbo.table 
    GROUP BY V1, V3, V4, V13
    HAVING COUNT (*) > 1
)
select t1.* 
from dbo.table as t1 
inner join cte on t1.V1 = cte.V1 and t1.V2 = cte.V2 ...
order by ...
;

I've made some changes to incorporate best practices. Added the schema name (assumed dbo) to your table name, statement terminator, ORDER BY clause (because it usually matters). I changed the HAVING clause since you asked for duplicates which means the count is > 1. Using count = 2 means you are limiting the result to those rows where there are exactly two duplicates. That is a very unusual requirement but change it if needed. Normally I would not use "*" as the column list as it is generally far better to explicitly include the columns you need. I leave that to you as well.

Finally, consider changing your schema to prevent duplicates. That will avoid a lot of future work.

回复收藏 0 原文

~没有更多了~