列出具有 2 列子行作为重复项的所有行的 SQL 查询是什么？

发布于 2024-07-05 22:04:24 字数 555 浏览 16 评论 0原文

我有一个包含冗余数据的表，并且我正在尝试识别具有重复子行的所有行（因为缺乏更好的词）。我所说的子行是指仅考虑 COL1 和 COL2。

假设我有这样的东西：

 COL1   COL2   COL3
 ---------------------
 aa     111    blah_x
 aa     111    blah_j
 aa     112    blah_m
 ab     111    blah_s
 bb     112    blah_d
 bb     112    blah_d
 cc     112    blah_w
 cc     113    blah_p

我需要一个返回以下内容的 SQL 查询：

 COL1   COL2   COL3
 ---------------------
 aa     111    blah_x
 aa     111    blah_j
 bb     112    blah_d
 bb     112    blah_d

原文

I have a table that has redundant data and I'm trying to identify all rows that have duplicate sub-rows (for lack of a better word). By sub-rows I mean considering COL1 and COL2 only.

So let's say I have something like this:

 COL1   COL2   COL3
 ---------------------
 aa     111    blah_x
 aa     111    blah_j
 aa     112    blah_m
 ab     111    blah_s
 bb     112    blah_d
 bb     112    blah_d
 cc     112    blah_w
 cc     113    blah_p

I need a SQL query that returns this:

 COL1   COL2   COL3
 ---------------------
 aa     111    blah_x
 aa     111    blah_j
 bb     112    blah_d
 bb     112    blah_d

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

零度° 2024-07-12 22:04:25

像这样的事情应该有效：

SELECT a.COL1, a.COL2, a.COL3
FROM YourTable a
JOIN YourTable b ON b.COL1 = a.COL1 AND b.COL2 = a.COL2 AND b.COL3 <> a.COL3

一般来说，JOIN 子句应该包括您考虑作为“重复”部分的每一列（本例中为 COL1 和 COL2），以及至少一列（或尽可能多的列））以消除与其自身连接的行（在本例中为 COL3）。

Something like this should work:

SELECT a.COL1, a.COL2, a.COL3
FROM YourTable a
JOIN YourTable b ON b.COL1 = a.COL1 AND b.COL2 = a.COL2 AND b.COL3 <> a.COL3

In general, the JOIN clause should include every column that you're considering to be part of a "duplicate" (COL1 and COL2 in this case), and at least one column (or as many as it takes) to eliminate a row joining to itself (COL3, in this case).

回复收藏 0 原文

夏末染殇 2024-07-12 22:04:25

我天真的尝试是，

select a.*, b.* from table a, table b where a.col1 = b.col1 and a.col2 = b.col2 and a.col3 != b.col3;

但这会返回所有行两次。我不确定你如何将其限制为仅返回一次。也许如果有主键，您可以添加“and a.pkey < b.pkey”。

就像我说的，这并不优雅，可能有更好的方法来做到这一点。

My naive attempt would be

select a.*, b.* from table a, table b where a.col1 = b.col1 and a.col2 = b.col2 and a.col3 != b.col3;

but that would return all the rows twice. I'm not sure how you'd restrict it to just returning them once. Maybe if there was a primary key, you could add "and a.pkey < b.pkey".

Like I said, that's not elegant and there is probably a better way to to do this.

回复收藏 0 原文

や三分注定 2024-07-12 22:04:25

没有方便的数据库来测试这个，但我认为它应该可以工作......

select
  *
from
  theTable
where
  col1 in
    (
    select
      col1
    from
      theTable
    group by
      col1||col2
    having
      count(col1||col2) > 1
    )

Don't have a database handy to test this, but I think it should work...

select
  *
from
  theTable
where
  col1 in
    (
    select
      col1
    from
      theTable
    group by
      col1||col2
    having
      count(col1||col2) > 1
    )

回复收藏 0 原文

呆° 2024-07-12 22:04:25

像这样加入自己：

SELECT a.col3, b.col3, a.col1, a.col2 
FROM tablename a, tablename b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3

如果您使用 postgresql，您可以使用 oid 使其返回更少的重复结果，如下所示：

SELECT a.col3, b.col3, a.col1, a.col2 
FROM tablename a, tablename b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3
  AND a.oid < b.oid

Join on yourself like this:

SELECT a.col3, b.col3, a.col1, a.col2 
FROM tablename a, tablename b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3

If you're using postgresql, you can use the oid to make it return less duplicated results, like this:

SELECT a.col3, b.col3, a.col1, a.col2 
FROM tablename a, tablename b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3
  AND a.oid < b.oid

回复收藏 0 原文

叫思念不要吵 2024-07-12 22:04:25

根据您列出的数据，您的查询是不可能的。第 5 行和第 5 行的数据 6本身并不独特。

假设您的表名为“quux”，如果您从这样的内容开始：

SELECT a.COL1, a.COL2, a.COL3 
FROM quux a, quux b
WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.COL3 <> b.COL3
ORDER BY a.COL1, a.COL2

您最终会得到这个答案：

 COL1   COL2   COL3
 ---------------------
 aa     111    blah_x
 aa     111    blah_j

那是因为第 5 行和第 5 行是 6 的 COL3 具有相同的值。任何返回第 5 行和第 5 行的查询 6 还将返回该数据集中所有行的重复项。

另一方面，如果您有主键 (ID)，那么您可以改用此查询：

SELECT a.COL1, a.COL2, a.COL3
FROM quux a, quux b
WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.ID <> b.ID
ORDER BY a.COL1, a.COL2

[已编辑以简化 WHERE 子句]

您将得到您想要的结果：

COL1   COL2   COL3
---------------------
aa     111    blah_x
aa     111    blah_j
bb     112    blah_d
bb     112    blah_d

我只是我们在 SQL Server 2000 上对此进行了测试，但您应该在任何现代 SQL 数据库上看到相同的结果。

blorgbeard 证明了我错误——对他有好处！

With the data you have listed, your query is not possible. The data on rows 5 & 6 is not distinct within itself.

Assuming that your table is named 'quux', if you start with something like this:

SELECT a.COL1, a.COL2, a.COL3 
FROM quux a, quux b
WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.COL3 <> b.COL3
ORDER BY a.COL1, a.COL2

You'll end up with this answer:

 COL1   COL2   COL3
 ---------------------
 aa     111    blah_x
 aa     111    blah_j

That's because rows 5 & 6 have the same values for COL3. Any query that returns both rows 5 & 6 will also return duplicates of ALL of the rows in this dataset.

On the other hand, if you have a primary key (ID), then you can use this query instead:

SELECT a.COL1, a.COL2, a.COL3
FROM quux a, quux b
WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.ID <> b.ID
ORDER BY a.COL1, a.COL2

[Edited to simplify the WHERE clause]

And you'll get the results you want:

COL1   COL2   COL3
---------------------
aa     111    blah_x
aa     111    blah_j
bb     112    blah_d
bb     112    blah_d

I just tested this on SQL Server 2000, but you should see the same results on any modern SQL database.

blorgbeard proved me wrong -- good for him!

回复收藏 0 原文

够钟 2024-07-12 22:04:25

这与自连接非常相似，只是它不会有重复项。

select COL1,COL2,COL3
from theTable a
where exists (select 'x'
              from theTable b
              where a.col1=b.col1
              and   a.col2=b.col2
              and   a.col3<>b.col3)
order by col1,col2,col3

This is pretty similar to the self-join, except it will not have the duplicates.

select COL1,COL2,COL3
from theTable a
where exists (select 'x'
              from theTable b
              where a.col1=b.col1
              and   a.col2=b.col2
              and   a.col3<>b.col3)
order by col1,col2,col3

回复收藏 0 原文

夏末的微笑 2024-07-12 22:04:25

以下是查找重复项的方法。使用您的数据在 oracle 10g 中进行了测试。

从 tst 中选择 *
其中 (col1, col2) 在
（从 tst 组中选择 col1、col2，其中 count(*) > 1）

回复收藏 0 原文

北方的巷 2024-07-12 22:04:25

COL1、COL2、COL3

从表

组中选择 COL1、COL2、COL3

按 count(*)>1 的

回复收藏 0 原文

做个少女永远怀春 2024-07-12 22:04:25

忘记连接——使用分析函数：

select col1, col2, col3
from
(
select col1, col2, col3, count(*) over (partition by col1, col2) rows_per_col1_col2
from table
)
where rows_per_col1_col2 > 1

Forget joins -- use an analytic function:

select col1, col2, col3
from
(
select col1, col2, col3, count(*) over (partition by col1, col2) rows_per_col1_col2
from table
)
where rows_per_col1_col2 > 1

回复收藏 0 原文

孤者何惧 2024-07-12 22:04:24

这对你有用吗？

select t.* from table t
left join ( select col1, col2, count(*) as count from table group by col1, col2 ) c on t.col1=c.col1 and t.col2=c.col2
where c.count > 1

Does this work for you?

select t.* from table t
left join ( select col1, col2, count(*) as count from table group by col1, col2 ) c on t.col1=c.col1 and t.col2=c.col2
where c.count > 1

回复收藏 0 原文

~没有更多了~