数据库中存在重复项，请帮助编辑我的查询以将其过滤掉？

发布于 2024-12-18 09:04:24 字数 1426 浏览 3 评论 0原文

我刚刚完成了最新的任务，即使用 PHP 创建 RSS Feed 以从数据库中获取数据。

我刚刚注意到这些项目中的很多（如果不是全部）都有重复项，我正在尝试找出如何只获取其中的一个。

我有一个想法，在我的 PHP 循环中，我只能每隔两行打印出每组重复项中的一个，但在某些情况下，每篇文章有 3 或 4 个，因此必须通过查询来实现。

查询：

SELECT * 
FROM uk_newsreach_article t1
    INNER JOIN uk_newsreach_article_photo t2
        ON t1.id = t2.newsArticleID
    INNER JOIN uk_newsreach_photo t3
        ON t2.newsPhotoID = t3.id
ORDER BY t1.publishDate DESC;

表结构：

uk_newsreach_article
--------------------
id | headline | extract | text | publishDate | ...

uk_newsreach_article_photo
--------------------------
id | newsArticleID | newsPhotoID

uk_newsreach_photo
------------------
id | htmlAlt | URL | height | width | ...

由于某种原因，存在大量重复项，每组数据中唯一真正唯一的是uk_newsreach_article_photo.id 因为即使 uk_newsreach_article_photo.newsArticleID 和 uk_newsreach_article_photo.newsPhotoID 在一组重复项中是相同的，我只需要是每组中的一个，例如

示例数据

id | newsArticleID | newsPhotoID
--------------------------------
 2 |     800482746 |     7044521
10 |     800482746 |     7044521
19 |     800482746 |     7044521
29 |     800482746 |     7044521
39 |     800482746 |     7044521
53 |     800482746 |     7044521
67 |     800482746 |     7044521

我尝试将DISTINCT 粘贴到查询中并指定我想要的实际列，但这不起作用。

原文

I have just finished my latest task of creating an RSS Feed using PHP to fetch data from a database.

I've only just noticed that a lot (if not all) of these items have duplicates and I was trying to work out how to only fetch one of each.

I had a thought that in my PHP loop I could only print out every second row to only have one of each set of duplicates but in some cases there are 3 or 4 of each article so somehow it must be achieved by the query.

Query:

SELECT * 
FROM uk_newsreach_article t1
    INNER JOIN uk_newsreach_article_photo t2
        ON t1.id = t2.newsArticleID
    INNER JOIN uk_newsreach_photo t3
        ON t2.newsPhotoID = t3.id
ORDER BY t1.publishDate DESC;

Table Structures:

uk_newsreach_article
--------------------
id | headline | extract | text | publishDate | ...

uk_newsreach_article_photo
--------------------------
id | newsArticleID | newsPhotoID

uk_newsreach_photo
------------------
id | htmlAlt | URL | height | width | ...

For some reason or another there are lots of duplicates and the only thing truely unique amongst each set of data is the uk_newsreach_article_photo.id because even though uk_newsreach_article_photo.newsArticleID and uk_newsreach_article_photo.newsPhotoID are identical in a set of duplicates, all I need is one from each set, e.g.

Sample Data

id | newsArticleID | newsPhotoID
--------------------------------
 2 |     800482746 |     7044521
10 |     800482746 |     7044521
19 |     800482746 |     7044521
29 |     800482746 |     7044521
39 |     800482746 |     7044521
53 |     800482746 |     7044521
67 |     800482746 |     7044521

I tried sticking a DISTINCT into the query along with specifying the actual columns I wanted but this didn't work.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

惯饮孤独 2024-12-25 09:04:24

正如您所注意到的，DISTINCT 运算符将返回每个 id。您可以使用GROUP BY来代替。

您必须决定要保留哪个id。在示例中，我使用了 MIN，但任何聚合函数都可以。

SQL 语句

SELECT MIN(t1.id), t2.newsArticleID, t2.newsPhotoID 
FROM uk_newsreach_article t1
    INNER JOIN uk_newsreach_article_photo t2
        ON t1.id = t2.newsArticleID
    INNER JOIN uk_newsreach_photo t3
        ON t2.newsPhotoID = t3.id
GROUP BY t2.newsArticleID, t2.newsPhotoID 
ORDER BY t1.publishDate DESC;

免责声明

现在，虽然这将是解决您当前问题的一个简单解决方案，但如果您认为不应该发生重复，那么您确实应该考虑重新设计表，以防止重复进入表中。

As you have noticed, the DISTINCT operator will return every id. You could use a GROUP BYinstead.

You will have to make a decision about wich id you want to retain. In the example, I have used MINbut any aggregate function would do.

SQL Statement

SELECT MIN(t1.id), t2.newsArticleID, t2.newsPhotoID 
FROM uk_newsreach_article t1
    INNER JOIN uk_newsreach_article_photo t2
        ON t1.id = t2.newsArticleID
    INNER JOIN uk_newsreach_photo t3
        ON t2.newsPhotoID = t3.id
GROUP BY t2.newsArticleID, t2.newsPhotoID 
ORDER BY t1.publishDate DESC;

Disclaimer

Now while this would be an easy solution to your immediate problem, if you decide that duplicates should not happen, you really should consider redesigning your tables to prevent duplicates getting into your tables in the first place.

回复收藏 0 原文

假装不在乎 2024-12-25 09:04:24

group by 所有选定的列，HAVING COUNT(*) > 1 将消除所有重复项，如下所示：

SELECT * 
FROM uk_newsreach_article t1
    INNER JOIN uk_newsreach_article_photo t2
      ON t1.id = t2.newsArticleID
    INNER JOIN uk_newsreach_photo t3
      ON t2.newsPhotoID = t3.id
GROUP BY  t1.id, t1.headline, t1.extract, t1.text, t1.publishDate,
          t2.id, t2.newsArticleID, t2.newsPhotoID,
          t3.id, t3.htmlAlt, t3.URL, t3.height, t3.width
HAVING  COUNT(*) > 1
ORDER BY t1.publishDate DESC;

group by all your selected columns with HAVING COUNT(*) > 1 will eleminate all duplicates like this:

SELECT * 
FROM uk_newsreach_article t1
    INNER JOIN uk_newsreach_article_photo t2
      ON t1.id = t2.newsArticleID
    INNER JOIN uk_newsreach_photo t3
      ON t2.newsPhotoID = t3.id
GROUP BY  t1.id, t1.headline, t1.extract, t1.text, t1.publishDate,
          t2.id, t2.newsArticleID, t2.newsPhotoID,
          t3.id, t3.htmlAlt, t3.URL, t3.height, t3.width
HAVING  COUNT(*) > 1
ORDER BY t1.publishDate DESC;

回复收藏 0 原文

~没有更多了~