在两列上使用 COUNT 和 GROUP BY 的 SQL 查询非常慢

发布于 2024-08-22 11:30:27 字数 712 浏览 10 评论 0原文

我正在存档这个网络论坛，该论坛通常每周清理一次。所以我在屏幕上抓取它，并将其存储到我的数据库（PostgreSQL）中。

我还对数据做了一些分析，并提供了一些图表供用户欣赏，例如一天中的什么时间论坛最活跃，等等。

所以我有一个帖子表，如下所示：

   Column   |            Type
------------+------------------------------
 id         | integer
 body       | text
 created_at | timestamp without time zone
 topic_id   | integer
 user_name  | text
 user_id    | integer

现在我想要为我的前 10 名海报表中的每个用户提供帖子计数。

我想出了这个：

SELECT user_id, user_name, count(*)
FROM posts
GROUP BY user_id, user_name
ORDER BY count DESC LIMIT 10

事实证明这非常慢。 9 秒，此时 posts 表中只有大约 300 000 行。

如果我只对一列进行分组，则只需要半秒钟，但我需要两列。

我对关系数据库和 SQL 相当陌生，所以我不太确定这是否正确，或者只是我怎么做错了？

原文

I'm archiving this web forum, which normally gets purged about once a week. So I'm screen scraping it, and storing it into my database (PostgreSQL).

I also do a little analysis on the data, with some graphs for users to enjoy, like what time of day is the forum most active, and so forth.

So I have a posts table, like so:

   Column   |            Type
------------+------------------------------
 id         | integer
 body       | text
 created_at | timestamp without time zone
 topic_id   | integer
 user_name  | text
 user_id    | integer

And I now want to have a post count for each user, for my little top 10 posters table.

I came up with this:

SELECT user_id, user_name, count(*)
FROM posts
GROUP BY user_id, user_name
ORDER BY count DESC LIMIT 10

Which turns out to be very slow. 9 seconds, with just about 300 000 rows in the posts table at the moment.

It takes only half a second, if I group on just one column, but I need both.

I'm rather new to relational databases, and SQL, so I'm not quite sure if this is right, or just how am I doing it wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

罪#恶を代价 2024-08-29 11:30:27

可能只有一个用户具有特定 ID，因此 max(user_name) 应等于 user_name。然后，您可以对单个列进行分组，您的帖子表明这样做速度更快：

SELECT user_id, max(user_name), count(*)
FROM posts
GROUP BY user_id

There's probably only one user with a particular ID, so max(user_name) should equal user_name. Then you can group on a single column, which your post indicates works faster:

SELECT user_id, max(user_name), count(*)
FROM posts
GROUP BY user_id

回复收藏 0 原文

渔村楼浪 2024-08-29 11:30:27

也可以使用计数> 0 所以你只返回 true

回复收藏 0 原文

~没有更多了~

关于作者

茶色山野

暂无简介

0 文章

0 评论

25 人气

关注发私信

友情链接

文江博客

在两列上使用 COUNT 和 GROUP BY 的 SQL 查询非常慢

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

亚希

cyp

北漠

11223456

坠似风落

游魂

友情链接

在两列上使用 COUNT 和 GROUP BY 的 SQL 查询非常慢

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

亚希

cyp

北漠

11223456

坠似风落

游魂

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。