在两列上使用 COUNT 和 GROUP BY 的 SQL 查询非常慢
我正在存档这个网络论坛,该论坛通常每周清理一次。所以我在屏幕上抓取它,并将其存储到我的数据库(PostgreSQL)中。
我还对数据做了一些分析,并提供了一些图表供用户欣赏,例如一天中的什么时间论坛最活跃,等等。
所以我有一个帖子表,如下所示:
Column | Type
------------+------------------------------
id | integer
body | text
created_at | timestamp without time zone
topic_id | integer
user_name | text
user_id | integer
现在我想要为我的前 10 名海报表中的每个用户提供帖子计数。
我想出了这个:
SELECT user_id, user_name, count(*)
FROM posts
GROUP BY user_id, user_name
ORDER BY count DESC LIMIT 10
事实证明这非常慢。 9 秒,此时 posts 表中只有大约 300 000 行。
如果我只对一列进行分组,则只需要半秒钟,但我需要两列。
我对关系数据库和 SQL 相当陌生,所以我不太确定这是否正确,或者只是我怎么做错了?
I'm archiving this web forum, which normally gets purged about once a week. So I'm screen scraping it, and storing it into my database (PostgreSQL).
I also do a little analysis on the data, with some graphs for users to enjoy, like what time of day is the forum most active, and so forth.
So I have a posts table, like so:
Column | Type
------------+------------------------------
id | integer
body | text
created_at | timestamp without time zone
topic_id | integer
user_name | text
user_id | integer
And I now want to have a post count for each user, for my little top 10 posters table.
I came up with this:
SELECT user_id, user_name, count(*)
FROM posts
GROUP BY user_id, user_name
ORDER BY count DESC LIMIT 10
Which turns out to be very slow. 9 seconds, with just about 300 000 rows in the posts table at the moment.
It takes only half a second, if I group on just one column, but I need both.
I'm rather new to relational databases, and SQL, so I'm not quite sure if this is right, or just how am I doing it wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
可能只有一个用户具有特定 ID,因此
max(user_name)
应等于user_name
。然后,您可以对单个列进行分组,您的帖子表明这样做速度更快:There's probably only one user with a particular ID, so
max(user_name)
should equaluser_name
. Then you can group on a single column, which your post indicates works faster:也可以使用计数> 0 所以你只返回 true
also could use having count > 0 so you only return true