MySQL 按组最频繁的 SELECT
如何获取 MySQL 中每个标签最常出现的类别?理想情况下,我想模拟一个聚合函数来计算 a 的 模式柱子。
SELECT
t.tag
, s.category
FROM tags t
LEFT JOIN stuff s
USING (id)
ORDER BY tag;
+------------------+----------+
| tag | category |
+------------------+----------+
| automotive | 8 |
| ba | 8 |
| bamboo | 8 |
| bamboo | 8 |
| bamboo | 8 |
| bamboo | 8 |
| bamboo | 8 |
| bamboo | 10 |
| bamboo | 8 |
| bamboo | 9 |
| bamboo | 8 |
| bamboo | 10 |
| bamboo | 8 |
| bamboo | 9 |
| bamboo | 8 |
| banana tree | 8 |
| banana tree | 8 |
| banana tree | 8 |
| banana tree | 8 |
| bath | 9 |
+-----------------------------+
How do I get the most frequently occurring category for each tag in MySQL? Ideally, I would want to simulate an aggregate function that would calculate the mode of a column.
SELECT
t.tag
, s.category
FROM tags t
LEFT JOIN stuff s
USING (id)
ORDER BY tag;
+------------------+----------+
| tag | category |
+------------------+----------+
| automotive | 8 |
| ba | 8 |
| bamboo | 8 |
| bamboo | 8 |
| bamboo | 8 |
| bamboo | 8 |
| bamboo | 8 |
| bamboo | 10 |
| bamboo | 8 |
| bamboo | 9 |
| bamboo | 8 |
| bamboo | 10 |
| bamboo | 8 |
| bamboo | 9 |
| bamboo | 8 |
| banana tree | 8 |
| banana tree | 8 |
| banana tree | 8 |
| banana tree | 8 |
| bath | 9 |
+-----------------------------+
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这适用于更简单的情况:
SELECT action, COUNT(action) AS ActionCount
来自日志
按操作分组
ORDER BY ActionCount DESC;
This is for simpler situations:
SELECT action, COUNT(action) AS ActionCount
FROM log
GROUP BY action
ORDER BY ActionCount DESC;
这里有一个 hacky 方法,它利用 max 聚合函数,因为 MySQL 中没有模式聚合函数(或窗口函数等)可以实现这一点:
基本上它利用了这样一个事实:我们可以找到每个单独类别的计数的词汇最大值。
使用命名类别更容易看到这一点:
在这种情况下,我们不应该对
most_frequent_category
列进行整数转换:为了更深入地了解正在发生的事情,以下是
>grouped_cats
内部选择看起来像(我添加了order by tag, c desc
):我们可以看到
count(*)
的最大值如何如果我们省略substring
位,列将沿着其关联类别拖动:Here's a hacky approach to this which utilizes the
max
aggregate function seeing as there is no mode aggregate function in MySQL (or windowing functions etc.) that would allow this:Basically it utilizes the fact that we can find the lexical max of the counts of each individual category.
This is easier to see with named categories:
In which case we shouldn't be doing integer conversion on the
most_frequent_category
column:And to delve a little bit more into what is going on, here's what the
grouped_cats
inner select looks like (I've addedorder by tag, c desc
):And we can see how the max of the
count(*)
column drags along it's associated category if we omit thesubstring
bit:我同意这对于单个 SQL 查询来说有点太多了。在子查询中任何使用
GROUP BY
都会让我畏缩。您可以通过使用视图使其看起来更简单:但它基本上在幕后执行相同的工作。
您评论说您可以在应用程序代码中轻松执行类似的操作。那你为什么不这样做呢?执行更简单的查询来获取每个类别的计数:
并在应用程序代码中对结果进行排序。
I agree this is kind of too much for a single SQL query. Any use of
GROUP BY
inside a subquery makes me wince. You can make it look simpler by using views:But it's basically doing the same work behind the scenes.
You comment that you could do a similar operation easily in application code. So why don't you do that? Do the simpler query to get the counts per category:
And sort through the result in application code.
对于您的数据,这将返回以下内容:
这是测试脚本:
On your data, this returns the following:
Here's the test script:
(编辑:忘记了 ORDER BY 中的 DESC)
在子查询中使用 LIMIT 很容易做到。 MySQL 仍然有子查询中的 no-LIMIT 限制吗?下面的示例使用 PostgreSQL。
仅当您需要计数时才需要第三列。
(Edit: forgot DESC in ORDER BYs)
Easy to do with a LIMIT in the subquery. Does MySQL still have the no-LIMIT-in-subqueries restriction? Below example is using PostgreSQL.
Third column is only necessary if you need the count.