MySQL 查询:计算一个非常大的表中的重复值
我有这个 MySQL 表:
CREATE TABLE `triple` (
`id_one` int(11) NOT NULL,
`id_two` int(11) NOT NULL,
`id_three` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
它包含近 1000 万行。中间列(“id_two”)中的 id 可以在不同的行中出现不同的时间。
简短示例:
id_one id_two id_three
1 2 3
2 2 3
3 2 1
68 98 1
1 4 3
2 4 4
4 5 33
6 5 3
90 5 3
34 5 83
9 3 98
现在我想计算不同的 id(“id_two”),即本例所示:
id_two count
2 -> 3
98 -> 1
4 -> 2
5 -> 4
3 -> 1
如何有效地实现这一点? (这是一次性工作......)这是第一件事。我需要的第二件事是:像上面那样计数,然后仅选择计数超过 100 的 id。
非常感谢您的帮助!
亲切的问候 奥夫风
I have this MySQL table:
CREATE TABLE `triple` (
`id_one` int(11) NOT NULL,
`id_two` int(11) NOT NULL,
`id_three` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
It coontains nearly 10 Million rows. The ids in the middle column ("id_two") can appear in different rows different times.
Short Example:
id_one id_two id_three
1 2 3
2 2 3
3 2 1
68 98 1
1 4 3
2 4 4
4 5 33
6 5 3
90 5 3
34 5 83
9 3 98
Now I want to count the different ids ("id_two"), that is in this example:
id_two count
2 -> 3
98 -> 1
4 -> 2
5 -> 4
3 -> 1
How to achieve that efficiently? (It is a one time job...) That is the first thing. The second thing I need is: To count like above and then select only the ids, that have a count of more then let's say 100.
Thank you very much for helping!
Kind Regards
Aufwind
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
完成此操作的基本命令是这样的:
如果您愿意,可以将其存储在临时表中
……或者在外部查询中使用结果……
或者使用 HAVING 子句(如 Marc 在评论中建议的那样):
The basic command to accomplish this is this one:
You can store that in a temporary table if you want to …
… or use the result in an outer query …
… or use a HAVING-clause (as suggested by Marc in the comments):
对于问题 1:
对于问题 2:
For question 1:
For question 2:
对于“第二件事”,使用 HAVING
索引字段 id_two 应该会提高性能。
For your "second thing", use HAVING
Indexing field
id_two
should give it a performance boost.对于 1000 万行表上的一次性作业,我会完全跳过 SQL。尝试使用 GROUP BY 可能会锁定表太长时间。
在类 Unix 系统上,这将生成一个包含两列的列表:出现次数、ID
..如果必须使用 GROUP BY,请将 ORDER BY NULL 添加到 select 语句的末尾以节省一些时间。否则MySQL将尝试按默认顺序< /a> (id_two 的值)。
For a one-time job on a 10 million row table, I'd just skip the SQL entirely. Trying to use GROUP BY could lock your table for way too long.
On a Unix-like system, this will produce a listing with two columns: number of occurrences, ID
..if you must use GROUP BY, add an ORDER BY NULL to the end of your select statement to save some time. Otherwise MySQL will try to order your groups by the default order (the value of id_two).
试试这个:
Try this: