MySQL 查询：计算一个非常大的表中的重复值

发布于 2024-11-02 14:23:28 字数 866 浏览 0 评论 0原文

我有这个 MySQL 表：

CREATE TABLE `triple` (
  `id_one` int(11) NOT NULL,
  `id_two` int(11) NOT NULL,
  `id_three` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

它包含近 1000 万行。中间列（“id_two”）中的 id 可以在不同的行中出现不同的时间。

简短示例：

id_one    id_two    id_three
1         2         3
2         2         3
3         2         1
68        98        1
1         4         3
2         4         4
4         5         33
6         5         3
90        5         3
34        5         83
9         3         98

现在我想计算不同的 id（“id_two”），即本例所示：

id_two     count
2     ->  3
98    ->  1
4     ->  2
5     ->  4
3     ->  1

如何有效地实现这一点？（这是一次性工作......）这是第一件事。我需要的第二件事是：像上面那样计数，然后仅选择计数超过 100 的 id。

非常感谢您的帮助！

亲切的问候奥夫风

原文

I have this MySQL table:

CREATE TABLE `triple` (
  `id_one` int(11) NOT NULL,
  `id_two` int(11) NOT NULL,
  `id_three` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

It coontains nearly 10 Million rows. The ids in the middle column ("id_two") can appear in different rows different times.

Short Example:

id_one    id_two    id_three
1         2         3
2         2         3
3         2         1
68        98        1
1         4         3
2         4         4
4         5         33
6         5         3
90        5         3
34        5         83
9         3         98

Now I want to count the different ids ("id_two"), that is in this example:

id_two     count
2     ->  3
98    ->  1
4     ->  2
5     ->  4
3     ->  1

How to achieve that efficiently? (It is a one time job...) That is the first thing. The second thing I need is: To count like above and then select only the ids, that have a count of more then let's say 100.

Thank you very much for helping!

Kind Regards
Aufwind

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

往日情怀 2024-11-09 14:23:28

完成此操作的基本命令是这样的：

SELECT id_two, count(*) FROM triple GROUP BY id_two;

如果您愿意，可以将其存储在临时表中

CREATE TEMPORARY TABLE xxx SELECT id_two, count(*) AS c FROM …
SELECT * FROM xxx WHERE c > 100;

……或者在外部查询中使用结果……

SELECT * FROM (SELECT id_two, count(*) AS c FROM triple GROUP BY id_two) t WHERE c > 100;

或者使用 HAVING 子句（如 Marc 在评论中建议的那样）：

SELECT id_two, count(*) AS c FROM triple GROUP BY id_two HAVING c > 100;

The basic command to accomplish this is this one:

SELECT id_two, count(*) FROM triple GROUP BY id_two;

You can store that in a temporary table if you want to …

CREATE TEMPORARY TABLE xxx SELECT id_two, count(*) AS c FROM …
SELECT * FROM xxx WHERE c > 100;

… or use the result in an outer query …

SELECT * FROM (SELECT id_two, count(*) AS c FROM triple GROUP BY id_two) t WHERE c > 100;

… or use a HAVING-clause (as suggested by Marc in the comments):

SELECT id_two, count(*) AS c FROM triple GROUP BY id_two HAVING c > 100;

回复收藏 0 原文

冷血 2024-11-09 14:23:28

SELECT id_two, COUNT(*)
    FROM triple
    GROUP BY id_two
    HAVING COUNT(*) > 100

SELECT id_two, COUNT(*)
    FROM triple
    GROUP BY id_two
    HAVING COUNT(*) > 100

回复收藏 0 原文

夜清冷一曲。 2024-11-09 14:23:28

对于问题 1：

SELECT id_two, COUNT(1)
  FROM triple
GROUP BY id_two

对于问题 2：

SELECT id_two, COUNT(1)
  FROM triple
GROUP BY id_two
HAVING COUNT(1) > 100

For question 1:

SELECT id_two, COUNT(1)
  FROM triple
GROUP BY id_two

For question 2:

SELECT id_two, COUNT(1)
  FROM triple
GROUP BY id_two
HAVING COUNT(1) > 100

回复收藏 0 原文

你げ笑在眉眼 2024-11-09 14:23:28

对于“第二件事”，使用 HAVING

SELECT id_two, count(*) nb FROM triple GROUP BY id_two HAVING nb >= 100;

索引字段 id_two 应该会提高性能。

For your "second thing", use HAVING

SELECT id_two, count(*) nb FROM triple GROUP BY id_two HAVING nb >= 100;

Indexing field id_two should give it a performance boost.

回复收藏 0 原文

赠我空喜 2024-11-09 14:23:28

对于 1000 万行表上的一次性作业，我会完全跳过 SQL。尝试使用 GROUP BY 可能会锁定表太长时间。

SELECT id_two FROM TRIPLE INTO OUTFILE('/tmp/idtwo.txt')

在类 Unix 系统上，这将生成一个包含两列的列表：出现次数、ID

# sort -n /tmp/idtwo.txt | uniq -c

..如果必须使用 GROUP BY，请将 ORDER BY NULL 添加到 select 语句的末尾以节省一些时间。否则MySQL将尝试按默认顺序< /a> （id_two 的值）。

For a one-time job on a 10 million row table, I'd just skip the SQL entirely. Trying to use GROUP BY could lock your table for way too long.

SELECT id_two FROM TRIPLE INTO OUTFILE('/tmp/idtwo.txt')

On a Unix-like system, this will produce a listing with two columns: number of occurrences, ID

# sort -n /tmp/idtwo.txt | uniq -c

..if you must use GROUP BY, add an ORDER BY NULL to the end of your select statement to save some time. Otherwise MySQL will try to order your groups by the default order (the value of id_two).

回复收藏 0 原文

梦旅人picnic 2024-11-09 14:23:28

试试这个：

select id_two,Frequency=count(*)
from triple
group by id_two
having count(*) > 1 -- replace 1 with desired threshold

Try this:

select id_two,Frequency=count(*)
from triple
group by id_two
having count(*) > 1 -- replace 1 with desired threshold

回复收藏 0 原文

~没有更多了~

关于作者

莫多说

暂无简介

0 文章

0 评论

21 人气

关注发私信

友情链接

文江博客

MySQL 查询：计算一个非常大的表中的重复值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

MySQL 查询：计算一个非常大的表中的重复值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。