通过组优化范围查询

发布于 2025-01-25 20:38:29 字数 1104 浏览 4 评论 0原文

在查询方面遇到麻烦。这是轮廓 -

表结构：

CREATE TABLE `world` (
  `placeRef` int NOT NULL,
  `forenameRef` int NOT NULL,
  `surnameRef` int NOT NULL,
  `incidence` int NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb3;

ALTER TABLE `world`
  ADD KEY `surnameRef_forenameRef` (`surnameRef`,`forenameRef`),
  ADD KEY `forenameRef_surnameRef` (`forenameRef`,`surnameRef`),
  ADD KEY `forenameRef` (`forenameRef`,`placeRef`);
COMMIT;

此表包含类似于600,000,000行的数据：

placeRef    forenameRef    surnameRef    incidence
1           1              2             100
2           1              3             600

这表示在某个地方具有给定的forename -sermame组合的人数。

我希望能够查询姓氏所附加的所有前令；然后对这些前命名的存在的位置进行另一次搜索，并以总和的计数。例如：获取所有具有“史密斯”姓氏的人；然后获取所有这些前命名的列表，该列表按位置和总和发病率分组。我可以通过以下查询来执行此操作：

SELECT placeRef, SUM( incidence )
FROM world
WHERE forenameRef IN
(
    SELECT DISTINCT forenameRef
    FROM world
    WHERE surnameRef = 214488
)
GROUP BY world.placeRef

但是，此查询大约需要一分钟的时间来执行，如果搜索姓氏很常见，则需要更多时间。

根问题是：使用组执行范围查询不会使用完整的索引。

有任何建议如何提高速度？

原文

Having trouble with a query. Here is the outline -

Table structure:

CREATE TABLE `world` (
  `placeRef` int NOT NULL,
  `forenameRef` int NOT NULL,
  `surnameRef` int NOT NULL,
  `incidence` int NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb3;

ALTER TABLE `world`
  ADD KEY `surnameRef_forenameRef` (`surnameRef`,`forenameRef`),
  ADD KEY `forenameRef_surnameRef` (`forenameRef`,`surnameRef`),
  ADD KEY `forenameRef` (`forenameRef`,`placeRef`);
COMMIT;

This table contains data like and has over 600,000,000 rows:

placeRef    forenameRef    surnameRef    incidence
1           1              2             100
2           1              3             600

This represents the number of people with a given forename-surname combination in a place.

I would like to be able to query all the forenames that a surname is attached to; and then perform another search for where those forenames exist, with a count of the sum incidence. For Example: get all the forenames of people who have the surname "Smith"; then get a list of all those forenames, grouped by place and with the sum incidence. I can do this with the following query:

SELECT placeRef, SUM( incidence )
FROM world
WHERE forenameRef IN
(
    SELECT DISTINCT forenameRef
    FROM world
    WHERE surnameRef = 214488
)
GROUP BY world.placeRef

However, this query takes about a minute to execute and will take more time if the surname being searched for is common.

The root problem is: performing a range query with a group doesn't utilize the full index.

Any suggestions how the speed could be improved?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

乖乖哒 2025-02-01 20:38:29

根据我的经验，如果您的查询具有范围条件（即除=或以外的任何类型的谓词是null），则该条件的列是您的最后一列可用于优化搜索，排序或分组的索引。

换句话说，假设您在列上有一个索引（a，b，c）。

以下使用所有三列。它能够通过C优化订单，因为由于所有与A和B的特定值匹配的行都将被绑定，然后这些匹配的行已经按c按顺序进行，因此订购是一个no-op。

SELECT * FROM mytable WHERE a = 1 AND b = 2 ORDER BY c;

但是下一个示例仅使用列a，b。需要执行Filesort的订单，因为该索引不是按c按顺序订购的。

SELECT * FROM mytable WHERE a = 1 AND b > 2 ORDER BY c;

组对小组也是如此。以下使用A，B进行行选择，并且还可以通过使用索引来优化组，因为c的每个不同值的每个值都可以保证将其分组在一起。因此，它可以计算c的每个值的行，并且当使用一个组完成时，可以保证，以后不会再有c的值。

SELECT c, COUNT(*) FROM mytable WHERE a = 1 AND b = 2 GROUP BY c;

但是范围条件会破坏这一点。 c的每个值的行未分组在一起。假定c的每个值的行可以分散在b的每个较高值之间。

SELECT c, COUNT(*) FROM mytable WHERE a = 1 AND b > 2 GROUP BY c;

在这种情况下，MySQL无法通过此查询中的组来优化组。它必须使用临时表来计数c的不同值的行。

MySQL 8.0.13引入了一种新型的优化器行为，跳过扫描范围访问方法。但据我所知，它仅适用于范围条件，而不是按或组成。

确实，如果您有范围条件，这会破坏订单和组的索引优化。

In my experience, if your query has a range condition (i.e. any kind of predicate other than = or IS NULL), the column for that condition is the last column in your index that can be used to optimize search, sort, or grouping.

In other words, suppose you have an index on columns (a, b, c).

The following uses all three columns. It is able to optimize the ORDER BY c, because since all rows matching the specific values of a and b will by definition be tied, and then those matching rows will already be in order by c, so the ORDER BY is a no-op.

SELECT * FROM mytable WHERE a = 1 AND b = 2 ORDER BY c;

But the next example only uses columns a, b. The ORDER BY needs to do a filesort, because the index is not in order by c.

SELECT * FROM mytable WHERE a = 1 AND b > 2 ORDER BY c;

A similar effect is true for GROUP BY. The following uses a, b for row selection, and it can also optimize the GROUP BY using the index, because each group of values per distinct value of c is guaranteed to be grouped together in the index. So it can count the rows for each value of c, and when it's done with one group, it is assured there will be no more rows later with that value of c.

SELECT c, COUNT(*) FROM mytable WHERE a = 1 AND b = 2 GROUP BY c;

But the range condition spoils that. The rows for each value of c are not grouped together. It's assumed that the rows for each value of c may be scattered among each of the higher values of b.

SELECT c, COUNT(*) FROM mytable WHERE a = 1 AND b > 2 GROUP BY c;

In this case, MySQL can't optimize the GROUP BY in this query. It must use a temporary table to count the rows per distinct value of c.

MySQL 8.0.13 introduced a new type of optimizer behavior, the Skip Scan Range Access Method. But as far as I know, it only applies to range conditions, not ORDER BY or GROUP BY.

It's still true that if you have a range condition, this spoils the index optimization of ORDER BY and GROUP BY.

回复收藏 0 原文

只是我以为 2025-02-01 20:38:29

除非我不了解任务，否则似乎有效：

SELECT placeRef, SUM( incidence )
    FROM world
    WHERE surnameRef = 214488
    GROUP BY placeRef;

尝试一下。

它将从此顺序中的复合索引中受益：

INDEX(surnameRef, placeRef, incidence)

发病率被大量更新吗？如果是这样，请将其放下我的索引。

您应该考虑从Myisam转移到InnoDB。它可能需要合适的PK，

PRIMARY KEY(placeRef, surnameRef, forenameRef)

并且需要2倍-3x磁盘空间。

Unless I don't understand the task, it seems like this works:

SELECT placeRef, SUM( incidence )
    FROM world
    WHERE surnameRef = 214488
    GROUP BY placeRef;

Give it a try.

It would benefit from a composite index in this order:

INDEX(surnameRef, placeRef, incidence)

Is incidence being updated a lot? If so, leave it off my Index.

You should consider moving from MyISAM to InnoDB. It will need a suitable PK, probably

PRIMARY KEY(placeRef, surnameRef, forenameRef)

and it will take 2x-3x the disk space.

回复收藏 0 原文

~没有更多了~

关于作者

ζ澈沫

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

通过组优化范围查询

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

lylex099819

yg

mb_PT8LkUS5

埋情葬爱

佚名

奢望

友情链接

通过组优化范围查询

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

lylex099819

yg

mb_PT8LkUS5

埋情葬爱

佚名

奢望

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。