大量相同值的索引/性能策略

发布于 2024-10-11 13:11:41 字数 629 浏览 1 评论 0原文

基本信息:这是 OpenStreetMap 数据索引过程的上下文。为了简化问题:核心信息分为3个主要类型,值为“W”、“R”、“N”(VARCHAR(1))。

该表大约有约 75M 行,所有带有“W”的列组成约 42M 行。现有索引与此问题无关。


现在问题本身:数据的索引是通过一个过程完成的。在此过程中,有一些循环执行以下操作:

[...] SELECT * FROM table WHERE the_key = "W"; [...]

结果再次循环,上面的查询本身也在循环中。这需要花费大量时间并大大减慢该过程。索引 the_key 显然是无用的,因为索引可能使用的所有值都是相同的(“W”)。脚本本身的运行速度还可以,只是 SELECT 需要很长时间。

我是否

  • 需要创建一种“特殊”类型的索引来考虑这一点并使 SELECT 更快?如果有,是哪一个?
  • 需要调整一些服务器参数(它们已经调整过,并且它们提供的结果似乎不错。如果需要,我可以发布它们)?
  • 必须适应速度并简单地获得更多硬件来获得更多功能(蒂姆·泰勒咕噜咕噜咕噜咕噜)?

上述几点的任何替代方案(除了重写或不使用它)?

Base information: This is in context to the indexing process of OpenStreetMap data. To simplify the question: the core information is divided into 3 main types with value "W", "R", "N" (VARCHAR(1)).

The table has somewhere around ~75M rows, all columns with "W" make up ~42M rows. Existing indexes are not relevant to this question.


Now the question itself: The indexing of the data is done via an procedure. Inside this procedure, there are some loops that do the following:

[...] SELECT * FROM table WHERE the_key = "W"; [...]

The results get looped again and the above query itself is also in a loop. This takes a lot of time and slows down the process massivly. An indexon the_key is obviously useless since all the values that the index might use are the same ("W"). The script itself is running with a speed that is OK, only the SELECTing takes very long.

Do I

  • need to create a "special" kind of index that takes this into account and makes the SELECT quicker? If so, which one?
  • need to tune some of the server parameters (they are already tuned and the result that they deliver seem to be good. If needed, I can post them)?
  • have to live with the speed and simply get more hardware to gain more power (Tim Taylor grunt grunt)?

Any alternatives to the above points (except rewriting it or not using it)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

固执像三岁 2024-10-18 13:11:41

如果您将 work_mem 设置得足够高以启用位图索引扫描,则此查询可以使用索引。然而,优化器很可能仍然不会选择使用它。总而言之,没有太多可以优化的地方。看起来周围的循环代码需要改进。

This query could use an index if you set work_mem high enough to enable a bitmap index scan. It's quite possible, however, that the optimizer still won't choose to use it. Altogether, there isn't much to optimize about this. It looks like the surrounding looping code is in need of improvement.

可是我不能没有你 2024-10-18 13:11:41

首先你说:

该表大约有大约 75M
行,所有列由“W”组成
约 4200 万行。

那你说你

SELECT * FROM table WHERE the_key = "W";

循环执行几次并期望它执行?这是不可能的 - 没有索引会加速这个查询 - 它必须返回 42M 行 - 超过一半。如果您拒绝重写此索引过程以避免多次查询,那么它就值得 The Daily WTF 了。

First you say:

The table has somewhere around ~75M
rows, all columns with "W" make up
~42M rows.

Then you say that you do

SELECT * FROM table WHERE the_key = "W";

several times in a loop and expect it to perform? It is impossible - no indexing would speed up this query - it has to return 42M rows - more than half. If you refuse to rewrite this indexing procedure to avoid querying this more than once then it is just The Daily WTF worthy.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文