大量相同值的索引/性能策略
基本信息:这是 OpenStreetMap 数据索引过程的上下文。为了简化问题:核心信息分为3个主要类型,值为“W”、“R”、“N”(VARCHAR(1)
)。
该表大约有约 75M 行,所有带有“W”的列组成约 42M 行。现有索引与此问题无关。
现在问题本身:数据的索引是通过一个过程完成的。在此过程中,有一些循环执行以下操作:
[...] SELECT * FROM table WHERE the_key = "W"; [...]
结果再次循环,上面的查询本身也在循环中。这需要花费大量时间并大大减慢该过程。索引 the_key
显然是无用的,因为索引可能使用的所有值都是相同的(“W”)。脚本本身的运行速度还可以,只是 SELECT 需要很长时间。
我是否
- 需要创建一种“特殊”类型的索引来考虑这一点并使
SELECT
更快?如果有,是哪一个? - 需要调整一些服务器参数(它们已经调整过,并且它们提供的结果似乎不错。如果需要,我可以发布它们)?
- 必须适应速度并简单地获得更多硬件来获得更多功能(蒂姆·泰勒咕噜咕噜咕噜咕噜)?
上述几点的任何替代方案(除了重写或不使用它)?
Base information: This is in context to the indexing process of OpenStreetMap data. To simplify the question: the core information is divided into 3 main types with value "W", "R", "N" (VARCHAR(1)
).
The table has somewhere around ~75M rows, all columns with "W" make up ~42M rows. Existing indexes are not relevant to this question.
Now the question itself: The indexing of the data is done via an procedure. Inside this procedure, there are some loops that do the following:
[...] SELECT * FROM table WHERE the_key = "W"; [...]
The results get looped again and the above query itself is also in a loop. This takes a lot of time and slows down the process massivly. An indexon the_key
is obviously useless since all the values that the index might use are the same ("W"). The script itself is running with a speed that is OK, only the SELECT
ing takes very long.
Do I
- need to create a "special" kind of index that takes this into account and makes the
SELECT
quicker? If so, which one? - need to tune some of the server parameters (they are already tuned and the result that they deliver seem to be good. If needed, I can post them)?
- have to live with the speed and simply get more hardware to gain more power (Tim Taylor grunt grunt)?
Any alternatives to the above points (except rewriting it or not using it)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您将
work_mem
设置得足够高以启用位图索引扫描,则此查询可以使用索引。然而,优化器很可能仍然不会选择使用它。总而言之,没有太多可以优化的地方。看起来周围的循环代码需要改进。This query could use an index if you set
work_mem
high enough to enable a bitmap index scan. It's quite possible, however, that the optimizer still won't choose to use it. Altogether, there isn't much to optimize about this. It looks like the surrounding looping code is in need of improvement.首先你说:
那你说你
循环执行几次并期望它执行?这是不可能的 - 没有索引会加速这个查询 - 它必须返回 42M 行 - 超过一半。如果您拒绝重写此索引过程以避免多次查询,那么它就值得 The Daily WTF 了。
First you say:
Then you say that you do
several times in a loop and expect it to perform? It is impossible - no indexing would speed up this query - it has to return 42M rows - more than half. If you refuse to rewrite this indexing procedure to avoid querying this more than once then it is just The Daily WTF worthy.