为什么 DBS 不自动调整/调整其缓冲区大小?

发布于 2024-10-07 02:36:13 字数 695 浏览 4 评论 0原文

不确定是否有 DBS 可以做到这一点,以及这是否确实是一个有用的功能,但是: 关于如何通过调整缓冲区大小来加速数据库操作有很多建议。一个示例是将 Open Street Map 数据(行星文件)导入 Postgres 实例。有一个名为 osm2pgsql (http://wiki.openstreetmap.org/wiki/Osm2pgsql) 的工具用于此目的,还有一个建议为此目的调整特定缓冲区参数的指南。 在导入的最后一步中,数据库正在创建索引,并且(根据我在阅读文档时的理解)将从巨大的maintenance_work_mem中受益,而在正常操作期间,这不会太有用。 此线程 http://www.mail-archive.com/< span class="__cf_email__" data-cfemail="76061105071a5b1113181304171a360619050211041305071a58190411">[email protected]/msg119245.html 相反表明,在最终索引创建。 理想情况下(imo),DBS 应该最清楚在给定有限的总缓冲区内存大小的情况下,哪种缓冲区大小组合最能带来收益。 那么,是否有一些充分的理由解释为什么没有一个能够根据当前任务自动调整缓冲区大小的内置启发式方法呢?

Not sure whether there isn't a DBS that does and whether this is indeed a useful feature, but:
There are a lot of suggestions on how to speed up DB operations by tuning buffer sizes. One example is importing Open Street Map data (the planet file) into a Postgres instance. There is a tool called osm2pgsql (http://wiki.openstreetmap.org/wiki/Osm2pgsql) for this purpose and also a guide that suggests to adapt specific buffer parameters for this purpose.
In the final step of the import, the database is creating indexes and (according to my understanding when reading the docs) would benefit from a huge maintenance_work_mem whereas during normal operation, this wouldn't be too useful.
This thread http://www.mail-archive.com/[email protected]/msg119245.html in the contrary suggests a large maintenance_work_mem would not make too much sense during final index creation.
Ideally (imo), the DBS should know best what buffers size combination it could profit most given a limited size of total buffer memory.
So, are there some good reasons why there isn't a built-in heuristic that is able to adapt the buffer sizes automatically according to the current task?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

盗琴音 2024-10-14 02:36:13

这个问题与任何预测软件都是一样的。历史上发生过的事情并不意味着它会再次发生。此外,你需要完成一项任务,以便充分分析你应该如何更有效地完成它。问题是下一个任务不一定与之前完成的任务相似。因此,如果您的导入例程需要 8GB 内存才能完成,那么为每个只读用户分配 8GB 内存是否有意义?反之亦然。

将这个决定留给人类时,数据库将表现出并非适合所有情况的最佳性能特征,但作为回报,让我们(人类)单独优化每个情况(如果愿意)。

另一个重要的方面是,大多数人/公司重视可靠和稳定的水平,而不是变化但可能更好的水平。成本高并不像成本差异大那么重要。当然,情况并非总是如此,因为整个公司都是基于这样一个事实:偶尔会达到 1%。

现代数据库已经做出了一些努力来适应所呈现的任务,例如越来越复杂的查询优化器。至少 Oracle 可以选择跟踪一些影响优化器决策的措施(单块读取的成本随当前负载而变化)。

The problem is the same as with any forecasting software. Just because something happened historically doesn't mean it will happen again. Also, you need to complete a task in order to fully analyze how you should have done it more efficient. Problem is that the next task is not necessarily anything like the previously completed task. So if your import routine needed 8gb of memory to complete, would it make sense to assign each read-only user 8gb of memory? The other way around wouldn't work well either.

In leaving this decision to humans, the database will exhibit performance characteristics that aren't optimal for all cases, but in return, let's us (the humans) optimize each case individually (if like to).

Another important aspect is that most people/companies value reliable and stable levels over varying but potentially better levels. Having a high cost isn't as big a deal as having large variations in cost. This is of course not true all the times as entire companies are based around the fact the once in a while hit that 1%.

Modern databases already make some effort into adapting itself to the tasks presented, such as increasingly more sofisticated query optimizers. At least Oracle have the option to keep track of some of the measures that are influencing the optimizer decisions (cost of single block read which will vary with the current load).

深府石板幽径 2024-10-14 02:36:13

我的猜测是,通过自适应方式来调整旋钮是非常困难的。首先,您必须查询机器以获取许多未知信息,例如它有多少可用 RAM,但也需要查询未知的“您还希望在机器上运行什么”。

除此之外,仅通过设置 max_mem_usage 参数,问题是如何使系统能够

  • 很好地适应大多数典型负载。
  • 某些负载不会出现奇怪的病理问题。
  • 是有点可理解的代码,没有错误。

然而对于 postgresql 来说,答案也可能是

  • 还没有人写它,因为其他东西被认为更重要。
  • 还没有写它。

My guess would be it is awfully hard to get the knobs right by adaptive means. First you will have to query the machine for a lot of unknowns like how much RAM it has available - but also the unknown "what do you expect to run on the machine in addition".

Barring that, by setting a max_mem_usage parameter only, the problem is how to make a system which

  • adapts well to most typical loads.
  • Don't have odd pathological problems with some loads.
  • is somewhat comprehensible code without error.

For postgresql however the answer could also be

  • Nobody wrote it yet because other stuff is seen as more important.
  • You didn't write it yet.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文