数据库:有关(B 树)索引的更多问题

发布于 2024-11-17 17:37:54 字数 249 浏览 5 评论 0原文

我一直在研究索引,有一些问题困扰着我,但我认为这些问题很重要。

如果您可以提供帮助或参考资料来源,请随时这样做。

Q1: B 树索引有利于快速访问表中的特定行。考虑一个 OLTP 系统,同时进行许多访问(读和写),您认为在该系统上拥有许多 B 树索引是否会是一个缺点?为什么?

Q2:为什么B-Tree索引没有被完全占用(如果我没记错的话,通常只有75%被占用)?

I've been studying indexes and there are some questions that pother me and which I think important.

If you can help or refer to sources, please feel free to do it.

Q1: B-tree indexes can favor a fast access to specific rows on a table. Considering an OLTP system, with many accesses, both Read and Write, simultaneously, do you think it can be a disadvantage having many B-tree indexes on this system? Why?

Q2: Why are B-Tree indexes not fully occupied (typically only 75% occupied, if I'm not mistaken)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

天气好吗我好吗 2024-11-24 17:37:54

Q1:我在实践中没有大型索引系统的管理经验,但是典型的多处理环境缺点适用于在系统上拥有多个 B 树索引——上下文切换的成本、缓存失效和刷新、糟糕的 IO 调度以及名单上升。另一方面,IO 本质上应该是非阻塞的,以最大限度地利用资源,如果没有某种并发性,即使以协作方式完成,也很难做到这一点。 (例如,有些人推荐基于事件的系统。)此外,对于许多实际应用程序,您将需要多个索引结构,尤其是在考虑 OLTP 时。这里最重要的是良好的 IO 调度、访问模式以及取决于所述访问模式的数据缓存。

Q2:因为分裂和重新平衡节点的成本很高。天真的速度方法是“只有在它们满了的情况下才分裂”。鉴于此,有两个极端:一个节点刚刚被分割并且已半满,或者一个节点已满,因此下次也会如此。案例(50% 和 100%)之间的“平均值”为 75%。是的,从数学角度来看,这有点糟糕的逻辑,但它揭示了 75% 这个数字出现的根本原因。

Q1: I've no administration experience with large indexing systems in practice, but the typical multiprocessing environment drawbacks apply to having multiple B-tree indexes on a system -- cost of context switching, cache invalidation and flushing, poor IO scheduling, and the list goes up. On the other hand, IO is something that inherently ought to be non-blocking for maximal use of resources, and it's hard to do that without some sort of concurrency, even if done in a cooperative manner. (For example, some people recommend event-based systems.) Also, you're going to need multiple index structures for many practical applications, especially if you're looking at OLTP. The biggest thing here is good IO scheduling, access patterns, and data caching depending on said access patterns.

Q2: Because splitting and re-balancing nodes is expensive. The naive methodology for speed is "only split with they're full." Given this, there's two extremes -- a node was just split and is half full, or a node is full so it will be next time. The 'average' between the cases (50% and 100%) is 75%. Yes, it's somewhat bad logic from a mathematics perspective, but it exposes the underlying reason as to why the 75% figure appears.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文