我不明白elasticsearch滚动搜索中的解释

发布于 01-16 16:53 字数 285 浏览 2 评论 0原文

输入图片这里的描述

“保持初始搜索上下文的活动对于主动更新索引来说成本很高。”

上面这句话中的高成本是指内存占用吗?

那么,为什么内存占用这么高呢?

为了在保持活动状态的同时对索引的更新请求进行排队?

或者因为您正在内存中缓存活动索引快照?

enter image description here

"Keeping the initial search context alive has a high cost for actively updated indices."

Does the high cost in the sentence above refer to memory usage?

So, why is the memory usage so high?

In order to queue update requests of the index while remaining active?

Or because you're caching an active index snapshot in memory?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

有木有妳兜一样2025-01-23 16:53:45

来自 官方文档

滚动返回在初始搜索请求时与搜索匹配的所有文档。它忽略对这些文档的任何后续更改。 [...] 搜索上下文由初始请求创建,并通过后续请求保持活动状态。 [...]

通常,后台合并过程通过将较小的段合并在一起以创建新的更大的段来优化索引。一旦不再需要较小的段,它们就会被删除。此过程在滚动期间继续,但是开放的搜索上下文会阻止旧段被删除,因为它们仍在使用中。

保持较旧的段处于活动状态意味着需要更多的磁盘空间和文件句柄。确保您已将节点配置为拥有足够的可用文件句柄。请参阅文件描述符。

此外,如果分段包含已删除或更新的文档,则搜索上下文必须跟踪分段中的每个文档在初始搜索请求时是否处于活动状态。如果索引上有许多打开的卷轴需要持续删除或更新,请确保您的节点有足够的堆空间

上述文档中添加了重点,以强调为什么在相当长的一段时间内保持一个或多个滚动上下文处于活动状态的成本很高。 Elasticsearch 尽力保持一切新鲜和活力并丢弃旧数据,但滚动上下文基本上是将旧数据放在生命支持上,并将其在角落里多存放一段时间,然后再让它使用当不再需要滚动上下文时死亡。

这就是为什么需要更多的资源(主要是存储、文件句柄和堆)来保持滚动上下文的活动,这就是所谓的“高成本”

From the official documentation:

A scroll returns all the documents which matched the search at the time of the initial search request. It ignores any subsequent changes to these documents. [...] The search context is created by the initial request and kept alive by subsequent requests. [...]

Normally, the background merge process optimizes the index by merging together smaller segments to create new, bigger segments. Once the smaller segments are no longer needed they are deleted. This process continues during scrolling, but an open search context prevents the old segments from being deleted since they are still in use.

Keeping older segments alive means that more disk space and file handles are needed. Ensure that you have configured your nodes to have ample free file handles. See File Descriptors.

Additionally, if a segment contains deleted or updated documents then the search context must keep track of whether each document in the segment was live at the time of the initial search request. Ensure that your nodes have sufficient heap space if you have many open scrolls on an index that is subject to ongoing deletes or updates.

The emphasis have been added to the above documentation to highlight why it is costly keep to one or many scroll contexts alive during a substantial period of time. Elasticsearch makes its best to keep everything fresh and alive and discard the old data, but a scroll context is basically putting old data on life support and stashing it in a corner for a bit more time, before letting it die when the scroll context is not needed anymore.

That's why more resources (mainly storage, file handles and heap) are needed to keep scroll contexts alive, that's what is referred to by "high cost"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文