在Elasticsearch中编辑和重新索引大量数据(数百万记录)

发布于 2025-01-21 14:16:07 字数 446 浏览 3 评论 0原文

最近,我为我的Elasticsearch数据制作了新版本的索引,其中包含一些新字段。我从旧索引中重新索引,因此新索引具有所有旧数据,并带有新的映射,以包括新字段。

现在,我想在索引中更新我的所有Elasticsearch数据以包含这些新字段,我可以通过对其他来源进行一些单独的数据库 + API调用来计算这些字段。

鉴于索引中有数百万个记录,最好的方法是什么?

从逻辑上讲,我不确定如何完成此操作...就像我如何跟踪我更新的记录?我一直在阅读有关卷轴API的信息,但是不确定这是否有效,因为最高滚动时间为24小时(如果需要的时间比这需要更长的时间)?同样,一个认真的考虑是,由于我需要进行其他数据库调用来计算新的字段值,因此我不想在一次会话中敲打该数据库的时间太长。

是否有某种方法可以每晚进行10分钟的更新,但是请跟踪已更新/需要更新的记录吗?

我只是不确定很多,都将感谢有关如何解决的任何见解或其他想法。

I recently made a new version of an index for my elasticsearch data with some new fields included. I re-indexed from the old index, so the new index has all of the old data with also the new mapping to include the new fields.

Now, I'd like to update all of my elasticsearch data in the index to include these new fields, which I can calculate by making some separate database + api calls to other sources.

What is the best way to do this, given that there are millions of records in the index?

Logistically speaking I'm not sure how to accomplish this... as in how can I keep track of the records that I've updated? I've been reading about the scroll api, but not certain if this is valid because of the max scroll time of 24 hours (what if it takes longer than that)? Also a serious consideration is that since I need to make other database calls to calculate the new field values, I don't want to hammer that database for too long in a single session.

Would there be some way to run an update for say 10 minutes every night, but keep track of what records have been updated/need updating?

I'm just not sure about a lot of this, would appreciate any insights or other ideas on how to go about it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

丑丑阿 2025-01-28 14:16:07

您将需要通过查询原始索引进行更新,这很昂贵,

您可能可以使用别名指向其背后的索引,并且当您想进行更改时,使用新映射等将其附加到别名上,以便正确写入新数据。然后将“旧”数据重新索引到新索引中,

该索引取决于您正在做什么的细节

you would need to run an update by query on your original index, which is expensive

you might be able to use aliases to point to indices behind that, and when you want to make a change, create a new index with the new mappings etc and attach it to the alias so new data coming in gets written correctly. then reindex the "old" data into the new index

that will depend on the details of what you're doing though

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文