从春季data-elasticsearch升级为4.2.1到4.3.0后,相同文档的不同分数不同

发布于 2025-02-10 15:37:13 字数 514 浏览 1 评论 0 原文

我目前正在升级项目的春季启动版本。从2.5升级到2.6后,一些测试开始失败,这涉及Elasticsearch文档的检索。我试图仅获取最高得分文件,但是当期望2个相同的文档时,只会检索1个。

在阅读了问题后,我发现问题归因于使用多个碎片的ElasticsearchIndex,每个碎片都有自己的评分逻辑,并且(可能?)相同的文档是从不同的碎片中获取的,因此,尽管实际上是不同的分数相同的。

现在,谁能告诉我为什么这在较新的春季data-elasticsearch版本中会发生这种情况,以及是否有设置将其返回到旧功能?

我已经设置了一个小测试项目,可以解决这个问题。如果有人有兴趣自己尝试此操作,请随时检查一下: https:// github .com/Moldavis/Elasticsearch-Scoring-Poc

I'm currently in the process of upgrading the spring boot version of my project. After upgrading from 2.5 to 2.6 a few tests started failing which deal with the retrival of elasticsearch documents. I'm trying to fetch only the highest scoring documents, but when expecting 2 identical documents, only 1 is retrieved.

After reading up on the issue I figured out that the problem comes down to the Elasticsearchindex using multiple shards, each having their own scoring logic and (probably?) the identical documents being fetched from different shards, thus resulting in different scores despite being virtually the same.

Now, can anyone tell me why this happens in the newer spring-data-elasticsearch version and if there is a setting to return it to the old functionality?

I've set up a little test project to play around with this. If anyone is interested in trying this for themselves, feel free to check it out: https://github.com/Moldavis/elasticsearch-scoring-poc

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

罗罗贝儿 2025-02-17 15:37:14

实际上,在春季数据中断更改文档(DUH)中找到了我自己的答案。

https://docs.spring.io/spring-data/elasticsearch/docs/current/referent/referent/html/html/#elasticsearch-migration-migration-guide-guide-guide-4.2-4.3.breaking-changes

search_type default value
The default value for the search_type in Elasticsearch is query_then_fetch. This now is also set as default value in the Query implementations, it was previously set to dfs_query_then_fetch.

文档和期限频率等于不同碎片之间的分数。默认情况下,这不再使用,因此发生了上述问题。

可以通过为查询设置搜索类型来修复它:

queryBuilder.withSearchType(SearchType.DFS_QUERY_THEN_FETCH);

Actually found my own answer in the spring data breaking changes documentation (duh).

https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#elasticsearch-migration-guide-4.2-4.3.breaking-changes

search_type default value
The default value for the search_type in Elasticsearch is query_then_fetch. This now is also set as default value in the Query implementations, it was previously set to dfs_query_then_fetch.

The dfs_query_then_fetch option queries all shards for document and term frequency to equal out the score between different shards. This is no longer used by default, therefore the mentioned problem occurs.

It can be fixed by setting the searchtype for the query like so:

queryBuilder.withSearchType(SearchType.DFS_QUERY_THEN_FETCH);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文