版本之间的索引/搜索算法稳定性
我正在从Elasticsearch 1.5
迁移到7.10
有多个必需的更改,最相关的更改是删除版本6中的文档类型概念,以处理它引入了一个新的字段doc_type
,然后在搜索时与之匹配。 我的问题是,当我进行相同的(或同等因素,因为有一些更改)时,我应该期望获得完全相同的结果集吗?因为我有一些差异,所以我想弄清楚是在新映射中或搜索查询中打破了某些内容。 在第一个问题之后提前
编辑:
一般:我有一项与ES 1.5
通信的服务,我必须将其迁移到ES 7.10
使外部API尽可能稳定。
- 我不使用得分。
- 以前,我有文档类型
a
和b
,例如:迁移后,我将a
或b
保存在doc_type
中,并且查询变为host/indexName/_search
with a“ bool”:{“应该”:[{“ enter”:{“ doc_type”:[a a'],“ boost”:1.0}},{“ tenter”:{“ doc_type”:[b “],“ boost”:1.0}}],“ aptim_pure_negative”:true,“ boost”:1.0}
在身体中。如果我将其放在a
和b
的不同索引中我不知道我应该遵循哪种策略,因此将其全部保持在一起,我会从ES中获得混合(doc_type
)的响应。我遵循这种特定方法 https> https:// wwwwwwwwww .elastic.co/blog/删除映像型型 - elasticsearch#custom-type-field - 差异不是很大,很难显示具体示例,因为它是一个复杂的数据/文档结构,而是这个想法是,对于
1.5
此响应以进行给出查询:[a,b,c,d,e,f,g,h,i,j]
(每个人都可以具有任何类型a
或) 在7.10的情况下,我有这样的答复:
[A,B,E,C,D,F,G,H,I,J]
或[a,b,c,d,e,e,g,i,i,i,j,k,k,k ]
第二次编辑: 此查询是从Java客户端生成的。
{
"from":0,
"size":100,
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"mark_deleted:false",
"fields":[
],
"type":"best_fields",
"default_operator":"or",
"max_determinized_states":10000,
"enable_position_increments":true,
"fuzziness":"AUTO",
"fuzzy_prefix_length":0,
"fuzzy_max_expansions":50,
"phrase_slop":0,
"escape":false,
"auto_generate_synonyms_phrase_query":true,
"fuzzy_transpositions":true,
"boost":1.0
}
},
{
"bool":{
"should":[
{
"terms":{
"type":[
"A"
],
"boost":1.0
}
},
{
"terms":{
"type":[
"B"
],
"boost":1.0
}
},
{
"terms":{
"type":[
"D"
],
"boost":1.0
}
}
],
"adjust_pure_negative":true,
"boost":1.0
}
}
],
"adjust_pure_negative":true,
"boost":1.0
}
},
"post_filter":{
"term":{
"mark_deleted":{
"value":false,
"boost":1.0
}
}
},
"sort":[
{
"a_specific_date":{
"order":"desc"
}
}
],
"highlight":{
"pre_tags":[
"<b>"
],
"post_tags":[
"</b>"
],
"no_match_size":120,
"fields":{
"body":{
"fragment_size":120,
"number_of_fragments":1
}
}
}
}
I'm migrating from Elasticsearch 1.5
to 7.10
there are multiple required changes, the most relevant one is the removal of the document type concept in version 6, to deal with it I introduced a new field doc_type
and then I match with it when I search.
My question is, when I make the same (or equivalent because there are some changes) search query should I expect to have the exact same result set? Because I'm having some differences, so I would like to figure out if I broke something in the new mappings or in the search query.
Thank you in advance
Edit after first question:
In general: I have a service that communicates with ES 1.5
and I have to migrate it to ES 7.10
keeping the external API as stable as possible.
- I'm not using scoring.
- Previously I had document types
A
andB
, when I make a query like this for example:host/indexname/A,B/_search
, after the migration I keepA
orB
indoc_type
, and the query becomeshost/indexname/_search
with a"bool":{"should":[{"terms":{"doc_type":["A"],"boost":1.0}},{"terms":{"doc_type":["B"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}
in the body. If I put it in different indexes forA
andB
and the user want to match in both of them I'll have to "merge" the search response for both queries and I don't know which strategy should I follow for that, so keeping it all together I get a response with mixed (doc_type
) results from ES. I followed this specific approach https://www.elastic.co/blog/removal-of-mapping-types-elasticsearch#custom-type-field - The differences are not so big, difficult to show a concrete example because it's a complex data/doc structure but the idea is, having for
1.5
this response for a giving query for example:[a, b, c, d, e, f, g, h, i, j]
(where each one may have any of typesA
orB
)
With 7.10 I'm having responses like:[a, b, e, c, d, f, g, h, i, j]
or[a, b, c, d, e, g, i, j, k]
Second edit:
This query has been generated from the java client.
{
"from":0,
"size":100,
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"mark_deleted:false",
"fields":[
],
"type":"best_fields",
"default_operator":"or",
"max_determinized_states":10000,
"enable_position_increments":true,
"fuzziness":"AUTO",
"fuzzy_prefix_length":0,
"fuzzy_max_expansions":50,
"phrase_slop":0,
"escape":false,
"auto_generate_synonyms_phrase_query":true,
"fuzzy_transpositions":true,
"boost":1.0
}
},
{
"bool":{
"should":[
{
"terms":{
"type":[
"A"
],
"boost":1.0
}
},
{
"terms":{
"type":[
"B"
],
"boost":1.0
}
},
{
"terms":{
"type":[
"D"
],
"boost":1.0
}
}
],
"adjust_pure_negative":true,
"boost":1.0
}
}
],
"adjust_pure_negative":true,
"boost":1.0
}
},
"post_filter":{
"term":{
"mark_deleted":{
"value":false,
"boost":1.0
}
}
},
"sort":[
{
"a_specific_date":{
"order":"desc"
}
}
],
"highlight":{
"pre_tags":[
"<b>"
],
"post_tags":[
"</b>"
],
"no_match_size":120,
"fields":{
"body":{
"fragment_size":120,
"number_of_fragments":1
}
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,由于您不关心评分,因此应该在顶层使用
bool/filter
而不是bool/must
,否则您的结果将按_score 排序
默认情况下,在 1.7 和 7.10 之间,发生了很多变化,这可以解释您所得到的差异。因此,您最好简单地使用_score
之外的任何其他字段对结果进行排序其次,您可以使用
type
上的bool/should
一个简单的terms
查询,它执行完全相同的工作,但以更简单的方式:最后,我不确定为什么您使用
query_string
查询来执行精确匹配mark_deleted:false
,这对我来说没有意义。一个简单的term
查询在这里会更好、更充分。也不清楚为什么您删除了
post_filter
中也有mark_deleted:false
的所有结果,因为它与您的query_string
约束中的条件相同。First, since you don't care about scoring you should use
bool/filter
instead ofbool/must
at the top level, otherwise your results are sorted by_score
by default and between 1.7 et 7.10, there have been so many changes that it would explain the differences you get. So you're better off simply sorting the results using any other field than_score
Second, instead of the
bool/should
ontype
you can use a simpleterms
query, which does exactly the same job, yet in a simpler way:Finally, I'm not sure why you're using a
query_string
query to do an exact match onmark_deleted:false
, it doesn't make sense to me. A simpleterm
query would be better and more adequate here.Also not clear why you have remove all results that also have
mark_deleted:false
in yourpost_filter
, since it's the same condition as in yourquery_string
constraint.