Elasticsearch 聚合显示的总计不正确
Elasticsearch 版本是 7.4.2
我对 Elasticsearch 很不满意,我试图找出这个查询出了什么问题。
{
"size": 10,
"from": 0,
"query": {
"bool": {
"must": [
{
"exists": {
"field": "firstName"
}
},
{
"query_string": {
"query": "*",
"fields": [
"params.display",
"params.description",
"params.name",
"lastName"
]
}
},
{
"match": {
"status": "DONE"
}
}
],
"filter": [
{
"term": {
"success": true
}
}
]
}
},
"sort": {
"createDate": "desc"
},
"collapse": {
"field": "lastName.keyword",
"inner_hits": {
"name": "lastChange",
"size": 1,
"sort": [
{
"createDate": "desc"
}
]
}
},
"aggs": {
"total": {
"cardinality": {
"field": "lastName.keyword"
}
}
}
}
它返回:
"aggregations": {
"total": {
"value": 429896
}
}
So ~430k 结果,但在分页中我们不再获得 426k 标记附近的结果。意思是,当我运行查询时,
{
"size": 10,
"from": 427000,
...
}
我得到:
{
"took": 2215,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"total": {
"value": 429896
}
}
}
但如果我将 from
更改为 426000,我仍然会得到结果。
Elasticsearch version is 7.4.2
I suck at Elasticsearch and I'm trying to figure out what's wrong with this query.
{
"size": 10,
"from": 0,
"query": {
"bool": {
"must": [
{
"exists": {
"field": "firstName"
}
},
{
"query_string": {
"query": "*",
"fields": [
"params.display",
"params.description",
"params.name",
"lastName"
]
}
},
{
"match": {
"status": "DONE"
}
}
],
"filter": [
{
"term": {
"success": true
}
}
]
}
},
"sort": {
"createDate": "desc"
},
"collapse": {
"field": "lastName.keyword",
"inner_hits": {
"name": "lastChange",
"size": 1,
"sort": [
{
"createDate": "desc"
}
]
}
},
"aggs": {
"total": {
"cardinality": {
"field": "lastName.keyword"
}
}
}
}
It returns:
"aggregations": {
"total": {
"value": 429896
}
}
So ~430k results, but in pagination we stop getting results around the 426k mark. Meaning, when I run the query with
{
"size": 10,
"from": 427000,
...
}
I get:
{
"took": 2215,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"total": {
"value": 429896
}
}
}
But if I change from
to be 426000 I still get results.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您正在将字段
lastName.keyword
的基数聚合值与索引中的总文档数进行比较,这是两个不同的事情。您可以使用 count API 和 from/size 您在查询级别定义,即它会带来与您的搜索查询匹配的文档,并且由于您没有
track_total_hits
它显示10k
有关系gte
表示有超过10k
文档与您的搜索查询匹配。当涉及到您的聚合时,我可以看到在这两种情况下它都会返回计数为
429896
,因为此聚合不取决于您在查询中提到的 from/size。You are comparing the cardinality aggregation value of your field
lastName.keyword
to your total documents in the index, which is two different things.You can check the total no of documents in your index using the count API and from/size you are defined at query level ie it brings the documents matching your search query and as you don't have
track_total_hits
it shows10k
with relationgte
means there are more than10k
documents matching your search query.When it comes to your aggregation, I can see in both the case it returns the count as
429896
as this aggregation is not depend on the from/size you are mentioning for your query.当我发现基数参数有 精确控制。
设置最大值是我的解决方案。
I was surprised when I found out that the cardinality parameter has Precision control.
Setting the maximum value was the solution for me.