Elasticsearch 聚合显示的总计不正确

发布于 2025-01-12 12:58:23 字数 1795 浏览 3 评论 0原文

Elasticsearch 版本是 7.4.2

我对 Elasticsearch 很不满意,我试图找出这个查询出了什么问题。

{
  "size": 10,
  "from": 0,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "firstName"
          }
        },
        {
          "query_string": {
            "query": "*",
            "fields": [
              "params.display",
              "params.description",
              "params.name",
              "lastName"
            ]
          }
        },
        {
          "match": {
            "status": "DONE"
          }
        }
      ],
      "filter": [
        {
          "term": {
            "success": true
          }
        }
      ]
    }
  },
  "sort": {
    "createDate": "desc"
  },
  "collapse": {
    "field": "lastName.keyword",
    "inner_hits": {
      "name": "lastChange",
      "size": 1,
      "sort": [
        {
          "createDate": "desc"
        }
      ]
    }
  },
  "aggs": {
    "total": {
      "cardinality": {
        "field": "lastName.keyword"
      }
    }
  }
}

它返回:

    "aggregations": {
        "total": {
            "value": 429896
        }
    }

So ~430k 结果,但在分页中我们不再获得 426k 标记附近的结果。意思是,当我运行查询时,

{
  "size": 10,
  "from": 427000,
...
}

我得到:

{
    "took": 2215,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "total": {
            "value": 429896
        }
    }
}

但如果我将 from 更改为 426000,我仍然会得到结果。

Elasticsearch version is 7.4.2

I suck at Elasticsearch and I'm trying to figure out what's wrong with this query.

{
  "size": 10,
  "from": 0,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "firstName"
          }
        },
        {
          "query_string": {
            "query": "*",
            "fields": [
              "params.display",
              "params.description",
              "params.name",
              "lastName"
            ]
          }
        },
        {
          "match": {
            "status": "DONE"
          }
        }
      ],
      "filter": [
        {
          "term": {
            "success": true
          }
        }
      ]
    }
  },
  "sort": {
    "createDate": "desc"
  },
  "collapse": {
    "field": "lastName.keyword",
    "inner_hits": {
      "name": "lastChange",
      "size": 1,
      "sort": [
        {
          "createDate": "desc"
        }
      ]
    }
  },
  "aggs": {
    "total": {
      "cardinality": {
        "field": "lastName.keyword"
      }
    }
  }
}

It returns:

    "aggregations": {
        "total": {
            "value": 429896
        }
    }

So ~430k results, but in pagination we stop getting results around the 426k mark. Meaning, when I run the query with

{
  "size": 10,
  "from": 427000,
...
}

I get:

{
    "took": 2215,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "total": {
            "value": 429896
        }
    }
}

But if I change from to be 426000 I still get results.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你穿错了嫁妆 2025-01-19 12:58:23

您正在将字段 lastName.keyword 的基数聚合值与索引中的总文档数进行比较,这是两个不同的事情。

您可以使用 count API 和 from/size 您在查询级别定义,即它会带来与您的搜索查询匹配的文档,并且由于您没有 track_total_hits 它显示 10k 有关系gte 表示有超过 10k 文档与您的搜索查询匹配。

当涉及到您的聚合时,我可以看到在这两种情况下它都会返回计数为 429896 ,因为此聚合不取决于您在查询中提到的 from/size。

You are comparing the cardinality aggregation value of your field lastName.keyword to your total documents in the index, which is two different things.

You can check the total no of documents in your index using the count API and from/size you are defined at query level ie it brings the documents matching your search query and as you don't have track_total_hits it shows 10k with relation gte means there are more than 10k documents matching your search query.

When it comes to your aggregation, I can see in both the case it returns the count as 429896 as this aggregation is not depend on the from/size you are mentioning for your query.

爱人如己 2025-01-19 12:58:23

当我发现基数参数有 精确控制

设置最大值是我的解决方案。

I was surprised when I found out that the cardinality parameter has Precision control.

Setting the maximum value was the solution for me.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文