Elasticsearch 聚合显示的总计不正确

发布于 2025-01-12 12:58:23 字数 1795 浏览 3 评论 0原文

Elasticsearch 版本是 7.4.2

我对 Elasticsearch 很不满意，我试图找出这个查询出了什么问题。

{
  "size": 10,
  "from": 0,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "firstName"
          }
        },
        {
          "query_string": {
            "query": "*",
            "fields": [
              "params.display",
              "params.description",
              "params.name",
              "lastName"
            ]
          }
        },
        {
          "match": {
            "status": "DONE"
          }
        }
      ],
      "filter": [
        {
          "term": {
            "success": true
          }
        }
      ]
    }
  },
  "sort": {
    "createDate": "desc"
  },
  "collapse": {
    "field": "lastName.keyword",
    "inner_hits": {
      "name": "lastChange",
      "size": 1,
      "sort": [
        {
          "createDate": "desc"
        }
      ]
    }
  },
  "aggs": {
    "total": {
      "cardinality": {
        "field": "lastName.keyword"
      }
    }
  }
}

它返回：

    "aggregations": {
        "total": {
            "value": 429896
        }
    }

So ~430k 结果，但在分页中我们不再获得 426k 标记附近的结果。意思是，当我运行查询时，

{
  "size": 10,
  "from": 427000,
...
}

我得到：

{
    "took": 2215,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "total": {
            "value": 429896
        }
    }
}

但如果我将 from 更改为 426000，我仍然会得到结果。

原文

Elasticsearch version is 7.4.2

I suck at Elasticsearch and I'm trying to figure out what's wrong with this query.

{
  "size": 10,
  "from": 0,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "firstName"
          }
        },
        {
          "query_string": {
            "query": "*",
            "fields": [
              "params.display",
              "params.description",
              "params.name",
              "lastName"
            ]
          }
        },
        {
          "match": {
            "status": "DONE"
          }
        }
      ],
      "filter": [
        {
          "term": {
            "success": true
          }
        }
      ]
    }
  },
  "sort": {
    "createDate": "desc"
  },
  "collapse": {
    "field": "lastName.keyword",
    "inner_hits": {
      "name": "lastChange",
      "size": 1,
      "sort": [
        {
          "createDate": "desc"
        }
      ]
    }
  },
  "aggs": {
    "total": {
      "cardinality": {
        "field": "lastName.keyword"
      }
    }
  }
}

It returns:

    "aggregations": {
        "total": {
            "value": 429896
        }
    }

So ~430k results, but in pagination we stop getting results around the 426k mark. Meaning, when I run the query with

{
  "size": 10,
  "from": 427000,
...
}

I get:

{
    "took": 2215,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "total": {
            "value": 429896
        }
    }
}

But if I change from to be 426000 I still get results.

分享到QQ

分享到微博