如何通过弹性搜索将基数限制为某个阈值

发布于 2025-02-02 17:24:15 字数 1749 浏览 2 评论 0原文

我正在尝试在给定名称过滤时找到唯一值的数量，当结果数太大时，问题就会出现，这增加了找到确切的基数的时间。我实际上不需要确切的基数，也可以将搜索的最大唯一值限制为10000，然后停止搜索更多唯一值。

这是当前查询：

GET /my_index/_search?size=0
{
  "query": {
    "bool": {
      "must": {
        "simple_query_string": {
          "query": "abc*",
          "fields": ["name"],
          "analyzer": "whitespace"
        }
      }
    }
  },
  "aggs": {
    "unique_values": {
      "cardinality": {
        "field": "name.keyword"
      }
    }
  }
}

当前响应：

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "unique_values" : {
      "value" : 98504
    }
  }
}

我希望所需的查询看起来像这样：

GET /my_index/_search?size=0
{
  "query": {
    "bool": {
      "must": {
        "simple_query_string": {
          "query": "abc*",
          "fields": ["name"],
          "analyzer": "whitespace"
        }
      }
    }
  },
  "aggs": {
    "unique_values": {
      "cardinality": {
        "field": "name.keyword"
        "limit": 10000
      }
    }
  }
}

所需的响应：< /strong>

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "unique_values" : {
      "value" : 10000
    }
  }
}

原文

I'm trying to find the number of unique values when filtering over a given name,
the problem comes when the number of results is too big which increases the time to find the exact cardinality.
I actually don't need the exact cardinality, It's also fine to limit the maximum unique values we search for to 10000, and then stop searching for more.

This is the current query:

GET /my_index/_search?size=0
{
  "query": {
    "bool": {
      "must": {
        "simple_query_string": {
          "query": "abc*",
          "fields": ["name"],
          "analyzer": "whitespace"
        }
      }
    }
  },
  "aggs": {
    "unique_values": {
      "cardinality": {
        "field": "name.keyword"
      }
    }
  }
}

Current response:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "unique_values" : {
      "value" : 98504
    }
  }
}

I would expect the desired query to look something like this:

GET /my_index/_search?size=0
{
  "query": {
    "bool": {
      "must": {
        "simple_query_string": {
          "query": "abc*",
          "fields": ["name"],
          "analyzer": "whitespace"
        }
      }
    }
  },
  "aggs": {
    "unique_values": {
      "cardinality": {
        "field": "name.keyword"
        "limit": 10000
      }
    }
  }
}

The desired response:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "unique_values" : {
      "value" : 10000
    }
  }
}

分享到QQ

分享到微博