如何通过弹性搜索将基数限制为某个阈值

发布于 2025-02-02 17:24:15 字数 1749 浏览 2 评论 0原文

我正在尝试在给定名称过滤时找到唯一值的数量, 当结果数太大时,问题就会出现,这增加了找到确切的基数的时间。 我实际上不需要确切的基数,也可以将搜索的最大唯一值限制为10000,然后停止搜索更多唯一值。

这是当前查询:

GET /my_index/_search?size=0
{
  "query": {
    "bool": {
      "must": {
        "simple_query_string": {
          "query": "abc*",
          "fields": ["name"],
          "analyzer": "whitespace"
        }
      }
    }
  },
  "aggs": {
    "unique_values": {
      "cardinality": {
        "field": "name.keyword"
      }
    }
  }
}

当前响应:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "unique_values" : {
      "value" : 98504
    }
  }
}

我希望所需的查询看起来像这样:

GET /my_index/_search?size=0
{
  "query": {
    "bool": {
      "must": {
        "simple_query_string": {
          "query": "abc*",
          "fields": ["name"],
          "analyzer": "whitespace"
        }
      }
    }
  },
  "aggs": {
    "unique_values": {
      "cardinality": {
        "field": "name.keyword"
        "limit": 10000
      }
    }
  }
}

所需的响应:< /strong>

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "unique_values" : {
      "value" : 10000
    }
  }
}

I'm trying to find the number of unique values when filtering over a given name,
the problem comes when the number of results is too big which increases the time to find the exact cardinality.
I actually don't need the exact cardinality, It's also fine to limit the maximum unique values we search for to 10000, and then stop searching for more.

This is the current query:

GET /my_index/_search?size=0
{
  "query": {
    "bool": {
      "must": {
        "simple_query_string": {
          "query": "abc*",
          "fields": ["name"],
          "analyzer": "whitespace"
        }
      }
    }
  },
  "aggs": {
    "unique_values": {
      "cardinality": {
        "field": "name.keyword"
      }
    }
  }
}

Current response:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "unique_values" : {
      "value" : 98504
    }
  }
}

I would expect the desired query to look something like this:

GET /my_index/_search?size=0
{
  "query": {
    "bool": {
      "must": {
        "simple_query_string": {
          "query": "abc*",
          "fields": ["name"],
          "analyzer": "whitespace"
        }
      }
    }
  },
  "aggs": {
    "unique_values": {
      "cardinality": {
        "field": "name.keyword"
        "limit": 10000
      }
    }
  }
}

The desired response:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "unique_values" : {
      "value" : 10000
    }
  }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

复古式 2025-02-09 17:24:15

不可能将基数限制在某个阈值中。但是,您有两种方法可以加快聚合计算。

答:通过玩 precision_threshold参数。因为默认值为4000,请尝试10000。B

。使用预先计算的哈希,因为如果预先计算哈希,则每次都不需要在查询时间计算它们。

It's not possible to cap the cardinality to some threshold. However, you have two ways of speeding up the aggregation computation.

A. By playing with the precision_threshold parameter. As the default is 4000, try 10000.

B. Using pre-computed hashes, because if hashes are pre-computed they don't need to be computed at query time everytime.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文