如何通过弹性搜索将基数限制为某个阈值
我正在尝试在给定名称过滤时找到唯一值的数量, 当结果数太大时,问题就会出现,这增加了找到确切的基数的时间。 我实际上不需要确切的基数,也可以将搜索的最大唯一值限制为10000,然后停止搜索更多唯一值。
这是当前查询:
GET /my_index/_search?size=0
{
"query": {
"bool": {
"must": {
"simple_query_string": {
"query": "abc*",
"fields": ["name"],
"analyzer": "whitespace"
}
}
}
},
"aggs": {
"unique_values": {
"cardinality": {
"field": "name.keyword"
}
}
}
}
当前响应:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 12,
"successful" : 12,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"unique_values" : {
"value" : 98504
}
}
}
我希望所需的查询看起来像这样:
GET /my_index/_search?size=0
{
"query": {
"bool": {
"must": {
"simple_query_string": {
"query": "abc*",
"fields": ["name"],
"analyzer": "whitespace"
}
}
}
},
"aggs": {
"unique_values": {
"cardinality": {
"field": "name.keyword"
"limit": 10000
}
}
}
}
所需的响应:< /strong>
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 12,
"successful" : 12,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"unique_values" : {
"value" : 10000
}
}
}
I'm trying to find the number of unique values when filtering over a given name,
the problem comes when the number of results is too big which increases the time to find the exact cardinality.
I actually don't need the exact cardinality, It's also fine to limit the maximum unique values we search for to 10000, and then stop searching for more.
This is the current query:
GET /my_index/_search?size=0
{
"query": {
"bool": {
"must": {
"simple_query_string": {
"query": "abc*",
"fields": ["name"],
"analyzer": "whitespace"
}
}
}
},
"aggs": {
"unique_values": {
"cardinality": {
"field": "name.keyword"
}
}
}
}
Current response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 12,
"successful" : 12,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"unique_values" : {
"value" : 98504
}
}
}
I would expect the desired query to look something like this:
GET /my_index/_search?size=0
{
"query": {
"bool": {
"must": {
"simple_query_string": {
"query": "abc*",
"fields": ["name"],
"analyzer": "whitespace"
}
}
}
},
"aggs": {
"unique_values": {
"cardinality": {
"field": "name.keyword"
"limit": 10000
}
}
}
}
The desired response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 12,
"successful" : 12,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"unique_values" : {
"value" : 10000
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不可能将基数限制在某个阈值中。但是,您有两种方法可以加快聚合计算。
答:通过玩
precision_threshold
参数。因为默认值为4000,请尝试10000。B。使用预先计算的哈希,因为如果预先计算哈希,则每次都不需要在查询时间计算它们。
It's not possible to cap the cardinality to some threshold. However, you have two ways of speeding up the aggregation computation.
A. By playing with the
precision_threshold
parameter. As the default is 4000, try 10000.B. Using pre-computed hashes, because if hashes are pre-computed they don't need to be computed at query time everytime.