大列表的Elasticsearch聚合
我正在尝试计算不同文档中出现多少次成分。我的索引主体与成分字段中的索引相似
index_body = {
"settings":{
"index":{
"number_of_replicas":0,
"number_of_shards":4,
"refresh_interval":"-1",
"knn":"true"
}
},
"mappings":{
"properties":{
"recipe_id":{
"type":"keyword"
},
"recipe_title":{
"type":"text",
"analyzer":"standard",
"similarity":"BM25"
},
"description":{
"type":"text",
"analyzer":"standard",
"similarity":"BM25"
},
"ingredient":{
"type":"keyword"
},
"image":{
"type":"keyword"
},
....
}
}
,我存储了每种成分[ingredient1,ingredient2,....]
我有大约900个文档的字符串
。每个都有自己的成分列表。
我已经尝试使用Elasticsearch的聚合,但似乎并没有返回我的期望。 这是我一直在使用的查询:
{
"size":0,
"aggs":{
"ingredients":{
"terms": {"field":"ingredient"}
}
}
}
但是它返回以下内容:
{'took': 4, 'timed_out': False, '_shards': {'total': 4, 'successful': 4, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 994, 'relation': 'eq'}, 'max_score': None, 'hits': []}, 'aggregations': {'ingredients': {'doc_count_error_upper_bound': 56, 'sum_other_doc_count': 4709, 'buckets': [{'key': 'salt', 'doc_count': 631}, {'key': 'oil', 'doc_count': 320}, {'key': 'sugar', 'doc_count': 314}, {'key': 'egg', 'doc_count': 302}, {'key': 'butter', 'doc_count': 291}, {'key': 'flour', 'doc_count': 264}, {'key': 'garlic', 'doc_count': 220}, {'key': 'ground pepper', 'doc_count': 185}, {'key': 'vanilla extract', 'doc_count': 146}, {'key': 'lemon', 'doc_count': 131}]}}}
这显然是错误的,因为我有很多成分。我在做什么错?为什么只返回这些?有没有办法迫使Elasticsearch返回所有计数?
I'm trying to count how many times ingredients show up in different documents. My index body is similar to this
index_body = {
"settings":{
"index":{
"number_of_replicas":0,
"number_of_shards":4,
"refresh_interval":"-1",
"knn":"true"
}
},
"mappings":{
"properties":{
"recipe_id":{
"type":"keyword"
},
"recipe_title":{
"type":"text",
"analyzer":"standard",
"similarity":"BM25"
},
"description":{
"type":"text",
"analyzer":"standard",
"similarity":"BM25"
},
"ingredient":{
"type":"keyword"
},
"image":{
"type":"keyword"
},
....
}
}
In the ingredient field, I've stored an array of strings of each ingredient [ingredient1,ingredient2,....]
I have around 900 documents. Each with their own ingredients list.
I've tried using Elasticsearch's aggregations but it seems to not return what I expected.
Here is the query I've been using:
{
"size":0,
"aggs":{
"ingredients":{
"terms": {"field":"ingredient"}
}
}
}
But it returns this:
{'took': 4, 'timed_out': False, '_shards': {'total': 4, 'successful': 4, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 994, 'relation': 'eq'}, 'max_score': None, 'hits': []}, 'aggregations': {'ingredients': {'doc_count_error_upper_bound': 56, 'sum_other_doc_count': 4709, 'buckets': [{'key': 'salt', 'doc_count': 631}, {'key': 'oil', 'doc_count': 320}, {'key': 'sugar', 'doc_count': 314}, {'key': 'egg', 'doc_count': 302}, {'key': 'butter', 'doc_count': 291}, {'key': 'flour', 'doc_count': 264}, {'key': 'garlic', 'doc_count': 220}, {'key': 'ground pepper', 'doc_count': 185}, {'key': 'vanilla extract', 'doc_count': 146}, {'key': 'lemon', 'doc_count': 131}]}}}
This is clearly wrong, as I have many ingredients. What am I doing wrong? Why is it returning only these ones? Is there a way to force Elasticsearch to return all counts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要在聚合中指定大小。
{
“大小”:0,
“ aggs”:{
“原料”:{
“术语”:{“ field”:“成分”,“大小”:10000}
}
}
}
You need to specify size inside the aggregation.
{
"size":0,
"aggs":{
"ingredients":{
"terms": {"field":"ingredient", "size": 10000}
}
}
}