大列表的Elasticsearch聚合

发布于 2025-02-07 12:27:29 字数 1871 浏览 0 评论 0原文

我正在尝试计算不同文档中出现多少次成分。我的索引主体与成分字段中的索引相似

index_body = {
   "settings":{
      "index":{
         "number_of_replicas":0,
         "number_of_shards":4,
         "refresh_interval":"-1",
         "knn":"true"
      }
   },
   "mappings":{
      "properties":{
         "recipe_id":{
            "type":"keyword"
         },
         "recipe_title":{
            "type":"text",
            "analyzer":"standard",
            "similarity":"BM25"
         },
         "description":{
             "type":"text",
             "analyzer":"standard",
             "similarity":"BM25"
         },
         "ingredient":{
            "type":"keyword"
         },
         "image":{
            "type":"keyword"
         },

         ....
   }
}

,我存储了每种成分[ingredient1,ingredient2,....]我有大约900个文档的字符串

。每个都有自己的成分列表。

我已经尝试使用Elasticsearch的聚合,但似乎并没有返回我的期望。 这是我一直在使用的查询:

{
        "size":0,
        "aggs":{
            "ingredients":{
                "terms": {"field":"ingredient"} 
            }
        }
    }

但是它返回以下内容:

{'took': 4, 'timed_out': False, '_shards': {'total': 4, 'successful': 4, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 994, 'relation': 'eq'}, 'max_score': None, 'hits': []}, 'aggregations': {'ingredients': {'doc_count_error_upper_bound': 56, 'sum_other_doc_count': 4709, 'buckets': [{'key': 'salt', 'doc_count': 631}, {'key': 'oil', 'doc_count': 320}, {'key': 'sugar', 'doc_count': 314}, {'key': 'egg', 'doc_count': 302}, {'key': 'butter', 'doc_count': 291}, {'key': 'flour', 'doc_count': 264}, {'key': 'garlic', 'doc_count': 220}, {'key': 'ground pepper', 'doc_count': 185}, {'key': 'vanilla extract', 'doc_count': 146}, {'key': 'lemon', 'doc_count': 131}]}}}

这显然是错误的,因为我有很多成分。我在做什么错?为什么只返回这些?有没有办法迫使Elasticsearch返回所有计数?

I'm trying to count how many times ingredients show up in different documents. My index body is similar to this

index_body = {
   "settings":{
      "index":{
         "number_of_replicas":0,
         "number_of_shards":4,
         "refresh_interval":"-1",
         "knn":"true"
      }
   },
   "mappings":{
      "properties":{
         "recipe_id":{
            "type":"keyword"
         },
         "recipe_title":{
            "type":"text",
            "analyzer":"standard",
            "similarity":"BM25"
         },
         "description":{
             "type":"text",
             "analyzer":"standard",
             "similarity":"BM25"
         },
         "ingredient":{
            "type":"keyword"
         },
         "image":{
            "type":"keyword"
         },

         ....
   }
}

In the ingredient field, I've stored an array of strings of each ingredient [ingredient1,ingredient2,....]

I have around 900 documents. Each with their own ingredients list.

I've tried using Elasticsearch's aggregations but it seems to not return what I expected.
Here is the query I've been using:

{
        "size":0,
        "aggs":{
            "ingredients":{
                "terms": {"field":"ingredient"} 
            }
        }
    }

But it returns this:

{'took': 4, 'timed_out': False, '_shards': {'total': 4, 'successful': 4, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 994, 'relation': 'eq'}, 'max_score': None, 'hits': []}, 'aggregations': {'ingredients': {'doc_count_error_upper_bound': 56, 'sum_other_doc_count': 4709, 'buckets': [{'key': 'salt', 'doc_count': 631}, {'key': 'oil', 'doc_count': 320}, {'key': 'sugar', 'doc_count': 314}, {'key': 'egg', 'doc_count': 302}, {'key': 'butter', 'doc_count': 291}, {'key': 'flour', 'doc_count': 264}, {'key': 'garlic', 'doc_count': 220}, {'key': 'ground pepper', 'doc_count': 185}, {'key': 'vanilla extract', 'doc_count': 146}, {'key': 'lemon', 'doc_count': 131}]}}}

This is clearly wrong, as I have many ingredients. What am I doing wrong? Why is it returning only these ones? Is there a way to force Elasticsearch to return all counts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

依 靠 2025-02-14 12:27:29

您需要在聚合中指定大小。

{
“大小”:0,
“ aggs”:{
“原料”:{
“术语”:{“ field”:“成分”,“大小”:10000}
}
}
}

You need to specify size inside the aggregation.

{
"size":0,
"aggs":{
"ingredients":{
"terms": {"field":"ingredient", "size": 10000}
}
}
}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文