大列表的Elasticsearch聚合

发布于 2025-02-07 12:27:29 字数 1871 浏览 0 评论 0原文

我正在尝试计算不同文档中出现多少次成分。我的索引主体与成分字段中的索引相似

index_body = {
   "settings":{
      "index":{
         "number_of_replicas":0,
         "number_of_shards":4,
         "refresh_interval":"-1",
         "knn":"true"
      }
   },
   "mappings":{
      "properties":{
         "recipe_id":{
            "type":"keyword"
         },
         "recipe_title":{
            "type":"text",
            "analyzer":"standard",
            "similarity":"BM25"
         },
         "description":{
             "type":"text",
             "analyzer":"standard",
             "similarity":"BM25"
         },
         "ingredient":{
            "type":"keyword"
         },
         "image":{
            "type":"keyword"
         },

         ....
   }
}

，我存储了每种成分[ingredient1，ingredient2，....]我有大约900个文档的字符串

。每个都有自己的成分列表。

我已经尝试使用Elasticsearch的聚合，但似乎并没有返回我的期望。这是我一直在使用的查询：

{
        "size":0,
        "aggs":{
            "ingredients":{
                "terms": {"field":"ingredient"} 
            }
        }
    }

但是它返回以下内容：

{'took': 4, 'timed_out': False, '_shards': {'total': 4, 'successful': 4, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 994, 'relation': 'eq'}, 'max_score': None, 'hits': []}, 'aggregations': {'ingredients': {'doc_count_error_upper_bound': 56, 'sum_other_doc_count': 4709, 'buckets': [{'key': 'salt', 'doc_count': 631}, {'key': 'oil', 'doc_count': 320}, {'key': 'sugar', 'doc_count': 314}, {'key': 'egg', 'doc_count': 302}, {'key': 'butter', 'doc_count': 291}, {'key': 'flour', 'doc_count': 264}, {'key': 'garlic', 'doc_count': 220}, {'key': 'ground pepper', 'doc_count': 185}, {'key': 'vanilla extract', 'doc_count': 146}, {'key': 'lemon', 'doc_count': 131}]}}}

这显然是错误的，因为我有很多成分。我在做什么错？为什么只返回这些？有没有办法迫使Elasticsearch返回所有计数？

原文

I'm trying to count how many times ingredients show up in different documents. My index body is similar to this

index_body = {
   "settings":{
      "index":{
         "number_of_replicas":0,
         "number_of_shards":4,
         "refresh_interval":"-1",
         "knn":"true"
      }
   },
   "mappings":{
      "properties":{
         "recipe_id":{
            "type":"keyword"
         },
         "recipe_title":{
            "type":"text",
            "analyzer":"standard",
            "similarity":"BM25"
         },
         "description":{
             "type":"text",
             "analyzer":"standard",
             "similarity":"BM25"
         },
         "ingredient":{
            "type":"keyword"
         },
         "image":{
            "type":"keyword"
         },

         ....
   }
}

In the ingredient field, I've stored an array of strings of each ingredient [ingredient1,ingredient2,....]

I have around 900 documents. Each with their own ingredients list.

I've tried using Elasticsearch's aggregations but it seems to not return what I expected.
Here is the query I've been using:

{
        "size":0,
        "aggs":{
            "ingredients":{
                "terms": {"field":"ingredient"} 
            }
        }
    }

But it returns this:

{'took': 4, 'timed_out': False, '_shards': {'total': 4, 'successful': 4, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 994, 'relation': 'eq'}, 'max_score': None, 'hits': []}, 'aggregations': {'ingredients': {'doc_count_error_upper_bound': 56, 'sum_other_doc_count': 4709, 'buckets': [{'key': 'salt', 'doc_count': 631}, {'key': 'oil', 'doc_count': 320}, {'key': 'sugar', 'doc_count': 314}, {'key': 'egg', 'doc_count': 302}, {'key': 'butter', 'doc_count': 291}, {'key': 'flour', 'doc_count': 264}, {'key': 'garlic', 'doc_count': 220}, {'key': 'ground pepper', 'doc_count': 185}, {'key': 'vanilla extract', 'doc_count': 146}, {'key': 'lemon', 'doc_count': 131}]}}}

This is clearly wrong, as I have many ingredients. What am I doing wrong? Why is it returning only these ones? Is there a way to force Elasticsearch to return all counts?

分享到QQ

分享到微博