复合聚合的分类分页

发布于 2025-02-11 16:30:58 字数 3232 浏览 1 评论 0原文

我有Elasticsearch 7.1文档,其中包含以下映射: -

{
  "event" : {
    "mappings" : {
      "properties" : {
        "Code1" : {
          "type" : "keyword"
        },
        "Code2" : {
          "type" : "keyword"
        },
        "Date1" : {
          "type" : "date"
        },
        "Date2" : {
          "type" : "date"
        },
        "Value" : {
          "type" : "long"
        }
      }
    }
  }
}

我想通过code1code2date1date2 成桶 加上

totalValue,它是value的总和,在存储桶中的所有文档的字段

count ,是存储桶中的文档数量。

我想要的最终输出是这样的: -

{
    {
        "Code1": "ABC",
        "Code2": "XYZ",
        "Date1": "01/01/2022",
        "Date2": "31/01/2022",
        "TotalValue": "100",
        "Count": "3"
    },
    ...
}

我想要的是,通过对存储桶的任何输出字段进行排序,即。 ; code1code2date1date2totalValuecount> count < /代码>。

使用复合汇总,我提出了此查询,它可以用分页式响应和对code1code2 ,date1 ,date2

但无法在totalValuecount(doc_count)字段上进行适当的分类分页。

GET event/_search
{
  "size":0,
  "aggs": {
      "AggregatedBucket": {
        "composite": {
          "size":"10",
          "sources": [
           {
              "Code1": {
                "terms": {
                  "field": "Code1",
                  "order": "desc"
                }
              }
            },
           {
              "Code2": {
                "terms": {
                  "field": "Code2",
                  "order": "desc"
                }
              }
            },
            {
              "Date1": {
                "terms": {
                  "field": "Date1",
                  "order": "desc"
                }
              }
            },
            {
              "Date2": {
                "terms": {
                  "field": "Date2",
                  "order": "desc"
                }
              }
            }
          ]
        },
        "aggs":{
            "TotalValue":{
              "sum": {
                "field": "Value"
              }
            }
        }
      }
    }
}}

这是截断的响应,我得到的

  "aggregations" : {
    "AggregatedBucket" : {
      "after_key" : {
        "Code1" : "ABC2",
        "Code2" : "XYZ2",
        "Date1" : "02/01/2022",
        "Date2" : "02/02/2022"
      },
      "buckets" : [
        {
          "key" : {
            "Code1" : "ABC1",
            "Code2" : "XYZ1",
            "Date1" : "01/01/2022",
            "Date2" : "01/02/2022"
          },
          "doc_count" : 1,
          "TotalValue" : {
            "value" : 4.0
          }
        },
        {
          "key" : {
            "Code1" : "ABC2",
            "Code2" : "XYZ2",
            "Date1" : "02/01/2022",
            "Date2" : "02/02/2022"
          },
          "doc_count" : 1,
          "TotalValue" : {
            "value" : 3.0
          }
        }
     ]
   }
 }

任何其他方法可以返回我的预期响应,这也是有帮助的。

I have ElasticSearch 7.1 documents with following mappings:-

{
  "event" : {
    "mappings" : {
      "properties" : {
        "Code1" : {
          "type" : "keyword"
        },
        "Code2" : {
          "type" : "keyword"
        },
        "Date1" : {
          "type" : "date"
        },
        "Date2" : {
          "type" : "date"
        },
        "Value" : {
          "type" : "long"
        }
      }
    }
  }
}

I want to group the documents by Code1, Code2, Date1, Date2 into buckets
together with

TotalValue which is sum of Value field of all documents in a bucket

and

Count which is number of documents in a bucket.

Final Output which I want is like this:-

{
    {
        "Code1": "ABC",
        "Code2": "XYZ",
        "Date1": "01/01/2022",
        "Date2": "31/01/2022",
        "TotalValue": "100",
        "Count": "3"
    },
    ...
}

Also I want, paginated output with sorting on any of the output fields of the bucket, viz. ; Code1, Code2, Date1, Date2, TotalValue, Count.

Using Composite Aggregation, I came up with this query, which is able to do aggregation as reqd with paginated response and sorting on Code1, Code2, Date1, Date2

but not able to do proper sorted pagination on TotalValueand Count(doc_count) fields.

GET event/_search
{
  "size":0,
  "aggs": {
      "AggregatedBucket": {
        "composite": {
          "size":"10",
          "sources": [
           {
              "Code1": {
                "terms": {
                  "field": "Code1",
                  "order": "desc"
                }
              }
            },
           {
              "Code2": {
                "terms": {
                  "field": "Code2",
                  "order": "desc"
                }
              }
            },
            {
              "Date1": {
                "terms": {
                  "field": "Date1",
                  "order": "desc"
                }
              }
            },
            {
              "Date2": {
                "terms": {
                  "field": "Date2",
                  "order": "desc"
                }
              }
            }
          ]
        },
        "aggs":{
            "TotalValue":{
              "sum": {
                "field": "Value"
              }
            }
        }
      }
    }
}}

Here is the truncated response I am getting

  "aggregations" : {
    "AggregatedBucket" : {
      "after_key" : {
        "Code1" : "ABC2",
        "Code2" : "XYZ2",
        "Date1" : "02/01/2022",
        "Date2" : "02/02/2022"
      },
      "buckets" : [
        {
          "key" : {
            "Code1" : "ABC1",
            "Code2" : "XYZ1",
            "Date1" : "01/01/2022",
            "Date2" : "01/02/2022"
          },
          "doc_count" : 1,
          "TotalValue" : {
            "value" : 4.0
          }
        },
        {
          "key" : {
            "Code1" : "ABC2",
            "Code2" : "XYZ2",
            "Date1" : "02/01/2022",
            "Date2" : "02/02/2022"
          },
          "doc_count" : 1,
          "TotalValue" : {
            "value" : 3.0
          }
        }
     ]
   }
 }

Any alternate way to return my expected response would also be helpful.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

毅然前行 2025-02-18 16:30:58

很抱歉这样说,但是您无法使用排序顺序分页。复合聚合已经根据您为分页指定的密钥“排序”。
在您的情况下,

  1. 按Code1的上升顺序
  2. 如果2代码1相同,则将
  3. 排序,如果2代码2相同,则code2的上升顺序,则升级date1,
  4. 如果2 date1相同,则升序1 date1相同,则升级date of Date2。

您创建的(总计)的亚参数不能用于对复合聚合进行分类。

这是并且一直是复合聚合的主要缺点。

如果您想使这一点变得不那么复杂,那么一种简单的方法是在四个字段中构建一个串联字段:
“ code1-code2-date1-date2”。然后将其插入每个文档。在串联字段上执行术语汇总,并按降序排序(这将自动为您的“总”)。这仍然不允许您分页,但是您可以将返回的聚合响应的大小设置为足够大以满足您需求的东西。

聚集对分页的支持很差。实际上,他们旨在将索引中的所有数据获取并产生响应。分页的概念不是围绕聚合设计的。

Hth。

Sorry to say this, but you cannot paginate a composite aggregation using a sort order. The composite aggregation is already "sorted" based on the keys that you specified for the pagination.
In your case it will sort

  1. On ascending order of Code1
  2. If 2 code1's are the same, then ascending order of Code2
  3. If 2 code2's are the same, then ascending order of Date1
  4. If 2 Date1's are the same, then ascending order of Date2.

The subaggregation that you have created (total) cannot be used to sort a composite aggregation.

This is and always has been a major drawback of composite aggregations.

If you want to make this less complicated, a simpler way would be to build a concatenated field out of the four fields:
"Code1-Code2-Date1-Date2". THen insert that into every document. Perform a terms aggregation on the concatenated field and sort in descending order (which will automatically be your "total"). This still does not allow you to paginate, but you can set the size of the returned aggregation response to something that is large enough to meet your requirement.

Aggregations have very poor support for pagination. They are actually intended to take ALL the data in the index and produce a response. The concept of pagination is not designed around aggregations.

HTH.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文