根据字段值,通过自定义存储桶进行聚合

发布于 2025-01-11 05:20:51 字数 1112 浏览 0 评论 0原文

我有兴趣将数据聚合到存储桶中,但我想将两个不同的值放入同一个存储桶中。

这就是我的意思:

假设我有这个查询:

GET _search
{
  "size": 0, 
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "ecs.version"
      }
    }
  }
}

它返回这个响应:

"aggregations" : {
    "my-agg-name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.12.0",
          "doc_count" : 642826144
        },
        {
          "key" : "8.0.0",
          "doc_count" : 204064845
        },
        {
          "key" : "1.1.0",
          "doc_count" : 16508253
        },
        {
          "key" : "1.0.0",
          "doc_count" : 9162928
        },
        {
          "key" : "1.6.0",
          "doc_count" : 1111542
        },
        {
          "key" : "1.5.0",
          "doc_count" : 10445
        }
      ]
    }
  }

字段 ecs.version 的每个不同值都在它自己的存储桶中。

但是假设我想定义我的存储桶: 桶1:[1.12.0,8.0.0] 桶2:[1.6.0,8.4.0] bucket3: [1.0.0, 8.8.0]

这无论如何可能吗?

我知道我可以返回所有存储桶并以编程方式求和,但是这个列表可能很长,我认为这不会有效。我错了吗?

I'm interested in aggregating my data into buckets, but I want to put two distinct values to the same bucket.

This is what I mean:

Say I have this query:

GET _search
{
  "size": 0, 
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "ecs.version"
      }
    }
  }
}

it returns this response:

"aggregations" : {
    "my-agg-name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.12.0",
          "doc_count" : 642826144
        },
        {
          "key" : "8.0.0",
          "doc_count" : 204064845
        },
        {
          "key" : "1.1.0",
          "doc_count" : 16508253
        },
        {
          "key" : "1.0.0",
          "doc_count" : 9162928
        },
        {
          "key" : "1.6.0",
          "doc_count" : 1111542
        },
        {
          "key" : "1.5.0",
          "doc_count" : 10445
        }
      ]
    }
  }

every distinct value of the field ecs.version is in it's own bucket.

But say I wanted to define my buckets such that:
bucket1: [1.12.0, 8.0.0]
bucket2: [1.6.0, 8.4.0]
bucket3: [1.0.0, 8.8.0]

Is this possible in anyway?

I know I can just return all the buckets and do the sum programmatically, but this list can be very long, I don't think it would be efficient. Am I wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ペ泪落弦音 2025-01-18 05:20:51

您可以使用 运行时映射来生成运行时字段,该字段将用于聚合。我在 ES 7.16 上完成了以下示例。

我对一些示例文档进行了索引,下面是没有连接多个值的聚合输出:

"aggregations" : {
    "version" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.12.0",
          "doc_count" : 3
        },
        {
          "key" : "1.6.0",
          "doc_count" : 3
        },
        {
          "key" : "8.4.0",
          "doc_count" : 3
        },
        {
          "key" : "8.0.0",
          "doc_count" : 2
        }
      ]
    }
  }

您可以将下面的查询与运行时映射一起使用,但您需要为版本映射添加多个 if 条件:

{
  "size": 0,
  "runtime_mappings": {
    "normalized_version": {
      "type": "keyword",
      "script": """
        String version = doc['version.keyword'].value;
        if (version.equals('1.12.0') || version.equals('8.0.0')) {
          emit('1.12.0, 8.0.0');
        } else if (version.equals('1.6.0') || version.equals('8.4.0')){
          emit('1.6.0, 8.4.0');
        }else {
           emit(version);
        }
      """
    }
  },
  "aggs": {
    "genres": {
      "terms": {
        "field": "normalized_version"
      }
    }
  }
}

下面是上述聚合查询的输出:

"aggregations" : {
    "genres" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.6.0, 8.4.0",
          "doc_count" : 6
        },
        {
          "key" : "1.12.0, 8.0.0",
          "doc_count" : 5
        }
      ]
    }
  }

You can use Runtime Mapping to generat runtime field and that field will be use for aggregation. I have done below exmaple on ES 7.16.

I have index some of the sample document and below is aggregation output without join on multipul values:

"aggregations" : {
    "version" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.12.0",
          "doc_count" : 3
        },
        {
          "key" : "1.6.0",
          "doc_count" : 3
        },
        {
          "key" : "8.4.0",
          "doc_count" : 3
        },
        {
          "key" : "8.0.0",
          "doc_count" : 2
        }
      ]
    }
  }

You can use below query with runtime mapping but you need to add multipul if condition for your version mappings:

{
  "size": 0,
  "runtime_mappings": {
    "normalized_version": {
      "type": "keyword",
      "script": """
        String version = doc['version.keyword'].value;
        if (version.equals('1.12.0') || version.equals('8.0.0')) {
          emit('1.12.0, 8.0.0');
        } else if (version.equals('1.6.0') || version.equals('8.4.0')){
          emit('1.6.0, 8.4.0');
        }else {
           emit(version);
        }
      """
    }
  },
  "aggs": {
    "genres": {
      "terms": {
        "field": "normalized_version"
      }
    }
  }
}

Below is output of above aggregation query:

"aggregations" : {
    "genres" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.6.0, 8.4.0",
          "doc_count" : 6
        },
        {
          "key" : "1.12.0, 8.0.0",
          "doc_count" : 5
        }
      ]
    }
  }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文