根据字段值，通过自定义存储桶进行聚合

发布于 2025-01-11 05:20:51 字数 1112 浏览 0 评论 0原文

我有兴趣将数据聚合到存储桶中，但我想将两个不同的值放入同一个存储桶中。

这就是我的意思：

假设我有这个查询：

GET _search
{
  "size": 0, 
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "ecs.version"
      }
    }
  }
}

它返回这个响应：

"aggregations" : {
    "my-agg-name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.12.0",
          "doc_count" : 642826144
        },
        {
          "key" : "8.0.0",
          "doc_count" : 204064845
        },
        {
          "key" : "1.1.0",
          "doc_count" : 16508253
        },
        {
          "key" : "1.0.0",
          "doc_count" : 9162928
        },
        {
          "key" : "1.6.0",
          "doc_count" : 1111542
        },
        {
          "key" : "1.5.0",
          "doc_count" : 10445
        }
      ]
    }
  }

字段 ecs.version 的每个不同值都在它自己的存储桶中。

但是假设我想定义我的存储桶：桶1：[1.12.0，8.0.0] 桶2：[1.6.0，8.4.0] bucket3: [1.0.0, 8.8.0]

这无论如何可能吗？

我知道我可以返回所有存储桶并以编程方式求和，但是这个列表可能很长，我认为这不会有效。我错了吗？

原文

I'm interested in aggregating my data into buckets, but I want to put two distinct values to the same bucket.

This is what I mean:

Say I have this query:

GET _search
{
  "size": 0, 
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "ecs.version"
      }
    }
  }
}

it returns this response:

"aggregations" : {
    "my-agg-name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.12.0",
          "doc_count" : 642826144
        },
        {
          "key" : "8.0.0",
          "doc_count" : 204064845
        },
        {
          "key" : "1.1.0",
          "doc_count" : 16508253
        },
        {
          "key" : "1.0.0",
          "doc_count" : 9162928
        },
        {
          "key" : "1.6.0",
          "doc_count" : 1111542
        },
        {
          "key" : "1.5.0",
          "doc_count" : 10445
        }
      ]
    }
  }

every distinct value of the field ecs.version is in it's own bucket.

But say I wanted to define my buckets such that:
bucket1: [1.12.0, 8.0.0]
bucket2: [1.6.0, 8.4.0]
bucket3: [1.0.0, 8.8.0]

Is this possible in anyway?

I know I can just return all the buckets and do the sum programmatically, but this list can be very long, I don't think it would be efficient. Am I wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ペ泪落弦音 2025-01-18 05:20:51

您可以使用运行时映射来生成运行时字段，该字段将用于聚合。我在 ES 7.16 上完成了以下示例。

我对一些示例文档进行了索引，下面是没有连接多个值的聚合输出：

"aggregations" : {
    "version" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.12.0",
          "doc_count" : 3
        },
        {
          "key" : "1.6.0",
          "doc_count" : 3
        },
        {
          "key" : "8.4.0",
          "doc_count" : 3
        },
        {
          "key" : "8.0.0",
          "doc_count" : 2
        }
      ]
    }
  }

您可以将下面的查询与运行时映射一起使用，但您需要为版本映射添加多个 if 条件：

{
  "size": 0,
  "runtime_mappings": {
    "normalized_version": {
      "type": "keyword",
      "script": """
        String version = doc['version.keyword'].value;
        if (version.equals('1.12.0') || version.equals('8.0.0')) {
          emit('1.12.0, 8.0.0');
        } else if (version.equals('1.6.0') || version.equals('8.4.0')){
          emit('1.6.0, 8.4.0');
        }else {
           emit(version);
        }
      """
    }
  },
  "aggs": {
    "genres": {
      "terms": {
        "field": "normalized_version"
      }
    }
  }
}

下面是上述聚合查询的输出：

"aggregations" : {
    "genres" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.6.0, 8.4.0",
          "doc_count" : 6
        },
        {
          "key" : "1.12.0, 8.0.0",
          "doc_count" : 5
        }
      ]
    }
  }

You can use Runtime Mapping to generat runtime field and that field will be use for aggregation. I have done below exmaple on ES 7.16.

I have index some of the sample document and below is aggregation output without join on multipul values:

"aggregations" : {
    "version" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.12.0",
          "doc_count" : 3
        },
        {
          "key" : "1.6.0",
          "doc_count" : 3
        },
        {
          "key" : "8.4.0",
          "doc_count" : 3
        },
        {
          "key" : "8.0.0",
          "doc_count" : 2
        }
      ]
    }
  }

You can use below query with runtime mapping but you need to add multipul if condition for your version mappings:

{
  "size": 0,
  "runtime_mappings": {
    "normalized_version": {
      "type": "keyword",
      "script": """
        String version = doc['version.keyword'].value;
        if (version.equals('1.12.0') || version.equals('8.0.0')) {
          emit('1.12.0, 8.0.0');
        } else if (version.equals('1.6.0') || version.equals('8.4.0')){
          emit('1.6.0, 8.4.0');
        }else {
           emit(version);
        }
      """
    }
  },
  "aggs": {
    "genres": {
      "terms": {
        "field": "normalized_version"
      }
    }
  }
}

Below is output of above aggregation query:

"aggregations" : {
    "genres" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.6.0, 8.4.0",
          "doc_count" : 6
        },
        {
          "key" : "1.12.0, 8.0.0",
          "doc_count" : 5
        }
      ]
    }
  }

回复收藏 0 原文

~没有更多了~