ElasticSearch 在脚本化嵌套字段上聚合

发布于 2025-01-11 22:38:39 字数 1715 浏览 6 评论 0原文

我的 ElasticSearch 索引中有以下映射(由于其他字段不相关而进行了简化:

{
  "test": {
    "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "entities": {
          "type": "nested",
          "properties": {
            "text_property": {
              "type": "text"
            },
            "float_property": {
              "type": "float"
            }
          }
        }
      }
    }
  }
}

数据看起来像这样(再次简化):

[
  {
    "name": "a",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.4
      },
      {
        "text_property": "baz",
        "float_property": 0.6
      }
    ]
  },
  {
    "name": "b",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.9
      }
    ]
  },
  {
    "name": "c",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.9
      }
    ]
  }
]

我正在尝试对 float_property 的最大值执行存储桶聚合因此,对于上面的示例,以下是所需的响应:

...
{
  "buckets": [
    {
      "key": "0.9",
      "doc_count": 2
    },
    {
      "key": "0.6",
      "doc_count": 1
    }
  ]
}

由于 doc afloat_property 的最高嵌套值为 0.6, b 的值为 0.9,c 的值为 0.9

我尝试过使用 nestedaggs 的混合 。 >,以及runtime_mappings,但我不确定以什么顺序使用它们,或者这是否可能。

I have the following mapping in my ElasticSearch index (simplified as the other fields are irrelevant:

{
  "test": {
    "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "entities": {
          "type": "nested",
          "properties": {
            "text_property": {
              "type": "text"
            },
            "float_property": {
              "type": "float"
            }
          }
        }
      }
    }
  }
}

The data looks like this (again simplified):

[
  {
    "name": "a",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.4
      },
      {
        "text_property": "baz",
        "float_property": 0.6
      }
    ]
  },
  {
    "name": "b",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.9
      }
    ]
  },
  {
    "name": "c",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.9
      }
    ]
  }
]

I'm trying perform a bucket aggregation on the maximum value of float_property for each document. So for the example above, the following would be the desired response:

...
{
  "buckets": [
    {
      "key": "0.9",
      "doc_count": 2
    },
    {
      "key": "0.6",
      "doc_count": 1
    }
  ]
}

as doc a's highest nested value for float_property is 0.6, b's is 0.9 and c's is 0.9.

I've tried using a mixture of nested and aggs, along with runtime_mappings, but I'm not sure in which order to use these, or if this is even possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

荒人说梦 2025-01-18 22:38:40

我最终设法弄清楚了这一点。

我没有意识到的两件事是:

  1. 您可以为存储桶聚合提供一个 script 而不是 field 键。
  2. 您可以使用 params._source 直接访问嵌套值,而不是使用嵌套查询。

这两件事的结合使我能够编写正确的查询:

{
  "size": 0,
  "aggs": {
    "max.float_property": {
      "terms": {
        "script": "double max = 0; for (item in params._source.entities) { if (item.float_property > max) { max = item.float_property; }} return max;"
      }
    }
  }
}

响应:

{
  ...
  "aggregations": {
    "max.float_property": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "0.9",
          "doc_count": 2
        },
        {
          "key": "0.6",
          "doc_count": 1
        }
      ]
    }
  }
}

不过我很困惑,因为我认为访问 nested 字段的正确方法是使用 nested 查询类型。不幸的是,这方面的文档很少,所以我仍然不确定这是否是聚合脚本嵌套字段的预期/正确方法。

I've managed to figure this out in the end.

The two things I hadn't realised were:

  1. You can provide a script instead of a field key to bucket aggregations.
  2. Instead of using nested queries, you can access nested values directly using params._source.

The combination of these two things allowed me to write the correct query:

{
  "size": 0,
  "aggs": {
    "max.float_property": {
      "terms": {
        "script": "double max = 0; for (item in params._source.entities) { if (item.float_property > max) { max = item.float_property; }} return max;"
      }
    }
  }
}

Response:

{
  ...
  "aggregations": {
    "max.float_property": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "0.9",
          "doc_count": 2
        },
        {
          "key": "0.6",
          "doc_count": 1
        }
      ]
    }
  }
}

I'm confused though, because I thought the correct way to access nested fields was by using the nested query type. Unfortunately there's very little documentation for this, so I'm still unsure if this is the intended/correct way to aggregate on scripted nested fields.

夜未央樱花落 2025-01-18 22:38:40

我使用您的映射将 float_property 类型更改为 double 创建了索引。

PUT testindex
{
 "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "entities": {
          "type": "nested",
          "include_in_parent": true, 
          "properties": {
            "text_property": {
              "type": "text"
            },
            "float_property": {
              "type": "double"
            }
          }
        }
      }
    }
}

为“嵌套”类型添加了“include_in_parent”:true

索引文档:

PUT testindex/_doc/1
{
    "name": "a",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.4
      },
      {
        "text_property": "baz",
        "float_property": 0.6
      }
    ]
  }
 
PUT testindex/_doc/2
{
    "name": "b",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.9
      }
    ]
  }
  
PUT testindex/_doc/3 
{
    "name": "c",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.9
      }
    ]
  }

然后对嵌套字段进行术语聚合:

POST testindex/_search
{
  "from": 0,
  "size": 30,
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "entities.float_property": {
      "terms": {
        "field": "entities.float_property"
      }
    }
  }
}

它产生如下聚合结果:

"aggregations" : {
    "entities.float_property" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 0.2,
          "doc_count" : 2
        },
        {
          "key" : 0.9,
          "doc_count" : 2
        },
        {
          "key" : 0.4,
          "doc_count" : 1
        },
        {
          "key" : 0.6,
          "doc_count" : 1
        }
      ]
    }
  }

I created the index with your mappings changing float_property type to double.

PUT testindex
{
 "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "entities": {
          "type": "nested",
          "include_in_parent": true, 
          "properties": {
            "text_property": {
              "type": "text"
            },
            "float_property": {
              "type": "double"
            }
          }
        }
      }
    }
}

Added "include_in_parent":true for "nested" type

Indexed the documents:

PUT testindex/_doc/1
{
    "name": "a",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.4
      },
      {
        "text_property": "baz",
        "float_property": 0.6
      }
    ]
  }
 
PUT testindex/_doc/2
{
    "name": "b",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.9
      }
    ]
  }
  
PUT testindex/_doc/3 
{
    "name": "c",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.9
      }
    ]
  }

Then Terms Aggregation on nested field:

POST testindex/_search
{
  "from": 0,
  "size": 30,
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "entities.float_property": {
      "terms": {
        "field": "entities.float_property"
      }
    }
  }
}

It produce the aggregation result as below:

"aggregations" : {
    "entities.float_property" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 0.2,
          "doc_count" : 2
        },
        {
          "key" : 0.9,
          "doc_count" : 2
        },
        {
          "key" : 0.4,
          "doc_count" : 1
        },
        {
          "key" : 0.6,
          "doc_count" : 1
        }
      ]
    }
  }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文