ElasticSearch 在脚本化嵌套字段上聚合
我的 ElasticSearch 索引中有以下映射(由于其他字段不相关而进行了简化:
{
"test": {
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"entities": {
"type": "nested",
"properties": {
"text_property": {
"type": "text"
},
"float_property": {
"type": "float"
}
}
}
}
}
}
}
数据看起来像这样(再次简化):
[
{
"name": "a",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.4
},
{
"text_property": "baz",
"float_property": 0.6
}
]
},
{
"name": "b",
"entities": [
{
"text_property": "foo",
"float_property": 0.9
}
]
},
{
"name": "c",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.9
}
]
}
]
我正在尝试对 float_property
的最大值执行存储桶聚合因此,对于上面的示例,以下是所需的响应:
...
{
"buckets": [
{
"key": "0.9",
"doc_count": 2
},
{
"key": "0.6",
"doc_count": 1
}
]
}
由于 doc a
的 float_property
的最高嵌套值为 0.6, b
的值为 0.9,c
的值为 0.9
我尝试过使用 nested
和 aggs
的混合 。 >,以及runtime_mappings
,但我不确定以什么顺序使用它们,或者这是否可能。
I have the following mapping in my ElasticSearch index (simplified as the other fields are irrelevant:
{
"test": {
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"entities": {
"type": "nested",
"properties": {
"text_property": {
"type": "text"
},
"float_property": {
"type": "float"
}
}
}
}
}
}
}
The data looks like this (again simplified):
[
{
"name": "a",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.4
},
{
"text_property": "baz",
"float_property": 0.6
}
]
},
{
"name": "b",
"entities": [
{
"text_property": "foo",
"float_property": 0.9
}
]
},
{
"name": "c",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.9
}
]
}
]
I'm trying perform a bucket aggregation on the maximum value of float_property
for each document. So for the example above, the following would be the desired response:
...
{
"buckets": [
{
"key": "0.9",
"doc_count": 2
},
{
"key": "0.6",
"doc_count": 1
}
]
}
as doc a
's highest nested value for float_property
is 0.6, b
's is 0.9 and c
's is 0.9.
I've tried using a mixture of nested
and aggs
, along with runtime_mappings
, but I'm not sure in which order to use these, or if this is even possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我最终设法弄清楚了这一点。
我没有意识到的两件事是:
script
而不是field
键。params._source
直接访问嵌套值,而不是使用嵌套
查询。这两件事的结合使我能够编写正确的查询:
响应:
不过我很困惑,因为我认为访问
nested
字段的正确方法是使用nested
查询类型。不幸的是,这方面的文档很少,所以我仍然不确定这是否是聚合脚本嵌套字段的预期/正确方法。I've managed to figure this out in the end.
The two things I hadn't realised were:
script
instead of afield
key to bucket aggregations.nested
queries, you can access nested values directly usingparams._source
.The combination of these two things allowed me to write the correct query:
Response:
I'm confused though, because I thought the correct way to access
nested
fields was by using thenested
query type. Unfortunately there's very little documentation for this, so I'm still unsure if this is the intended/correct way to aggregate on scripted nested fields.我使用您的映射将 float_property 类型更改为 double 创建了索引。
索引文档:
然后对嵌套字段进行术语聚合:
它产生如下聚合结果:
I created the index with your mappings changing float_property type to double.
Indexed the documents:
Then Terms Aggregation on nested field:
It produce the aggregation result as below: