弹性搜索将文本字段添加到我的聚合中

发布于 2025-01-28 13:50:17 字数 4868 浏览 3 评论 0原文

我在弹性搜索中有类似的文章信息:

{
   "ArticleId":355027,
   "ArticleNumber":"433398",
   "CharacteristicsMultiValue":[
      {
         "Name":"Aantal cartridges",
         "Value":"4",
         "NumValue":4,
         "Priority":2147483647
      },
      {
         "Name":"ADF",
         "Value":"Ja",
         "Priority":10,
         "Description":"Een Automatic Document Feeder (ADF), of automatische documentinvoer, laat een multifunctionele printer (all-in-one) automatisch meerdere vellen na elkaar verwerken. Door meerdere vellen in de ADF te plaatsen, wordt ieder vel papier stuk voor stuk automatisch gekopieerd of gescand."
      },
      {
         "Name":"Scanresolutie",
         "Value":"600x600 DPI",
         "Priority":2147483647
      }
   ]
}

我正在运行以下查询,以检索我的搜索所有可能值的tremitasissmultivalue,并将它们对我的喜好进行分类。

{
  "query": {
    "query_string": {
     "query": "433398",
     "default_operator": "and"
    }
  },
  "aggs":{
    "CharacteristicsMultiValue":{
      "nested":{
        "path":"CharacteristicsMultiValue"
       },
       "aggs":{
         "Name":{
           "terms":{
            "field":"CharacteristicsMultiValue.Name",
            "size":25
          },
          "aggs":{
            "Value":{
              "terms":{
                "field":"CharacteristicsMultiValue.Value",
                "size":25
              }
            }, 
            "Priority":{
              "avg":{
                "field":"CharacteristicsMultiValue.Priority"
              }
            },
            "Characteristics_sort": {
              "bucket_sort": {
                "sort": [
                  { "Priority": { "order": "asc" } } 
                ]                               
              }
            }       
          }
        }
      }
    }
  }
}

结果显示了tremitiatesMultivalue的列表。

{
   "key":"ADF",
   "doc_count":1,
   "Priority":{
      "value":10
   },
   "Value":{
      "doc_count_error_upper_bound":0,
      "sum_other_doc_count":0,
      "buckets":[
         {
            "key":"Ja",
            "doc_count":1
         }
      ]
   }
}

这一切都很好。我想进行更改,以便trumitiessmultivalue.description字段包含在聚合中。我并不是真正的弹性搜索经验,但是我觉得我应该很容易做到这一点。

我做了一些研究,要我理解,我需要为描述列添加一个新的子聚合。我试图通过将下面的JSON添加到当前查询中的几个地方来做到这一点,但是我一直在404错误。谁能告诉我如何添加(第一个找到)描述字段到我的聚合中。

"aggs":{
    "Description":{
        "terms":{
            "field":"CharacteristicsMultiValue.Description",
            "size":1
        }
    }
}

我测试了乔提出的解决方案。这会导致以下错误响应:

{ 
  "error": { 
    "root_cause": [ 
      {
        "type": "illegal_argument_exception",
        "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "articles_dev1_nl",
        "node": "HiGH6JY9QvOozRSWJmFXpw",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    }
  },
  "status": 400
}

I have article information like this in Elastic Search:

{
   "ArticleId":355027,
   "ArticleNumber":"433398",
   "CharacteristicsMultiValue":[
      {
         "Name":"Aantal cartridges",
         "Value":"4",
         "NumValue":4,
         "Priority":2147483647
      },
      {
         "Name":"ADF",
         "Value":"Ja",
         "Priority":10,
         "Description":"Een Automatic Document Feeder (ADF), of automatische documentinvoer, laat een multifunctionele printer (all-in-one) automatisch meerdere vellen na elkaar verwerken. Door meerdere vellen in de ADF te plaatsen, wordt ieder vel papier stuk voor stuk automatisch gekopieerd of gescand."
      },
      {
         "Name":"Scanresolutie",
         "Value":"600x600 DPI",
         "Priority":2147483647
      }
   ]
}

I'm running the following query to retrieve all the occurrences of the CharacteristicsMultiValue for my search with all possible values and sort them to my liking.

{
  "query": {
    "query_string": {
     "query": "433398",
     "default_operator": "and"
    }
  },
  "aggs":{
    "CharacteristicsMultiValue":{
      "nested":{
        "path":"CharacteristicsMultiValue"
       },
       "aggs":{
         "Name":{
           "terms":{
            "field":"CharacteristicsMultiValue.Name",
            "size":25
          },
          "aggs":{
            "Value":{
              "terms":{
                "field":"CharacteristicsMultiValue.Value",
                "size":25
              }
            }, 
            "Priority":{
              "avg":{
                "field":"CharacteristicsMultiValue.Priority"
              }
            },
            "Characteristics_sort": {
              "bucket_sort": {
                "sort": [
                  { "Priority": { "order": "asc" } } 
                ]                               
              }
            }       
          }
        }
      }
    }
  }
}

The result shows a list of CharacteristicsMultiValue like below.

{
   "key":"ADF",
   "doc_count":1,
   "Priority":{
      "value":10
   },
   "Value":{
      "doc_count_error_upper_bound":0,
      "sum_other_doc_count":0,
      "buckets":[
         {
            "key":"Ja",
            "doc_count":1
         }
      ]
   }
}

This all works great. I want to make a change so the the CharacteristicsMultiValue.Description field is included in the aggregation. I'm not really experienced with Elastic Search, but I feel I should be able to do this pretty easily.

I did some research and to my understanding I would need to add a new sub aggregation for the description column. I tried to do that by adding the JSON below to my current query on several places but I keep getting 404 errors. Could anyone tell me how I could add (the first found) description field to my aggregation.

"aggs":{
    "Description":{
        "terms":{
            "field":"CharacteristicsMultiValue.Description",
            "size":1
        }
    }
}

I tested the solution proposed by Joe. This results in the following error response:

{ 
  "error": { 
    "root_cause": [ 
      {
        "type": "illegal_argument_exception",
        "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "articles_dev1_nl",
        "node": "HiGH6JY9QvOozRSWJmFXpw",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    }
  },
  "status": 400
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

寻找我们的幸福 2025-02-04 13:50:17

我不知道为什么您会得到404错误 - 如果您的汇总语法已关闭,通常是400不良请求

无论哪种方式,如果您想在每个桶装value下找到顶部descript术语,则可以使用:

{
  "query": {
    "query_string": {
      "query": "433398",
      "default_operator": "and"
    }
  },
  "aggs": {
    "CharacteristicsMultiValue": {
      "nested": {
        "path": "CharacteristicsMultiValue"
      },
      "aggs": {
        "Name": {
          "terms": {
            "field": "CharacteristicsMultiValue.Name",
            "size": 25
          },
          "aggs": {
            "Value": {
              "terms": {
                "field": "CharacteristicsMultiValue.Value",
                "size": 25
              },
    -->       "aggs": {
                "Description": {
                  "terms": {
                    "field": "CharacteristicsMultiValue.Description",
                    "size": 1
                  }
                }
              }
            },
            "Priority": {
              "avg": {
                "field": "CharacteristicsMultiValue.Priority"
              }
            },
            "Characteristics_sort": {
              "bucket_sort": {
                "sort": [
                  {
                    "Priority": {
                      "order": "asc"
                    }
                  }
                ]
              }
            }
          }
        }
      }
    }
  }
}

一般而言,

{
  "query": { },  // optional query
  "aggs": {
    "your_agg_name": {
      "agg_type": {
        // agg spec
      },
      "aggs": {
        "your_sub_agg_name_1": {
          "agg_type": {
            // agg spec
          }
        },
        "your_sub_agg_name_2_if_needed": {
          "agg_type": {
            // agg spec
          }
        },
        ...
      }
    }
  }
}

  • ​strong> nest进一步sub-aggs 就像您已经使用了name-> valuevalue-> description从我的示例
  • 相同的级别name-> valuename-> Priority一样。

I don't know why you're getting 404 errors -- it's usually 400 Bad Request if your aggregations' syntax is off.

Either way, if you want to find the top Description terms under every bucketed Value, you can use:

{
  "query": {
    "query_string": {
      "query": "433398",
      "default_operator": "and"
    }
  },
  "aggs": {
    "CharacteristicsMultiValue": {
      "nested": {
        "path": "CharacteristicsMultiValue"
      },
      "aggs": {
        "Name": {
          "terms": {
            "field": "CharacteristicsMultiValue.Name",
            "size": 25
          },
          "aggs": {
            "Value": {
              "terms": {
                "field": "CharacteristicsMultiValue.Value",
                "size": 25
              },
    -->       "aggs": {
                "Description": {
                  "terms": {
                    "field": "CharacteristicsMultiValue.Description",
                    "size": 1
                  }
                }
              }
            },
            "Priority": {
              "avg": {
                "field": "CharacteristicsMultiValue.Priority"
              }
            },
            "Characteristics_sort": {
              "bucket_sort": {
                "sort": [
                  {
                    "Priority": {
                      "order": "asc"
                    }
                  }
                ]
              }
            }
          }
        }
      }
    }
  }
}

Generally speaking, sub-aggregations adhere to the following schema:

{
  "query": { },  // optional query
  "aggs": {
    "your_agg_name": {
      "agg_type": {
        // agg spec
      },
      "aggs": {
        "your_sub_agg_name_1": {
          "agg_type": {
            // agg spec
          }
        },
        "your_sub_agg_name_2_if_needed": {
          "agg_type": {
            // agg spec
          }
        },
        ...
      }
    }
  }
}

and you can:

  • nest further sub-aggs like you're already doing with Name->Value or Value->Description from my example
  • or keep them on the same level like you did with Name->Value and Name->Priority.

???? Tip: your query is already quite heavily nested so you could explore the typed_keys query parameter to determine more easily which bucket corresponds to which sub-aggregation.


Edit

As described in the error msg, the Description field needs to be aggregatable before any aggregations are performed.

So if you drop your index, you should turn fielddata on:

PUT articles_dev1_nl
{
  "mappings": {
    "properties": {
      "CharacteristicsMultiValue": {
        "type": "nested",
        "properties": {
          .... other props ...
          
          "Description": {
            "type": "text",
            "fielddata": true        <---
          }
        }
      }
    }
  }
}

or, if your index already exists, you can use the update API:

PUT articles_dev1_nl/_mapping
{
  "properties": {
    "CharacteristicsMultiValue": {
      "type": "nested",
      "properties": {
        "Description": {
          "type": "text",
          "fielddata": true
        }
      }
    }
  }
}

You can learn more about fielddata vs. keyword here in the docs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文