无法解析类型整数字段

发布于 2025-01-31 04:34:53 字数 1266 浏览 1 评论 0原文

导入文档时,我会收到下面附加的错误。

我想当更改数据提供商(eSmapping.js)时,出现了问题,以使用整数子场来对文档进行排序。

是否可以使用某些模式对文档进行排序,以免再次发生此错误?有人有主意吗?

这个问题是指已经问的一个 - 启用关键字类型(Elasticsearch)的数字的上升和下降排序,

error

022-05-18 11:33:32.5830 [ERROR] ESIndexerLogger Failed to commit bulk. Errors:
index returned 400 _index: adama_gen_ro_importdocument _type: _doc _id: 4c616067-4beb-4484-83cc-7eb9d36eb175 _version: 0 error: Type: mapper_parsing_exception Reason: "failed to parse field [number.sequenceNumber] of type [integer] in document with id '4c616067-4beb-4484-83cc-7eb9d36eb175'. Preview of field's value: 'BS-000011/2022'" CausedBy: "Type: number_format_exception Reason: "For input string: "BS-000011/2022"""

映射(Sequencenumber用于排序) :

"number": {
        "type": "keyword",
        "copy_to": [
            "_summary"
        ],
        "fields": {
            "sequenceNumber": {
                "type": "integer"
            }
        }
    }

When importing a document, I get an error that is attached below.

I guess the problem arose when the data provider (esMapping.js) was changed, to use the integer sub-field to sort documents.

Is it possible to use some pattern to sort the document so that this error does not occur again? Does anyone have an idea?

The question refers to the one already asked - Enable ascending and descending sorting of numbers that are of the keyword type (Elasticsearch)

Error:

022-05-18 11:33:32.5830 [ERROR] ESIndexerLogger Failed to commit bulk. Errors:
index returned 400 _index: adama_gen_ro_importdocument _type: _doc _id: 4c616067-4beb-4484-83cc-7eb9d36eb175 _version: 0 error: Type: mapper_parsing_exception Reason: "failed to parse field [number.sequenceNumber] of type [integer] in document with id '4c616067-4beb-4484-83cc-7eb9d36eb175'. Preview of field's value: 'BS-000011/2022'" CausedBy: "Type: number_format_exception Reason: "For input string: "BS-000011/2022"""

Mapping (sequenceNumber used for sorting):

"number": {
        "type": "keyword",
        "copy_to": [
            "_summary"
        ],
        "fields": {
            "sequenceNumber": {
                "type": "integer"
            }
        }
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

孤寂小茶 2025-02-07 04:34:53

在返回的错误消息中,被索引到数字字段的值是带有字母字符的字符串,'BS-000011/2022'。对于具有关键字类型的号码字段来说,这是没有问题的。但是,这是具有整数类型的sequenceNumber子场的问题。传递到号码的文本值也将其传递到sequencenumber子场中,因此错误。

不幸的是,上一个问题中使用的文本分析仪也无济于事,因为排序无法在文本字段上执行。但是,可以将定制分析仪使用的令牌可以将其重新用于摄入管道。

作者在上一个问题

"tokenizer": {
   "document_number_tokenizer": {
      "type": "pattern",
       "pattern": "-0*([1-9][0-9]*)\/",
       "group": 1
    }
}

如果使用自定义分析仪,则使用Elasticsearch _ANALYZE api上述值(如SO)(STACK_INDEX是使用分析器)的临时索引) :

POST stack_index/_analyze
{
  "analyzer": "document_number_analyzer",
  "text": ["BS-000011/2022"]
}

分析仪返回11的一个令牌,但令牌是用于搜索分析,而不是分类。

使用 grok noreferrer 可以将索引应用于索引,以从值中提取所需数字并作为整数索引。需要将处理器配置为期望该值的格式,该格式类似于“ BS-0000011/2022”。下面提供了一个示例:

PUT _ingest/pipeline/numberSort
{
  "processors": [
    {
      "grok": {
        "field": "number",
        "patterns": ["%{WORD}%{ZEROS}%{SORTVALUES:sequenceNumber:int}%{SEPARATE}%{NUMBER}"],
        "pattern_definitions": {
          "SEPARATE":  "[/]",
          "ZEROS" : "[-0]*",
          "SORTVALUES":  "[1-9][0-9]*"
        }
      }
    }
  ]
}

Grok获取输入文本值并从中提取结构化字段。提取可排序的数字的模式是sortValues模式,%{sortValues:sequencenumber:int}。文档将创建一个新字段,称为sequencenumber。当在号码字段中索引“ BS-000011/2022”时,将11索引到sequencenumber字段作为整数。

然后,您可以创建一个索引模板应用摄入管道。 sequenceNumber字段将需要明确添加为整数类型。只要将匹配上面输入格式的值索引到number字段中,摄入管道将自动索引。然后,sequenceNumber字段将进行排序。

In the returned error message, the value being indexed into the number field is a string with alphabetical characters, 'BS-000011/2022'. This is no problem for the number field that has a keyword type. However, it is an issue for the sequenceNumber sub-field which has an integer type. The text value passed into number is also passed into sequenceNumber sub-field, hence the error.

Unfortunately, the text analyzer used in the previous question won't help either, as sorting can't be performed on a text field. However, the tokenizer used by the custom analyzer document_number_analyzer can be repurposed into an ingest pipeline.

The custom tokenizer, for context, provided by the author in the previous question :

"tokenizer": {
   "document_number_tokenizer": {
      "type": "pattern",
       "pattern": "-0*([1-9][0-9]*)\/",
       "group": 1
    }
}

If the custom analyzer is used, with the Elasticsearch _analyze API on the value above like so (stack_index being a temporary index to use the analyzer) :

POST stack_index/_analyze
{
  "analyzer": "document_number_analyzer",
  "text": ["BS-000011/2022"]
}

The analyzer returns one token of 11, but tokens are for search analysis, not sorting.

An Elasticsearch ingest pipeline, using the grok processor, can be applied to the index to perform the extraction of the desired number from the value and indexed as an integer. The processor needs to be configured to expect the value's format, which would be similar to 'BS-0000011/2022'. An example is provided below:

PUT _ingest/pipeline/numberSort
{
  "processors": [
    {
      "grok": {
        "field": "number",
        "patterns": ["%{WORD}%{ZEROS}%{SORTVALUES:sequenceNumber:int}%{SEPARATE}%{NUMBER}"],
        "pattern_definitions": {
          "SEPARATE":  "[/]",
          "ZEROS" : "[-0]*",
          "SORTVALUES":  "[1-9][0-9]*"
        }
      }
    }
  ]
}

Grok takes an input text value and extracts structured fields from it. The pattern where the sortable number will be extracted is the SORTVALUES pattern, %{SORTVALUES:sequenceNumber:int}. A new field, called sequenceNumber, will be created in the document. When 'BS-000011/2022' is indexed in the number field, 11 is indexed into the sequenceNumber field as an integer.

You can then create an index template to apply the ingest pipeline. The sequenceNumber field will need to be explicitly added as an integer type. The ingest pipeline will automatically index into as long as a value matching the format of the input above is indexed into the number field. The sequenceNumber field will then be available to sort on.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文