无法解析类型整数字段
导入文档时,我会收到下面附加的错误。
我想当更改数据提供商(eSmapping.js)时,出现了问题,以使用整数子场来对文档进行排序。
是否可以使用某些模式对文档进行排序,以免再次发生此错误?有人有主意吗?
这个问题是指已经问的一个 - 启用关键字类型(Elasticsearch)的数字的上升和下降排序,
error :
022-05-18 11:33:32.5830 [ERROR] ESIndexerLogger Failed to commit bulk. Errors:
index returned 400 _index: adama_gen_ro_importdocument _type: _doc _id: 4c616067-4beb-4484-83cc-7eb9d36eb175 _version: 0 error: Type: mapper_parsing_exception Reason: "failed to parse field [number.sequenceNumber] of type [integer] in document with id '4c616067-4beb-4484-83cc-7eb9d36eb175'. Preview of field's value: 'BS-000011/2022'" CausedBy: "Type: number_format_exception Reason: "For input string: "BS-000011/2022"""
映射(Sequencenumber用于排序) :
"number": {
"type": "keyword",
"copy_to": [
"_summary"
],
"fields": {
"sequenceNumber": {
"type": "integer"
}
}
}
When importing a document, I get an error that is attached below.
I guess the problem arose when the data provider (esMapping.js) was changed, to use the integer sub-field to sort documents.
Is it possible to use some pattern to sort the document so that this error does not occur again? Does anyone have an idea?
The question refers to the one already asked - Enable ascending and descending sorting of numbers that are of the keyword type (Elasticsearch)
Error:
022-05-18 11:33:32.5830 [ERROR] ESIndexerLogger Failed to commit bulk. Errors:
index returned 400 _index: adama_gen_ro_importdocument _type: _doc _id: 4c616067-4beb-4484-83cc-7eb9d36eb175 _version: 0 error: Type: mapper_parsing_exception Reason: "failed to parse field [number.sequenceNumber] of type [integer] in document with id '4c616067-4beb-4484-83cc-7eb9d36eb175'. Preview of field's value: 'BS-000011/2022'" CausedBy: "Type: number_format_exception Reason: "For input string: "BS-000011/2022"""
Mapping (sequenceNumber used for sorting):
"number": {
"type": "keyword",
"copy_to": [
"_summary"
],
"fields": {
"sequenceNumber": {
"type": "integer"
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在返回的错误消息中,被索引到数字字段的值是带有字母字符的字符串,
'BS-000011/2022'
。对于具有关键字类型的号码
字段来说,这是没有问题的。但是,这是具有整数类型的sequenceNumber
子场的问题。传递到号码
的文本值也将其传递到sequencenumber
子场中,因此错误。不幸的是,上一个问题中使用的文本分析仪也无济于事,因为排序无法在文本字段上执行。但是,可以将定制分析仪使用的令牌可以将其重新用于摄入管道。
作者在上一个问题:
如果使用自定义分析仪,则使用Elasticsearch
_ANALYZE
api上述值(如SO)(STACK_INDEX是使用分析器)的临时索引) :分析仪返回
11
的一个令牌,但令牌是用于搜索分析,而不是分类。使用 grok noreferrer 可以将索引应用于索引,以从值中提取所需数字并作为整数索引。需要将处理器配置为期望该值的格式,该格式类似于“ BS-0000011/2022”。下面提供了一个示例:
Grok获取输入文本值并从中提取结构化字段。提取可排序的数字的模式是
sortValues
模式,%{sortValues:sequencenumber:int}
。文档将创建一个新字段,称为sequencenumber
。当在号码
字段中索引“ BS-000011/2022”时,将11索引到sequencenumber
字段作为整数。然后,您可以创建一个索引模板应用摄入管道。
sequenceNumber
字段将需要明确添加为整数类型。只要将匹配上面输入格式的值索引到number
字段中,摄入管道将自动索引。然后,sequenceNumber
字段将进行排序。In the returned error message, the value being indexed into the number field is a string with alphabetical characters,
'BS-000011/2022'
. This is no problem for thenumber
field that has a keyword type. However, it is an issue for thesequenceNumber
sub-field which has an integer type. The text value passed intonumber
is also passed intosequenceNumber
sub-field, hence the error.Unfortunately, the text analyzer used in the previous question won't help either, as sorting can't be performed on a text field. However, the tokenizer used by the custom analyzer
document_number_analyzer
can be repurposed into an ingest pipeline.The custom tokenizer, for context, provided by the author in the previous question :
If the custom analyzer is used, with the Elasticsearch
_analyze
API on the value above like so (stack_index being a temporary index to use the analyzer) :The analyzer returns one token of
11
, but tokens are for search analysis, not sorting.An Elasticsearch ingest pipeline, using the grok processor, can be applied to the index to perform the extraction of the desired number from the value and indexed as an integer. The processor needs to be configured to expect the value's format, which would be similar to 'BS-0000011/2022'. An example is provided below:
Grok takes an input text value and extracts structured fields from it. The pattern where the sortable number will be extracted is the
SORTVALUES
pattern,%{SORTVALUES:sequenceNumber:int}
. A new field, calledsequenceNumber
, will be created in the document. When 'BS-000011/2022' is indexed in thenumber
field, 11 is indexed into thesequenceNumber
field as an integer.You can then create an index template to apply the ingest pipeline. The
sequenceNumber
field will need to be explicitly added as an integer type. The ingest pipeline will automatically index into as long as a value matching the format of the input above is indexed into thenumber
field. ThesequenceNumber
field will then be available to sort on.