将数组中的两个重复值转换为字符串
我有一些旧文档,其中一个字段重复了两个阀的数组,类似的是:
"task" : [
"first_task",
"first_task"
],
我试图将此数组转换为字符串,因为它是相同的值。我已经看过以下脚本:将带有2个相等值的数组转换为单个值,但在我的情况下,无法通过logstash解决此问题,因为它只是在存储的旧文档中发生。
我当时想这样做:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"description": "Change task field from array to first element of this one",
"lang": "painless",
"source": """
if (ctx['task'][0] == ctx['task'][1]) {
ctx['task'] = ctx['task'][0];
}
"""
}
}
]
},
"docs": [
{
"_index" : "tasks",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"@timestamp" : "2022-05-03T07:33:44.652Z",
"task" : ["first_task", "first_task"]
}
}
]
}
结果文档如下:
{
"docs" : [
{
"doc" : {
"_index" : "tasks",
"_type" : "_doc",
"_id" : "1",
"_source" : {
"@timestamp" : "2022-05-03T07:33:44.652Z",
"task" : "first_task"
},
"_ingest" : {
"timestamp" : "2022-05-11T09:08:48.150815183Z"
}
}
}
]
}
我们可以看到task
字段被重新分配,并且我们将数组的第一个元素作为值。
是否有一种方法可以从Elasticsearch中操纵实际数据并使用DSL查询使用此特征转换所有文档?
谢谢。
I have some old documents where a field has an array of two vales repeated, something like this:
"task" : [
"first_task",
"first_task"
],
I'm trying to convert this array into a string because it's the same value. I've seen the following script: Convert array with 2 equal values to single value but in my case, this problem can't be fixed through logstash because it happens just with old documents stored.
I was thinking to do something like this:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"description": "Change task field from array to first element of this one",
"lang": "painless",
"source": """
if (ctx['task'][0] == ctx['task'][1]) {
ctx['task'] = ctx['task'][0];
}
"""
}
}
]
},
"docs": [
{
"_index" : "tasks",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"@timestamp" : "2022-05-03T07:33:44.652Z",
"task" : ["first_task", "first_task"]
}
}
]
}
The result document is the following:
{
"docs" : [
{
"doc" : {
"_index" : "tasks",
"_type" : "_doc",
"_id" : "1",
"_source" : {
"@timestamp" : "2022-05-03T07:33:44.652Z",
"task" : "first_task"
},
"_ingest" : {
"timestamp" : "2022-05-11T09:08:48.150815183Z"
}
}
}
]
}
We can see the task
field is reassigned and we have the first element of the array as a value.
Is there a way to manipulate actual data from Elasticsearch and convert all the documents with this characteristic using DSL queries?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用
_UPDATE_BY_QUERY
endpoint实现此目标。以下是一个示例:如果要更新所有文档,则可以删除
match_all
查询,也可以通过查询中的条件来过滤文档。请记住,在更新过程运行时,运行脚本以更新索引中的所有文档可能会导致某些性能问题。
You can achieve this with
_update_by_query
endpoint. Here is an example:You can remove the
match_all
query if you want to update all documents or you can filter documents by chaning the conditions in the query.Keep in mind that running a script to update all documents in the index may cause some performance issues while the update process is running.