将数组中的两个重复值转换为字符串

发布于 2025-01-27 22:12:55 字数 1700 浏览 5 评论 0原文

我有一些旧文档，其中一个字段重复了两个阀的数组，类似的是：

          "task" : [
            "first_task",
            "first_task"
          ],

我试图将此数组转换为字符串，因为它是相同的值。我已经看过以下脚本：将带有2个相等值的数组转换为单个值，但在我的情况下，无法通过logstash解决此问题，因为它只是在存储的旧文档中发生。

我当时想这样做：

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "script": {
          "description": "Change task field from array to first element of this one",
          "lang": "painless",
          "source": """
            if (ctx['task'][0] == ctx['task'][1]) {
                ctx['task'] = ctx['task'][0];
            }
          """
        }
      }
    ]
  },
  "docs": [
    {
        "_index" : "tasks",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2022-05-03T07:33:44.652Z",
          "task" : ["first_task", "first_task"]
        }
    }
  ]
}

结果文档如下：

{
  "docs" : [
    {
      "doc" : {
        "_index" : "tasks",
        "_type" : "_doc",
        "_id" : "1",
        "_source" : {
          "@timestamp" : "2022-05-03T07:33:44.652Z",
          "task" : "first_task"
        },
        "_ingest" : {
          "timestamp" : "2022-05-11T09:08:48.150815183Z"
        }
      }
    }
  ]
}

我们可以看到task字段被重新分配，并且我们将数组的第一个元素作为值。

是否有一种方法可以从Elasticsearch中操纵实际数据并使用DSL查询使用此特征转换所有文档？

谢谢。

原文

I have some old documents where a field has an array of two vales repeated, something like this:

          "task" : [
            "first_task",
            "first_task"
          ],

I'm trying to convert this array into a string because it's the same value. I've seen the following script: Convert array with 2 equal values to single value but in my case, this problem can't be fixed through logstash because it happens just with old documents stored.

I was thinking to do something like this:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "script": {
          "description": "Change task field from array to first element of this one",
          "lang": "painless",
          "source": """
            if (ctx['task'][0] == ctx['task'][1]) {
                ctx['task'] = ctx['task'][0];
            }
          """
        }
      }
    ]
  },
  "docs": [
    {
        "_index" : "tasks",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2022-05-03T07:33:44.652Z",
          "task" : ["first_task", "first_task"]
        }
    }
  ]
}

The result document is the following:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "tasks",
        "_type" : "_doc",
        "_id" : "1",
        "_source" : {
          "@timestamp" : "2022-05-03T07:33:44.652Z",
          "task" : "first_task"
        },
        "_ingest" : {
          "timestamp" : "2022-05-11T09:08:48.150815183Z"
        }
      }
    }
  ]
}

We can see the task field is reassigned and we have the first element of the array as a value.

Is there a way to manipulate actual data from Elasticsearch and convert all the documents with this characteristic using DSL queries?

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半透明的墙 2025-02-03 22:12:55

您可以使用_UPDATE_BY_QUERY endpoint实现此目标。以下是一个示例：

POST tasks/_update_by_query
{
  "script": {
    "source": """
      if (ctx._source['task'][0] == ctx._source['task'][1]) {
          ctx._source['task'] = ctx._source['task'][0];
      }
    """,
    "lang": "painless"
  },
  "query": {
    "match_all": {}
  }
}

如果要更新所有文档，则可以删除match_all查询，也可以通过查询中的条件来过滤文档。

请记住，在更新过程运行时，运行脚本以更新索引中的所有文档可能会导致某些性能问题。

You can achieve this with _update_by_query endpoint. Here is an example:

POST tasks/_update_by_query
{
  "script": {
    "source": """
      if (ctx._source['task'][0] == ctx._source['task'][1]) {
          ctx._source['task'] = ctx._source['task'][0];
      }
    """,
    "lang": "painless"
  },
  "query": {
    "match_all": {}
  }
}

You can remove the match_all query if you want to update all documents or you can filter documents by chaning the conditions in the query.

Keep in mind that running a script to update all documents in the index may cause some performance issues while the update process is running.

回复收藏 0 原文

~没有更多了~