使用 JQ 更新 JSON 数组，其中包含另一个数组的值，即 JOIN

发布于 2025-01-12 02:17:17 字数 3459 浏览 0 评论 0原文

给定两个文件 1.json 和 2.json。它们都是对象数组。需要将字段 ping_latency 从 2.json 更新为 1.json

1.json

[
    {
      "domain": "ca944.nordvpn.com",
      "name": "Canada #944",
      "ip_address": "172.83.40.219"
    },
    {
      "domain": "pl128.nordvpn.com",
      "name": "Poland #128",
      "ip_address": "194.99.105.100"
    },
    {
      "domain": "dk151.nordvpn.com",
      "name": "Denmark #151",
      "ip_address": "82.102.20.236"
    },
    {
      "domain": "be148.nordvpn.com",
      "name": "Belgium #148",
      "ip_address": "82.102.19.137",
      "ping_latency": 334
    }
]

2.json

[
    {
      "domain": "ca944.nordvpn.com",
      "name": "Canada #944",
      "ip_address": "172.83.40.219",
      "ping_latency": 123
    },
    {
      "domain": "pl27.nordvpn.com",
      "name": "Poland #27",
      "ip_address": "194.99.105.27",
      "ping_latency": "REMOVED"
    },
    {
      "domain": "dk151.nordvpn.com",
      "name": "Denmark #151",
      "ip_address": "82.102.20.236",
      "ping_latency": 13
    },
    {
      "domain": "be148.nordvpn.com",
      "name": "Belgium #148",
      "ip_address": "82.102.19.137",
      "ping_latency": 67
    }
]

带有标记“REMOVED”的对象不应出现在结果。因为它不在1.json中。

PS _{我不为 NordVPN 工作 - 这只是一个示例。}

我尝试使用运算符 + 或 * 合并数组。但它总是添加“已删除”域。

jq -s 'map(INDEX(.domain)) | add | [.[]]' {1,2}.json

并且

jq -s '(.[0]|INDEX(.domain)) as $x | (.[1]|INDEX(.domain)) as $y | $x *$y' {1,2}.json

两者都从 2.json 添加了“REMOVED”节点。

[
  {
    "domain": "ca944.nordvpn.com",
    "name": "Canada #944",
    "ip_address": "172.83.40.219",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "name": "Poland #128",
    "ip_address": "194.99.105.100"
  },
  {
    "domain": "dk151.nordvpn.com",
    "name": "Denmark #151",
    "ip_address": "82.102.20.236",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "name": "Belgium #148",
    "ip_address": "82.102.19.137",
    "ping_latency": 67
  },
  {
    "domain": "pl27.nordvpn.com",
    "name": "Poland #27",
    "ip_address": "194.99.105.27",
    "ping_latency": "REMOVED"
  }
]

如何管理呢？

更新。经过一番心理斗争，我找到了一种方法并设法在 JQ 中做到这一点，

jq 'INDEX(.domain) as $u | 
     reduce ($full[][] | {domain,ip_address,name}) as $i (
     []; . + [ $i | .ping_latency=( $u[$i.domain].ping_latency//98767 )]
    )' --slurpfile full 1.json <2.json

但是与运算符 * 相比，我的方法大约慢了 100 倍，并且在数组长度为 Intel Core i7-11xxx 上最多需要 2 秒5474 个对象

[
  {
    "domain": "ca944.nordvpn.com",
    "ip_address": "172.83.40.219",
    "name": "Canada #944",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "ip_address": "194.99.105.100",
    "name": "Poland #128",
    "ping_latency": 98767
  },
  {
    "domain": "dk151.nordvpn.com",
    "ip_address": "82.102.20.236",
    "name": "Denmark #151",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "ip_address": "82.102.19.137",
    "name": "Belgium #148",
    "ping_latency": 67
  }
]

您可能知道更快更好的方法吗？

原文

Given two files 1.json and 2.json. They are both arrays of objects. Need to update field ping_latency to 1.json from 2.json

1.json

[
    {
      "domain": "ca944.nordvpn.com",
      "name": "Canada #944",
      "ip_address": "172.83.40.219"
    },
    {
      "domain": "pl128.nordvpn.com",
      "name": "Poland #128",
      "ip_address": "194.99.105.100"
    },
    {
      "domain": "dk151.nordvpn.com",
      "name": "Denmark #151",
      "ip_address": "82.102.20.236"
    },
    {
      "domain": "be148.nordvpn.com",
      "name": "Belgium #148",
      "ip_address": "82.102.19.137",
      "ping_latency": 334
    }
]

2.json

[
    {
      "domain": "ca944.nordvpn.com",
      "name": "Canada #944",
      "ip_address": "172.83.40.219",
      "ping_latency": 123
    },
    {
      "domain": "pl27.nordvpn.com",
      "name": "Poland #27",
      "ip_address": "194.99.105.27",
      "ping_latency": "REMOVED"
    },
    {
      "domain": "dk151.nordvpn.com",
      "name": "Denmark #151",
      "ip_address": "82.102.20.236",
      "ping_latency": 13
    },
    {
      "domain": "be148.nordvpn.com",
      "name": "Belgium #148",
      "ip_address": "82.102.19.137",
      "ping_latency": 67
    }
]

Object with mark "REMOVED" should not appear in result. Because it is not in 1.json.

PS _{I do not work for NordVPN - this is just an example.}

I tried to merge arrays with operator + or *. But it always adds "REMOVED" domain.

jq -s 'map(INDEX(.domain)) | add | [.[]]' {1,2}.json

and

jq -s '(.[0]|INDEX(.domain)) as $x | (.[1]|INDEX(.domain)) as $y | $x *$y' {1,2}.json

Both adds "REMOVED" node from 2.json.

[
  {
    "domain": "ca944.nordvpn.com",
    "name": "Canada #944",
    "ip_address": "172.83.40.219",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "name": "Poland #128",
    "ip_address": "194.99.105.100"
  },
  {
    "domain": "dk151.nordvpn.com",
    "name": "Denmark #151",
    "ip_address": "82.102.20.236",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "name": "Belgium #148",
    "ip_address": "82.102.19.137",
    "ping_latency": 67
  },
  {
    "domain": "pl27.nordvpn.com",
    "name": "Poland #27",
    "ip_address": "194.99.105.27",
    "ping_latency": "REMOVED"
  }
]

How to manage it?

Update. After some mental fight I found a way and managed to do that in JQ

jq 'INDEX(.domain) as $u | 
     reduce ($full[][] | {domain,ip_address,name}) as $i (
     []; . + [ $i | .ping_latency=( $u[$i.domain].ping_latency//98767 )]
    )' --slurpfile full 1.json <2.json

But comparing to operator * my approach is about 100 times slower and takes up to 2 second on Intel Core i7-11xxx with array length of 5474 objects

[
  {
    "domain": "ca944.nordvpn.com",
    "ip_address": "172.83.40.219",
    "name": "Canada #944",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "ip_address": "194.99.105.100",
    "name": "Poland #128",
    "ping_latency": 98767
  },
  {
    "domain": "dk151.nordvpn.com",
    "ip_address": "82.102.20.236",
    "name": "Denmark #151",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "ip_address": "82.102.19.137",
    "name": "Belgium #148",
    "ping_latency": 67
  }
]

May be you know a quick better way?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

诠释孤独 2025-01-19 02:17:17

假设 .domain 对于 2.json 中的更新对象是唯一的（如果这对于另一个键也适用，则进行更改；甚至可以使用数组跨越多个键，例如[.name, .ip_address]），您可以根据唯一的INDEX使用JOIN来匹配相应的对象对。

此外，要组合匹配，您可以简单地添加该对的成员，因为 2.json 中的任何内容都无法有效覆盖 1.json 中的任何内容（其他），假设您的样本在这方面具有代表性。如果不是这种情况，请改用更细粒度的组合方法，例如 first + (last | {ping_latency} | select(.[]) // {}) 或类似方法。

最后，根据你的描述

带有“REMOVED”标记的对象不应出现在结果中。因为它不在 1.json 中。

进一步假设 2.json 中通常不存在应新添加到 1.json 的对象。由于 JOIN 的行为正是如此，因此检查 "REMOVED" 被认为是没有必要的。

将第一个文件作为输入，同时使用 --argfile 将第二个文件读入变量：

jq --argfile a 2.json '[JOIN(INDEX($a[]; .domain); .[]; .domain; add)]' 1.json

或者等效地，使用 --slurp 将两个文件读入一个数组：

jq -s '[JOIN(INDEX(last[]; .domain); first[]; .domain; add)]' 1.json 2.json

[
  {
    "domain": "ca944.nordvpn.com",
    "name": "Canada #944",
    "ip_address": "172.83.40.219",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "name": "Poland #128",
    "ip_address": "194.99.105.100"
  },
  {
    "domain": "dk151.nordvpn.com",
    "name": "Denmark #151",
    "ip_address": "82.102.20.236",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "name": "Belgium #148",
    "ip_address": "82.102.19.137",
    "ping_latency": 67
  }
]

< a href="https://jqplay.org/s/WUNqbQFC_q" rel="nofollow noreferrer">演示

Assuming .domain is unique to the updating objects in 2.json (change if this holds true for another key instead; even spannig over multiple keys is possible using an array, e.g. [.name, .ip_address]), you could use JOIN based on the unique INDEX to match corresponding pairs of objects.

Furthermore, to combine a match, you could simply add up the pair's members, as nothing from 2.json can effectively overwrite anything (else) in 1.json, assuming your sample is representative to this regard. If this ish't the case, use a more fine-grained combination method instead, e.g. first + (last | {ping_latency} | select(.[]) // {}) or similar.

Lastly, backed by your description

Object with mark "REMOVED" should not appear in result. Because it is not in 1.json.

it is further assumed that there is generally no object in 2.json which should newly be added to 1.json. As JOIN behaves exactly that way, checking for "REMOVED" is believed to not be necessary.

Having the first file as input while reading the second file into a variable using --argfile:

jq --argfile a 2.json '[JOIN(INDEX($a[]; .domain); .[]; .domain; add)]' 1.json

Or, equivalently, reading in both files into one array using --slurp:

jq -s '[JOIN(INDEX(last[]; .domain); first[]; .domain; add)]' 1.json 2.json

[
  {
    "domain": "ca944.nordvpn.com",
    "name": "Canada #944",
    "ip_address": "172.83.40.219",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "name": "Poland #128",
    "ip_address": "194.99.105.100"
  },
  {
    "domain": "dk151.nordvpn.com",
    "name": "Denmark #151",
    "ip_address": "82.102.20.236",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "name": "Belgium #148",
    "ip_address": "82.102.19.137",
    "ping_latency": 67
  }
]

Demo

回复收藏 0 原文

~没有更多了~