使用 JQ - 连接 2 个结构不同但具有公共密钥的 json 文件

发布于 2025-01-09 09:59:48 字数 1806 浏览 1 评论 0原文

我有 2 个文件想要基于公用密钥加入。

第一个文件是一个数组,其中:

{"_time": "2022-02-20T23","csp_name": "1","tool_bf_id": "1234", "dvc_ssr": "aa-1111"}
{"_time": "2022-02-20T23","csp_name": "2","tool_bf_id": "4567", "dvc_ssr": "aa-2222"}
{"_time": "2022-02-20T23","csp_name": "3","tool_bf_id": "1357", "dvc_ssr": "null"}
{"_time": "2022-02-20T23","csp_name": "4","tool_bf_id": "2468", "dvc_ssr": "aa-1111"}
{"_time": "2022-02-20T23","csp_name": "5","tool_bf_id": "1246", "dvc_ssr": "null"}

第二个文件也是一个数组,是所有不同“dvc_ssr”的完整列表,其中:

{"host": "hostId1","dvc_ssr": "aa-1111"}
{"host": "hostId2","dvc_ssr": "aa-2222"}

我试图通过使用 dvc_ssr 键将第二个文件中的信息添加到第一个文件中来加入这两个文件价值。

我期待这样的事情:

{"_time": "2022-02-20T23","csp_name": "1","tool_bf_id": "1234", "dvc_ssr": "aa-1111","host": "hostId1"}
{"_time": "2022-02-20T23","csp_name": "2",,"tool_bf_id": "4567", "dvc_ssr": "aa-2222","host": "hostId2"}
{"_time": "2022-02-20T23","csp_name": "3","tool_bf_id": "1357", "dvc_ssr": "null"}
{"_time": "2022-02-20T23","csp_name": "4","tool_bf_id": "2468", "dvc_ssr": "aa-1111","host": "hostId1"}
{"_time": "2022-02-20T23","csp_name": "5","tool_bf_id": "1246", "dvc_ssr": "null"}

/group_by/add 的想法。

经过一些研究,我发现了使用flatten group_by(.dvc_ssr) | group_by(.dvc_ssr) | map(reduce .[] as $x ({}; . * $x))

[.[1] + . [0]| group_by(.dvc_ssr) [] | group_by(.dvc_ssr) [] | add]

但这不起作用。 问题是通过对它们进行分组,我将“null”以及具有相同“dvc_ssr”的那些分组。最后我失去了一些记录。

我能够使用 jq -s (--slurp) 连接文件并尝试对数组

jq -s '[.[0] + .[1] | group_by(.dvc_ssr) []]' file1.json file2.json

然后我可以使用以下方法从第二个文件中删除未使用的记录: |map(select(if ._time!=null then .else empty end))| .[]

这个想法实际上是像 SQL 中那样进行 JOIN,其中“dvc_ssr”是相同的。

I have 2 files that I would like to join based a common key.

The first file is an array with:

{"_time": "2022-02-20T23","csp_name": "1","tool_bf_id": "1234", "dvc_ssr": "aa-1111"}
{"_time": "2022-02-20T23","csp_name": "2","tool_bf_id": "4567", "dvc_ssr": "aa-2222"}
{"_time": "2022-02-20T23","csp_name": "3","tool_bf_id": "1357", "dvc_ssr": "null"}
{"_time": "2022-02-20T23","csp_name": "4","tool_bf_id": "2468", "dvc_ssr": "aa-1111"}
{"_time": "2022-02-20T23","csp_name": "5","tool_bf_id": "1246", "dvc_ssr": "null"}

The second file is also an array, a complete list of all the different "dvc_ssr" with:

{"host": "hostId1","dvc_ssr": "aa-1111"}
{"host": "hostId2","dvc_ssr": "aa-2222"}

I am trying to join both files by adding information from the second file into the first one using the dvc_ssr key value.

I am expecting something like this:

{"_time": "2022-02-20T23","csp_name": "1","tool_bf_id": "1234", "dvc_ssr": "aa-1111","host": "hostId1"}
{"_time": "2022-02-20T23","csp_name": "2",,"tool_bf_id": "4567", "dvc_ssr": "aa-2222","host": "hostId2"}
{"_time": "2022-02-20T23","csp_name": "3","tool_bf_id": "1357", "dvc_ssr": "null"}
{"_time": "2022-02-20T23","csp_name": "4","tool_bf_id": "2468", "dvc_ssr": "aa-1111","host": "hostId1"}
{"_time": "2022-02-20T23","csp_name": "5","tool_bf_id": "1246", "dvc_ssr": "null"}

After some research I found ideas using flatten/group_by/add ..

flatten | group_by(.dvc_ssr) | map(reduce .[] as $x ({}; . * $x))

or

[.[1] + . [0] | group_by(.dvc_ssr) [] | add]

But this is not working out.
The issue is by grouping them, I am grouping the "null" and also those which have the same "dvc_ssr". At then end I am losing some records.

I was able to join the files with jq -s (--slurp) and try to group array

jq -s '[.[0] + .[1] | group_by(.dvc_ssr) []]' file1.json file2.json

and I can remove then the not used record from the second file, by using:
|map(select(if ._time!=null then . else empty end))| .[]

The idea is really to do a JOIN like in SQL where "dvc_ssr" are identical.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

青巷忧颜 2025-01-16 09:59:48

使用 JOIN 的所有参数,您可能会

jq --slurpfile fst first.json --slurpfile snd second.json -nc '
  JOIN(INDEX($snd[]; .dvc_ssr); $fst[]; .dvc_ssr; add)
'
{"_time":"2022-02-20T23","csp_name":"1","tool_bf_id":"1234","dvc_ssr":"aa-1111","host":"hostId1"}
{"_time":"2022-02-20T23","csp_name":"2","tool_bf_id":"4567","dvc_ssr":"aa-2222","host":"hostId2"}
{"_time":"2022-02-20T23","csp_name":"3","tool_bf_id":"1357","dvc_ssr":"null"}
{"_time":"2022-02-20T23","csp_name":"4","tool_bf_id":"2468","dvc_ssr":"aa-1111","host":"hostId1"}
{"_time":"2022-02-20T23","csp_name":"5","tool_bf_id":"1246","dvc_ssr":"null"}

根据您的应用程序上下文来删除其中一些参数。例如,仅包含两个参数的 JOIN 接受流作为输入,并允许您在 JOIN 外部处理连接。

相同的示例(注意 -s 而不是 -n,以及 JOIN(…)[] 而不是 JOIN(…)):

jq --slurpfile snd second.json -sc '
  JOIN(INDEX($snd[]; .dvc_ssr); .dvc_ssr)[] | add
' first.json

Using all parameters of JOIN, you'd do

jq --slurpfile fst first.json --slurpfile snd second.json -nc '
  JOIN(INDEX($snd[]; .dvc_ssr); $fst[]; .dvc_ssr; add)
'
{"_time":"2022-02-20T23","csp_name":"1","tool_bf_id":"1234","dvc_ssr":"aa-1111","host":"hostId1"}
{"_time":"2022-02-20T23","csp_name":"2","tool_bf_id":"4567","dvc_ssr":"aa-2222","host":"hostId2"}
{"_time":"2022-02-20T23","csp_name":"3","tool_bf_id":"1357","dvc_ssr":"null"}
{"_time":"2022-02-20T23","csp_name":"4","tool_bf_id":"2468","dvc_ssr":"aa-1111","host":"hostId1"}
{"_time":"2022-02-20T23","csp_name":"5","tool_bf_id":"1246","dvc_ssr":"null"}

Depending on your application context, you could probably cut off some of them. For instance, JOIN with just two parameters accepts the stream as input and lets you process the joining outside of JOIN.

Same example (notice -s instead of -n, and JOIN(…)[] instead of JOIN(…)):

jq --slurpfile snd second.json -sc '
  JOIN(INDEX($snd[]; .dvc_ssr); .dvc_ssr)[] | add
' first.json
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文