Stdin中的流的迭代拆分输出

发布于 2025-01-30 18:20:41 字数 2367 浏览 1 评论 0 原文

我有一个带有 JQ 的大型JSON文件。

这可以用作测试文件:

{
    "a": "some",
    "b": [
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        }
    ]
}

一旦STDIN提供了定义的线数,我将尝试保存单独的文件。多个答案(

我如何拆分一个文本文件进入多个 *.txt文件?

如何将一个大的文本文件拆分为具有相等数的行数的较小文件?

使用jq在多个文件中,每个文件都有一个特定的对象数量?

使用命令行工具将JSON数组拆分为多个文件

建议使用 split 输送到初始命令。

jq -c --stream 'fromstream(0|truncate_stream(inputs|select(.[0][0]=="b")| del(.[0][0:2])))' ex.json | split -l 4 --numeric-suffixes=1 - part_ --additional-suffix=.json

但是,这起作用的是,根据我对UNIX中 | 的了解,它采用了第一个命令的输出,并将其发送到第二个命令,因此STDIN将包含所有行(使流毫无用处,尽管STDIN可能不会退出记忆,因为它可以保存在磁盘上)。

我已经读到 Xargs 可以将预定义的线路发送到命令,因此我尝试了此方法:

jq -c --stream 'fromstream(0|truncate_stream(inputs|select(.[0][0]=="b")| del(.[0][0:2])))' ex.json | xargs -I -l5 split -l 4 --numeric-suffixes=1 - part_ --additional-suffix=.json

但是,没有产生输出,加上 | 仍然存在,所以我我假设我会得到相同的行为。此外,我相信Split会覆盖先前创建的文件,因为这将是一个新的调用。

有人有建议吗?我是否缺少 unix 终端知识中的东西?

(这个问题如何'grep'连续流?列表如何格雷普使用的连续流 - 线条缓冲方法,是否有相等的拆分?)

I have a large JSON file that I am streaming with jq.

This can be used as a test file:

{
    "a": "some",
    "b": [
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        },
        {
            "d": "some"
        }
    ]
}

I am trying to save separate files once a defined number of lines has been provided in STDIN. Multiple answers (

How can I split one text file into multiple *.txt files?,

How can I split a large text file into smaller files with an equal number of lines?,

Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?,

Split a JSON array into multiple files using command line tools)

suggest the use of split piped to the initial command.

jq -c --stream 'fromstream(0|truncate_stream(inputs|select(.[0][0]=="b")| del(.[0][0:2])))' ex.json | split -l 4 --numeric-suffixes=1 - part_ --additional-suffix=.json

This works, however, based on my knowledge of the | in unix, it takes the output of the first command and sends it to the second so STDIN will contain all of the lines (making the stream useless, although STDIN will likely not go out of memory as it can be saved on disk).

I have read that xargs can send a predefined number of lines to a command, so I tried this:

jq -c --stream 'fromstream(0|truncate_stream(inputs|select(.[0][0]=="b")| del(.[0][0:2])))' ex.json | xargs -I -l5 split -l 4 --numeric-suffixes=1 - part_ --additional-suffix=.json

However, no output is generate, plus the | is still there so I am assuming I would get the same behavior. In addition, I believe split will overwrite the previously created files as it would be a new invocation.

Does anyone have any advice? Am I missing something in my unix terminal knowledge?

(This question How to 'grep' a continuous stream? lists how to grep a continuous stream using the --line-buffered approach, is there an equivalent for split?)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

百思不得你姐 2025-02-06 18:20:41

正如@fravadona评论:

否,管道中的命令并行运行;每个命令可能会在内部进行一些缓冲以优化IO。

因此,指示的命令具有预期的行为。

As commented by @Fravadona:

"the | in unix, it takes the output of the first command and sends it to the second so STDIN will contain all of the lines"

No, the commands in a pipe run in parallel; each command might do a little buffering internally for optimizing IO though.

So the indicated command has the expected behavior.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文