tee 打印到标准输出的顺序是否有保证？

发布于 2024-09-07 12:15:04 字数 368 浏览 5 评论 0原文

在Linux下可以使用tee命令分割管道，如下

printf "line1\nline2\nline3\n" |三通 >(wc -l ) | (awk '{print "this is awk: "$0}')

产生输出

this is awk: line1
this is awk: line2
this is awk: line3
this is awk: 3

我的问题是，打印顺序是否有保证？计算行数的 tee 分割管道是否总是在最后打印？有没有办法始终在开始时打印它？或者打印 tee 的顺序永远无法保证？

原文

You can split a pipe using the tee command under linux as follows

printf "line1\nline2\nline3\n" | tee >(wc -l ) | (awk '{print "this is awk: "$0}')

which yields the output

this is awk: line1
this is awk: line2
this is awk: line3
this is awk: 3

My question, is that order of printing guaranteed? Will the tee split pipe that counts the number of lines always print at the end? Is there a way to always print it at the start? Or is the order of printing tee never guaranteed?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜深人未静 2024-09-14 12:15:10

~~我认为你不能指望它。这里的 wc 运行在一个单独的进程中，因此没有同步。~~我的试运行表明可能是这样（至少在 bash 中）。正如 Daenyth 解释的，这个特殊case 很特殊，但是用 grep -o line 而不是 wc 尝试一下，看看会得到什么。

也就是说，在我的 MacBoox 上我得到：

$ printf "line1\nline2\nline3\nline4\nline5\n" | tee >(grep -o line ) | (awk '{print "this is awk: "$0}')
this is awk: line1
this is awk: line2
this is awk: line3
this is awk: line4
this is awk: line5
this is awk: line
this is awk: line
this is awk: line
this is awk: line
this is awk: line

非常一致。我必须仔细阅读 bash 手册页才能确定。

同样：

$ printf "line1\nline2\nline3\nline4\nline5\n" | tee >(awk '{print "*" $0 "*"}' ) | (awk '{print "this is awk: "$0}')
this is awk: line1
this is awk: line2
this is awk: line3
this is awk: line4
this is awk: line5
this is awk: *line1*
this is awk: *line2*
this is awk: *line3*
this is awk: *line4*
this is awk: *line5*

每次...并且

$ printf "line1\nline2\nline3\nline4\nline5\n" | tee >(awk '{print "*" $0 "*"}' ) | (grep line)
line1
line2
line3
line4
line5
*line1*
*line2*
*line3*
*line4*
*line5*

~~I don't think that you can count on it. The wc here runs in a separate process, so there is no synchronization.~~ My trial run suggests that it might be (at least in bash). As Daenyth explains, this particular case is special, but try it with grep -o line instead of wc and see what you get.

That said, on my MacBoox I get:

$ printf "line1\nline2\nline3\nline4\nline5\n" | tee >(grep -o line ) | (awk '{print "this is awk: "$0}')
this is awk: line1
this is awk: line2
this is awk: line3
this is awk: line4
this is awk: line5
this is awk: line
this is awk: line
this is awk: line
this is awk: line
this is awk: line

very consistently. I'd have to read the bash man page very closely to be sure.

Similarly:

$ printf "line1\nline2\nline3\nline4\nline5\n" | tee >(awk '{print "*" $0 "*"}' ) | (awk '{print "this is awk: "$0}')
this is awk: line1
this is awk: line2
this is awk: line3
this is awk: line4
this is awk: line5
this is awk: *line1*
this is awk: *line2*
this is awk: *line3*
this is awk: *line4*
this is awk: *line5*

everytime...and

$ printf "line1\nline2\nline3\nline4\nline5\n" | tee >(awk '{print "*" $0 "*"}' ) | (grep line)
line1
line2
line3
line4
line5
*line1*
*line2*
*line3*
*line4*
*line5*

回复收藏 0 原文

旧瑾黎汐 2024-09-14 12:15:09

我怀疑在这种情况下， wc 正在等待 EOF，因此在第一个命令完成发送输入之前它不会返回（或打印输出），而 awk 会逐行执行，因此始终会先打印。我不知道发送到其他进程时是否定义了它。

为什么不在打印行本身之前让 awk 计算行数呢？

回复收藏 0 原文

鸢与 2024-09-14 12:15:07

它不是由 tee 定义的，但正如 Daenyth 所说，在 tee 完成向其传递数据之前 wc 不会完成 - 所以通常 tee 也会在那时将其传递给 awk。在这种情况下，让 awk 进行计数可能会更好。

echo -ne {one,two,three,four}\\n | \
awk '{print "awk processing line " NR ": "$0} END {print "Awk saw " NR " lines"}'

缺点是它在完成之前不会知道数字（知道它需要缓冲数据）。在您的示例中， tee 和 wc 都将 stdout 连接到同一管道（awk 的 stdin ），但顺序未定义。 cat（以及大多数其他管道工具）可用于按已知顺序组装文件。

可以使用更高级的管道技术，例如 bash 协进程 (coproc) 或命名管道（mkfifo 或 mknod p）。后者获取文件系统中的名称，这些名称可以传递给其他进程，但您必须清理它们并避免冲突。 tempfile 或 $$ 可能对此有用。管道不适用于缓冲数据，因为它们通常具有有限的大小并且只会阻止写入。

管道是错误解决方案的一个例子：

mkfifo wcin wcout
wc -l < wcin > wcout &
yes | dd count=1 bs=8M | tee wcin | cat -n wcout - | head

这里的问题是 tee 在尝试向 cat 写入内容时会被卡住，而 cat 想要首先以 wcout 完成。从 Tee 到 Cat 的管道中的数据太多了。

编辑有关 dmckee 的答案：
是的，订单可能是可重复的，但不能保证。这是规模、调度和缓冲区大小的问题。在这个 GNU/Linux 机器上，示例在几千行后开始分解：

seq -f line%g 20000 | tee >(awk '{print "*" $0 "*"}' ) | \
(awk '{print "this is awk: "$0}') | less
this is awk: line2397
this is awk: line2398
this is awk: line2*line1*
this is awk: *line2*
this is awk: *line3*

It is not defined by tee, but as Daenyth says, wc won't be finished until tee has finished passing it data - so usually tee will have passed it on to awk by then too. In this instance it might be better to have awk do the counting.

echo -ne {one,two,three,four}\\n | \
awk '{print "awk processing line " NR ": "$0} END {print "Awk saw " NR " lines"}'

The downside being that it won't know the number untils it finishes (knowing it requires buffering the data). In your example, both tee and wc have stdout connected to the same pipe (stdin for awk), but the order is undefined. cat (and most other piping tools) can be used to assemble files in a known order.

There are more advanced piping techniques that could be used, such as bash coprocesses (coproc) or named pipes (mkfifo or mknod p). The latter gets you names in the filesystem, which can be passed to other processes, but you'll have to clean them up and avoid collissions. tempfile or $$ may be useful for that. Pipes are not for buffering data, as they often have limited size and will simply block writes.

An example of where pipes are the wrong solution:

mkfifo wcin wcout
wc -l < wcin > wcout &
yes | dd count=1 bs=8M | tee wcin | cat -n wcout - | head

The problem here is that tee will get stuck trying to write things to cat, which wants to finish with wcout first. There's simply too much data for the pipe from tee to cat.

Edit regarding dmckee's answer:
Yes, the order may be repeatable, but it is not guaranteed. It is a matter of scale, scheduling and buffer sizes. On this GNU/Linux box, the example starts breaking up after a few thousand lines:

seq -f line%g 20000 | tee >(awk '{print "*" $0 "*"}' ) | \
(awk '{print "this is awk: "$0}') | less
this is awk: line2397
this is awk: line2398
this is awk: line2*line1*
this is awk: *line2*
this is awk: *line3*

回复收藏 0 原文

~没有更多了~