tee 打印到标准输出的顺序是否有保证?
在Linux下可以使用tee
命令分割管道,如下
printf "line1\nline2\nline3\n" |三通 >(wc -l ) | (awk '{print "this is awk: "$0}')
产生输出
this is awk: line1
this is awk: line2
this is awk: line3
this is awk: 3
我的问题是,打印顺序是否有保证?计算行数的 tee
分割管道是否总是在最后打印?有没有办法始终在开始时打印它?或者打印 tee
的顺序永远无法保证?
You can split a pipe using the tee
command under linux as follows
printf "line1\nline2\nline3\n" | tee >(wc -l ) | (awk '{print "this is awk: "$0}')
which yields the output
this is awk: line1
this is awk: line2
this is awk: line3
this is awk: 3
My question, is that order of printing guaranteed? Will the tee
split pipe that counts the number of lines always print at the end? Is there a way to always print it at the start? Or is the order of printing tee
never guaranteed?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为你不能指望它。这里的我的试运行表明可能是这样(至少在 bash 中)。正如 Daenyth 解释的,这个特殊case 很特殊,但是用wc
运行在一个单独的进程中,因此没有同步。grep -o line
而不是wc
尝试一下,看看会得到什么。也就是说,在我的 MacBoox 上我得到:
非常一致。我必须仔细阅读 bash 手册页才能确定。
同样:
每次...并且
I don't think that you can count on it. TheMy trial run suggests that it might be (at least in bash). As Daenyth explains, this particular case is special, but try it withwc
here runs in a separate process, so there is no synchronization.grep -o line
instead ofwc
and see what you get.That said, on my MacBoox I get:
very consistently. I'd have to read the bash man page very closely to be sure.
Similarly:
everytime...and
我怀疑在这种情况下, wc 正在等待 EOF,因此在第一个命令完成发送输入之前它不会返回(或打印输出),而 awk 会逐行执行,因此始终会先打印。我不知道发送到其他进程时是否定义了它。
为什么不在打印行本身之前让 awk 计算行数呢?
I suspect that in this case,
wc
is waiting for EOF, and so it will not return (or print output) until the first command is done sending input, whereas awk acts line by line and so will always print first. I don't know if it's defined when sending to other processes.Why not just have awk count the lines before printing the lines themselves?
它不是由 tee 定义的,但正如 Daenyth 所说,在 tee 完成向其传递数据之前 wc 不会完成 - 所以通常 tee 也会在那时将其传递给 awk。在这种情况下,让 awk 进行计数可能会更好。
缺点是它在完成之前不会知道数字(知道它需要缓冲数据)。在您的示例中, tee 和 wc 都将 stdout 连接到同一管道(awk 的 stdin ),但顺序未定义。 cat(以及大多数其他管道工具)可用于按已知顺序组装文件。
可以使用更高级的管道技术,例如 bash 协进程 (coproc) 或命名管道(mkfifo 或 mknod p)。后者获取文件系统中的名称,这些名称可以传递给其他进程,但您必须清理它们并避免冲突。 tempfile 或 $$ 可能对此有用。管道不适用于缓冲数据,因为它们通常具有有限的大小并且只会阻止写入。
管道是错误解决方案的一个例子:
这里的问题是 tee 在尝试向 cat 写入内容时会被卡住,而 cat 想要首先以 wcout 完成。从 Tee 到 Cat 的管道中的数据太多了。
编辑有关 dmckee 的答案:
是的,订单可能是可重复的,但不能保证。这是规模、调度和缓冲区大小的问题。在这个 GNU/Linux 机器上,示例在几千行后开始分解:
It is not defined by tee, but as Daenyth says, wc won't be finished until tee has finished passing it data - so usually tee will have passed it on to awk by then too. In this instance it might be better to have awk do the counting.
The downside being that it won't know the number untils it finishes (knowing it requires buffering the data). In your example, both tee and wc have stdout connected to the same pipe (stdin for awk), but the order is undefined. cat (and most other piping tools) can be used to assemble files in a known order.
There are more advanced piping techniques that could be used, such as bash coprocesses (coproc) or named pipes (mkfifo or mknod p). The latter gets you names in the filesystem, which can be passed to other processes, but you'll have to clean them up and avoid collissions. tempfile or $$ may be useful for that. Pipes are not for buffering data, as they often have limited size and will simply block writes.
An example of where pipes are the wrong solution:
The problem here is that tee will get stuck trying to write things to cat, which wants to finish with wcout first. There's simply too much data for the pipe from tee to cat.
Edit regarding dmckee's answer:
Yes, the order may be repeatable, but it is not guaranteed. It is a matter of scale, scheduling and buffer sizes. On this GNU/Linux box, the example starts breaking up after a few thousand lines: