排序& Linux shell 中的 uniq

发布于 2024-09-12 08:04:22 字数 99 浏览 3 评论 0原文

下面两个命令有什么区别？

sort -u FILE

sort FILE | uniq

原文

What is the difference between the following two commands?

sort -u FILE

sort FILE | uniq

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧人九事 2024-09-19 08:04:22

使用 sort -u 执行的 I/O 少于 sort | uniq，但最终结果是一样的。特别是，如果文件足够大，以至于 sort 必须创建中间文件，则 sort -u 很有可能会使用稍少或稍小的中间文件，因为它可以在对每组进行排序时消除重复项。如果数据高度重复，这可能是有益的；如果事实上重复很少，则不会有太大区别（与管道的一阶效果相比，绝对是二阶性能效果）。

请注意，有时管道是合适的。例如：

sort FILE | uniq -c | sort -n

这会按照文件中每行出现的次数对文件进行排序，重复次数最多的行出现在最后。（如果我发现这种 Unix 或 POSIX 惯用的组合可以通过 GNU sort 压缩为一个复杂的“排序”命令，我不会感到惊讶。）

有时不使用管道很重要。例如：

sort -u -o FILE FILE

这对文件进行“原位”排序；也就是说，输出文件由-o FILE指定，并且该操作保证安全（文件在被覆盖输出之前被读取）。

Using sort -u does less I/O than sort | uniq, but the end result is the same. In particular, if the file is big enough that sort has to create intermediate files, there's a decent chance that sort -u will use slightly fewer or slightly smaller intermediate files as it could eliminate duplicates as it is sorting each set. If the data is highly duplicative, this could be beneficial; if there are few duplicates in fact, it won't make much difference (definitely a second order performance effect, compared to the first order effect of the pipe).

Note that there times when the piping is appropriate. For example:

sort FILE | uniq -c | sort -n

This sorts the file into order of the number of occurrences of each line in the file, with the most repeated lines appearing last. (It wouldn't surprise me to find that this combination, which is idiomatic for Unix or POSIX, can be squished into one complex 'sort' command with GNU sort.)

There are times when not using the pipe is important. For example:

sort -u -o FILE FILE

This sorts the file 'in situ'; that is, the output file is specified by -o FILE, and this operation is guaranteed safe (the file is read before being overwritten for output).

回复收藏 0 原文

不必你懂 2024-09-19 08:04:22

有一个细微的差别：返回码。

问题是，除非设置了 shopt -o pipelinefail ，否则管道命令的返回代码将是最后一个命令的返回代码。并且 uniq 始终返回零（成功）。尝试检查退出代码，您将看到类似这样的内容（此处未设置 pipefail）：

pavel@lonely ~ $ sort -u file_that_doesnt_exist ; echo $?
sort: open failed: file_that_doesnt_exist: No such file or directory
2
pavel@lonely ~ $ sort file_that_doesnt_exist | uniq ; echo $?
sort: open failed: file_that_doesnt_exist: No such file or directory
0

除此之外，命令是等效的。

There is one slight difference: return code.

The thing is that unless shopt -o pipefail is set the return code of the piped command will be return code of the last one. And uniq always returns zero (success). Try examining exit code, and you'll see something like this (pipefail is not set here):

pavel@lonely ~ $ sort -u file_that_doesnt_exist ; echo $?
sort: open failed: file_that_doesnt_exist: No such file or directory
2
pavel@lonely ~ $ sort file_that_doesnt_exist | uniq ; echo $?
sort: open failed: file_that_doesnt_exist: No such file or directory
0

Other than this, the commands are equivalent.

回复收藏 0 原文