哪一个处理速度更快?使用 Awk 或 Cut 打印列?
我有大约 1 亿行和 6 个由空格分隔的字段,每个字段都有七位数字。
我喜欢删除第二个字段,并且可以通过以下方式实现它
1. awk '{print $1,$3,$4,$5,$6}' input.txt
2. cut --delimiter=' ' --fields=1,3-6 input.txt
哪一个更快地获得所需的输出?有没有办法计时这个过程?
感谢您的帮助。
I have about 100 million rows and 6 fields separated by a space, each field has seven-digit numbers.
I like to delete the 2nd field and can achieve it with the following
1. awk '{print $1,$3,$4,$5,$6}' input.txt
2. cut --delimiter=' ' --fields=1,3-6 input.txt
Which one is faster to have the desired output? Is there a way to time the process?
Thank you for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的。只需在代码前添加命令
time
,它就会返回花费的时间。为每一个人做这件事。通过快速的分析,看起来
cut
在这种情况下勉强胜出。对于awk
来说,考虑到它比cut
功能强大得多,这仍然是一个令人印象深刻的时刻。切割
awk
Yes. Just prepend the command
time
before your code and it will return how long it took. Do it for each one.With a quick bit of profiling it looks like
cut
just barely wins out in this scenario. It's still quite an impressive time forawk
considering how much more capable it is overcut
.cut
awk
我测试了一下,结果是:
我用大约 200 万行的文件进行了测试:
它只是使用标准分隔符进行剪切,然后打印到文件。
正如您所看到的,在这种情况下,AWK 的速度快了大约 3 倍(您自己也尝试一下)
演示:
这里有一些关于使 awk 更快的原因的文档:
http://www.linuxquestions.org/ questions/programming-9/which-one-is-efficient-cut-cmd-or-using-awk-783673/
https://lyness.io/the-function-and-performance-differences-of-sed-awk- and-other-unix-parsing-utilities
希望有帮助
I´ve tested it, and the result is:
I did my testing with a ~2Million rows file:
It´s just a cut with a standar delimiter, and printing to a file.
As you can see, AWK is ~3X faster in this case (try yourself the same)
Demonstration:
Here you have some docs about the reasons that make awk faster:
http://www.linuxquestions.org/questions/programming-9/which-one-is-efficient-cut-cmd-or-using-awk-783673/
https://lyness.io/the-functional-and-performance-differences-of-sed-awk-and-other-unix-parsing-utilities
Hope it helps