哪一个处理速度更快？使用 Awk 或 Cut 打印列？

发布于 2024-10-06 09:10:26 字数 242 浏览 2 评论 0原文

我有大约 1 亿行和 6 个由空格分隔的字段，每个字段都有七位数字。

我喜欢删除第二个字段，并且可以通过以下方式实现它

1. awk '{print $1,$3,$4,$5,$6}' input.txt

2. cut --delimiter=' ' --fields=1,3-6 input.txt

哪一个更快地获得所需的输出？有没有办法计时这个过程？

感谢您的帮助。

原文

I have about 100 million rows and 6 fields separated by a space, each field has seven-digit numbers.

I like to delete the 2nd field and can achieve it with the following

1. awk '{print $1,$3,$4,$5,$6}' input.txt

2. cut --delimiter=' ' --fields=1,3-6 input.txt

Which one is faster to have the desired output? Is there a way to time the process?

Thank you for your help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

酒与心事 2024-10-13 09:10:26

有没有办法计时这个过程？

是的。只需在代码前添加命令 time ，它就会返回花费的时间。为每一个人做这件事。

time awk '{print $1,$3,$4,$5,$6}' input.txt
time cut --delimiter=' ' --fields=1,3-6 input.txt

通过快速的分析，看起来 cut 在这种情况下勉强胜出。对于 awk 来说，考虑到它比 cut 功能强大得多，这仍然是一个令人印象深刻的时刻。

切割

$ time for i in {1..1000}; do cut --delimiter=' ' --fields=1,3-6 >/dev/null <<<"one two three four five six seven"; done

real    0m4.074s
user    0m0.496s
sys     0m2.799s

awk

$ time for i in {1..1000}; do awk '{print $1,$3,$4,$5,$6}' >/dev/null <<<"one two three four five six seven"; done

real    0m4.511s
user    0m0.728s
sys     0m3.165s

Is there a way to time the process?

Yes. Just prepend the command time before your code and it will return how long it took. Do it for each one.

time awk '{print $1,$3,$4,$5,$6}' input.txt
time cut --delimiter=' ' --fields=1,3-6 input.txt

With a quick bit of profiling it looks like cut just barely wins out in this scenario. It's still quite an impressive time for awk considering how much more capable it is over cut.

cut

$ time for i in {1..1000}; do cut --delimiter=' ' --fields=1,3-6 >/dev/null <<<"one two three four five six seven"; done

real    0m4.074s
user    0m0.496s
sys     0m2.799s

awk

$ time for i in {1..1000}; do awk '{print $1,$3,$4,$5,$6}' >/dev/null <<<"one two three four five six seven"; done

real    0m4.511s
user    0m0.728s
sys     0m3.165s

回复收藏 0 原文

远山浅 2024-10-13 09:10:26

我测试了一下，结果是：

AWK 更快

我用大约 200 万行的文件进行了测试：

它只是使用标准分隔符进行剪切，然后打印到文件。

正如您所看到的，在这种情况下，AWK 的速度快了大约 3 倍（您自己也尝试一下）

演示：

# wc -l prueba
2088036 prueba    
# cat test.sh
date +%s
awk '{print $2}' prueba > ok
date +%s
cut -d" " -f2 prueba > ok2
date +%s
# ./test.sh
1484848197
1484848199
1484848204

这里有一些关于使 awk 更快的原因的文档：

http://www.linuxquestions.org/ questions/programming-9/which-one-is-efficient-cut-cmd-or-using-awk-783673/

https://lyness.io/the-function-and-performance-differences-of-sed-awk- and-other-unix-parsing-utilities

希望有帮助

I´ve tested it, and the result is:

AWK IS FASTER

I did my testing with a ~2Million rows file:

It´s just a cut with a standar delimiter, and printing to a file.

As you can see, AWK is ~3X faster in this case (try yourself the same)

Demonstration:

# wc -l prueba
2088036 prueba    
# cat test.sh
date +%s
awk '{print $2}' prueba > ok
date +%s
cut -d" " -f2 prueba > ok2
date +%s
# ./test.sh
1484848197
1484848199
1484848204

Here you have some docs about the reasons that make awk faster:

http://www.linuxquestions.org/questions/programming-9/which-one-is-efficient-cut-cmd-or-using-awk-783673/

https://lyness.io/the-functional-and-performance-differences-of-sed-awk-and-other-unix-parsing-utilities

Hope it helps

回复收藏 0 原文

~没有更多了~