关于linux中的awk shell和pipe

发布于 2024-12-21 00:49:22 字数 673 浏览 0 评论 0原文

大家，我正在处理一个大约有500万行的日志文件，所以我在linux中使用awk shell

我必须grep域并获取日志中最高的100，所以我这样写：

          awk '{print $19}' $1 | 
          awk '{ split($0, string, "/");print string[1]}' |
          awk '{domains[$0]++} END{for(j in domains) print domains[j], j}' |
          sort -n | tail -n 100 > $2

它运行大约13秒

然后我像这样更改脚本：

          awk 'split($19, string, "/"); domains[string[1]]++}
               END{for(j in domains) print domains[j], j}' $1 |
          sort -n | tail -n 100 > $2

它运行大约 21 秒，

为什么？

你知道 awk shell 的一行可能会减少 cal 的总和，它只读取每一行一次，但时间增加......

所以，如果你知道答案，请告诉我

原文

everyone, I am dealing with a log file which has about 5 million lines, so I use the awk shell in linux

I have to grep the domains and get the highest 100 in the log, so I write like this:

          awk '{print $19}' $1 | 
          awk '{ split($0, string, "/");print string[1]}' |
          awk '{domains[$0]++} END{for(j in domains) print domains[j], j}' |
          sort -n | tail -n 100 > $2

it runs about 13 seconds

then I change the script like this:

          awk 'split($19, string, "/"); domains[string[1]]++}
               END{for(j in domains) print domains[j], j}' $1 |
          sort -n | tail -n 100 > $2

it runs about 21 seconds

why?

you know one line of awk shell may reduce the sum of cal, it only read each line once, but the time increase...

so, if you know the answer, tell me

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倾城月光淡如水﹏ 2024-12-28 00:49:22

当您使用管道命令时，只要管道已满，它们就会并行运行。

所以我的猜测是，在第一个版本中，工作分布在 CPU 之间，而在第二个版本中，所有工作都由一个核心完成。

您可以使用 top（或者更好的是 htop）来验证这一点。

出于好奇，这样更快吗？（未经测试）：

cut -f 19 -d' ' $1 | cut -f1 -d'/' | sort | uniq -c | sort -nr | head -n 100 > $2

When you pipe commands they run in parallel as long as the pipe is full.

So my guess is that in the first version work is distributed among your CPUs, while in the second one all the work is done by one core.

You can verify this with top (or, better, htop).

Out of curiosity, is this faster? (untested):

cut -f 19 -d' ' $1 | cut -f1 -d'/' | sort | uniq -c | sort -nr | head -n 100 > $2

回复收藏 0 原文

~没有更多了~

关于作者

救赎№

暂无简介

文章

26 人气

关注发私信

梦里南柯

文章 0 评论 0

关注

不将就、

文章 0 评论 0

关注

alipaysp_ZRaVhH1Dn

文章 0 评论 0

关注

青衫儰鉨ミ守葔

文章 0 评论 0

关注

故事未完

文章 0 评论 0

关注

梦晓ヶ微光ヅ倾城

文章 0 评论 0

友情链接

文江博客

关于linux中的awk shell和pipe

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

梦里南柯

不将就、

alipaysp_ZRaVhH1Dn

青衫儰鉨ミ守葔

故事未完

梦晓ヶ微光ヅ倾城

友情链接

关于linux中的awk shell和pipe

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

梦里南柯

不将就、

alipaysp_ZRaVhH1Dn

青衫儰鉨ミ守葔

故事未完

梦晓ヶ微光ヅ倾城

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。