从日志中对唯一的 url 进行排序

发布于 2024-12-16 17:21:07 字数 301 浏览 8 评论 0原文

我需要从网络日志中获取唯一的 URL，然后对它们进行排序。我正在考虑使用 grep、uniq、sort 命令并将其输出到另一个文件

我执行了这个命令：

cat access.log | awk '{print $7}' > url.txt

然后只获取唯一的一个并对它们进行排序：

cat url.txt | uniq | sort > urls.txt

问题是我可以看到重复项，即使文件已排序，这意味着我的命令有效。为什么？

原文

I need to get the unique URLs from a web log and then sort them. I was thinking of using grep, uniq, sort command and output this to another file

I executed this command:

cat access.log | awk '{print $7}' > url.txt

then only get the unique one and sort them:

cat url.txt | uniq | sort > urls.txt

The problem is that I can see duplicates, even though the file is sorted which means my command worked. Why?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月亮是我掰弯的 2024-12-23 17:21:08

尝试这样的事情：

cat url.txt | sort | uniq

Try something like this:

cat url.txt | sort | uniq

回复收藏 0 原文

情场扛把子 2024-12-23 17:21:08

对于 nginx 访问日志，这给出了被调用的唯一 URL：

 sed -r "s/.*(GET|POST|PUT|DELETE|HEAD) (.*?) HTTP.*/\2/" /var/log/nginx/access.log | sort | uniq -u

参考：
https://www.guyrutenberg .com/2008/08/10/generate-url-list-from-access-log-access_log/

For nginx access logs, this gives the unique URLs being called:

 sed -r "s/.*(GET|POST|PUT|DELETE|HEAD) (.*?) HTTP.*/\2/" /var/log/nginx/access.log | sort | uniq -u

Reference:
https://www.guyrutenberg.com/2008/08/10/generating-url-list-from-access-log-access_log/

回复收藏 0 原文

不忘初心 2024-12-23 17:21:07

uniq | sort 不起作用：uniq 删除连续的重复项。

正确的方法是 sort | uniq 或更好的sort -u。因为只产生一个进程。

回复收藏 0 原文

痴者 2024-12-23 17:21:07

uniq 需要对其输入进行排序，但您在 uniq 之后进行了排序。尝试：

$ sort -u < url.txt > urls.txt

uniq needs its input sorted, but you sorted after uniq. Try:

$ sort -u < url.txt > urls.txt

回复收藏 0 原文

~没有更多了~

关于作者

女皇必胜

暂无简介

文章

27 人气

关注发私信

眼泪淡了忧伤

文章 0 评论 0

关注

corot39

文章 0 评论 0

关注

守护在此方

文章 0 评论 0

关注

github_3h15MP3i7

文章 0 评论 0

关注

相思故

文章 0 评论 0

关注

滥情空心

文章 0 评论 0

友情链接

文江博客

从日志中对唯一的 url 进行排序

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

从日志中对唯一的 url 进行排序

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。