从日志中对唯一的 url 进行排序
我需要从网络日志中获取唯一的 URL,然后对它们进行排序。我正在考虑使用 grep、uniq、sort 命令并将其输出到另一个文件
我执行了这个命令:
cat access.log | awk '{print $7}' > url.txt
然后只获取唯一的一个并对它们进行排序:
cat url.txt | uniq | sort > urls.txt
问题是我可以看到重复项,即使文件已排序,这意味着我的命令有效。为什么?
I need to get the unique URLs from a web log and then sort them. I was thinking of using grep, uniq, sort command and output this to another file
I executed this command:
cat access.log | awk '{print $7}' > url.txt
then only get the unique one and sort them:
cat url.txt | uniq | sort > urls.txt
The problem is that I can see duplicates, even though the file is sorted which means my command worked. Why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
尝试这样的事情:
Try something like this:
对于 nginx 访问日志,这给出了被调用的唯一 URL:
参考:
https://www.guyrutenberg .com/2008/08/10/generate-url-list-from-access-log-access_log/
For nginx access logs, this gives the unique URLs being called:
Reference:
https://www.guyrutenberg.com/2008/08/10/generating-url-list-from-access-log-access_log/
uniq | sort
不起作用:uniq
删除连续的重复项。正确的方法是
sort | uniq
或更好的sort -u
。因为只产生一个进程。uniq | sort
does not work:uniq
removes contiguous duplicates.The correct way is
sort | uniq
or bettersort -u
. Because only one process is spawned.uniq 需要对其输入进行排序,但您在 uniq 之后进行了排序。尝试:
uniq needs its input sorted, but you sorted after uniq. Try: