Grep 日志中的 IP 地址
我不太擅长使用“基本?” unix 命令和这个问题进一步检验了我的知识。我想做的是 grep 日志中的所有 IP 地址(例如来自 apache 的 access.log)并计算它们出现的频率。我可以用一个命令来做到这一点,还是需要为此编写一个脚本?
I am quite bad at using "basic?" unix commands and this question puts my knowledge even more to test. What I would like to do is grep all IP adresses from a log (e.g. access.log from apache) and count how often they occur. Can I do that with one command or do I need to write a script for that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
您至少需要一条短管道。
它将打印每个 IP(尽管仅适用于 ipv4),并以计数为前缀进行排序。
我用 apache2 的 access.log 测试了它(虽然它是可配置的,所以你需要检查),它对我有用。它假定 IP 地址是每行的第一个内容。
sed 收集 IP 地址(实际上它会查找 4 组数字,中间有句点),并用它替换整行。
-e t
如果设法进行替换,则继续到下一行,-e d
会删除该行(如果该行上没有 IP 地址)。sort
排序.. :) 并且uniq -c
计算连续相同行的实例数(因为我们已经对它们进行了排序,所以它对应于总计数)。You'll need a short pipeline at least.
Which will print each IP (will only work with ipv4 though), sorted prefixed with the count.
I tested it with apache2's access.log (it's configurable though, so you'll need to check), and it worked for me. It assumes the IP-address is the first thing on each line.
The sed collects the IP-addresses (actually it looks for 4 sets of digits, with periods in between), and replaces the entire line with it.
-e t
continues to the next line if it managed to do a substitution,-e d
deletes the line (if there was no IP address on it).sort
sorts.. :) Anduniq -c
counts instances of consecutive identical lines (which, since we've sorted them, corresponds to the total count).这里给出的答案都不适合我,所以这是一个有效的答案:
它使用 grep 来隔离所有 ip。然后对它们进行排序、计数,然后再次对结果进行排序。
None of the answers presented here worked for me, so here is a working one:
it uses grep to isolate all ips. then sorts them, counts them, and sorts that result again.
您可以执行以下操作(其中数据文件是日志文件的名称)
编辑:错过了有关计数地址的部分,现在添加了
you can do the following (where datafile is the name of the log file)
edit: missed the part about counting address, now added
egrep '[[:数字:]]{1,3}(.[[:数字:]]{1,3}){3}' |awk '{print $1}'|sort|uniq -c
egrep '[[:digit:]]{1,3}(.[[:digit:]]{1,3}){3}' |awk '{print $1}'|sort|uniq -c
以下是我几年前写的一个脚本。它从 apache 访问日志中 grep 出地址。我刚刚尝试运行 Ubuntu 11.10 (oneiric) 3.0.0-32-generic #51-Ubuntu SMP Thu Mar 21 15:51:26 UTC 2013 i686 i686 i386 GNU/Linux
效果很好。使用 Gvim 或 Vim 读取生成的文件,该文件将被称为 unique_visits,它将在一列中列出唯一的 ip。关键在于 grep 使用的行。这些表达式用于提取 IP 地址数字。仅限 IPV4。您可能需要检查并更新浏览器版本号。我为 Slackware 系统编写的另一个类似脚本如下:
http://www.perpetualpc.net/srtd_bkmrk.html
The following is a script I wrote several years ago. It greps out addresses from apache access logs. I just tried it running Ubuntu 11.10 (oneiric) 3.0.0-32-generic #51-Ubuntu SMP Thu Mar 21 15:51:26 UTC 2013 i686 i686 i386 GNU/Linux
It works fine. Use Gvim or Vim to read the resulting file, which will be called unique_visits, which will list the unique ips in a column. The key to this is in the lines used with grep. Those expressions work to extract the ip address numbers. IPV4 only. You may need to go through and update browser version numbers. Another similar script that I wrote for a Slackware system is here:
http://www.perpetualpc.net/srtd_bkmrk.html
由于在 IP 地址中,3-Digits-Then-A-Dot 会重复 3 次,因此我们可以这样写:
使用 bash 变量甚至更短:
要仅打印文件中唯一的 IP 地址,请使用
管道输出排序--uniq
。Since in an IP address, 3-Digits-Then-A-Dot repeats itself 3 times, so we can write this way:
Even shorter using bash variable:
To print only unique IP addresses in the file, pipe the output with
sort --uniq
.使用 sed:
您可以在 Inernet 上搜索并找到可用于 ip 地址的正则表达式,并将其替换为
。例如 摘自 stackoverflow 上相关问题的答案Using sed:
You can search and find regex available for ip address on Inernet and replace it with
<regex_for_ip_address>
. e.g. From answers to a related question on stackoverflow