Grep 日志中的 IP 地址

发布于 2024-11-02 06:55:10 字数 131 浏览 0 评论 0原文

我不太擅长使用“基本?” unix 命令和这个问题进一步检验了我的知识。我想做的是 grep 日志中的所有 IP 地址(例如来自 apache 的 access.log)并计算它们出现的频率。我可以用一个命令来做到这一点,还是需要为此编写一个脚本?

I am quite bad at using "basic?" unix commands and this question puts my knowledge even more to test. What I would like to do is grep all IP adresses from a log (e.g. access.log from apache) and count how often they occur. Can I do that with one command or do I need to write a script for that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

辞慾 2024-11-09 06:55:10

您至少需要一条短管道。

sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c

它将打印每个 IP(尽管仅适用于 ipv4),并以计数为前缀进行排序。

我用 apache2 的 access.log 测试了它(虽然它是可配置的,所以你需要检查),它对我有用。它假定 IP 地址是每行的第一个内容。

sed 收集 IP 地址(实际上它会查找 4 组数字,中间有句点),并用它替换整行。 -e t 如果设法进行替换,则继续到下一行,-e d 会删除该行(如果该行上没有 IP 地址)。 sort 排序.. :) 并且 uniq -c 计算连续相同行的实例数(因为我们已经对它们进行了排序,所以它对应于总计数)。

You'll need a short pipeline at least.

sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c

Which will print each IP (will only work with ipv4 though), sorted prefixed with the count.

I tested it with apache2's access.log (it's configurable though, so you'll need to check), and it worked for me. It assumes the IP-address is the first thing on each line.

The sed collects the IP-addresses (actually it looks for 4 sets of digits, with periods in between), and replaces the entire line with it. -e t continues to the next line if it managed to do a substitution, -e d deletes the line (if there was no IP address on it). sort sorts.. :) And uniq -c counts instances of consecutive identical lines (which, since we've sorted them, corresponds to the total count).

高冷爸爸 2024-11-09 06:55:10

这里给出的答案都不适合我,所以这是一个有效的答案:

cat yourlogs.txt | grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" | sort | uniq -c | sort

它使用 grep 来隔离所有 ip。然后对它们进行排序、计数,然后再次对结果进行排序。

None of the answers presented here worked for me, so here is a working one:

cat yourlogs.txt | grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" | sort | uniq -c | sort

it uses grep to isolate all ips. then sorts them, counts them, and sorts that result again.

迷路的信 2024-11-09 06:55:10

您可以执行以下操作(其中数据文件是日志文件的名称)

egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' datafile | sort | uniq -c

编辑:错过了有关计数地址的部分,现在添加了

you can do the following (where datafile is the name of the log file)

egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' datafile | sort | uniq -c

edit: missed the part about counting address, now added

书信已泛黄 2024-11-09 06:55:10

egrep '[[:数字:]]{1,3}(.[[:数字:]]{1,3}){3}' |awk '{print $1}'|sort|uniq -c

egrep '[[:digit:]]{1,3}(.[[:digit:]]{1,3}){3}' |awk '{print $1}'|sort|uniq -c

×纯※雪 2024-11-09 06:55:10

以下是我几年前写的一个脚本。它从 apache 访问日志中 grep 出地址。我刚刚尝试运行 Ubuntu 11.10 (oneiric) 3.0.0-32-generic #51-Ubuntu SMP Thu Mar 21 15:51:26 UTC 2013 i686 i686 i386 GNU/Linux
效果很好。使用 Gvim 或 Vim 读取生成的文件,该文件将被称为 unique_visits,它将在一列中列出唯一的 ip。关键在于 grep 使用的行。这些表达式用于提取 IP 地址数字。仅限 IPV4。您可能需要检查并更新浏览器版本号。我为 Slackware 系统编写的另一个类似脚本如下:
http://www.perpetualpc.net/srtd_bkmrk.html

#!/bin/sh
#eliminate search engine referals and zombie hunters. combined_log is the original file
egrep '(google)|(yahoo)|(mamma)|(query)|(msn)|(ask.com)|(search)|(altavista)|(images.google)|(xb1)|(cmd.exe)|(trexmod)|(robots.txt)|(copernic.com)|(POST)' combined_log > search
#now sort them to eliminate duplicates and put them in order
sort -un search > search_sort
#do the same with original file
sort -un combined_log > combined_log_sort
#now get all the ip addresses. only the numbers
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' search_sort > search_sort_ip
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' combined_log_sort > combined_log_sort_ip
sdiff -s combined_log_sort_ip search_sort_ip > final_result_ip
#get rid of the extra column
grep -o '^\|[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' final_result_ip > bookmarked_ip
#remove stuff like browser versions and system versions
egrep -v '(4.4.2.0)|(1.6.3.1)|(0.9.2.1)|(4.0.0.42)|(4.1.8.0)|(1.305.2.109)|(1.305.2.12)|(0.0.43.45)|(5.0.0.0)|(1.6.2.0)|(4.4.5.0)|(1.305.2.137)|(4.3.5.0)|(1.2.0.7)|(4.1.5.0)|(5.0.2.6)|(4.4.9.0)|(6.1.0.1)|(4.4.9.0)|(5.0.8.6)|(5.0.2.4)|(4.4.8.0)|(4.4.6.0)' bookmarked_ip > unique_visits

exit 0

The following is a script I wrote several years ago. It greps out addresses from apache access logs. I just tried it running Ubuntu 11.10 (oneiric) 3.0.0-32-generic #51-Ubuntu SMP Thu Mar 21 15:51:26 UTC 2013 i686 i686 i386 GNU/Linux
It works fine. Use Gvim or Vim to read the resulting file, which will be called unique_visits, which will list the unique ips in a column. The key to this is in the lines used with grep. Those expressions work to extract the ip address numbers. IPV4 only. You may need to go through and update browser version numbers. Another similar script that I wrote for a Slackware system is here:
http://www.perpetualpc.net/srtd_bkmrk.html

#!/bin/sh
#eliminate search engine referals and zombie hunters. combined_log is the original file
egrep '(google)|(yahoo)|(mamma)|(query)|(msn)|(ask.com)|(search)|(altavista)|(images.google)|(xb1)|(cmd.exe)|(trexmod)|(robots.txt)|(copernic.com)|(POST)' combined_log > search
#now sort them to eliminate duplicates and put them in order
sort -un search > search_sort
#do the same with original file
sort -un combined_log > combined_log_sort
#now get all the ip addresses. only the numbers
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' search_sort > search_sort_ip
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' combined_log_sort > combined_log_sort_ip
sdiff -s combined_log_sort_ip search_sort_ip > final_result_ip
#get rid of the extra column
grep -o '^\|[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' final_result_ip > bookmarked_ip
#remove stuff like browser versions and system versions
egrep -v '(4.4.2.0)|(1.6.3.1)|(0.9.2.1)|(4.0.0.42)|(4.1.8.0)|(1.305.2.109)|(1.305.2.12)|(0.0.43.45)|(5.0.0.0)|(1.6.2.0)|(4.4.5.0)|(1.305.2.137)|(4.3.5.0)|(1.2.0.7)|(4.1.5.0)|(5.0.2.6)|(4.4.9.0)|(6.1.0.1)|(4.4.9.0)|(5.0.8.6)|(5.0.2.4)|(4.4.8.0)|(4.4.6.0)' bookmarked_ip > unique_visits

exit 0
完美的未来在梦里 2024-11-09 06:55:10

由于在 IP 地址中,3-Digits-Then-A-Dot 会重复 3 次,因此我们可以这样写:

cat filename | egrep -o "([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}"
                                      ^^^     ^       ^~~~~~~~   
                         Up_to_3_digits.     Repeat_thrice.   Last_section.

使用 bash 变量甚至更短:

PAT=[[:digit:]]{1,3}
cat filename | egrep -o "($PAT\.){3}$PAT" 

要仅打印文件中唯一的 IP 地址,请使用 管道输出排序--uniq

Since in an IP address, 3-Digits-Then-A-Dot repeats itself 3 times, so we can write this way:

cat filename | egrep -o "([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}"
                                      ^^^     ^       ^~~~~~~~   
                         Up_to_3_digits.     Repeat_thrice.   Last_section.

Even shorter using bash variable:

PAT=[[:digit:]]{1,3}
cat filename | egrep -o "($PAT\.){3}$PAT" 

To print only unique IP addresses in the file, pipe the output with sort --uniq.

兰花执着 2024-11-09 06:55:10

使用 sed:

$ sed 's/.*\(<regex_for_ip_address>\).*/\1/' <filename> | sort | uniq -c

您可以在 Inernet 上搜索并找到可用于 ip 地址的正则表达式,并将其替换为 。例如 摘自 stackoverflow 上相关问题的答案

Using sed:

$ sed 's/.*\(<regex_for_ip_address>\).*/\1/' <filename> | sort | uniq -c

You can search and find regex available for ip address on Inernet and replace it with <regex_for_ip_address>. e.g. From answers to a related question on stackoverflow

破晓 2024-11-09 06:55:10
cat access.log |egrep -o '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' |uniq -c|sort
cat access.log |egrep -o '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' |uniq -c|sort
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文