Grep 日志中的 IP 地址

发布于 2024-11-02 06:55:10 字数 131 浏览 0 评论 0原文

我不太擅长使用“基本？” unix 命令和这个问题进一步检验了我的知识。我想做的是 grep 日志中的所有 IP 地址（例如来自 apache 的 access.log）并计算它们出现的频率。我可以用一个命令来做到这一点，还是需要为此编写一个脚本？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

辞慾 2024-11-09 06:55:10

您至少需要一条短管道。

sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c

它将打印每个 IP（尽管仅适用于 ipv4），并以计数为前缀进行排序。

我用 apache2 的 access.log 测试了它（虽然它是可配置的，所以你需要检查），它对我有用。它假定 IP 地址是每行的第一个内容。

sed 收集 IP 地址（实际上它会查找 4 组数字，中间有句点），并用它替换整行。 -e t 如果设法进行替换，则继续到下一行，-e d 会删除该行（如果该行上没有 IP 地址）。 sort 排序.. :) 并且 uniq -c 计算连续相同行的实例数（因为我们已经对它们进行了排序，所以它对应于总计数）。

You'll need a short pipeline at least.

sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c

Which will print each IP (will only work with ipv4 though), sorted prefixed with the count.

I tested it with apache2's access.log (it's configurable though, so you'll need to check), and it worked for me. It assumes the IP-address is the first thing on each line.

The sed collects the IP-addresses (actually it looks for 4 sets of digits, with periods in between), and replaces the entire line with it. -e t continues to the next line if it managed to do a substitution, -e d deletes the line (if there was no IP address on it). sort sorts.. :) And uniq -c counts instances of consecutive identical lines (which, since we've sorted them, corresponds to the total count).

回复收藏 0 原文

高冷爸爸 2024-11-09 06:55:10

这里给出的答案都不适合我，所以这是一个有效的答案：

cat yourlogs.txt | grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" | sort | uniq -c | sort

它使用 grep 来隔离所有 ip。然后对它们进行排序、计数，然后再次对结果进行排序。

None of the answers presented here worked for me, so here is a working one:

cat yourlogs.txt | grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" | sort | uniq -c | sort

it uses grep to isolate all ips. then sorts them, counts them, and sorts that result again.

回复收藏 0 原文

迷路的信 2024-11-09 06:55:10

您可以执行以下操作（其中数据文件是日志文件的名称）

egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' datafile | sort | uniq -c

编辑：错过了有关计数地址的部分，现在添加了

you can do the following (where datafile is the name of the log file)

egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' datafile | sort | uniq -c

edit: missed the part about counting address, now added

回复收藏 0 原文

书信已泛黄 2024-11-09 06:55:10

egrep '[[:数字:]]{1,3}(.[[:数字:]]{1,3}){3}' |awk '{print $1}'|sort|uniq -c

回复收藏 0 原文

×纯※雪 2024-11-09 06:55:10

以下是我几年前写的一个脚本。它从 apache 访问日志中 grep 出地址。我刚刚尝试运行 Ubuntu 11.10 (oneiric) 3.0.0-32-generic #51-Ubuntu SMP Thu Mar 21 15:51:26 UTC 2013 i686 i686 i386 GNU/Linux
效果很好。使用 Gvim 或 Vim 读取生成的文件，该文件将被称为 unique_visits，它将在一列中列出唯一的 ip。关键在于 grep 使用的行。这些表达式用于提取 IP 地址数字。仅限 IPV4。您可能需要检查并更新浏览器版本号。我为 Slackware 系统编写的另一个类似脚本如下：
http://www.perpetualpc.net/srtd_bkmrk.html

#!/bin/sh
#eliminate search engine referals and zombie hunters. combined_log is the original file
egrep '(google)|(yahoo)|(mamma)|(query)|(msn)|(ask.com)|(search)|(altavista)|(images.google)|(xb1)|(cmd.exe)|(trexmod)|(robots.txt)|(copernic.com)|(POST)' combined_log > search
#now sort them to eliminate duplicates and put them in order
sort -un search > search_sort
#do the same with original file
sort -un combined_log > combined_log_sort
#now get all the ip addresses. only the numbers
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' search_sort > search_sort_ip
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' combined_log_sort > combined_log_sort_ip
sdiff -s combined_log_sort_ip search_sort_ip > final_result_ip
#get rid of the extra column
grep -o '^\|[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' final_result_ip > bookmarked_ip
#remove stuff like browser versions and system versions
egrep -v '(4.4.2.0)|(1.6.3.1)|(0.9.2.1)|(4.0.0.42)|(4.1.8.0)|(1.305.2.109)|(1.305.2.12)|(0.0.43.45)|(5.0.0.0)|(1.6.2.0)|(4.4.5.0)|(1.305.2.137)|(4.3.5.0)|(1.2.0.7)|(4.1.5.0)|(5.0.2.6)|(4.4.9.0)|(6.1.0.1)|(4.4.9.0)|(5.0.8.6)|(5.0.2.4)|(4.4.8.0)|(4.4.6.0)' bookmarked_ip > unique_visits

exit 0

The following is a script I wrote several years ago. It greps out addresses from apache access logs. I just tried it running Ubuntu 11.10 (oneiric) 3.0.0-32-generic #51-Ubuntu SMP Thu Mar 21 15:51:26 UTC 2013 i686 i686 i386 GNU/Linux
It works fine. Use Gvim or Vim to read the resulting file, which will be called unique_visits, which will list the unique ips in a column. The key to this is in the lines used with grep. Those expressions work to extract the ip address numbers. IPV4 only. You may need to go through and update browser version numbers. Another similar script that I wrote for a Slackware system is here:
http://www.perpetualpc.net/srtd_bkmrk.html

#!/bin/sh
#eliminate search engine referals and zombie hunters. combined_log is the original file
egrep '(google)|(yahoo)|(mamma)|(query)|(msn)|(ask.com)|(search)|(altavista)|(images.google)|(xb1)|(cmd.exe)|(trexmod)|(robots.txt)|(copernic.com)|(POST)' combined_log > search
#now sort them to eliminate duplicates and put them in order
sort -un search > search_sort
#do the same with original file
sort -un combined_log > combined_log_sort
#now get all the ip addresses. only the numbers
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' search_sort > search_sort_ip
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' combined_log_sort > combined_log_sort_ip
sdiff -s combined_log_sort_ip search_sort_ip > final_result_ip
#get rid of the extra column
grep -o '^\|[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' final_result_ip > bookmarked_ip
#remove stuff like browser versions and system versions
egrep -v '(4.4.2.0)|(1.6.3.1)|(0.9.2.1)|(4.0.0.42)|(4.1.8.0)|(1.305.2.109)|(1.305.2.12)|(0.0.43.45)|(5.0.0.0)|(1.6.2.0)|(4.4.5.0)|(1.305.2.137)|(4.3.5.0)|(1.2.0.7)|(4.1.5.0)|(5.0.2.6)|(4.4.9.0)|(6.1.0.1)|(4.4.9.0)|(5.0.8.6)|(5.0.2.4)|(4.4.8.0)|(4.4.6.0)' bookmarked_ip > unique_visits

exit 0

回复收藏 0 原文

完美的未来在梦里 2024-11-09 06:55:10

由于在 IP 地址中，3-Digits-Then-A-Dot 会重复 3 次，因此我们可以这样写：

cat filename | egrep -o "([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}"
                                      ^^^     ^       ^~~~~~~~   
                         Up_to_3_digits.     Repeat_thrice.   Last_section.

使用 bash 变量甚至更短：

PAT=[[:digit:]]{1,3}
cat filename | egrep -o "($PAT\.){3}$PAT"

要仅打印文件中唯一的 IP 地址，请使用 管道输出排序--uniq。

Since in an IP address, 3-Digits-Then-A-Dot repeats itself 3 times, so we can write this way:

cat filename | egrep -o "([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}"
                                      ^^^     ^       ^~~~~~~~   
                         Up_to_3_digits.     Repeat_thrice.   Last_section.

Even shorter using bash variable:

PAT=[[:digit:]]{1,3}
cat filename | egrep -o "($PAT\.){3}$PAT"

To print only unique IP addresses in the file, pipe the output with sort --uniq.

回复收藏 0 原文

兰花执着 2024-11-09 06:55:10

使用 sed：

$ sed 's/.*\(<regex_for_ip_address>\).*/\1/' <filename> | sort | uniq -c

您可以在 Inernet 上搜索并找到可用于 ip 地址的正则表达式，并将其替换为。例如摘自 stackoverflow 上相关问题的答案

Using sed:

$ sed 's/.*\(<regex_for_ip_address>\).*/\1/' <filename> | sort | uniq -c

You can search and find regex available for ip address on Inernet and replace it with <regex_for_ip_address>. e.g. From answers to a related question on stackoverflow

回复收藏 0 原文

破晓 2024-11-09 06:55:10

cat access.log |egrep -o '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' |uniq -c|sort

cat access.log |egrep -o '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' |uniq -c|sort

回复收藏 0 原文

~没有更多了~

关于作者

思慕

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

Grep 日志中的 IP 地址

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

Grep 日志中的 IP 地址

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。