awk 解析邮件日志中的唯一 IP 地址

发布于 2024-10-03 08:37:20 字数 1199 浏览 3 评论 0原文

昨天我在这里问了一个关于oneliner 和 mjschultz 给了我一个答案,我立刻就爱上了:) awk 刚刚破坏了手头的任务,解析了几秒钟内即可处理大型日志文件(500+ MB)。现在我正在尝试将其他 oneliners 移植到 awk。

这是有问题的:

grep "pop3\[" maillog | grep "User logged in" |  
egrep -o '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' | sort -u

我需要使用 pop3 连接到邮件服务器的所有唯一 IP 地址的列表。

这是一个示例日志条目:

Nov 15 00:49:21 hostname pop3[19418]: login: [10.10.10.10] username plaintext  
User logged in

因此,我找到所有包含“pop3”的行,并解析它们以查找“用户登录”部分。接下来,我使用egrep和正则表达式来匹配IP地址,并使用排序来过滤掉重复的地址。

这是我到目前为止的 awk 版本:

awk '/pop3\[.*.User logged in/ {ip[$7]=0} END {for (address in ip)  
{ print address} }' maillog

这工作得很好,但一如既往,并非所有日志条目都是相同的,例如有时 IP 会移动到第 8 个字段,如下所示:

Nov 15 10:42:40 hostname pop3[2232]: login: hostname.domain.com [20.20.20.20]  
username plaintext User logged in

What would be the best way to catch those items with还有 awk 吗?

一如既往地感谢您提前做出的所有精彩回复,您已经教会了我很多东西:)

Yesterday I asked a question here about a oneliner and mjschultz gave me an answer that I instantly fell in love with :) Awk just destroyed the task at hand, parsing a large logfile (500+ MB) in a matter of seconds. Now I'm trying to port my other oneliners to awk.

This is the one in question:

grep "pop3\[" maillog | grep "User logged in" |  
egrep -o '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' | sort -u

I need the list of all unique IP addresses using pop3 to connect to the mail server.

This is an example log entry:

Nov 15 00:49:21 hostname pop3[19418]: login: [10.10.10.10] username plaintext  
User logged in

So I find all the lines containing "pop3" and I parse them for the "User logged in" part. Next i use egrep and a regex to match IP addresses and I use sort to filter out the duplicate addresses.

This is what I have so far for my awk version:

awk '/pop3\[.*.User logged in/ {ip[$7]=0} END {for (address in ip)  
{ print address} }' maillog

This works perfectly but as always not all log entries are identical, for example sometimes the IP gets moved to the 8th field like here:

Nov 15 10:42:40 hostname pop3[2232]: login: hostname.domain.com [20.20.20.20]  
username plaintext User logged in

What would be the best way to catch those entries with awk as well?

As always thanks for all the great responses in advance, you've taught me so much already :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

天涯沦落人 2024-10-10 08:37:20

AWK代码

只需匹配您的ip格式...注意没有其他格式...

/pop3\[.*.User logged in/    {
         where = match($0,/\[[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)
         if (where)
           ip[substr($0,RSTART+1,RLENGTH-1)]=0
} 

END {for (address in ip)  
{ print address} }  

ideone

AWK code

just match your ip format ... be careful that there are no other formats ...

/pop3\[.*.User logged in/    {
         where = match($0,/\[[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)
         if (where)
           ip[substr($0,RSTART+1,RLENGTH-1)]=0
} 

END {for (address in ip)  
{ print address} }  

running at ideone

固执像三岁 2024-10-10 08:37:20

对我来说,这看起来更像是 Perl 领域,而不是 Awk:

my %ip_addresses = ();
while (<>)
{
    next unless m/pop3\[/;
    next unless m/User logged in/;
    if (my($ip) = $_ =~ m/( \d{1,3} (?: [.] \d{1,3} ){3} )/msx)
    {
         $ip_addresses{$ip} = 1;
    }
}
foreach my $ip (sort keys %ip_addresses)
{
    print "$ip\n";
}

排序不太完美 - 按字母顺序而不是数字顺序(因此 192.1.168.10 将出现在 9.25.13.26 之前)。当然,这是可以解决的。

That looks more like Perl territory than Awk to me:

my %ip_addresses = ();
while (<>)
{
    next unless m/pop3\[/;
    next unless m/User logged in/;
    if (my($ip) = $_ =~ m/( \d{1,3} (?: [.] \d{1,3} ){3} )/msx)
    {
         $ip_addresses{$ip} = 1;
    }
}
foreach my $ip (sort keys %ip_addresses)
{
    print "$ip\n";
}

The sort is less than perfect - being alphabetic rather than numeric (so 192.1.168.10 will appear before 9.25.13.26). That can be fixed, of course.

小瓶盖 2024-10-10 08:37:20

在看到并尝试了这些方法之后,我有了一个新想法。

belisarius 的代码满足了我的要求,但由于它必须执行所有正则表达式匹配,因此它不是最快的代码,速度才是我所追求的。

所以我想出了这个,因为你可以看到“有问题的”日志行有一个额外的字段,使它们全部有 13 个字段长而不是正常的 12 个字段,所以我只是删除了额外的字段,这给了我正确的 IP 列表地址,接下来我再次使用 awk 删除所有重复条目:

awk '/pop3\[.*.User logged in/ {{if (NF == 13) $7="";gsub(FS "+",FS)};print $7}'
/var/log/maillog | awk '!($0 in a){a[$0];print}'

Ideone 链接 如果您想查看正在运行的代码

After seeing and trying these approaches I got a new idea.

belisarius's code does what I asked for but since it has to do all the regex matching it's not the fastest one and speed is what I'm after.

So I came up with this, as you can see the "problematic" log lines have an extra field, making them all 13 fields long instead of the normal 12, so I just delete the extra field, this gives me the correct list of IP addresses, next i use awk again to delete all duplicate entries:

awk '/pop3\[.*.User logged in/ {{if (NF == 13) $7="";gsub(FS "+",FS)};print $7}'
/var/log/maillog | awk '!($0 in a){a[$0];print}'

Ideone link if you want to see the code in action

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文