使用正则表达式和 grep 获取日志文件中模式的所有唯一实例

发布于 2025-01-10 17:56:49 字数 581 浏览 0 评论 0原文

我需要从服务器的访问日志中获取正在访问服务器的唯一客户端计算机名称/IP 地址的列表。

目标日志行如下所示:

2020-11-17 15:34:04.208 -0500 Information 94  XYZ-ASDF-FMP123  Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".

在此示例中,字符串 (QWER-L1212-W6) [11.22.333.44] 将是客户端计算机/IP 地址的唯一实例的示例。

所以结果会是这样的:

(QWER-L1212-W6) [11.22.333.44]
(QWER-L1234-W7) [11.22.333.55]
etc...

我尝试了这个但没有成功:

grep --only-matching '\(.+\) \[.+\]' | sort --unique Access.log

匹配失败并返回整个日志行。

I need to get a list of unique client computer names/ip addresses that are accessing a server from the access logs of the server.

The target log line looks like this:

2020-11-17 15:34:04.208 -0500 Information 94  XYZ-ASDF-FMP123  Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".

In this example, the string (QWER-L1212-W6) [11.22.333.44] would be an example of a unique instance of a client computer/ip address.

So the result would be something like this:

(QWER-L1212-W6) [11.22.333.44]
(QWER-L1234-W7) [11.22.333.55]
etc...

I tried this without success:

grep --only-matching '\(.+\) \[.+\]' | sort --unique Access.log

the matching fails and the entire log line is returned.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

故人的歌 2025-01-17 17:56:49

请注意,您正在使用 POSIX BRE 正则表达式风格,因为您没有传递 -E/-r-P 选项来更改正则表达式风格默认的。 \(...\) 定义 POSIX BRE 中的捕获组。但这里还有更多问题。

您需要使用

grep -o '([^()]*) \[[^][]*]' Access.log | sort -u

记下 grep 的输入文件参数的位置。

这里的 ([^()]*) \[[^][]*] 是一个 POSIX BRE 模式,它匹配

  • ( - 文字 ( char(\( 是捕获组的开始)
  • [^()]* - 除 ( 之外的零个或多个字符和 )
  • ) - a文字 ) char(a \) 是捕获组的末尾)
  • - 空格
  • \[ - a [ char
  • [^][]* - 除 []
  • ] 之外的零个或多个字符 - 一个 ] 字符。

请参阅在线演示

#!/bin/bash
s='2020-11-17 15:34:04.208 -0500 Information 94  XYZ-ASDF-FMP123  Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".'
grep -o '([^()]*) \[[^][]*]' <<< "$s" | sort -u
# => (QWER-L1212-W6) [11.22.333.44]

Note you are using a POSIX BRE regex flavor since you do not pass -E/-r nor -P options to change the regex flavor from the default one. \(...\) defines a capturing group in POSIX BRE. There are more issues here though.

You need to use

grep -o '([^()]*) \[[^][]*]' Access.log | sort -u

Note the location of the input file argument to grep.

The ([^()]*) \[[^][]*] here is a POSIX BRE pattern that matches

  • ( - a literal ( char (a \( is the start of a capturing group)
  • [^()]* - zero or more chars other than ( and )
  • ) - a literal ) char (a \) is the end of a capturing group)
  • - a space
  • \[ - a [ char
  • [^][]* - zero or more chars other than [ and ]
  • ] - a ] char.

See the online demo:

#!/bin/bash
s='2020-11-17 15:34:04.208 -0500 Information 94  XYZ-ASDF-FMP123  Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".'
grep -o '([^()]*) \[[^][]*]' <<< "$s" | sort -u
# => (QWER-L1212-W6) [11.22.333.44]
小帐篷 2025-01-17 17:56:49
grep --only-matching '\(.+\) \[.+\]' file.log

这是失败的,因为您没有在 grep 中使用 ERE(扩展正则表达式或 -E),并且 + 未转义。因此,对于您的情况,以下内容可能有效:

grep -E --only-matching '\(.+\) \[.+\]' file.log

但是,此正则表达式是有问题的,因为 .+ 将在匹配结束 ) 和结束 ]。如果您的日志中有 (...) [...] 子字符串,如下所示:

2020-11-17 15:34:04.208 -0500 Information 94  XYZ-ASDF-FMP123  Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".
2020-11-17 15:34:04.208 -0500 Information 94  XYZ-ASDF-FMP123  Client "%USERNAME% (QWER-L1212-W6) [21.22.333.33]" opening database "databasename" as "username" (QWER-L1234-W7) [11.22.333.55]

那么您将得到不正确的结果。 不正确的结果也会随该模式一起显示为 '([^()]*) \[[^][]*]'

由于您使用的是 access.log,其中字段的格式和位置是固定的,因此使用 awk 进行提取会更安全、更高效,如下所示:

awk -F '"' '{sub(/^[^ ]* /, "", $2); print $2}' file.log

(QWER-L1212-W6) [11.22.333.44]
(QWER-L1212-W6) [21.22.333.33]
grep --only-matching '\(.+\) \[.+\]' file.log

This is failing because you are not using ERE (extended regex or -E) in grep and + is not escaped. So for your case following may work:

grep -E --only-matching '\(.+\) \[.+\]' file.log

However this regex is problematic because .+ will match 1+ of any character before matching closing ) and closing ]. If you have (...) [...] substring in your log like this:

2020-11-17 15:34:04.208 -0500 Information 94  XYZ-ASDF-FMP123  Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".
2020-11-17 15:34:04.208 -0500 Information 94  XYZ-ASDF-FMP123  Client "%USERNAME% (QWER-L1212-W6) [21.22.333.33]" opening database "databasename" as "username" (QWER-L1234-W7) [11.22.333.55]

Then you will get incorrect results. Incorrect results will also show up with the pattern as '([^()]*) \[[^][]*]'.

Since you are using access.log where format and positions of fields are fixed it is much safer and efficient to use awk for this extraction like this:

awk -F '"' '{sub(/^[^ ]* /, "", $2); print $2}' file.log

(QWER-L1212-W6) [11.22.333.44]
(QWER-L1212-W6) [21.22.333.33]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文