使用正则表达式和 grep 获取日志文件中模式的所有唯一实例
我需要从服务器的访问日志中获取正在访问服务器的唯一客户端计算机名称/IP 地址的列表。
目标日志行如下所示:
2020-11-17 15:34:04.208 -0500 Information 94 XYZ-ASDF-FMP123 Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".
在此示例中,字符串 (QWER-L1212-W6) [11.22.333.44]
将是客户端计算机/IP 地址的唯一实例的示例。
所以结果会是这样的:
(QWER-L1212-W6) [11.22.333.44]
(QWER-L1234-W7) [11.22.333.55]
etc...
我尝试了这个但没有成功:
grep --only-matching '\(.+\) \[.+\]' | sort --unique Access.log
匹配失败并返回整个日志行。
I need to get a list of unique client computer names/ip addresses that are accessing a server from the access logs of the server.
The target log line looks like this:
2020-11-17 15:34:04.208 -0500 Information 94 XYZ-ASDF-FMP123 Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".
In this example, the string (QWER-L1212-W6) [11.22.333.44]
would be an example of a unique instance of a client computer/ip address.
So the result would be something like this:
(QWER-L1212-W6) [11.22.333.44]
(QWER-L1234-W7) [11.22.333.55]
etc...
I tried this without success:
grep --only-matching '\(.+\) \[.+\]' | sort --unique Access.log
the matching fails and the entire log line is returned.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
请注意,您正在使用 POSIX BRE 正则表达式风格,因为您没有传递
-E
/-r
或-P
选项来更改正则表达式风格默认的。\(...\)
定义 POSIX BRE 中的捕获组。但这里还有更多问题。您需要使用
记下
grep
的输入文件参数的位置。这里的
([^()]*) \[[^][]*]
是一个 POSIX BRE 模式,它匹配(
- 文字( char(
\(
是捕获组的开始)[^()]*
- 除(
之外的零个或多个字符和)
)
- a文字)
char(a\)
是捕获组的末尾)- 空格
\[
- a[
char[^][]*
- 除[
和]
] 之外的零个或多个字符
- 一个]
字符。请参阅在线演示:
Note you are using a POSIX BRE regex flavor since you do not pass
-E
/-r
nor-P
options to change the regex flavor from the default one.\(...\)
defines a capturing group in POSIX BRE. There are more issues here though.You need to use
Note the location of the input file argument to
grep
.The
([^()]*) \[[^][]*]
here is a POSIX BRE pattern that matches(
- a literal(
char (a\(
is the start of a capturing group)[^()]*
- zero or more chars other than(
and)
)
- a literal)
char (a\)
is the end of a capturing group)- a space
\[
- a[
char[^][]*
- zero or more chars other than[
and]
]
- a]
char.See the online demo:
这是失败的,因为您没有在
grep
中使用 ERE(扩展正则表达式或-E
),并且+
未转义。因此,对于您的情况,以下内容可能有效:但是,此正则表达式是有问题的,因为
.+
将在匹配结束)
和结束]
。如果您的日志中有(...) [...]
子字符串,如下所示:那么您将得到不正确的结果。 不正确的结果也会随该模式一起显示为
'([^()]*) \[[^][]*]'
。由于您使用的是
access.log
,其中字段的格式和位置是固定的,因此使用awk
进行提取会更安全、更高效,如下所示:This is failing because you are not using ERE (extended regex or
-E
) ingrep
and+
is not escaped. So for your case following may work:However this regex is problematic because
.+
will match 1+ of any character before matching closing)
and closing]
. If you have(...) [...]
substring in your log like this:Then you will get incorrect results. Incorrect results will also show up with the pattern as
'([^()]*) \[[^][]*]'
.Since you are using
access.log
where format and positions of fields are fixed it is much safer and efficient to useawk
for this extraction like this: