Python 正则表达式从系统日志服务器中提取 FQDN

发布于 2024-10-08 03:20:33 字数 659 浏览 4 评论 0原文

我正在尝试构建一个正则表达式来解析我们的系统日志。我被要求对使用该服务的每台服务器进行说明。我编写了一个简单的正则表达式来提取 FQDN,但它似乎消耗了太多的行...

>>> string = "2010-12-13T00:00:02-05:00 <local3.info> suba1.suba2.example.com named[29959]: client 192.168.11.53#54608: query: subb1.subb2.example.com"
>>> regex = re.compile("\s.*?\.example\.com ")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x896dae0bbf9e6bf0>

# Run findall
>>> regex.findall(string)
[u' <local3.info> suba1.suba2.example.com ', u' client 192.168.11.53#54608: query: subb1.subb2.example.com ']

正如您所见,带有 .* 的 findall 太通用了,正则表达式最终消耗了太多。

I'm trying to build a regex to parse our syslogs. I was asked to account for each server that uses the service. I wrote a simple regex to pull out the FQDN, but it seems to be consuming too much of the line...

>>> string = "2010-12-13T00:00:02-05:00 <local3.info> suba1.suba2.example.com named[29959]: client 192.168.11.53#54608: query: subb1.subb2.example.com"
>>> regex = re.compile("\s.*?\.example\.com ")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x896dae0bbf9e6bf0>

# Run findall
>>> regex.findall(string)
[u' <local3.info> suba1.suba2.example.com ', u' client 192.168.11.53#54608: query: subb1.subb2.example.com ']

As you can see the findall with .* is too generic and the regex ends up consuming to much.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

紧拥背影 2024-10-15 03:20:33

\s 替换为 \b,将 .*? 替换为 \S 即可。

>>> regex = re.compile(r'\b\S*\.example\.com')
>>> regex.findall(string)
[u'suba1.suba2.example.com', u'subb1.subb2.example.com']

Replacing \s with \b and the .*? with \S will do it.

>>> regex = re.compile(r'\b\S*\.example\.com')
>>> regex.findall(string)
[u'suba1.suba2.example.com', u'subb1.subb2.example.com']
新人笑 2024-10-15 03:20:33

正则表达式

r"query: ([\w\.]+)"

将从 [...] 查询中获取结尾,然后您可以使用未命名组查找来仅提供域名。

如果这不是您需要的输出,您能否详细说明所需的输出(作为数据结构。我对此进行了猜测)。

python 代码可能如下所示:

match = re.search(r"query: ([\w.]+)", string, re.IGNORECASE | re.MULTILINE)
if match:
    result = match.group(1)
else:
    result = ""

结果将包含

subb1.subb2.example.com

The regex

r"query: ([\w\.]+)"

would grab the end from [...] query on and then you can use a unnamed group look-up to give you just the domain name.

If this isn't the output you need, can you elaborate on the desired output (as a data structure. I took a guess for this).

The python code might look like:

match = re.search(r"query: ([\w.]+)", string, re.IGNORECASE | re.MULTILINE)
if match:
    result = match.group(1)
else:
    result = ""

result would contain

subb1.subb2.example.com
魂牵梦绕锁你心扉 2024-10-15 03:20:33

尝试使用:

regex = re.compile("\s\S*?\.example\.com ")

Try using:

regex = re.compile("\s\S*?\.example\.com ")
栖竹 2024-10-15 03:20:33
#!/usr/bin/env python

import re

s = """2010-12-13T00:00:02-05:00 <local3.info> 
    suba1.suba2.example.com named[29959]: 
    client 192.168.11.53#54608: query: subb1.subb2.example.com"""

pattern = re.compile("[\S.]+.example.com")

print pattern.findall(s)
# => ['suba1.suba2.example.com', 'subb1.subb2.example.com']
#!/usr/bin/env python

import re

s = """2010-12-13T00:00:02-05:00 <local3.info> 
    suba1.suba2.example.com named[29959]: 
    client 192.168.11.53#54608: query: subb1.subb2.example.com"""

pattern = re.compile("[\S.]+.example.com")

print pattern.findall(s)
# => ['suba1.suba2.example.com', 'subb1.subb2.example.com']
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文