匹配关键字前包含两个括号表达式的文本
我有一个包含来自游戏服务器的 10,000 多行消息的文件,如下所示:
11.07.23 08:40:16 [INFO] NC:移动违规:来自 yasmp 的 Wolfman98 (-90.8, 64.0, 167.5) 到 (-90.7, 64.0, 167.3) 距离 (0.0, 0.0, 0.2)
11.07.23 10:57:44 [INFO] NC:移动违规:来自 yasmp 的 AKxiZeroDark (-1228.3, 11.2, 1098.7) 到 (-1228.3, 11.2, 1098.7) 距离 (0.0, 0.0, 0.0)
我当前的正则表达式代码是: \d{1,4}\.\d{1}
,它匹配到目前为止所有粗体的内容:
11.07.23 08:40:16 [INFO] NC:移动违规:来自 yasmp 的 Wolfman98(-90.8、64.0、<强>167.5)到(-90.7,64.0, 167.3)距离(0.0、0.0、0.2)
我一直无法找到一种方法来获取仅显示以下内容的部分:
(-1228.3、11.2、1098.7)至(-1228.3、11.2、1098.7)
在“距离”一词之前,并且开头没有时间戳,最终将其替换为如下所示:
11.07.23 08:40:16 [INFO] NC:移动违规:来自 yasmp 的 Wolfman98 (-#, #, #) 到 (-#, #, #) 距离 (0.0, 0.0, 0.2)
11.07.23 10:57:44 [INFO] NC:移动违规:来自 yasmp 的 AKxiZeroDark (-#, #, #) 到 (-#, #, #) 距离 (0.0, 0.0, 0.0)
还有一点额外的信息,数字可以是负数或负数,范围从 1.0 位到 1234.0 位,这就是为什么我需要帮助再次匹配“距离”一词之前的原因。
编辑:或者甚至,如果整个事情没有出现也没关系:
11.07.23 08:40:16 [INFO] NC:移动违规:来自 yasmp 的 Wolfman98 距离(0.0,0.0,0.2)
11.07.23 10:57:44 [INFO] NC:移动违规:来自 yasmp 的 AKxiZeroDark 距离(0.0,0.0,0.0)
I have this file of 10,000+ lines of messages from a game server, like so:
11.07.23 08:40:16 [INFO] NC: Moving violation: wolfman98 from yasmp
(-90.8, 64.0, 167.5) to (-90.7, 64.0, 167.3) distance (0.0, 0.0, 0.2)11.07.23 10:57:44 [INFO] NC: Moving violation: AKxiZeroDark from yasmp
(-1228.3, 11.2, 1098.7) to (-1228.3, 11.2, 1098.7) distance (0.0, 0.0,
0.0)
The current regex code I have is: \d{1,4}\.\d{1}
, which matches so far everything in bold:
11.07.23 08:40:16 [INFO] NC: Moving violation: wolfman98 from yasmp (-90.8, 64.0, 167.5) to (-90.7, 64.0, 167.3) distance (0.0, 0.0, 0.2)
I've been having trouble finding a way to get the part that only says:
(-1228.3, 11.2, 1098.7) to (-1228.3, 11.2, 1098.7)
before the "distance" word, and without the timestamp in the beginning, and eventually replacing it to end up like this:
11.07.23 08:40:16 [INFO] NC: Moving violation: wolfman98 from yasmp
(-#, #, #) to (-#, #, #) distance (0.0, 0.0, 0.2)11.07.23 10:57:44 [INFO] NC: Moving violation: AKxiZeroDark from yasmp
(-#, #, #) to (-#, #, #) distance (0.0, 0.0, 0.0)
And a bit of extra information, the numbers can be either negative or not, ranging from 1.0 digit to 1234.0 digits, which is why I need help matching before the word "distance" again.
EDIT: Or even, it would be fine if the entire thing didn't show up:
11.07.23 08:40:16 [INFO] NC: Moving violation: wolfman98 from yasmp
distance (0.0, 0.0, 0.2)11.07.23 10:57:44 [INFO] NC: Moving violation: AKxiZeroDark from yasmp
distance (0.0, 0.0, 0.0)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
一个看起来相当毛茸茸的正则表达式可以将您的数字匹配正则表达式扩展为
\((?:-?\d{1,4}\.\d{1}(?:, |\))){3} 到\((?:-?\d{1,4}\.\d{1}(?:, |\))){3}(?= 距离)
。让我们稍微分解一下。它由两组相同的组组成,以匹配括号中的两组数字:
\((?:-?\d{1,4}\.\d{1}(?:, |\ ))){3}
。正则表达式现在允许在数字之前使用可选的-
,这使得数字与-?\d{1,4}\.\d{1}
匹配。每个数字后面有一个逗号或一个括号,因此要迭代数字匹配,我们还需要:(?:, |\))
。然后,整个野兽以\(
为前缀,以获取数字组的左括号。该正则表达式重复两次,以获得与to
匹配的两组数字最后一位是正向预测,以确保我们匹配单词
distance
后面的数字组,该单词不会包含在匹配中,但必须包含在内。在那里让我用过的 正则表达式匹配。非捕获组(
(?: ... )
内容),因为我不知道您想对捕获做什么,我已经使用您的两个示例日志文件行尝试了这一点 。 perl 5.12.2 似乎可以工作。
A fairly hairy looking regex that extends your number matching regex would be
\((?:-?\d{1,4}\.\d{1}(?:, |\))){3} to \((?:-?\d{1,4}\.\d{1}(?:, |\))){3}(?= distance)
. Let's break that down a little.It is made up of two groups that are identical to match the two groups of numbers in parens:
\((?:-?\d{1,4}\.\d{1}(?:, |\))){3}
. The regex now allows an optional-
before the number and which makes the number match-?\d{1,4}\.\d{1}
. After each number there is either a comma or a paren, so to iterate the number match we need that as well:(?:, |\))
. That entire beast is then prefixed with\(
to get the opening paren of the number group. That regex is repeated twice to get the two groups of numbers with theto
match in-between.The final bit is a positive look-ahead to ensure that we are matching the number groups that are followed by the word
distance
. That word will not be included in the match, but will have to be there for the regex to match.I've used non-capturing groups (the
(?: ... )
stuff) because I don't know what you want to do with the captures.I've tried this out against your two example logfile lines using perl 5.12.2 and it seems to work.
您需要从打开序列的
(
开头到)
末尾的距离进行匹配。未经检查的、可能太宽泛的正则表达式可能是:
\([-0-9., ]+\) 到 \([-0-9., ]+\)
但这可能会匹配您不想要的东西。You will want to match from the start of the
(
that opens the sequence, to the end of the)
before distance.A not-checked, may-be-too-broad regexp could be:
\([-0-9., ]+\) to \([-0-9., ]+\)
but that may match things you don't want.匹配您想要的数字(在 PHP 中测试)。
Matches the numbers you want (tested in PHP).
听起来像是 perl 的工作:
用法:
script.pl input.txt > output.txt
或者作为具有更简单的正则表达式的单行代码。只需删除前两个括号,无论它们包含什么:
Sounds like a job for perl:
Usage:
script.pl input.txt > output.txt
Or as a one-liner with simpler regexes. Just remove the first two parens, whatever they contain: