RegexKitLite 不匹配,Perl 支持的检查器是
我在 Cocoa 应用程序中使用 RKL 来解析包装任务中的日志语句。
模式:
(?:.+) \[.+?\] (.+) \[.+?\] logged in (?:.+)
测试数据:
2011-07-11 00:48:19 [INFO] Preparing spawn area: 97
2011-07-11 00:48:19 [INFO] Done (2175837000ns)! For help, type "help" or "?"
2011-07-11 00:48:42 [INFO] mikeyward [/127.0.0.1:59561] logged in with entity id blahblah
我在互联网上尝试过的每个正则表达式测试器都成功匹配第三行并捕获“mikeyward”。
Objective-C 代码:
NSString *loggedInPattern = @"(?:.+) \\[.+?\\] (.+) \\[.+?\\] logged in (?:.+)";
NSArray *captures = [searchString arrayOfCaptureComponentsMatchedByRegex:loggedInPattern];
NSString *username = [captures objectAtIndex:0];
问题: 尽管已检查以确保 searchString 有效并包含示例数据,但 RKL 仍无法匹配该行,更不用说捕获用户名了。在上面的示例中,引发了异常,因为捕获数组返回了零个对象,并且我没有进行错误检查:)
任何帮助理解为什么正则表达式检查器确认匹配并捕获但 RKL 错过了它的帮助将非常感激。
谢谢~
I'm using RKL in a Cocoa app to parse log statements from a wrapped task.
Pattern:
(?:.+) \[.+?\] (.+) \[.+?\] logged in (?:.+)
Test data:
2011-07-11 00:48:19 [INFO] Preparing spawn area: 97
2011-07-11 00:48:19 [INFO] Done (2175837000ns)! For help, type "help" or "?"
2011-07-11 00:48:42 [INFO] mikeyward [/127.0.0.1:59561] logged in with entity id blahblah
Every RegEx tester I've tried on the internet successfully matches the third line and captures 'mikeyward'.
Objective-C code:
NSString *loggedInPattern = @"(?:.+) \\[.+?\\] (.+) \\[.+?\\] logged in (?:.+)";
NSArray *captures = [searchString arrayOfCaptureComponentsMatchedByRegex:loggedInPattern];
NSString *username = [captures objectAtIndex:0];
Problem:
Despite having checked to ensure that searchString is valid and contains the sample data, RKL fails to match the line, let alone capture the username. In the example above, an exception is thrown because the captures array is returned with zero objects and I'm not error-checking :)
Any assistance in understanding why regex checkers confirm the match and capture but RKL misses it would be very much appreciated.
Thanks~
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的匹配器仅进行单行匹配。使用带有选项的版本并将其传递给
RKLMultiline
Your matcher is only doing single line matching. Use the version with options and pass it
RKLMultiline
您的问题可能与此问题有关,也可能只是灾难性回溯。无论哪种情况,我的建议都是相同的:编写正则表达式,这样所有量词都没有重叠的影响范围。例如:
在您的正则表达式中,第一个
(?:.+)
最初会吞噬该行中的所有字符,然后只需要返回其中的大部分字符,以便正则表达式的其余部分可以有一个匹配的机会。另一方面,[ 0-9:-]+
一旦看到不是空格、数字、冒号或连字符的字符,就会停止消耗。如果下一个字符不是
[
,则不再继续,并且整个匹配尝试失败的速度比以前快得多。同样,[AZ]+
不能越过结束的]
,\S+
不能超出下一个空格,并且 [^] ]+在下一个之前停止
]。我没有更改最终的
.+`,因为它已经完成了我们想要的操作,即消耗所有字符,直到下一个换行符或文本末尾。无论如何,这就是我编写正则表达式的方式,但出于好奇,如果您保持正则表达式不变但添加行锚点会发生什么?
它的效率仍然低得可怕,但它可能会造成不工作和工作糟糕之间的区别。 :D
Your problem may be related to this one, or it might just be a case of catastrophic backtracking. My advice would be the same in either case: write the regex so none of the quantifiers have overlapping spans of influence. For example:
In your regex, the first
(?:.+)
initially gobbles up all the characters in the line, only to have to give most of them back so the rest of the regex can have a chance to match.[ 0-9:-]+
, on the other hand, stops consuming as soon as it sees a character that's not a space, a digit, a colon, or a hyphen.If the next character is not
[
it goes no further, and the overall match attempt fails much more quickly than it would have before. Similarly,[A-Z]+
can't blow past the closing]
,\S+
can't overrun the next space, and [^]]+stops before the next
]. I didn't change the final
.+` because it already does what we want it to, i.e., consume all the characters until the next newline or the end of the text.This is how I would have written the regex anyway, but just out of curiosity, what happens if you leave your regex as it is but add line anchors?
It's still hideously inefficient, but it might make the difference between not working and working badly. :D