带括号的奇怪 Perl 正则表达式行为
我正在提取一些维基百科标记,并且想要匹配相对(在维基百科上)链接中的 URL。我不想匹配任何包含冒号的 URL(不包括协议冒号),以避免特殊页面等,因此我有以下代码:
while ($body =~ m|<a href="(?<url>/wiki/[^:"]+)|gis) {
my $url = $+{url};
print "$url\n";
}
不幸的是,此代码无法按预期工作。任何包含括号 [ie /wiki/Eon_(geology
)] 的 URL 都会在左括号之前被提前截断,因此该 URL 将匹配为 /wiki/Eon_
。我已经查看了代码一段时间,但我无法弄清楚我做错了什么。任何人都可以提供一些见解吗?
I'm pulling in some Wikipedia markup and I'm wanting to match the URLs in relative (on Wikipedia) links. I don't want to match any URL containing a colon (not counting the protocol colon), to avoid special pages and the like, so I have the following code:
while ($body =~ m|<a href="(?<url>/wiki/[^:"]+)|gis) {
my $url = $+{url};
print "$url\n";
}
unfortunately, this code is not working quite as expected. Any URL that contains a parenthetical [i.e. /wiki/Eon_(geology
)] is getting truncated prematurely just before the opening paren, so that URL would match as /wiki/Eon_
. I've been looking at the code for a bit and I cannot figure out what I'm doing wrong. Can anyone provide some insight?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
只要您的 Perl 足够新以支持这些 RE 功能,此代码就其本身而言没有任何问题。使用 Perl 5.10.1 进行测试。
你使用的是旧的 Perl 吗?
There isn't anything wrong in this code as it stands, so long as your Perl is new enough to support these RE features. Tested with Perl 5.10.1.
Are you using an old Perl?
您没有将 RE 锚定到字符串的末尾。在后面加上一个“。
虽然这是一个问题,但这不是他试图解决的问题。他试图解决的问题是没有任何东西可以匹配方法/主机名(http://en.wiki。 ..) 在 RE 中添加 .*? 会有所帮助。
You didn't anchor the RE to the end of the string. Put a " afterwards.
While that is a problem, it isn't the problem he was trying to solve. The problem he was trying to solve was that there was nothing to match the method/hostname (http://en.wiki...) in the RE. Adding a .*? would help that, before the "(?"