可以使用 Perl 的 grep 和 regex 返回捕获
是否可以使用 Perl 的 grep 函数仅返回正则表达式的捕获部分?我有如下代码:
use LWP::Simple;
my $examples_content = get('http://example.com/javascript/reports/examples/');
my @hrefs = grep(/href="(.*)"/, split("\n", $examples_content));
print $hrefs[0];
打印的内容是:
当我想要时: simple_chart.html
Is it possible to return just the captured portion of a regex using Perl's grep function? I have code such as the following:
use LWP::Simple;
my $examples_content = get('http://example.com/javascript/reports/examples/');
my @hrefs = grep(/href="(.*)"/, split("\n", $examples_content));
print $hrefs[0];
What gets printed though is:
When I'd like just: simple_chart.html
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
你为什么使用
grep
?这可能会做你想要的:Why are you using
grep
? This might do what you want:有人已经在评论中提到了这一点,但是如果您正在处理 HTML,我有一个提取链接的模块。如果您不介意依赖 HTML::Parser,那么它并不是一个糟糕的小工具:
我主要使用这个模块来完成快速而肮脏的工作。由于它使用真正的 HTML 解析器,因此不会提取误报,例如文本中的类似内容(标签内部)。
大多数其他人已经解决了
map
和split
的问题,但您也需要小心正则表达式:您可以看到不同的引用字符(或根本没有),以及不区分大小写的标签和属性。无论任何规范或标准如何规定,很多东西都会生成混乱的 HTML,并且许多浏览器都支持它。我可能仍然缺少那种模式中的东西。这就是我编写该模块的原因。
Someone already mentioned this in a comment, but if you are dealing with HTML, I have a module that extract links. If you don't mind depending on HTML::Parser, it's not a bad little tool:
I mostly use this module for quick and dirty jobs. Since it uses a real HTML parser, it won't extract the false positives, such as similar things in the text (inside of tags).
Most other people already addressed the issues with
map
andsplit
, but you need to be careful with the regexes too:You can see different quoting characters (or none at all), and case insensitive tags and attributes. No matter what any spec or standard says, lots of things generate messed up HTML and many browsers support it. I'm probably still missing things in that pattern. That's why I wrote the module.
grep 可能不是适合这项工作的工具。尝试只是
$examples_content =~ /href="(.*?)"/g
...不需要先split
和?
修饰符将防止href=".*"
模式匹配太多。grep
might be the wrong tool for the job. Try just$examples_content =~ /href="(.*?)"/g
… no need tosplit
first, and the?
modifier will keep thehref=".*"
pattern from matching too much.map
可以通过返回或不返回值来轻松模拟grep
:但我同意 Amadan 和 BRPocock 的观点在这种情况下,删除拆分并仅与源进行匹配会效果更好,但我将其添加为答案,以向您展示如何将
map
用于其他情况。本着不止一种方法的精神,行:
也可以写成:
如果您更喜欢顺序[输出变换输入]而不是[输出输入变换] >
map
can emulategrep
easily by either returning or not returning a value:but I agree with Amadan and BRPocock that removing the split and just matching against the source in this case will work better, but I added this as an answer to show you how
map
can be used for other cases.in the spirit of more than one way to do it, the line:
could also be written:
if you prefer the order [output transform input] rather than [output input transform]