可以使用 Perl 的 grep 和 regex 返回捕获

发布于 2024-12-20 03:30:05 字数 335 浏览 2 评论 0原文

是否可以使用 Perl 的 grep 函数仅返回正则表达式的捕获部分？我有如下代码：

use LWP::Simple;
my $examples_content = get('http://example.com/javascript/reports/examples/');
my @hrefs = grep(/href="(.*)"/, split("\n", $examples_content));
print $hrefs[0];

打印的内容是：

独立的单问题图表

当我想要时： simple_chart.html

原文

Is it possible to return just the captured portion of a regex using Perl's grep function? I have code such as the following:

use LWP::Simple;
my $examples_content = get('http://example.com/javascript/reports/examples/');
my @hrefs = grep(/href="(.*)"/, split("\n", $examples_content));
print $hrefs[0];

What gets printed though is:

Stand-alone single-question charts

When I'd like just: simple_chart.html

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅沫记忆 2024-12-27 03:30:05

你为什么使用grep？这可能会做你想要的：

my @hrefs = $examples_content =~ /href="(.*?)"/g

Why are you using grep? This might do what you want:

my @hrefs = $examples_content =~ /href="(.*?)"/g

回复收藏 0 原文

薄凉少年不暖心 2024-12-27 03:30:05

有人已经在评论中提到了这一点，但是如果您正在处理 HTML，我有一个提取链接的模块。如果您不介意依赖 HTML::Parser，那么它并不是一个糟糕的小工具：

    use HTML::SimpleLinkExtor;

    my $extor = HTML::SimpleLinkExtor->new;
    $extor->parse($html);

    @a_hrefs     = $extor->a;    # by tag
    @hrefs       = $extor->href; # by attribute

我主要使用这个模块来完成快速而肮脏的工作。由于它使用真正的 HTML 解析器，因此不会提取误报，例如文本中的类似内容（标签内部）。

大多数其他人已经解决了 map 和 split 的问题，但您也需要小心正则表达式：

 my @hrefs = map {
      / \s href \s* = \s* (['"]) (.*?) \1 /ix ? $2 : ()
     } @lines;

您可以看到不同的引用字符（或根本没有），以及不区分大小写的标签和属性。无论任何规范或标准如何规定，很多东西都会生成混乱的 HTML，并且许多浏览器都支持它。我可能仍然缺少那种模式中的东西。这就是我编写该模块的原因。

Someone already mentioned this in a comment, but if you are dealing with HTML, I have a module that extract links. If you don't mind depending on HTML::Parser, it's not a bad little tool:

    use HTML::SimpleLinkExtor;

    my $extor = HTML::SimpleLinkExtor->new;
    $extor->parse($html);

    @a_hrefs     = $extor->a;    # by tag
    @hrefs       = $extor->href; # by attribute

I mostly use this module for quick and dirty jobs. Since it uses a real HTML parser, it won't extract the false positives, such as similar things in the text (inside of tags).

Most other people already addressed the issues with map and split, but you need to be careful with the regexes too:

 my @hrefs = map {
      / \s href \s* = \s* (['"]) (.*?) \1 /ix ? $2 : ()
     } @lines;

You can see different quoting characters (or none at all), and case insensitive tags and attributes. No matter what any spec or standard says, lots of things generate messed up HTML and many browsers support it. I'm probably still missing things in that pattern. That's why I wrote the module.

回复收藏 0 原文

巾帼英雄 2024-12-27 03:30:05

grep 可能不是适合这项工作的工具。尝试只是 $examples_content =~ /href="(.*?)"/g ...不需要先 split 和 ? 修饰符将防止 href=".*" 模式匹配太多。

回复收藏 0 原文

弄潮 2024-12-27 03:30:05

map 可以通过返回或不返回值来轻松模拟 grep：

my @hrefs = map(/href="(.*?)"/g, split("\n", $examples_content));

但我同意 Amadan 和 BRPocock 的观点在这种情况下，删除拆分并仅与源进行匹配会效果更好，但我将其添加为答案，以向您展示如何将 map 用于其他情况。

本着不止一种方法的精神，行：

my @hrefs = $examples_content =~ /href="(.*?)"/g;

也可以写成：

my @hrefs = map /href="(.*?)"/g, $examples_content;

如果您更喜欢顺序[输出变换输入]而不是[输出输入变换] >

map can emulate grep easily by either returning or not returning a value:

my @hrefs = map(/href="(.*?)"/g, split("\n", $examples_content));

but I agree with Amadan and BRPocock that removing the split and just matching against the source in this case will work better, but I added this as an answer to show you how map can be used for other cases.

in the spirit of more than one way to do it, the line:

my @hrefs = $examples_content =~ /href="(.*?)"/g;

could also be written:

my @hrefs = map /href="(.*?)"/g, $examples_content;

if you prefer the order [output transform input] rather than [output input transform]

回复收藏 0 原文

~没有更多了~

关于作者

囚你心

暂无简介

文章

496 人气

关注发私信

友情链接

文江博客

可以使用 Perl 的 grep 和 regex 返回捕获

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

梦里南柯

不将就、

alipaysp_ZRaVhH1Dn

青衫儰鉨ミ守葔

故事未完

梦晓ヶ微光ヅ倾城

友情链接

可以使用 Perl 的 grep 和 regex 返回捕获

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

梦里南柯

不将就、

alipaysp_ZRaVhH1Dn

青衫儰鉨ミ守葔

故事未完

梦晓ヶ微光ヅ倾城

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。