多行正则表达式

发布于 2024-09-17 17:14:53 字数 565 浏览 6 评论 0原文

我正在尝试匹配此文本:

<a href="http://english317.ning.com/profiles/blogs/bad-business-writing-487">Continue</a>
                                      </div>
                <p class="small">

                                                    Added by <a href="/profile/KemberleyRamirez">Kemberley Ramirez</a> on September 2, 2010 at 11:38pm   

我想获取 /blogs 之后的文本(例如“bad-business-writing-487”)以及添加的字符串(学生姓名和提交日期)(例如“ Kemberley Ramirez 于 2010 年 9 月 2 日晚上 11:38")

我正在使用带有 Perl 表达式的 UltraEdit。

I'm trying to match out of this text:

<a href="http://english317.ning.com/profiles/blogs/bad-business-writing-487">Continue</a>
                                      </div>
                <p class="small">

                                                    Added by <a href="/profile/KemberleyRamirez">Kemberley Ramirez</a> on September 2, 2010 at 11:38pm   

I'd like to get the text after /blogs (e.g. "bad-business-writing-487") and also the added by string (Student Name and submit date) (e.g. "Kemberley Ramirez on September 2, 2010 at 11:38pm")

I'm using UltraEdit with Perl expressions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

情绪操控生活 2024-09-24 17:14:53

我不知道你到底想匹配什么,但你最好使用合适的 HTML 解析器:

#!/usr/bin/perl

use strict; use warnings;

use HTML::TokeParser::Simple;

my $parser = HTML::TokeParser::Simple->new(\*DATA);

my $blog_re = qr{^http://english317.ning.com/profiles/blogs/(.+)\z};
my $profile_re = qr{^/profile/(\w+)\z};

while ( my $tag = $parser->get_tag('a') ) {
    next unless my ($href) = $tag->get_attr('href');
    if ( $href =~ $blog_re or $href =~ $profile_re ) {
        print "[$1]\n";
    }
}

__DATA__
<a href="http://english317.ning.com/profiles/blogs/bad-business-writing-487">Continue</a>
                                      </div>
                <p class="small">

                                                    Added by <a href="/profile/KemberleyRamirez">Kemberley Ramirez</a> on September 2, 2010 at 11:38pm

I don't know what exactly you are trying to match, but you are better off using a proper HTML parser:

#!/usr/bin/perl

use strict; use warnings;

use HTML::TokeParser::Simple;

my $parser = HTML::TokeParser::Simple->new(\*DATA);

my $blog_re = qr{^http://english317.ning.com/profiles/blogs/(.+)\z};
my $profile_re = qr{^/profile/(\w+)\z};

while ( my $tag = $parser->get_tag('a') ) {
    next unless my ($href) = $tag->get_attr('href');
    if ( $href =~ $blog_re or $href =~ $profile_re ) {
        print "[$1]\n";
    }
}

__DATA__
<a href="http://english317.ning.com/profiles/blogs/bad-business-writing-487">Continue</a>
                                      </div>
                <p class="small">

                                                    Added by <a href="/profile/KemberleyRamirez">Kemberley Ramirez</a> on September 2, 2010 at 11:38pm
氛圍 2024-09-24 17:14:53

在“点匹配换行符”模式下使用 PowerGrep,我想出了这个:(

(?>profiles/blogs/(.*?)").*?added by(.*?)</a>(.*?2010.*?\d{2}[ap]m)

然后是额外的处理搜索)
<?a.*?>

Using PowerGrep in "dot matches newline" mode, I came up with this:

(?>profiles/blogs/(.*?)").*?added by(.*?)</a>(.*?2010.*?\d{2}[ap]m)

(and then an extra processing search)
<?a.*?>

迷乱花海 2024-09-24 17:14:53

/s 和 /m 修饰符控制如何处理多行。
请参阅 perlretut

您可能想要带有 /s 修饰符的 rrr reg.exps 之类的东西,或者类似的东西这个:(未经测试)

$foo =~ m|blogs/([^"]+).*Added by <[^>]+>([^<]+)</a>|s

使用 m||而不是 // 以避免所有转义..

The /s and /m modifiers control how multiple lines are handled.
see perlretut

You probably want something like rrr reg.exps with the /s modifier, or something like this: (untested)

$foo =~ m|blogs/([^"]+).*Added by <[^>]+>([^<]+)</a>|s

Using m|| instead of // to avoid all the escaping ..

回心转意 2024-09-24 17:14:53

以下应该适用于多行:

.*blogs\/(\S+)".*\(\n.*\)*<a.*>(.*)<\/a>(.*)

Following should work for multiple lines:

.*blogs\/(\S+)".*\(\n.*\)*<a.*>(.*)<\/a>(.*)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文