如何使用 PERL、正则表达式仅将文件名（非完整路径）放入 $1

发布于 2024-10-20 10:09:49 字数 616 浏览 7 评论 0原文

我只想保留文件名（而不是完整路径）并将文件名添加到某些 bbcode 中。

这是要转换的 HTML：

<a href=/path/to/full/image.jpg rel=prettyPhoto><img rel=prettyPhoto src=/path/to/thumb/image.jpg /></a>

注意，我不能有 rel="foo" （没有双引号）..

这是我在 PERL 中执行转换的内容：

s/\<a href=(.+?)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gi;

这会将 HTML 转换为：

[box]/path/to/image.jpg[/box]

但这就是我想要的结果：

[box]image.jpg[/box]

HTML 必须保持不变。那么如何更改 PERL 以使 $1 只包含文件名呢？

原文

I want to keep only the filenames (not full paths) and add the filename to some bbcode.

Here is the HTML to be converted:

<a href=/path/to/full/image.jpg rel=prettyPhoto><img rel=prettyPhoto src=/path/to/thumb/image.jpg /></a>

Notice I cannot have rel="foo" (no double quotes)..

Here is what I have in PERL, to perform the conversion:

s/\<a href=(.+?)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gi;

This converts the HTML to:

[box]/path/to/image.jpg[/box]

But this is what I want as a result:

[box]image.jpg[/box]

The HTML must remain the same. So how do I change my PERL so that $1 contains only the filename?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鲸落 2024-10-27 10:09:49

s/\<a href=(?:.*\/)?(.+?)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gi;

(?:.*\/)?

将匹配以 / 结尾的最长部分。最后的 ? 使其成为可选的。

s/\<a href=(?:.*\/)?(.+?)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gi;

(?:.*\/)?

Will match the longest part finishing by a /. The final ? makes this optional.

回复收藏 0 原文

雅心素梦 2024-10-27 10:09:49

我不知道它是否可以处理边缘情况，但我让它工作：

#!/usr/bin/perl

use strict;
use warnings;

my $in = '<a href=/path/to/full/image.jpg rel=prettyPhoto><img rel=prettyPhoto src=/path/to/thumb/image.jpg /></a>';

$in =~ s/\<a href=.*?([^\/]+)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gi;

print $in . "\n";

但是，您不想做类似的事情吗：

#!/usr/bin/perl

use strict;
use warnings;

use HTML::TokeParser;
my $p = HTML::TokeParser->new(\*DATA);

my $token = $p->get_tag("a");
my $token_attribs = $token->[1];
my $bb_code;

if ($token_attribs->{rel} eq 'prettyPhoto') {

  my $url = $token_attribs->{href};
  my @split_path = split(m'/', $url);

  $bb_code = '[box]' . $split_path[-1] . '[/box]';
}

print $bb_code . "\n";
__DATA__
<a href=/path/to/full/image.jpg rel=prettyPhoto><img rel=prettyPhoto src=/path/to/thumb/image.jpg /></a>

使用 HTML 解析器（如 HTML::TokeParser，文档中有示例）为您找到网址？比手动调整 HTML 好得多。

I don't know if it handles fringe cases, but I got this to work:

#!/usr/bin/perl

use strict;
use warnings;

my $in = '<a href=/path/to/full/image.jpg rel=prettyPhoto><img rel=prettyPhoto src=/path/to/thumb/image.jpg /></a>';

$in =~ s/\<a href=.*?([^\/]+)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gi;

print $in . "\n";

However, wouldn't you rather do something like:

#!/usr/bin/perl

use strict;
use warnings;

use HTML::TokeParser;
my $p = HTML::TokeParser->new(\*DATA);

my $token = $p->get_tag("a");
my $token_attribs = $token->[1];
my $bb_code;

if ($token_attribs->{rel} eq 'prettyPhoto') {

  my $url = $token_attribs->{href};
  my @split_path = split(m'/', $url);

  $bb_code = '[box]' . $split_path[-1] . '[/box]';
}

print $bb_code . "\n";
__DATA__
<a href=/path/to/full/image.jpg rel=prettyPhoto><img rel=prettyPhoto src=/path/to/thumb/image.jpg /></a>

using an HTML parser (like HTML::TokeParser, which has examples in the documentation) to find the url for you? Much better than relying on regexing the HTML by hand.

回复收藏 0 原文

琉璃繁缕 2024-10-27 10:09:49

我建议您使用正确的工具来完成这项工作，例如：

use HTML::PullParser;
use URI;

die '' . $! || $@ 
    unless my $p = HTML::PullParser->new(
      doc         =>  $doc_handle
    , start       => 'tag, attr'
    , report_tags => ['a']
    );

my @file_names;
while ( my $t = $p->get_token ) { 
    next unless $t    and my ( $tag_name, $attr ) = @$t;
    next unless $attr and my $href = $attr->{href};
    next unless my $uri = URI->new( $attr->{href} );
    next unless my $path = $uri->path;
    push @file_names, substr( $path, rindex( $path, '/' ) + 1 );
    # or it's safe to use a regex here:
    # push @file_names, $path =~ m{([^/]+)$};
}

Data::Dumper->Dump( [ \@file_names ], [ '*file_names' ] );

朋友不允许朋友使用正则表达式解析 HTML。

I suggest you use the right tools for the job, like these:

use HTML::PullParser;
use URI;

die '' . $! || $@ 
    unless my $p = HTML::PullParser->new(
      doc         =>  $doc_handle
    , start       => 'tag, attr'
    , report_tags => ['a']
    );

my @file_names;
while ( my $t = $p->get_token ) { 
    next unless $t    and my ( $tag_name, $attr ) = @$t;
    next unless $attr and my $href = $attr->{href};
    next unless my $uri = URI->new( $attr->{href} );
    next unless my $path = $uri->path;
    push @file_names, substr( $path, rindex( $path, '/' ) + 1 );
    # or it's safe to use a regex here:
    # push @file_names, $path =~ m{([^/]+)$};
}

Data::Dumper->Dump( [ \@file_names ], [ '*file_names' ] );

Friends don't let friends parse HTML with regexes.

回复收藏 0 原文

度的依靠╰つ 2024-10-27 10:09:49

不要捕捉整个事情。将非捕获组与 (?:...) 一起使用。这样你就可以进一步细分你匹配的部分和你捕获的部分。

回复收藏 0 原文

恍梦境° 2024-10-27 10:09:49

这显然在正则表达式中不起作用，但您可以运行 split 对 $1 执行函数并获取结果数组的最后一个元素。

回复收藏 0 原文

南渊 2024-10-27 10:09:49

怎么样：

s/\<a href=.*\/(.+?)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gsi;

What about:

s/\<a href=.*\/(.+?)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gsi;

回复收藏 0 原文

~没有更多了~

关于作者

穿透光

暂无简介

文章

27 人气

关注发私信

燃烧我的卡路李先生

文章 0 评论 0

关注

qq_2gSKZM

文章 0 评论 0

关注

∞梦里开花

文章 0 评论 0

关注

qq_IklFPL

文章 0 评论 0

关注

迷途知返

文章 0 评论 0

关注

深海不蓝

文章 0 评论 0

友情链接

文江博客

如何使用 PERL、正则表达式仅将文件名（非完整路径）放入 $1

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

燃烧我的卡路李先生

qq_2gSKZM

∞梦里开花

qq_IklFPL

迷途知返

深海不蓝

友情链接

如何使用 PERL、正则表达式仅将文件名（非完整路径）放入 $1

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

燃烧我的卡路李先生

qq_2gSKZM

∞梦里开花

qq_IklFPL

迷途知返

深海不蓝

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。