如何使用 Perl 删除 HTML 文件中 p 元素的所有属性？

发布于 2024-12-11 15:24:35 字数 705 浏览 0 评论 0原文

我想使用这个简单的 Perl 命令行删除 HTML 文件中

的所有属性：

$ perl -pe 's/<p[^>]*>/<p>/' input.html

但是，它不会替代

跨越多行，例如

<p 
class="hello">

删除行尾

# command-1
$ perl -pe 's/\n/ /' input.html > input-tmp.html
# command-2
$ perl -pe 's/<p[^>]*>/<p>/g' input-tmp.html > input-final.html

因此，我尝试首先通过执行Questions:

Is There an option in (Perl) regex to try the match across multiplelines? 来
？我可以将上面的两个命令（command-1 和 command-2）合并为一个吗？基本上，第一个命令需要在第二个命令开始之前完成执行。

原文

I'd like to remove all attributes of <p> in an HTML file by using this simple Perl command line:

$ perl -pe 's/<p[^>]*>/<p>/' input.html

However, it won't substitute e.g. <p class="hello"> that spans multiple lines such as

<p 
class="hello">

Thus, I attempted to first remove the end of line by doing

# command-1
$ perl -pe 's/\n/ /' input.html > input-tmp.html
# command-2
$ perl -pe 's/<p[^>]*>/<p>/g' input-tmp.html > input-final.html

Questions:

Is there an option in (Perl) regex to try the match across multiple lines?
Can I combine the two commands above (command-1 and command-2) into one? Basically, the first command needs to complete execution before the second one starts.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一抹淡然 2024-12-18 15:24:35

-p 是

LINE: while (<>) {
   ...
} continue {
   print
      or die "-p destination: $!\n";
}

As you can see $_ 一次只包含一行的缩写，因此该模式不可能匹配跨越多行的内容。您可以使用 -0777 欺骗 Perl 认为整个文件是一行。

perl -0777 -pe's/<p[^>]*>/<p>/g' input.html

命令行选项记录在 perlrun 中。

-p is short for

LINE: while (<>) {
   ...
} continue {
   print
      or die "-p destination: $!\n";
}

As you can see $_ only contains one line at a times, so the pattern can't possibly match something that spans more than one line. You can fool Perl into thinking the whole file is one line using -0777.

perl -0777 -pe's/<p[^>]*>/<p>/g' input.html

Command line options are documented in perlrun.

回复收藏 0 原文

追我者格杀勿论 2024-12-18 15:24:35

如果您编写一个简短的脚本，并将其放入自己的文件中，则可以使用简单的命令行轻松调用它。

改进以下脚本留作练习：

#!/usr/bin/perl

use warnings; use strict;
use HTML::TokeParser::Simple;

run(\@ARGV);

sub run {
    my ($argv, $opt) = @_;

    my $el = shift @$argv;

    for my $src (@$argv) {
        clean_attribs($src, $el, $opt);
    }
}

sub clean_attribs {
    my ($src, $el, $opt) = @_;
    my $el_pat = qr/^$el\z/;

    my $parser = HTML::TokeParser::Simple->new($src, %$opt);

    while (my $token = $parser->get_token) {
        if ($token->is_start_tag($el_pat)) {
            my $tag = $token->get_tag;
            print "<$tag>";
        }
        else {
            print $token->as_is;
        }
    }
}

If you write a short script, and put it in its own file, you can easily invoke it using a simple command line.

Improving the following script is left as an exercise:

#!/usr/bin/perl

use warnings; use strict;
use HTML::TokeParser::Simple;

run(\@ARGV);

sub run {
    my ($argv, $opt) = @_;

    my $el = shift @$argv;

    for my $src (@$argv) {
        clean_attribs($src, $el, $opt);
    }
}

sub clean_attribs {
    my ($src, $el, $opt) = @_;
    my $el_pat = qr/^$el\z/;

    my $parser = HTML::TokeParser::Simple->new($src, %$opt);

    while (my $token = $parser->get_token) {
        if ($token->is_start_tag($el_pat)) {
            my $tag = $token->get_tag;
            print "<$tag>";
        }
        else {
            print $token->as_is;
        }
    }
}

回复收藏 0 原文

羁拥 2024-12-18 15:24:35

perl -pe 'undef $/; s/

]*>/

/g'

回复收藏 0 原文

病女 2024-12-18 15:24:35

$ perl -pe 's/\n/ /; s/<p[^>]*>/<p>/gs;' input.html > input-final.html

$ perl -pe 's/\n/ /; s/<p[^>]*>/<p>/gs;' input.html > input-final.html

回复收藏 0 原文

~没有更多了~

关于作者

向地狱狂奔

暂无简介

0 文章

0 评论

24 人气

关注发私信

醉城メ夜风

文章 0 评论 0

关注

远昼

文章 0 评论 0

关注

平生欢

文章 0 评论 0

关注

微凉

文章 0 评论 0

关注

Honwey

文章 0 评论 0

关注

qq_ikhFfg

文章 0 评论 0

友情链接

文江博客

如何使用 Perl 删除 HTML 文件中 p 元素的所有属性？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

醉城メ夜风

远昼

平生欢

微凉

Honwey

qq_ikhFfg

友情链接

如何使用 Perl 删除 HTML 文件中 p 元素的所有属性？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

醉城メ夜风

远昼

平生欢

微凉

Honwey

qq_ikhFfg

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。