如何使用 Perl 正则表达式替换 HTML 属性中的多个单词（每个单词都哈希为备用单词）？

发布于 2024-07-30 20:41:33 字数 354 浏览 5 评论 0原文

我正在编写一个 HTML 混淆器，并且我有一个将用户友好名称（id 和类）与混淆名称（如 a、b、c 等）相关联的哈希值。我无法想出一个正则表达式来完成替换类似的内容

<div class="left tall">

如果

<div class="a b">

标签只能接受一个类，则正则表达式将简单地类似于

s/(class|id)="(.*?)"/$1="$hash{$2}"/

我应该如何更正它以考虑引号内的多个类名？该解决方案最好与 Perl 兼容。

原文

I'm writing an HTML obfuscator, and I have a hash correlating user-friendly names (of ids and classes) to obfuscated names (like a,b,c,etc). I'm having trouble coming up with a regexp for accomplishing replacing something like

<div class="left tall">

with

<div class="a b">

If tags could only accept one class, the regexp would simply be something like

s/(class|id)="(.*?)"/$1="$hash{$2}"/

How should I correct this to account for for multiple class names within quotes? Preferrably, the solution should be Perl compatible.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

﹏半生如梦愿梦如真 2024-08-06 20:41:33

首先，您不应该为此使用正则表达式。您试图使用一个正则表达式做太多事情（请参阅您能否提供一些示例来说明为什么使用正则表达式解析 XML 和 HTML 很困难？< /a> 为什么）。您需要的是一个 HTML 解析器。有关使用各种解析器的示例，请参阅您能提供一个使用您最喜欢的解析器解析 HTML 的示例吗？。

看看 HTML::Parser。这是一个可能不完整的实现：

#!/usr/bin/perl

use strict;
use warnings;

use HTML::Parser;

{
    my %map = (
        foo => "f",
        bar => "b",
    );

    sub start {
        my ($tag, $attr) = @_;
        my $attr_string = '';
        for my $key (keys %$attr) {
            if ($key eq 'class') {
                my @classes = split " ", $attr->{$key};
                #FIXME: this should be using //, but
                #it is only availble starting in 5.10
                #so I am using || which will do the
                #wrong thing if the class is 0, so
                #don't use a class of 0 in %map , m'kay
                $attr->{$key} = join " ", 
                    map { $map{$_} || $_ } @classes;
            }
            $attr_string .= qq/ $key="$attr->{$key}"/;
        }

        print "<$tag$attr_string>";
    }
}

sub text {
    print shift;
}

sub end {
    my $tag = shift;
    print "</$tag>";
}

my $p = HTML::Parser->new(
    start_h => [ \&start, "tagname,attr" ],
    text_h  => [ \&text, "dtext" ],
    end_h   => [ \&end, "tagname" ],
);

$p->parse_file(\*DATA);

__DATA__
<html>
    <head>
        <title>foo</title>
    </head>
    <body>
        <span class="foo">Foo!</span> <span class="bar">Bar!</span>
        <span class="foo bar">Foo Bar!</span>
        This should not be touched: class="foo"
    </body>
</html>

You shouldn't be using a regex for this in the first place. You are trying to do too much with one regex (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

Take a look at HTML::Parser. Here is a, probably incomplete, implementation:

#!/usr/bin/perl

use strict;
use warnings;

use HTML::Parser;

{
    my %map = (
        foo => "f",
        bar => "b",
    );

    sub start {
        my ($tag, $attr) = @_;
        my $attr_string = '';
        for my $key (keys %$attr) {
            if ($key eq 'class') {
                my @classes = split " ", $attr->{$key};
                #FIXME: this should be using //, but
                #it is only availble starting in 5.10
                #so I am using || which will do the
                #wrong thing if the class is 0, so
                #don't use a class of 0 in %map , m'kay
                $attr->{$key} = join " ", 
                    map { $map{$_} || $_ } @classes;
            }
            $attr_string .= qq/ $key="$attr->{$key}"/;
        }

        print "<$tag$attr_string>";
    }
}

sub text {
    print shift;
}

sub end {
    my $tag = shift;
    print "</$tag>";
}

my $p = HTML::Parser->new(
    start_h => [ \&start, "tagname,attr" ],
    text_h  => [ \&text, "dtext" ],
    end_h   => [ \&end, "tagname" ],
);

$p->parse_file(\*DATA);

__DATA__
<html>
    <head>
        <title>foo</title>
    </head>
    <body>
        <span class="foo">Foo!</span> <span class="bar">Bar!</span>
        <span class="foo bar">Foo Bar!</span>
        This should not be touched: class="foo"
    </body>
</html>

回复收藏 0 原文

东京女 2024-08-06 20:41:33

我想我会这样做：

s/  
    (class|id)="([^"]+)"
/   
    $1 . '="' . (
        join ' ', map { $hash{$_} } split m!\s+!, $2
    ) . '"'
/ex;

I guess I'd do this:

s/  
    (class|id)="([^"]+)"
/   
    $1 . '="' . (
        join ' ', map { $hash{$_} } split m!\s+!, $2
    ) . '"'
/ex;

回复收藏 0 原文

~没有更多了~

关于作者

深巷少女

暂无简介

0 文章

0 评论

382 人气

关注发私信

书间行客

文章 0 评论 0

关注

我ぃ本無心為│何有愛

文章 0 评论 0

关注

神妖

文章 0 评论 0

关注

undefined

文章 0 评论 0

关注

38169838

文章 0 评论 0

关注

彡翼

文章 0 评论 0

友情链接

文江博客

如何使用 Perl 正则表达式替换 HTML 属性中的多个单词（每个单词都哈希为备用单词）？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

书间行客

我ぃ本無心為│何有愛

神妖

undefined

38169838

彡翼

友情链接

如何使用 Perl 正则表达式替换 HTML 属性中的多个单词（每个单词都哈希为备用单词）？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

书间行客

我ぃ本無心為│何有愛

神妖

undefined

38169838

彡翼

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。