当前位置：文江博客话题详情

Perl html-parsing perl-module

Perl HTML::去除白名单

发布于 2024-12-03 03:49:46 字数 1079 浏览 0 评论 0原文

有没有办法为模块提供白名单以保留某些标签？

现在标记如下，

<div><b>test</b></div>

用此代码剥离

my $hs = HTML::Strip->new();

open FILE, "<test.markup";
$raw_html=<FILE>;
my $clean_text = $hs->parse( $raw_html );
$hs->eof;

生成下面的输出

test

但是我想使用 标记获得下面的白名单输出。

<b>test</b>

编辑，一种解决方案

使用HTML： :StripScripts::解析器

my $hss = HTML::StripScripts::Parser->new(
     {
         Context => 'Inline',
         EscapeFiltered  => 0,
         BanAllBut       => [qw(i b u)],
     },
     strict_comment => 0,
     strict_names   => 0,
);

$hss->filter_html("<div><b>test</b></div>");
$cooked = $hss->filtered_document;
$cooked =~ s/<!--filtered-->//g;
print $cooked; // <b>test</b>

Is there a way to give a whitelist to the module that it would preserve certain tags?

Now markup as below

<div><b>test</b></div>

Stripped with this code

my $hs = HTML::Strip->new();

open FILE, "<test.markup";
$raw_html=<FILE>;
my $clean_text = $hs->parse( $raw_html );
$hs->eof;

Produces output below

test

However I would like to get with <b> tag whitelisted output below.

<b>test</b>

EDIT, ONE SOLUTION

Using HTML::StripScripts::Parser

my $hss = HTML::StripScripts::Parser->new(
     {
         Context => 'Inline',
         EscapeFiltered  => 0,
         BanAllBut       => [qw(i b u)],
     },
     strict_comment => 0,
     strict_names   => 0,
);

$hss->filter_html("<div><b>test</b></div>");
$cooked = $hss->filtered_document;
$cooked =~ s/<!--filtered-->//g;
print $cooked; // <b>test</b>

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

许仙没带伞 2024-12-10 03:49:46

读取 Perl 包装器和底层 XS 代码，发现没有白名单功能。

可以添加，但不是 100% 微不足道 - 代码已经检查了“strip”标签的标签名称，例如

作为另一种方法，O'Reilly 的 RegEx 书籍提供了一个可以去除 HTML 标签（包括白名单功能）的正则表达式配方。

如果您不想弄乱正则表达式，请尝试 HTML:: StripScripts::Parser - 似乎它使用白名单

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

0 文章

0 评论

23 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

已经忘了多久

文章 0 评论 0

15867725375

文章 0 评论 0

LonelySnow

文章 0 评论 0

走过海棠暮

文章 0 评论 0

轻许诺言

文章 0 评论 0

信馬由缰

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文