HTML Purifier:根据元素的属性有条件地删除元素
根据 HTML Purifier Smoketest,“格式错误”的 URI 偶尔会被丢弃,留下一个无属性锚标记,例如
...以及偶尔被剥离到协议,例如
虽然这没有问题,但其本身有点难看。我没有尝试用正则表达式去除这些内容,而是希望使用 HTML Purifier 自己的库功能/注入器/插件/whathaveyou。
参考点:处理属性
有条件地删除 HTMLPurifier 中的属性很容易。这里,库提供了带有方法 confiscateAttr()
的类 HTMLPurifier_AttrTransform
。
虽然我个人不使用 confiscateAttr()
的功能,但我确实按照 此线程 将 target="_blank"
添加到所有锚点。
// more configuration stuff up here
$htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true);
$anchor = $htmlDef->addBlankElement('a');
$anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_Target();
// purify down here
当然,HTMLPurifier_AttrTransform_Target
是一个非常简单的类。
class HTMLPurifier_AttrTransform_Target extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context) {
// I could call $this->confiscateAttr() here to throw away an
// undesired attribute
$attr['target'] = '_blank';
return $attr;
}
}
自然,这部分就像魅力一样。
处理元素
也许我对 HTMLPurifier_TagTransform 的关注不够,或者看错了地方,或者通常不理解它,但我似乎找不到办法有条件地删除元素。
说一下,效果如下:
// more configuration stuff up here
$htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true);
$anchor = $htmlDef->addElementHandler('a');
$anchor->elem_transform_post[] = new HTMLPurifier_ElementTransform_Cull();
// add target as per 'point of reference' here
// purify down here
使用 Cull 类扩展具有 confiscateElement()
能力或类似能力的东西,其中我可以检查缺少的 href
属性或内容为 http://
的 href
属性。
HTMLPurifier_Filter
我知道我可以创建一个过滤器,但示例(Youtube.php 和 ExtractStyleBlocks.php)建议我在其中使用正则表达式,我真的宁愿避免,如果可能的话< /em>.我希望有一个板载或准板载解决方案,可以利用 HTML Purifier 出色的解析功能。
不幸的是,在 HTMLPurifier_AttrTransform
的子类中返回 null
并不能解决问题。
任何人都有任何聪明的想法,还是我被正则表达式困住了? :)
As per the HTML Purifier smoketest, 'malformed' URIs are occasionally discarded to leave behind an attribute-less anchor tag, e.g.
<a href="javascript:document.location='http://www.google.com/'">XSS</a>
becomes <a>XSS</a>
...as well as occasionally being stripped down to the protocol, e.g.
<a href="http://1113982867/">XSS</a>
becomes <a href="http:/">XSS</a>
While that's unproblematic, per se, it's a bit ugly. Instead of trying to strip these out with regular expressions, I was hoping to use HTML Purifier's own library capabilities / injectors / plug-ins / whathaveyou.
Point of reference: Handling attributes
Conditionally removing an attribute in HTMLPurifier is easy. Here the library offers the class HTMLPurifier_AttrTransform
with the method confiscateAttr()
.
While I don't personally use the functionality of confiscateAttr()
, I do use an HTMLPurifier_AttrTransform
as per this thread to add target="_blank"
to all anchors.
// more configuration stuff up here
$htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true);
$anchor = $htmlDef->addBlankElement('a');
$anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_Target();
// purify down here
HTMLPurifier_AttrTransform_Target
is a very simple class, of course.
class HTMLPurifier_AttrTransform_Target extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context) {
// I could call $this->confiscateAttr() here to throw away an
// undesired attribute
$attr['target'] = '_blank';
return $attr;
}
}
That part works like a charm, naturally.
Handling elements
Perhaps I'm not squinting hard enough at HTMLPurifier_TagTransform
, or am looking in the wrong place(s), or generally amn't understanding it, but I can't seem to figure out a way to conditionally remove elements.
Say, something to the effect of:
// more configuration stuff up here
$htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true);
$anchor = $htmlDef->addElementHandler('a');
$anchor->elem_transform_post[] = new HTMLPurifier_ElementTransform_Cull();
// add target as per 'point of reference' here
// purify down here
With the Cull class extending something that has a confiscateElement()
ability, or comparable, wherein I could check for a missing href
attribute or a href
attribute with the content http:/
.
HTMLPurifier_Filter
I understand I could create a filter, but the examples (Youtube.php and ExtractStyleBlocks.php) suggest I'd be using regular expressions in that, which I'd really rather avoid, if it is at all possible. I'm hoping for an onboard or quasi-onboard solution that makes use of HTML Purifier's excellent parsing capabilities.
Returning null
in a child-class of HTMLPurifier_AttrTransform
unfortunately doesn't cut it.
Anyone have any smart ideas, or am I stuck with regexes? :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
成功!感谢 另一个问题中的 Ambush Commander 和 mcgrailm,我现在正在使用一个非常简单的解决方案:
它有效,有效,bahahahaHAHAHAHAnhͥͤͫ̀ğͮ͑̆ͦó̓̉ͬ͋h́ͧ̆̈́̉ğ̈́͐̈a̾̈́̑ͨô̔̄̑̇g̀̄h̘̝͊̐ͩͥ̋ ͤ͛g̦̣̙̙̒̀ͥ̐̔ͅo̤̣hg͓̈́͋̇̓́̆a͖̩̯̥͕͂̈̐ͮ̒o̶ͬ̽̀̍ͮ̾ͮ͢Љ̩͉̘͓̙̦̩̹͍̹̕g̵̡͔ ̙͉̱̠̙̩͚͑ͥ̎̓͛̋͗̍̽͋͑̈́̚...! * 狂躁的笑声,咕噜咕噜的声音,脸上带着微笑倒下*
Success! Thanks to Ambush Commander and mcgrailm in another question, I am now using a hilariously simple solution:
It works, it works, bahahahaHAHAHAHAnhͥͤͫ̀ğͮ͑̆ͦó̓̉ͬ͋h́ͧ̆̈́̉ğ̈́͐̈a̾̈́̑ͨô̔̄̑̇g̀̄h̘̝͊̐ͩͥ̋ͤ͛g̦̣̙̙̒̀ͥ̐̔ͅo̤̣hg͓̈́͋̇̓́̆a͖̩̯̥͕͂̈̐ͮ̒o̶ͬ̽̀̍ͮ̾ͮ͢҉̩͉̘͓̙̦̩̹͍̹̠̕g̵̡͔̙͉̱̠̙̩͚͑ͥ̎̓͛̋͗̍̽͋͑̈́̚...! * manic laughter, gurgling noises, keels over with a smile on her face *
事实上,您无法使用 TagTransform 删除元素,这似乎是一个实现细节。删除节点(比标签稍微高级一点)的经典机制是使用注入器。
无论如何,您正在寻找的特定功能已经实现为 %AutoFormat.RemoveEmpty
The fact that you can't remove elements with a TagTransform appears to have been an implementation detail. The classic mechanism for removing nodes (a smidge higher-level than just tags) is to use an Injector though.
Anyway, the particular piece of functionality you're looking for is already implemented as %AutoFormat.RemoveEmpty
为了便于阅读,这是我当前的解决方案。它可以工作,但完全绕过 HTML Purifier。
我仍然更愿意有一个好的 HTML Purifier 解决方案来解决这个问题,因此,请注意,这个答案最终不会被自我接受。但如果最终没有更好的答案,至少它可能会帮助那些有类似问题的人。 :)
For perusal, this is my current solution. It works, but bypasses HTML Purifier entirely.
I'd still much rather have a good HTML Purifier solution to this, so, just as a heads-up, this answer won't end up self-accepted. But in case no better answer ends up coming around, at least it might help those with similar issues. :)