HTML Purifier:转换到
前提
我想使用 HTML Purifier 将 标签转换为
当前设置
在我的配置部分中,我当前正在执行以下操作:
$htmlDef = $this->configuration->getHTMLDefinition(true);
// defining the element to avoid triggering 'Element 'body' is not supported'
$bodyElem = $htmlDef->addElement('body', 'Block', 'Flow', 'Core');
$bodyElem->excludes = array('body' => true);
// add the transformation rule
$htmlDef->info_tag_transform['body'] = new HTMLPurifier_TagTransform_Simple('div');
...以及允许 及其
style
(和 class 和
id
)属性通过配置指令(它们是一个工作的大型列表的一部分,被解析为 HTML.AllowedElements
和 HTML.AllowedAttributes< /代码>)。
我已经关闭了定义缓存。
$config->set('Cache.DefinitionImpl', null);
不幸的是,在此设置中,似乎 HTMLPurifier_TagTransform_Simple
从未调用过其 transform()
方法。
HTML.父级?
我认为罪魁祸首是我的 HTML.Parent
,它被设置为 'div'
因为,很自然地,
不允许子
元素。但是,将 HTML.Parent
设置为 'html'
可以让我:ErrorException:无法使用无法识别的元素作为父元素
添加...
$htmlElem = $htmlDef->addElement('html', 'Block', 'Flow', 'Core');
$htmlElem->excludes = array('html' => true);
...消除了该错误消息,但仍然不会转换标签 - 相反,它被删除了。
添加...
$htmlElem = $htmlDef->addElement('html', 'Block', 'Custom: head?, body', 'Core');
$htmlElem->excludes = array('html' => true);
...也没有任何作用,因为它给我带来了一条错误消息:
ErrorException: Trying to get property of non-object
[...]/library/HTMLPurifier/Strategy/FixNesting.php:237
[...]/library/HTMLPurifier/Strategy/Composite.php:18
[...]/library/HTMLPurifier.php:181
[...]
我现在仍在调整最后一个选项,试图找出我需要提供的确切语法,但如果有人知道如何提供帮助我根据自己过去的经验,如果有任何正确方向的指示,我将不胜感激。
HTML.TidyLevel?
作为我能想象到的唯一另一个罪魁祸首,我的 HTML.TidyLevel
设置为 'heavy'
。我还没有尝试过所有可能的星座,但到目前为止,这没有什么区别。
(因为我只是第二次接触过这个,所以我很难回忆起我已经尝试过哪些星座,以免我将它们列在这里,但由于我缺乏信心,所以我不会错过我所做的事情或误报一些事情不过,当我完成一些专门的测试后,我可能会编辑此部分!)
完整配置
我的配置数据存储在 JSON 中,然后解析为 HTML Purifier。这是文件:(
{
"CSS" : {
"MaxImgLength" : "800px"
},
"Core" : {
"CollectErrors" : true,
"HiddenElements" : {
"script" : true,
"style" : true,
"iframe" : true,
"noframes" : true
},
"RemoveInvalidImg" : false
},
"Filter" : {
"ExtractStyleBlocks" : true
},
"HTML" : {
"MaxImgLength" : 800,
"TidyLevel" : "heavy",
"Doctype" : "XHTML 1.0 Transitional",
"Parent" : "html"
},
"Output" : {
"TidyFormat" : true
},
"Test" : {
"ForceNoIconv" : true
},
"URI" : {
"AllowedSchemes" : {
"http" : true,
"https" : true,
"mailto" : true,
"ftp" : true
},
"DisableExternalResources" : true
}
}
URI.Base
、URI.Munge
和 Cache.SerializerPath
也已设置,但我已在此粘贴中删除它们另外,HTML.Parent
警告:如前所述,通常将其设置为 'div'
。)
Premise
I'd like to use HTML Purifier to transform <body>
tags to <div>
tags, to preserve inline styling on the <body>
element, e.g. <body style="background:color#000000;">Hi there.</body>
would turn to <div style="background:color#000000;">Hi there.</div>
. I'm looking at a combination of a custom tag and a TagTransform
class.
Current setup
In my configuration section, I'm currently doing this:
$htmlDef = $this->configuration->getHTMLDefinition(true);
// defining the element to avoid triggering 'Element 'body' is not supported'
$bodyElem = $htmlDef->addElement('body', 'Block', 'Flow', 'Core');
$bodyElem->excludes = array('body' => true);
// add the transformation rule
$htmlDef->info_tag_transform['body'] = new HTMLPurifier_TagTransform_Simple('div');
...as well as allowing <body>
and its style
(and class
, and id
) attribute via the configuration directives (they're part of a working, large list that's parsed into HTML.AllowedElements
and HTML.AllowedAttributes
).
I've turned definition caching off.
$config->set('Cache.DefinitionImpl', null);
Unfortunately, in this setup, it seems like HTMLPurifier_TagTransform_Simple
never has its transform()
method called.
HTML.Parent?
I presume the culprit is my HTML.Parent
, which is set to 'div'
since, quite naturally, <div>
does not allow a child <body>
element. However, setting HTML.Parent
to 'html'
nets me:
ErrorException: Cannot use unrecognized element as parent
Adding...
$htmlElem = $htmlDef->addElement('html', 'Block', 'Flow', 'Core');
$htmlElem->excludes = array('html' => true);
...gets rid of that error message but still doesn't transform the tag - it's removed instead.
Adding...
$htmlElem = $htmlDef->addElement('html', 'Block', 'Custom: head?, body', 'Core');
$htmlElem->excludes = array('html' => true);
...also does nothing, because it nets me an error message:
ErrorException: Trying to get property of non-object
[...]/library/HTMLPurifier/Strategy/FixNesting.php:237
[...]/library/HTMLPurifier/Strategy/Composite.php:18
[...]/library/HTMLPurifier.php:181
[...]
I'm still tweaking around with the last option now, trying to figure out the exact syntax I need to provide, but if someone knows how to help me based on their own past experience, I'd appreciate any pointers in the right direction.
HTML.TidyLevel?
As the only other culprit I can imagine it being, my HTML.TidyLevel
is set to 'heavy'
. I've yet to try all possible constellations on this, but so far, this is making no difference.
(Since I've only been touching this secondarily, I struggle to recall which constellations I've already tried, lest I would list them here, but as it is I lack confidence I wouldn't miss something I've done or misreport something. I might edit this section later when I've done some dedicated testing, though!)
Full Configuration
My configuration data is stored in JSON and then parsed into HTML Purifier. Here's the file:
{
"CSS" : {
"MaxImgLength" : "800px"
},
"Core" : {
"CollectErrors" : true,
"HiddenElements" : {
"script" : true,
"style" : true,
"iframe" : true,
"noframes" : true
},
"RemoveInvalidImg" : false
},
"Filter" : {
"ExtractStyleBlocks" : true
},
"HTML" : {
"MaxImgLength" : 800,
"TidyLevel" : "heavy",
"Doctype" : "XHTML 1.0 Transitional",
"Parent" : "html"
},
"Output" : {
"TidyFormat" : true
},
"Test" : {
"ForceNoIconv" : true
},
"URI" : {
"AllowedSchemes" : {
"http" : true,
"https" : true,
"mailto" : true,
"ftp" : true
},
"DisableExternalResources" : true
}
}
(URI.Base
, URI.Munge
and Cache.SerializerPath
are also set, but I've removed them in this paste. Also, HTML.Parent
caveat: As mentioned, usually, this is set to 'div'
.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
此代码是您所做的事情不起作用的原因:
您可以使用 %Core.ConvertDocumentToFragment as false 将其关闭;如果代码的其余部分没有错误,那么它应该可以直接从那里开始工作。我不相信你的 bodyElem 定义是必要的。j
This code is the reason why what you're doing doesn't work:
You can turn it off using %Core.ConvertDocumentToFragment as false; if the rest of your code is bugfree, it should work straight from there. I don't believe your bodyElem definition is necessary.j
这样做不是更容易吗:
Wouldn't it be much easier to do: