HTML Purifier:转换到

发布于 2024-08-31 18:37:03 字数 3726 浏览 5 评论 0原文

前提

我想使用 HTML Purifier 标签转换为

标签,用于保留 元素上的内联样式,例如 Hi 将变成
Hi there.。我正在考虑 自定义标签TagTransform< 的组合/代码> 类。

当前设置

在我的配置部分中,我当前正在执行以下操作:

$htmlDef  = $this->configuration->getHTMLDefinition(true);
// defining the element to avoid triggering 'Element 'body' is not supported'
$bodyElem = $htmlDef->addElement('body', 'Block', 'Flow', 'Core');
$bodyElem->excludes = array('body' => true);
// add the transformation rule
$htmlDef->info_tag_transform['body'] = new HTMLPurifier_TagTransform_Simple('div');

...以及允许 及其 style (和 class 和 id)属性通过配置指令(它们是一个工作的大型列表的一部分,被解析为 HTML.AllowedElementsHTML.AllowedAttributes< /代码>)。

我已经关闭了定义缓存。

$config->set('Cache.DefinitionImpl', null);

不幸的是,在此设置中,似乎 HTMLPurifier_TagTransform_Simple 从未调用过其 transform() 方法。

HTML.父级?

我认为罪魁祸首是我的 HTML.Parent,它被设置为 'div' 因为,很自然地,

不允许子 元素。但是,将 HTML.Parent 设置为 'html' 可以让我:

ErrorException:无法使用无法识别的元素作为父元素

添加...

$htmlElem = $htmlDef->addElement('html', 'Block', 'Flow', 'Core');
$htmlElem->excludes = array('html' => true);

...消除了该错误消息,但仍然不会转换标签 - 相反,它被删除了。

添加...

$htmlElem = $htmlDef->addElement('html', 'Block', 'Custom: head?, body', 'Core');
$htmlElem->excludes = array('html' => true);

...也没有任何作用,因为它给我带来了一条错误消息:

ErrorException: Trying to get property of non-object       

[...]/library/HTMLPurifier/Strategy/FixNesting.php:237
[...]/library/HTMLPurifier/Strategy/Composite.php:18
[...]/library/HTMLPurifier.php:181
[...]

我现在仍在调整最后一个选项,试图找出我需要提供的确切语法,但如果有人知道如何提供帮助我根据自己过去的经验,如果有任何正确方向的指示,我将不胜感激。

HTML.TidyLevel?

作为我能想象到的唯一另一个罪魁祸首,我的 HTML.TidyLevel 设置为 'heavy'。我还没有尝试过所有可能的星座,但到目前为止,这没有什么区别。

(因为我只是第二次接触过这个,所以我很难回忆起我已经尝试过哪些星座,以免我将它们列在这里,但由于我缺乏信心,所以我不会错过我所做的事情或误报一些事情不过,当我完成一些专门的测试后,我可能会编辑此部分!)

完整配置

我的配置数据存储在 JSON 中,然后解析为 HTML Purifier。这是文件:(

{
    "CSS" : {
        "MaxImgLength" : "800px"
    },
    "Core" : {
        "CollectErrors" : true,
        "HiddenElements" : {
            "script"   : true,
            "style"    : true,
            "iframe"   : true,
            "noframes" : true
        },
        "RemoveInvalidImg" : false
    },
    "Filter" : {
        "ExtractStyleBlocks" : true
    },
    "HTML" : {
        "MaxImgLength" : 800,
        "TidyLevel"    : "heavy",
        "Doctype"      : "XHTML 1.0 Transitional",
        "Parent"       : "html"
    },
    "Output" : {
        "TidyFormat"   : true
    },
    "Test" : {
        "ForceNoIconv" : true
    },
    "URI" : {
        "AllowedSchemes" : {
            "http"     : true,
            "https"    : true,
            "mailto"   : true,
            "ftp"      : true
        },
        "DisableExternalResources" : true
    }
}

URI.BaseURI.MungeCache.SerializerPath 也已设置,但我已在此粘贴中删除它们另外,HTML.Parent 警告:如前所述,通常将其设置为 'div'。)

Premise

I'd like to use HTML Purifier to transform <body> tags to <div> tags, to preserve inline styling on the <body> element, e.g. <body style="background:color#000000;">Hi there.</body> would turn to <div style="background:color#000000;">Hi there.</div>. I'm looking at a combination of a custom tag and a TagTransform class.

Current setup

In my configuration section, I'm currently doing this:

$htmlDef  = $this->configuration->getHTMLDefinition(true);
// defining the element to avoid triggering 'Element 'body' is not supported'
$bodyElem = $htmlDef->addElement('body', 'Block', 'Flow', 'Core');
$bodyElem->excludes = array('body' => true);
// add the transformation rule
$htmlDef->info_tag_transform['body'] = new HTMLPurifier_TagTransform_Simple('div');

...as well as allowing <body> and its style (and class, and id) attribute via the configuration directives (they're part of a working, large list that's parsed into HTML.AllowedElements and HTML.AllowedAttributes).

I've turned definition caching off.

$config->set('Cache.DefinitionImpl', null);

Unfortunately, in this setup, it seems like HTMLPurifier_TagTransform_Simple never has its transform() method called.

HTML.Parent?

I presume the culprit is my HTML.Parent, which is set to 'div' since, quite naturally, <div> does not allow a child <body> element. However, setting HTML.Parent to 'html' nets me:

ErrorException: Cannot use unrecognized element as parent

Adding...

$htmlElem = $htmlDef->addElement('html', 'Block', 'Flow', 'Core');
$htmlElem->excludes = array('html' => true);

...gets rid of that error message but still doesn't transform the tag - it's removed instead.

Adding...

$htmlElem = $htmlDef->addElement('html', 'Block', 'Custom: head?, body', 'Core');
$htmlElem->excludes = array('html' => true);

...also does nothing, because it nets me an error message:

ErrorException: Trying to get property of non-object       

[...]/library/HTMLPurifier/Strategy/FixNesting.php:237
[...]/library/HTMLPurifier/Strategy/Composite.php:18
[...]/library/HTMLPurifier.php:181
[...]

I'm still tweaking around with the last option now, trying to figure out the exact syntax I need to provide, but if someone knows how to help me based on their own past experience, I'd appreciate any pointers in the right direction.

HTML.TidyLevel?

As the only other culprit I can imagine it being, my HTML.TidyLevel is set to 'heavy'. I've yet to try all possible constellations on this, but so far, this is making no difference.

(Since I've only been touching this secondarily, I struggle to recall which constellations I've already tried, lest I would list them here, but as it is I lack confidence I wouldn't miss something I've done or misreport something. I might edit this section later when I've done some dedicated testing, though!)

Full Configuration

My configuration data is stored in JSON and then parsed into HTML Purifier. Here's the file:

{
    "CSS" : {
        "MaxImgLength" : "800px"
    },
    "Core" : {
        "CollectErrors" : true,
        "HiddenElements" : {
            "script"   : true,
            "style"    : true,
            "iframe"   : true,
            "noframes" : true
        },
        "RemoveInvalidImg" : false
    },
    "Filter" : {
        "ExtractStyleBlocks" : true
    },
    "HTML" : {
        "MaxImgLength" : 800,
        "TidyLevel"    : "heavy",
        "Doctype"      : "XHTML 1.0 Transitional",
        "Parent"       : "html"
    },
    "Output" : {
        "TidyFormat"   : true
    },
    "Test" : {
        "ForceNoIconv" : true
    },
    "URI" : {
        "AllowedSchemes" : {
            "http"     : true,
            "https"    : true,
            "mailto"   : true,
            "ftp"      : true
        },
        "DisableExternalResources" : true
    }
}

(URI.Base, URI.Munge and Cache.SerializerPath are also set, but I've removed them in this paste. Also, HTML.Parent caveat: As mentioned, usually, this is set to 'div'.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你对谁都笑 2024-09-07 18:37:03

此代码是您所做的事情不起作用的原因:

/**
 * Takes a string of HTML (fragment or document) and returns the content
 * @todo Consider making protected
 */
public function extractBody($html) {
    $matches = array();
    $result = preg_match('!<body[^>]*>(.*)</body>!is', $html, $matches);
    if ($result) {
        return $matches[1];
    } else {
        return $html;
    }
}

您可以使用 %Core.ConvertDocumentToFragment as false 将其关闭;如果代码的其余部分没有错误,那么它应该可以直接从那里开始工作。我不相信你的 bodyElem 定义是必要的。j

This code is the reason why what you're doing doesn't work:

/**
 * Takes a string of HTML (fragment or document) and returns the content
 * @todo Consider making protected
 */
public function extractBody($html) {
    $matches = array();
    $result = preg_match('!<body[^>]*>(.*)</body>!is', $html, $matches);
    if ($result) {
        return $matches[1];
    } else {
        return $html;
    }
}

You can turn it off using %Core.ConvertDocumentToFragment as false; if the rest of your code is bugfree, it should work straight from there. I don't believe your bodyElem definition is necessary.j

权谋诡计 2024-09-07 18:37:03

这样做不是更容易吗:

$search = array('<body', 'body>');
$replace = array('<div', 'div>');

$html = '<body style="background:color#000000;">Hi there.</body>';

echo str_replace($search, $replace, $html);

>> '<div style="background:color#000000;">Hi there.</div>';

Wouldn't it be much easier to do:

$search = array('<body', 'body>');
$replace = array('<div', 'div>');

$html = '<body style="background:color#000000;">Hi there.</body>';

echo str_replace($search, $replace, $html);

>> '<div style="background:color#000000;">Hi there.</div>';
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文