innerHTML 将 CDATA 转换为注释

发布于 2024-11-30 01:29:52 字数 437 浏览 3 评论 0原文

我正在尝试使用 javascript 将一些 HTML 插入到页面中,并且我插入的 HTML 包含 CDATA 块。

我发现,在 Firefox 和 Chrome 中,CDATA 正在转换为注释。

HTML 不在我的控制之下,所以我很难避免使用 CDATA。

以下测试用例,当页面上有一个 id 为“test”的 div 时:

document.getElementById('test').innerHTML = '<![CDATA[foo]]> bar'

导致以下 HTML 被附加到“test”div:

<!--[CDATA[foo]]--> bar

有什么方法可以将包含 CDATA 的 HTML 逐字插入到文档中吗? JavaScript?

I'm trying to insert some HTML into a page using javascript, and the HTML I'm inserting contains CDATA blocks.

I'm finding, in Firefox and Chrome, that the CDATA is getting converted to a comment.

The HTML is not under my control, so it's difficult for me to avoid using CDATA.

The following test case, when there is a div on the page with id "test":

document.getElementById('test').innerHTML = '<![CDATA[foo]]> bar'

causes the following HTML to be appeded to the 'test' div:

<!--[CDATA[foo]]--> bar

Is there any way I can insert, verbatim, HTML containing CDATA into a document using javascript?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

爱已欠费 2024-12-07 01:29:52

document.createCDATASection 应该这样做,但真正的你的问题的答案是,虽然 HTML 5 确实有 CDATA 部分 对它们的跨浏览器支持相当不稳定。

编辑

CDATA 部分不在 HTML 4 定义中,因此大多数浏览器无法识别它们。

但它不需要完整的 DOM 解析器。这是一个可以解决该问题的简单词汇解决方案。

function htmlWithCDATASectionsToHtmlWithout(html) {
    var ATTRS = "(?:[^>\"\']|\"[^\"]*\"|\'[^\']*\')*",
        // names of tags with RCDATA or CDATA content.
        SCRIPT = "[sS][cC][rR][iI][pP][tT]",
        STYLE = "[sS][tT][yY][lL][eE]",
        TEXTAREA = "[tT][eE][xX][tT][aA][rR][eE][aA]",
        TITLE = "[tT][iI][tT][lL][eE]",
        XMP = "[xX][mM][pP]",
        SPECIAL_TAG_NAME = [SCRIPT, STYLE, TEXTAREA, TITLE, XMP].join("|"),
        ANY = "[\\s\\S]*?",
        AMP = /&/g,
        LT = /</g,
        GT = />/g;
    return html.replace(new RegExp(
        // Entities and text
        "[^<]+" +
        // Comment
        "|<!--"+ANY+"-->" +
        // Regular tag
        "|<\/?(?!"+SPECIAL_TAG_NAME+")[a-zA-Z]"+ATTRS+">" +
        // Special tags
        "|<\/?"+SCRIPT  +"\\b"+ATTRS+">"+ANY+"<\/"+SCRIPT  +"\\s*>" +
        "|<\/?"+STYLE   +"\\b"+ATTRS+">"+ANY+"<\/"+STYLE   +"\\s*>" +
        "|<\/?"+TEXTAREA+"\\b"+ATTRS+">"+ANY+"<\/"+TEXTAREA+"\\s*>" +
        "|<\/?"+TITLE   +"\\b"+ATTRS+">"+ANY+"<\/"+TITLE   +"\\s*>" +
        "|<\/?"+XMP     +"\\b"+ATTRS+">"+ANY+"<\/"+XMP     +"\\s*>" +
        // CDATA section.  Content in capturing group 1.
        "|<!\\[CDATA\\[("+ANY+")\\]\\]>" +
        // A loose less-than
        "|<", "g"),

        function (token, cdataContent) {
          return "string" === typeof cdataContent
              ? cdataContent.replace(AMP, "&").replace(LT, "<")
                .replace(GT, ">")
              : token === "<"
              ? "<"  // Normalize loose less-thans.
              : token;
        });
}

鉴于

<b>foo</b><![CDATA[<i>bar</i>]]>

它生成

<b>foo</b><i>bar</i>

并给出看起来像 script 内的 CDATA 部分或其他特殊标记或注释的内容,它正确地不会与它混在一起:

<script>/*<![CDATA[*/foo=bar<baz&//]]></script><![CDATA[fish: <><]]>

变成

<script>/*<![CDATA[*/foo=bar<baz&//]]></script>fish: <><

document.createCDATASection should do it, but the real answer to your question is that although HTML 5 does have CDATA sections cross-browser support for them is pretty spotty.

EDIT

The CDATA sections just aren't in the HTML 4 definition, so most browsers won't recognize them.

But it doesn't require a full DOM parser. Here's a simple lexical solution that will fix the problem.

function htmlWithCDATASectionsToHtmlWithout(html) {
    var ATTRS = "(?:[^>\"\']|\"[^\"]*\"|\'[^\']*\')*",
        // names of tags with RCDATA or CDATA content.
        SCRIPT = "[sS][cC][rR][iI][pP][tT]",
        STYLE = "[sS][tT][yY][lL][eE]",
        TEXTAREA = "[tT][eE][xX][tT][aA][rR][eE][aA]",
        TITLE = "[tT][iI][tT][lL][eE]",
        XMP = "[xX][mM][pP]",
        SPECIAL_TAG_NAME = [SCRIPT, STYLE, TEXTAREA, TITLE, XMP].join("|"),
        ANY = "[\\s\\S]*?",
        AMP = /&/g,
        LT = /</g,
        GT = />/g;
    return html.replace(new RegExp(
        // Entities and text
        "[^<]+" +
        // Comment
        "|<!--"+ANY+"-->" +
        // Regular tag
        "|<\/?(?!"+SPECIAL_TAG_NAME+")[a-zA-Z]"+ATTRS+">" +
        // Special tags
        "|<\/?"+SCRIPT  +"\\b"+ATTRS+">"+ANY+"<\/"+SCRIPT  +"\\s*>" +
        "|<\/?"+STYLE   +"\\b"+ATTRS+">"+ANY+"<\/"+STYLE   +"\\s*>" +
        "|<\/?"+TEXTAREA+"\\b"+ATTRS+">"+ANY+"<\/"+TEXTAREA+"\\s*>" +
        "|<\/?"+TITLE   +"\\b"+ATTRS+">"+ANY+"<\/"+TITLE   +"\\s*>" +
        "|<\/?"+XMP     +"\\b"+ATTRS+">"+ANY+"<\/"+XMP     +"\\s*>" +
        // CDATA section.  Content in capturing group 1.
        "|<!\\[CDATA\\[("+ANY+")\\]\\]>" +
        // A loose less-than
        "|<", "g"),

        function (token, cdataContent) {
          return "string" === typeof cdataContent
              ? cdataContent.replace(AMP, "&").replace(LT, "<")
                .replace(GT, ">")
              : token === "<"
              ? "<"  // Normalize loose less-thans.
              : token;
        });
}

Given

<b>foo</b><![CDATA[<i>bar</i>]]>

it produces

<b>foo</b><i>bar</i>

and given something that looks like a CDATA section inside a script or other special tag or comment, it correctly does not muck with it:

<script>/*<![CDATA[*/foo=bar<baz&//]]></script><![CDATA[fish: <><]]>

becomes

<script>/*<![CDATA[*/foo=bar<baz&//]]></script>fish: <><
带上头具痛哭 2024-12-07 01:29:52

您可以尝试使用 innerText 而不是 innerHTML

You could try to use innerText instead of innerHTML.

魔法少女 2024-12-07 01:29:52

我只是使用正则表达式来剥离 CDATA 标签,如下所示:

document.getElementById('test').innerHTML = '<![CDATA[foo]]> bar'.replace(/<!\[CDATA\[(.*)\]\]>/g, "$1")

这会导致“test”具有:

foo bar

这样,CDATA 部分的内容就被保留,而不必担心其中任何内容被注释掉。不幸的是,这可能会破坏文档使用 CDATA 部分的任何要求。

I would just strip the CDATA tags using a regular expression like so:

document.getElementById('test').innerHTML = '<![CDATA[foo]]> bar'.replace(/<!\[CDATA\[(.*)\]\]>/g, "$1")

Which results in 'test' having:

foo bar

That way the content of the CDATA sections is preserved without one having to worry about any of it becoming commented out. Unfortunately, this may break whatever required your documents to use CDATA sections to begin with.

向地狱狂奔 2024-12-07 01:29:52

转换<,>和&像这样的标志:

document.getElementById('test').innerHTML = '<![CDATA[foo]]> bar'

convert <, > and & signs like this:

document.getElementById('test').innerHTML = '<![CDATA[foo]]> bar'
烟织青萝梦 2024-12-07 01:29:52

这是因为 CDATA 转换 <><> ;) 到他们的 html 实体。尝试将实体转换回 <>

您可以在此处阅读更多相关信息。

That is because CDATA converts < and > (< and >) to their html entities. Try to convert the entities back to < and >.

You can read more about it here.

不寐倦长更 2024-12-07 01:29:52

如果您将页面设为 XHTML 而不是 HTML,则 CDATA 的自动注释“功能”可能不会发生。您确实需要跳过 XHTML 所需的各个环节,例如 DOCTYPE 以及其他任何内容。

看起来有点武断,任何依赖于 CDATA 的应用程序都会被破坏,恕我直言,但希望你能让它正常工作。

If you make your page XHTML rather than HTML then the auto-comment "feature" of the CDATA might not happen. You do need to jump through the hoops that XHTML requires, such as a DOCTYPE, and whatever else.

Seems a bit arbitrary, any application that depends on CDATA is broken IMHO, but hopefully you get it working.

怎樣才叫好 2024-12-07 01:29:52

2020年我仍然遇到这个问题:-(
与 OP 的细微差别是:我需要将 XML(不是 html) 注入到 div 中。
不幸的是,应用 @Mike Samuel 的答案将最初的 转换为 <?xml ...
我只需在正则表达式中添加以下子句:"|<\\?[xX][mM][lL]"+ANY+"\\?>"

xml 的完整功能:

function xmlWithCDATASectionsToXmlWithout(xml) {
    var ATTRS = "(?:[^>\"\']|\"[^\"]*\"|\'[^\']*\')*",
        // names of tags with RCDATA or CDATA content.
        SCRIPT = "[sS][cC][rR][iI][pP][tT]",
        STYLE = "[sS][tT][yY][lL][eE]",
        TEXTAREA = "[tT][eE][xX][tT][aA][rR][eE][aA]",
        TITLE = "[tT][iI][tT][lL][eE]",
        XMP = "[xX][mM][pP]",
        SPECIAL_TAG_NAME = [SCRIPT, STYLE, TEXTAREA, TITLE, XMP].join("|"),
        ANY = "[\\s\\S]*?",
        AMP = /&/g,
        LT = /</g,
        GT = />/g;
    return xml.replace(new RegExp(
            // Entities and text
            "[^<]+" +
            // initial XML TAG
            "|<\\?[xX][mM][lL]"+ANY+"\\?>" +
            // Comment
            "|<!--"+ANY+"-->" +
            // Regular tag
            "|<\/?(?!"+SPECIAL_TAG_NAME+")[a-zA-Z]"+ATTRS+">" +
            // Special tags
            "|<\/?"+SCRIPT  +"\\b"+ATTRS+">"+ANY+"<\/"+SCRIPT  +"\\s*>" +
            "|<\/?"+STYLE   +"\\b"+ATTRS+">"+ANY+"<\/"+STYLE   +"\\s*>" +
            "|<\/?"+TEXTAREA+"\\b"+ATTRS+">"+ANY+"<\/"+TEXTAREA+"\\s*>" +
            "|<\/?"+TITLE   +"\\b"+ATTRS+">"+ANY+"<\/"+TITLE   +"\\s*>" +
            "|<\/?"+XMP     +"\\b"+ATTRS+">"+ANY+"<\/"+XMP     +"\\s*>" +
            // CDATA section.  Content in capturing group 1.
            "|<!\\[CDATA\\[("+ANY+")\\]\\]>" +
            // A loose less-than
            "|<", "g"
        ),
        function (token, cdataContent) {
            return "string" === typeof cdataContent
                    ? cdataContent.replace(AMP, "&").replace(LT, "<")
                        .replace(GT, ">")
                    : token === "<"
                    ? "<"  // Normalize loose less-thans.
                    : token;
        }
    );
}

I still encountered this problem in 2020 :-(
The slight difference with OP was: I needed to inject XML (not html) into a div.
Applying @Mike Samuel 's answer unfortunately transformed the initial <?xml ... to <?xml ...
I just had to add following clause in the regex: "|<\\?[xX][mM][lL]"+ANY+"\\?>".

Full completed function for xml:

function xmlWithCDATASectionsToXmlWithout(xml) {
    var ATTRS = "(?:[^>\"\']|\"[^\"]*\"|\'[^\']*\')*",
        // names of tags with RCDATA or CDATA content.
        SCRIPT = "[sS][cC][rR][iI][pP][tT]",
        STYLE = "[sS][tT][yY][lL][eE]",
        TEXTAREA = "[tT][eE][xX][tT][aA][rR][eE][aA]",
        TITLE = "[tT][iI][tT][lL][eE]",
        XMP = "[xX][mM][pP]",
        SPECIAL_TAG_NAME = [SCRIPT, STYLE, TEXTAREA, TITLE, XMP].join("|"),
        ANY = "[\\s\\S]*?",
        AMP = /&/g,
        LT = /</g,
        GT = />/g;
    return xml.replace(new RegExp(
            // Entities and text
            "[^<]+" +
            // initial XML TAG
            "|<\\?[xX][mM][lL]"+ANY+"\\?>" +
            // Comment
            "|<!--"+ANY+"-->" +
            // Regular tag
            "|<\/?(?!"+SPECIAL_TAG_NAME+")[a-zA-Z]"+ATTRS+">" +
            // Special tags
            "|<\/?"+SCRIPT  +"\\b"+ATTRS+">"+ANY+"<\/"+SCRIPT  +"\\s*>" +
            "|<\/?"+STYLE   +"\\b"+ATTRS+">"+ANY+"<\/"+STYLE   +"\\s*>" +
            "|<\/?"+TEXTAREA+"\\b"+ATTRS+">"+ANY+"<\/"+TEXTAREA+"\\s*>" +
            "|<\/?"+TITLE   +"\\b"+ATTRS+">"+ANY+"<\/"+TITLE   +"\\s*>" +
            "|<\/?"+XMP     +"\\b"+ATTRS+">"+ANY+"<\/"+XMP     +"\\s*>" +
            // CDATA section.  Content in capturing group 1.
            "|<!\\[CDATA\\[("+ANY+")\\]\\]>" +
            // A loose less-than
            "|<", "g"
        ),
        function (token, cdataContent) {
            return "string" === typeof cdataContent
                    ? cdataContent.replace(AMP, "&").replace(LT, "<")
                        .replace(GT, ">")
                    : token === "<"
                    ? "<"  // Normalize loose less-thans.
                    : token;
        }
    );
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文