当前位置：文江博客话题详情

有没有办法在 xml 中转义 CDATA 结束标记？

发布于 2024-07-08 17:09:36 字数 348 浏览 9 评论 0原文

我想知道是否有任何方法可以在 xml 文档的 CDATA 部分中转义 CDATA 结束标记 (]]>)。或者，更一般地说，如果在 CDATA 中使用一些转义序列（但如果存在，我想无论如何，转义开始或结束标记可能才有意义）。

基本上，您是否可以在 CDATA 中嵌入一个开始或结束标记，并告诉解析器不要解释它，而是将其视为另一个字符序列。

也许，如果您发现自己试图这样做，您应该重构您的 xml 结构或代码，但即使在过去 3 年左右的时间里我每天都在使用 xml，而且我从未遇到过这个问题，我想知道这是否可能。只是出于好奇。

编辑：

除了使用 html 编码...

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

锦欢 2024-07-15 17:09:36

您必须将数据分成几部分才能隐藏 ]]>。

整个内容如下：

]]>

第一个具有 ]]。第二个 ]]> 具有 >。

回复收藏 0 原文

走过海棠暮 2024-07-15 17:09:36

您无法转义 CDATA 结束序列。 XML 规范的产生规则 20 非常清楚：

[20]    CData      ::=      (Char* - (Char* ']]>' Char*))

编辑：此乘积规则的字面意思是“CData 部分可以包含您想要的任何内容，但序列 ']]>'。也不例外。”。

编辑2：同一部分还显示：

在 CDATA 部分中，只有 CDEnd 字符串被识别为标记，因此左尖括号和 & 符号可能以其文字形式出现；它们不需要（也不能）使用“<”和“&”进行转义。 CDATA 节不能嵌套。

换句话说，不可能使用实体引用、标记或任何其他形式的解释语法。 CDATA 节中唯一解析的文本是 ]]>，它终止该节。

因此，不可能在 CDATA 部分中转义 ]]>。

EDIT3：同一部分还显示：

2.7 CDATA 部分
[定义：CDATA 节可能出现在任何可能出现字符数据的地方；它们用于转义包含字符的文本块，否则这些字符将被识别为标记。 CDATA 部分以字符串“”结束：]

那么在任何可能出现字符数据的地方都可能存在 CDATA 节，包括代替单个 CDATA 的多个相邻 CDATA 节部分。这样就可以拆分 ]]> 标记并将其两部分放在相邻的 CDATA 部分中。

例如：

<![CDATA[Certain tokens like ]]> can be difficult and <invalid>]]>

应该写成

<![CDATA[Certain tokens like ]]]]><![CDATA[> can be difficult and <valid>]]>

You cannot escape a CDATA end sequence. Production rule 20 of the XML specification is quite clear:

[20]    CData      ::=      (Char* - (Char* ']]>' Char*))

EDIT: This product rule literally means "A CData section may contain anything you want BUT the sequence ']]>'. No exception.".

EDIT2: The same section also reads:

Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "<" and "&". CDATA sections cannot nest.

In other words, it's not possible to use entity reference, markup or any other form of interpreted syntax. The only parsed text inside a CDATA section is ]]>, and it terminates the section.

Hence, it is not possible to escape ]]> within a CDATA section.

EDIT3: The same section also reads:

2.7 CDATA Sections
[Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>":]

Then there may be a CDATA section anywhere character data may occur, including multiple adjacent CDATA sections inplace of a single CDATA section. That allows it to be possible to split the ]]> token and put the two parts of it in adjacent CDATA sections.

ex:

<![CDATA[Certain tokens like ]]> can be difficult and <invalid>]]>

should be written as

<![CDATA[Certain tokens like ]]]]><![CDATA[> can be difficult and <valid>]]>

回复收藏 0 原文

倾其所爱 2024-07-15 17:09:36

只需将 ]]> 替换为 ]]]]>

回复收藏 0 原文

小苏打饼 2024-07-15 17:09:36

您不会转义 ]]>，但可以通过插入 ]]><]] 之后转义 >。 ![CDATA[ 在 > 之前，可以将其想象为 C/Java/PHP/Perl 字符串中的 \，但只需要在 之前>> 以及 ]] 之后。

顺便说一句，

S.Lott 的答案与此相同，只是措辞不同。

回复收藏 0 原文

旧梦荧光笔 2024-07-15 17:09:36

S. Lott 的答案是正确的：您不对结束标记进行编码，而是将其分解为多个 CDATA 部分。

如何在现实世界中解决这个问题：使用 XML 编辑器创建将输入内容管理系统的 XML 文档，尝试写一篇关于 CDATA 部分的文章。在 CDATA 部分中嵌入代码示例的普通技巧在这里会失败。你可以想象我是如何学到这一点的。

但在大多数情况下，您不会遇到这种情况，原因如下：如果您想将 XML 文档的文本存储为 XML 元素的内容，您可能会使用 DOM 方法，例如

XmlElement elm = doc.CreateElement("foo");
elm.InnerText = "<[CDATA[[Is this a problem?]]>";

： DOM 相当合理地转义了 < 和 >，这意味着您没有无意中在文档中嵌入了 CDATA 部分。

哦，这很有趣：

XmlDocument doc = new XmlDocument();

XmlElement elm = doc.CreateElement("doc");
doc.AppendChild(elm);

string data = "<![[CDATA[This is an embedded CDATA section]]>";
XmlCDataSection cdata = doc.CreateCDataSection(data);
elm.AppendChild(cdata);

这可能是 .NET DOM 的一个特点，但这不会引发异常。这里抛出异常：

Console.Write(doc.OuterXml);

我猜想，幕后发生的事情是 XmlDocument 正在使用 XmlWriter 生成其输出，并且 XmlWriter 在写入时检查格式是否良好。

S. Lott's answer is right: you don't encode the end tag, you break it across multiple CDATA sections.

How to run across this problem in the real world: using an XML editor to create an XML document that will be fed into a content-management system, try to write an article about CDATA sections. Your ordinary trick of embedding code samples in a CDATA section will fail you here. You can imagine how I learned this.

But under most circumstances, you won't encounter this, and here's why: if you want to store (say) the text of an XML document as the content of an XML element, you'll probably use a DOM method, e.g.:

XmlElement elm = doc.CreateElement("foo");
elm.InnerText = "<[CDATA[[Is this a problem?]]>";

And the DOM quite reasonably escapes the < and the >, which means that you haven't inadvertently embedded a CDATA section in your document.

Oh, and this is interesting:

XmlDocument doc = new XmlDocument();

XmlElement elm = doc.CreateElement("doc");
doc.AppendChild(elm);

string data = "<![[CDATA[This is an embedded CDATA section]]>";
XmlCDataSection cdata = doc.CreateCDataSection(data);
elm.AppendChild(cdata);

This is probably an ideosyncrasy of the .NET DOM, but that doesn't throw an exception. The exception gets thrown here:

Console.Write(doc.OuterXml);

I'd guess that what's happening under the hood is that the XmlDocument is using an XmlWriter produce its output, and the XmlWriter checks for well-formedness as it writes.

回复收藏 0 原文

我最亲爱的 2024-07-15 17:09:36

这是 ]]> 需要转义的另一种情况。假设我们需要将一个完全有效的 HTML 文档保存在 XML 文档的 CDATA 块内，并且 HTML 源恰好有它自己的 CDATA 块。例如：

<htmlSource><![CDATA[ 
    ... html ...
    <script type="text/javascript">
        /* <![CDATA[ */
        -- some working javascript --
        /* ]]> */
    </script>
    ... html ...
]]></htmlSource>

注释的 CDATA 后缀需要更改为：

        /* ]]]]><![CDATA[> *//

因为 XML 解析器不会知道如何处理 javascript 注释块

Here's another case in which ]]> needs to be escaped. Suppose we need to save a perfectly valid HTML document inside a CDATA block of an XML document and the HTML source happens to have it's own CDATA block. For example:

<htmlSource><![CDATA[ 
    ... html ...
    <script type="text/javascript">
        /* <![CDATA[ */
        -- some working javascript --
        /* ]]> */
    </script>
    ... html ...
]]></htmlSource>

the commented CDATA suffix needs to be changed to:

        /* ]]]]><![CDATA[> *//

since an XML parser isn't going to know how to handle javascript comment blocks

回复收藏 0 原文

寄人书 2024-07-15 17:09:36

在 PHP 中： '', $string), ']]]]>').'] ]>'

回复收藏 0 原文

内心激荡 2024-07-15 17:09:36

PHP 中更简洁的方法：

   function safeCData($string)
   {
      return '<![CDATA[' . str_replace(']]>', ']]]]><![CDATA[>', $string) . ']]>';
   }

如果需要，不要忘记使用多字节安全的 str_replace（非 latin1 $string）：

   function mb_str_replace($search, $replace, $subject, &$count = 0)
   {
      if (!is_array($subject))
      {
         $searches = is_array($search) ? array_values($search) : array ($search);
         $replacements = is_array($replace) ? array_values($replace) : array ($replace);
         $replacements = array_pad($replacements, count($searches), '');
         foreach ($searches as $key => $search)
         {
            $parts = mb_split(preg_quote($search), $subject);
            $count += count($parts) - 1;
            $subject = implode($replacements[$key], $parts);
         }
      }
      else
      {
         foreach ($subject as $key => $value)
         {
            $subject[$key] = mb_str_replace($search, $replace, $value, $count);
         }
      }
      return $subject;
   }

A cleaner way in PHP:

   function safeCData($string)
   {
      return '<![CDATA[' . str_replace(']]>', ']]]]><![CDATA[>', $string) . ']]>';
   }

Don't forget to use a multibyte-safe str_replace if required (non latin1 $string):

   function mb_str_replace($search, $replace, $subject, &$count = 0)
   {
      if (!is_array($subject))
      {
         $searches = is_array($search) ? array_values($search) : array ($search);
         $replacements = is_array($replace) ? array_values($replace) : array ($replace);
         $replacements = array_pad($replacements, count($searches), '');
         foreach ($searches as $key => $search)
         {
            $parts = mb_split(preg_quote($search), $subject);
            $count += count($parts) - 1;
            $subject = implode($replacements[$key], $parts);
         }
      }
      else
      {
         foreach ($subject as $key => $value)
         {
            $subject[$key] = mb_str_replace($search, $replace, $value, $count);
         }
      }
      return $subject;
   }

回复收藏 0 原文

旧故 2024-07-15 17:09:36

我想补充一点，如果您打破 ]] 之间的 CDATA 结束标记 ]]>，它也可以工作，如下所示: ] ]]> ]> 例如

。

<![CDATA[Certain tokens like ]]]><![CDATA[]> can be difficult and <valid> but <unconventional>]]>

但是，在 > 之前打破 ]]> 是全球公认的惯例，如下所示其他答案在这里。

<![CDATA[Certain tokens like ]]]]><![CDATA[> can be difficult and <valid> and <conventional>]]>

I'd just like to add that it also works if you break the CDATA end tag ]]> between the ]], like this: ] ]]><![CDATA[ ]>

ex.

<![CDATA[Certain tokens like ]]]><![CDATA[]> can be difficult and <valid> but <unconventional>]]>

However, it is the globally accepted convention to break the ]]> before the > as shown in the other answers here.

<![CDATA[Certain tokens like ]]]]><![CDATA[> can be difficult and <valid> and <conventional>]]>

回复收藏 0 原文

沦落红尘 2024-07-15 17:09:36

请参阅此结构：

<![CDATA[
   <![CDATA[
      <div>Hello World</div>
   ]]]]><![CDATA[>
]]>

对于内部 CDATA 标记，您必须以 ]]]]> 而不是 ]]> 结束。就那么简单。

See this structure:

<![CDATA[
   <![CDATA[
      <div>Hello World</div>
   ]]]]><![CDATA[>
]]>

For the inner CDATA tag(s) you must close with ]]]]><![CDATA[> instead of ]]>. Simple as that.

回复收藏 0 原文

~没有更多了~

关于作者

魂ガ小子

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

有没有办法在 xml 中转义 CDATA 结束标记？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（10）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

有没有办法在 xml 中转义 CDATA 结束标记？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（10）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。