有没有办法在 xml 中转义 CDATA 结束标记?

发布于 2024-07-08 17:09:36 字数 348 浏览 9 评论 0原文

我想知道是否有任何方法可以在 xml 文档的 CDATA 部分中转义 CDATA 结束标记 (]]>)。 或者,更一般地说,如果在 CDATA 中使用一些转义序列(但如果存在,我想无论如何,转义开始或结束标记可能才有意义)。

基本上,您是否可以在 CDATA 中嵌入一个开始或结束标记,并告诉解析器不要解释它,而是将其视为另一个字符序列。

也许,如果您发现自己试图这样做,您应该重构您的 xml 结构或代码,但即使在过去 3 年左右的时间里我每天都在使用 xml,而且我从未遇到过这个问题,我想知道这是否可能。 只是出于好奇。

编辑:

除了使用 html 编码...

I was wondering if there is any way to escape a CDATA end token (]]>) within a CDATA section in an xml document. Or, more generally, if there is some escape sequence for using within a CDATA (but if it exists, I guess it'd probably only make sense to escape begin or end tokens, anyway).

Basically, can you have a begin or end token embedded in a CDATA and tell the parser not to interpret it but to treat it as just another character sequence.

Probably, you should just refactor your xml structure or your code if you find yourself trying to do that, but even though I've been working with xml on a daily basis for the last 3 years or so and I have never had this problem, I was wondering if it was possible. Just out of curiosity.

Edit:

Other than using html encoding...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

锦欢 2024-07-15 17:09:36

您必须将数据分成几部分才能隐藏 ]]>

整个内容如下:

]]>

第一个 具有 ]]。 第二个 ]]> 具有 >

You have to break your data into pieces to conceal the ]]>.

Here's the whole thing:

<![CDATA[]]]]><![CDATA[>]]>

The first <![CDATA[]]]]> has the ]]. The second <![CDATA[>]]> has the >.

走过海棠暮 2024-07-15 17:09:36

您无法转义 CDATA 结束序列。 XML 规范的产生规则 20 非常清楚:

[20]    CData      ::=      (Char* - (Char* ']]>' Char*))

编辑:此乘积规则的字面意思是“CData 部分可以包含您想要的任何内容,但序列 ']]>'。也不例外。”。

编辑2:同一部分还显示:

在 CDATA 部分中,只有 CDEnd 字符串被识别为标记,因此左尖括号和 & 符号可能以其文字形式出现; 它们不需要(也不能)使用“<”和“&”进行转义。 CDATA 节不能嵌套。

换句话说,不可能使用实体引用、标记或任何其他形式的解释语法。 CDATA 节中唯一解析的文本是 ]]>,它终止该节。

因此,不可能在 CDATA 部分中转义 ]]>

EDIT3:同一部分还显示:

2.7 CDATA 部分

[定义:CDATA 节可能出现在任何可能出现字符数据的地方; 它们用于转义包含字符的文本块,否则这些字符将被识别为标记。 CDATA 部分以字符串“”结束:]

那么在任何可能出现字符数据的地方都可能存在 CDATA 节,包括代替单个 CDATA 的多个相邻 CDATA 节部分。 这样就可以拆分 ]]> 标记并将其两部分放在相邻的 CDATA 部分中。

例如:

<![CDATA[Certain tokens like ]]> can be difficult and <invalid>]]> 

应该写成

<![CDATA[Certain tokens like ]]]]><![CDATA[> can be difficult and <valid>]]> 

You cannot escape a CDATA end sequence. Production rule 20 of the XML specification is quite clear:

[20]    CData      ::=      (Char* - (Char* ']]>' Char*))

EDIT: This product rule literally means "A CData section may contain anything you want BUT the sequence ']]>'. No exception.".

EDIT2: The same section also reads:

Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "<" and "&". CDATA sections cannot nest.

In other words, it's not possible to use entity reference, markup or any other form of interpreted syntax. The only parsed text inside a CDATA section is ]]>, and it terminates the section.

Hence, it is not possible to escape ]]> within a CDATA section.

EDIT3: The same section also reads:

2.7 CDATA Sections

[Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>":]

Then there may be a CDATA section anywhere character data may occur, including multiple adjacent CDATA sections inplace of a single CDATA section. That allows it to be possible to split the ]]> token and put the two parts of it in adjacent CDATA sections.

ex:

<![CDATA[Certain tokens like ]]> can be difficult and <invalid>]]> 

should be written as

<![CDATA[Certain tokens like ]]]]><![CDATA[> can be difficult and <valid>]]> 
倾其所爱 2024-07-15 17:09:36

只需将 ]]> 替换为 ]]]]>

simply replace ]]> with ]]]]><![CDATA[>

小苏打饼 2024-07-15 17:09:36

您不会转义 ]]>,但可以通过插入 ]]><]] 之后转义 >。 ![CDATA[> 之前,可以将其想象为 C/Java/PHP/Perl 字符串中的 \,但只需要在 之前>> 以及 ]] 之后。

顺便说一句,

S.Lott 的答案与此相同,只是措辞不同。

You do not escape the ]]> but you escape the > after ]] by inserting ]]><![CDATA[ before the >, think of this just like a \ in C/Java/PHP/Perl string but only needed before a > and after a ]].

BTW,

S.Lott's answer is the same as this, just worded differently.

旧梦荧光笔 2024-07-15 17:09:36

S. Lott 的答案是正确的:您不对结束标记进行编码,而是将其分解为多个 CDATA 部分。

如何在现实世界中解决这个问题:使用 XML 编辑器创建将输入内容管理系统的 XML 文档,尝试写一篇关于 CDATA 部分的文章。 在 CDATA 部分中嵌入代码示例的普通技巧在这里会失败。 你可以想象我是如何学到这一点的。

但在大多数情况下,您不会遇到这种情况,原因如下:如果您想将 XML 文档的文本存储为 XML 元素的内容,您可能会使用 DOM 方法,例如

XmlElement elm = doc.CreateElement("foo");
elm.InnerText = "<[CDATA[[Is this a problem?]]>";

: DOM 相当合理地转义了 < 和 >,这意味着您没有无意中在文档中嵌入了 CDATA 部分。

哦,这很有趣:

XmlDocument doc = new XmlDocument();

XmlElement elm = doc.CreateElement("doc");
doc.AppendChild(elm);

string data = "<![[CDATA[This is an embedded CDATA section]]>";
XmlCDataSection cdata = doc.CreateCDataSection(data);
elm.AppendChild(cdata);

这可能是 .NET DOM 的一个特点,但这不会引发异常。 这里抛出异常:

Console.Write(doc.OuterXml);

我猜想,幕后发生的事情是 XmlDocument 正在使用 XmlWriter 生成其输出,并且 XmlWriter 在写入时检查格式是否良好。

S. Lott's answer is right: you don't encode the end tag, you break it across multiple CDATA sections.

How to run across this problem in the real world: using an XML editor to create an XML document that will be fed into a content-management system, try to write an article about CDATA sections. Your ordinary trick of embedding code samples in a CDATA section will fail you here. You can imagine how I learned this.

But under most circumstances, you won't encounter this, and here's why: if you want to store (say) the text of an XML document as the content of an XML element, you'll probably use a DOM method, e.g.:

XmlElement elm = doc.CreateElement("foo");
elm.InnerText = "<[CDATA[[Is this a problem?]]>";

And the DOM quite reasonably escapes the < and the >, which means that you haven't inadvertently embedded a CDATA section in your document.

Oh, and this is interesting:

XmlDocument doc = new XmlDocument();

XmlElement elm = doc.CreateElement("doc");
doc.AppendChild(elm);

string data = "<![[CDATA[This is an embedded CDATA section]]>";
XmlCDataSection cdata = doc.CreateCDataSection(data);
elm.AppendChild(cdata);

This is probably an ideosyncrasy of the .NET DOM, but that doesn't throw an exception. The exception gets thrown here:

Console.Write(doc.OuterXml);

I'd guess that what's happening under the hood is that the XmlDocument is using an XmlWriter produce its output, and the XmlWriter checks for well-formedness as it writes.

我最亲爱的 2024-07-15 17:09:36

这是 ]]> 需要转义的另一种情况。 假设我们需要将一个完全有效的 HTML 文档保存在 XML 文档的 CDATA 块内,并且 HTML 源恰好有它自己的 CDATA 块。 例如:

<htmlSource><![CDATA[ 
    ... html ...
    <script type="text/javascript">
        /* <![CDATA[ */
        -- some working javascript --
        /* ]]> */
    </script>
    ... html ...
]]></htmlSource>

注释的 CDATA 后缀需要更改为:

        /* ]]]]><![CDATA[> *//

因为 XML 解析器不会知道如何处理 javascript 注释块

Here's another case in which ]]> needs to be escaped. Suppose we need to save a perfectly valid HTML document inside a CDATA block of an XML document and the HTML source happens to have it's own CDATA block. For example:

<htmlSource><![CDATA[ 
    ... html ...
    <script type="text/javascript">
        /* <![CDATA[ */
        -- some working javascript --
        /* ]]> */
    </script>
    ... html ...
]]></htmlSource>

the commented CDATA suffix needs to be changed to:

        /* ]]]]><![CDATA[> *//

since an XML parser isn't going to know how to handle javascript comment blocks

寄人书 2024-07-15 17:09:36

在 PHP 中: '', $string), ']]]]>').'] ]>'

In PHP: '<![CDATA['.implode(explode(']]>', $string), ']]]]><![CDATA[>').']]>'

内心激荡 2024-07-15 17:09:36

PHP 中更简洁的方法:

   function safeCData($string)
   {
      return '<![CDATA[' . str_replace(']]>', ']]]]><![CDATA[>', $string) . ']]>';
   }

如果需要,不要忘记使用多字节安全的 str_replace(非 latin1 $string):

   function mb_str_replace($search, $replace, $subject, &$count = 0)
   {
      if (!is_array($subject))
      {
         $searches = is_array($search) ? array_values($search) : array ($search);
         $replacements = is_array($replace) ? array_values($replace) : array ($replace);
         $replacements = array_pad($replacements, count($searches), '');
         foreach ($searches as $key => $search)
         {
            $parts = mb_split(preg_quote($search), $subject);
            $count += count($parts) - 1;
            $subject = implode($replacements[$key], $parts);
         }
      }
      else
      {
         foreach ($subject as $key => $value)
         {
            $subject[$key] = mb_str_replace($search, $replace, $value, $count);
         }
      }
      return $subject;
   }

A cleaner way in PHP:

   function safeCData($string)
   {
      return '<![CDATA[' . str_replace(']]>', ']]]]><![CDATA[>', $string) . ']]>';
   }

Don't forget to use a multibyte-safe str_replace if required (non latin1 $string):

   function mb_str_replace($search, $replace, $subject, &$count = 0)
   {
      if (!is_array($subject))
      {
         $searches = is_array($search) ? array_values($search) : array ($search);
         $replacements = is_array($replace) ? array_values($replace) : array ($replace);
         $replacements = array_pad($replacements, count($searches), '');
         foreach ($searches as $key => $search)
         {
            $parts = mb_split(preg_quote($search), $subject);
            $count += count($parts) - 1;
            $subject = implode($replacements[$key], $parts);
         }
      }
      else
      {
         foreach ($subject as $key => $value)
         {
            $subject[$key] = mb_str_replace($search, $replace, $value, $count);
         }
      }
      return $subject;
   }
旧故 2024-07-15 17:09:36

我想补充一点,如果您打破 ]] 之间的 CDATA 结束标记 ]]>,它也可以工作,如下所示: ] ]]> ]> 例如

<![CDATA[Certain tokens like ]]]><![CDATA[]> can be difficult and <valid> but <unconventional>]]> 

但是,在 > 之前打破 ]]> 是全球公认的惯例,如下所示其他答案在这里。

<![CDATA[Certain tokens like ]]]]><![CDATA[> can be difficult and <valid> and <conventional>]]> 

I'd just like to add that it also works if you break the CDATA end tag ]]> between the ]], like this: ] ]]><![CDATA[ ]>

ex.

<![CDATA[Certain tokens like ]]]><![CDATA[]> can be difficult and <valid> but <unconventional>]]> 

However, it is the globally accepted convention to break the ]]> before the > as shown in the other answers here.

<![CDATA[Certain tokens like ]]]]><![CDATA[> can be difficult and <valid> and <conventional>]]> 
沦落红尘 2024-07-15 17:09:36

请参阅此结构:

<![CDATA[
   <![CDATA[
      <div>Hello World</div>
   ]]]]><![CDATA[>
]]>

对于内部 CDATA 标记,您必须以 ]]]]> 而不是 ]]> 结束。 就那么简单。

See this structure:

<![CDATA[
   <![CDATA[
      <div>Hello World</div>
   ]]]]><![CDATA[>
]]>

For the inner CDATA tag(s) you must close with ]]]]><![CDATA[> instead of ]]>. Simple as that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文