有没有办法在 xml 中转义 CDATA 结束标记?
我想知道是否有任何方法可以在 xml 文档的 CDATA 部分中转义 CDATA 结束标记 (]]>
)。 或者,更一般地说,如果在 CDATA 中使用一些转义序列(但如果存在,我想无论如何,转义开始或结束标记可能才有意义)。
基本上,您是否可以在 CDATA 中嵌入一个开始或结束标记,并告诉解析器不要解释它,而是将其视为另一个字符序列。
也许,如果您发现自己试图这样做,您应该重构您的 xml 结构或代码,但即使在过去 3 年左右的时间里我每天都在使用 xml,而且我从未遇到过这个问题,我想知道这是否可能。 只是出于好奇。
编辑:
除了使用 html 编码...
I was wondering if there is any way to escape a CDATA end token (]]>
) within a CDATA section in an xml document. Or, more generally, if there is some escape sequence for using within a CDATA (but if it exists, I guess it'd probably only make sense to escape begin or end tokens, anyway).
Basically, can you have a begin or end token embedded in a CDATA and tell the parser not to interpret it but to treat it as just another character sequence.
Probably, you should just refactor your xml structure or your code if you find yourself trying to do that, but even though I've been working with xml on a daily basis for the last 3 years or so and I have never had this problem, I was wondering if it was possible. Just out of curiosity.
Edit:
Other than using html encoding...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
您必须将数据分成几部分才能隐藏
]]>
。整个内容如下:
]]>
第一个
具有
]]
。 第二个]]>
具有>
。You have to break your data into pieces to conceal the
]]>
.Here's the whole thing:
<![CDATA[]]]]><![CDATA[>]]>
The first
<![CDATA[]]]]>
has the]]
. The second<![CDATA[>]]>
has the>
.您无法转义 CDATA 结束序列。 XML 规范的产生规则 20 非常清楚:
编辑:此乘积规则的字面意思是“CData 部分可以包含您想要的任何内容,但序列 ']]>'。也不例外。”。
编辑2:同一部分还显示:
换句话说,不可能使用实体引用、标记或任何其他形式的解释语法。 CDATA 节中唯一解析的文本是
]]>
,它终止该节。因此,不可能在 CDATA 部分中转义
]]>
。EDIT3:同一部分还显示:
那么在任何可能出现字符数据的地方都可能存在 CDATA 节,包括代替单个 CDATA 的多个相邻 CDATA 节部分。 这样就可以拆分
]]> 标记并将其两部分放在相邻的 CDATA 部分中。
例如:
应该写成
You cannot escape a CDATA end sequence. Production rule 20 of the XML specification is quite clear:
EDIT: This product rule literally means "A CData section may contain anything you want BUT the sequence ']]>'. No exception.".
EDIT2: The same section also reads:
In other words, it's not possible to use entity reference, markup or any other form of interpreted syntax. The only parsed text inside a CDATA section is
]]>
, and it terminates the section.Hence, it is not possible to escape
]]>
within a CDATA section.EDIT3: The same section also reads:
Then there may be a CDATA section anywhere character data may occur, including multiple adjacent CDATA sections inplace of a single CDATA section. That allows it to be possible to split the
]]>
token and put the two parts of it in adjacent CDATA sections.ex:
should be written as
只需将
]]>
替换为]]]]>
simply replace
]]>
with]]]]><![CDATA[>
您不会转义
]]>
,但可以通过插入]]><
在]]
之后转义>
。 ![CDATA[>
之前,可以将其想象为 C/Java/PHP/Perl 字符串中的\
,但只需要在之前>>
以及]]
之后。顺便说一句,
S.Lott 的答案与此相同,只是措辞不同。
You do not escape the
]]>
but you escape the>
after]]
by inserting]]><![CDATA[
before the>
, think of this just like a\
in C/Java/PHP/Perl string but only needed before a>
and after a]]
.BTW,
S.Lott's answer is the same as this, just worded differently.
S. Lott 的答案是正确的:您不对结束标记进行编码,而是将其分解为多个 CDATA 部分。
如何在现实世界中解决这个问题:使用 XML 编辑器创建将输入内容管理系统的 XML 文档,尝试写一篇关于 CDATA 部分的文章。 在 CDATA 部分中嵌入代码示例的普通技巧在这里会失败。 你可以想象我是如何学到这一点的。
但在大多数情况下,您不会遇到这种情况,原因如下:如果您想将 XML 文档的文本存储为 XML 元素的内容,您可能会使用 DOM 方法,例如
: DOM 相当合理地转义了 < 和 >,这意味着您没有无意中在文档中嵌入了 CDATA 部分。
哦,这很有趣:
这可能是 .NET DOM 的一个特点,但这不会引发异常。 这里抛出异常:
我猜想,幕后发生的事情是 XmlDocument 正在使用 XmlWriter 生成其输出,并且 XmlWriter 在写入时检查格式是否良好。
S. Lott's answer is right: you don't encode the end tag, you break it across multiple CDATA sections.
How to run across this problem in the real world: using an XML editor to create an XML document that will be fed into a content-management system, try to write an article about CDATA sections. Your ordinary trick of embedding code samples in a CDATA section will fail you here. You can imagine how I learned this.
But under most circumstances, you won't encounter this, and here's why: if you want to store (say) the text of an XML document as the content of an XML element, you'll probably use a DOM method, e.g.:
And the DOM quite reasonably escapes the < and the >, which means that you haven't inadvertently embedded a CDATA section in your document.
Oh, and this is interesting:
This is probably an ideosyncrasy of the .NET DOM, but that doesn't throw an exception. The exception gets thrown here:
I'd guess that what's happening under the hood is that the XmlDocument is using an XmlWriter produce its output, and the XmlWriter checks for well-formedness as it writes.
这是
]]> 需要转义的另一种情况。 假设我们需要将一个完全有效的 HTML 文档保存在 XML 文档的 CDATA 块内,并且 HTML 源恰好有它自己的 CDATA 块。 例如:
注释的 CDATA 后缀需要更改为:
因为 XML 解析器不会知道如何处理 javascript 注释块
Here's another case in which
]]>
needs to be escaped. Suppose we need to save a perfectly valid HTML document inside a CDATA block of an XML document and the HTML source happens to have it's own CDATA block. For example:the commented CDATA suffix needs to be changed to:
since an XML parser isn't going to know how to handle javascript comment blocks
在 PHP 中:
'', $string), ']]]]>').'] ]>'
In PHP:
'<![CDATA['.implode(explode(']]>', $string), ']]]]><![CDATA[>').']]>'
PHP 中更简洁的方法:
如果需要,不要忘记使用多字节安全的 str_replace(非 latin1
$string
):A cleaner way in PHP:
Don't forget to use a multibyte-safe str_replace if required (non latin1
$string
):我想补充一点,如果您打破
]]
之间的 CDATA 结束标记]]>
,它也可以工作,如下所示:]
]]>
]>
例如。
但是,在
>
之前打破]]> 是全球公认的惯例,如下所示其他答案在这里。
I'd just like to add that it also works if you break the CDATA end tag
]]>
between the]]
, like this:]
]]><![CDATA[
]>
ex.
However, it is the globally accepted convention to break the
]]>
before the>
as shown in the other answers here.请参阅此结构:
对于内部 CDATA 标记,您必须以
]]]]>
而不是]]>
结束。 就那么简单。See this structure:
For the inner CDATA tag(s) you must close with
]]]]><![CDATA[>
instead of]]>
. Simple as that.