XML 文档中需要转义哪些字符?
XML 文档中必须转义哪些字符,或者在哪里可以找到这样的列表?
What characters must be escaped in XML documents, or where could I find such a list?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
XML 文档中必须转义哪些字符,或者在哪里可以找到这样的列表?
What characters must be escaped in XML documents, or where could I find such a list?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(10)
如果您使用适当的类或库,他们将为您进行转义。许多 XML 问题是由字符串连接引起的。
XML 转义字符
只有五种:
转义字符取决于特殊字符的使用位置。
这些示例可以在 W3C 标记验证服务 进行验证。
文本
安全的方法是转义文本中的所有五个字符。但是,三个字符
"
、'
和>
不需要在文本中转义:属性
安全的方法是转义所有五个字符但是,属性中的
>
字符不需要转义:如果引号是
"<,则属性中的
'
字符不需要转义。 /code>:同样,如果引号是
'
,则"
不需要在属性中转义:注释
所有五个特殊字符不得在注释中转义:
CDATA
所有五个特殊字符不得在CDATA部分中转义:
处理说明
全部五个特殊字符不得 XML 处理指令中的转义:
XML 与 HTML
HTML 有它自己的一组转义代码,它涵盖了更多的字符。
If you use an appropriate class or library, they will do the escaping for you. Many XML issues are caused by string concatenation.
XML escape characters
There are only five:
Escaping characters depends on where the special character is used.
The examples can be validated at the W3C Markup Validation Service.
Text
The safe way is to escape all five characters in text. However, the three characters
"
,'
and>
needn't be escaped in text:Attributes
The safe way is to escape all five characters in attributes. However, the
>
character needn't be escaped in attributes:The
'
character needn't be escaped in attributes if the quotes are"
:Likewise, the
"
needn't be escaped in attributes if the quotes are'
:Comments
All five special characters must not be escaped in comments:
CDATA
All five special characters must not be escaped in CDATA sections:
Processing instructions
All five special characters must not be escaped in XML processing instructions:
XML vs. HTML
HTML has its own set of escape codes which cover a lot more characters.
对旧的常见问题的新的、简化的答案...
简化的 XML 转义(优先,100% 完成) >
始终 (90% 重要的是要记住)
<
转义为<
,除非<
正在启动
或其他标记。&
转义为&
,除非&
正在启动&entity;
。属性值 < super>(9%重要的是要记住)
attr="
'
单引号'
可以放在双引号内。"
attr='
"
双引号"
可以放在单引号内。'
"
转义为"
,将'
转义为'
。评论,CDATA 和处理说明 < em>(0.9% 重要记住)
不需要转义任何内容,但不允许使用
--
字符串。在 CDATA
]]>
无需转义任何内容,但不允许使用]]>
字符串。在 PI 内
?>
无需转义任何内容,但不允许使用?>
字符串。深奥 (0.1% 重要的是要记住)
]]>
转义为]]>
,除非]]>
结束 CDATA 部分。(此规则适用于一般字符数据 - 即使在 CDATA 部分之外。)
New, simplified answer to an old, commonly asked question...
Simplified XML Escaping (prioritized, 100% complete)
Always (90% important to remember)
<
as<
unless<
is starting a<tag/>
or other markup.&
as&
unless&
is starting an&entity;
.Attribute Values (9% important to remember)
attr="
'
Single quotes'
are ok within double quotes."
attr='
"
Double quotes"
are ok within single quotes.'
"
as"
and'
as'
otherwise.Comments, CDATA, and Processing Instructions (0.9% important to remember)
<!--
Within comments-->
nothing has to be escaped but no--
strings are allowed.<![CDATA[
Within CDATA]]>
nothing has to be escaped, but no]]>
strings are allowed.<?PITarget
Within PIs?>
nothing has to be escaped, but no?>
strings are allowed.Esoterica (0.1% important to remember)
]]>
as]]>
unless]]>
is ending a CDATA section.(This rule applies to character data in general – even outside a CDATA section.)
也许这会有所帮助:
XML 和 HTML 字符实体引用列表:
该文章列出了以下五个预定义的 XML 实体:
Perhaps this will help:
List of XML and HTML character entity references:
That article lists the following five predefined XML entities:
根据万维网联盟(w3C)的规范,有 5 个字符不得以其文字形式出现在 XML 文档中,除非用作标记分隔符或在注释、处理指令或 CDATA 部分中使用。在所有其他情况下,必须根据下表使用相应的实体或数字引用来替换这些字符:
原始字符XML 实体替换XML 数字替换
; < 0; p;
; > p; >& ;
" p; " p; " & ;
& ; & p; & & ;
' p; ' p; ' & ;
请注意,上述实体也可以在 HTML 中使用,但 ' 除外,它是随 XHTML 1.0 引入的,并且未在 HTML 4 中声明。出于这个原因,并确保复古-兼容性,XHTML 规范建议使用' 相反。
According to the specifications of the World Wide Web Consortium (w3C), there are 5 characters that must not appear in their literal form in an XML document, except when used as markup delimiters or within a comment, a processing instruction, or a CDATA section. In all the other cases, these characters must be replaced either using the corresponding entity or the numeric reference according to the following table:
Original CharacterXML entity replacementXML numeric replacement
< < <
> > >
" " "
& & &
' ' '
Notice that the aforementioned entities can be used also in HTML, with the exception of ', that was introduced with XHTML 1.0 and is not declared in HTML 4. For this reason, and to ensure retro-compatibility, the XHTML specification recommends the use of ' instead.
标签和属性的转义字符是不同的。
对于标签:
对于属性:
来自字符数据和标记:
Escaping characters is different for tags and attributes.
For tags:
For attributes:
From Character Data and Markup:
除了众所周知的五个字符 [<、>、&、" 和 '] 之外,我还会转义垂直制表符 (0x0B)。它是有效的 UTF-8,但不是有效的 XML 1.0,并且甚至许多库(包括高度可移植的(ANSI C)库libxml2)错过了它并默默地输出 invalid XML。
In addition to the commonly known five characters [<, >, &, ", and '], I would also escape the vertical tab character (0x0B). It is valid UTF-8, but not valid XML 1.0, and even many libraries (including the highly portable (ANSI C) library libxml2) miss it and silently output invalid XML.
摘自:XML,转义
有五个预定义实体:
“所有允许的 Unicode 字符都可以用数字字符引用来表示。” 例如:
大多数控制字符和其他 Unicode 范围都被明确排除,这意味着(我认为)它们不能发生转义或直接:
XML 中的有效字符
Abridged from: XML, Escaping
There are five predefined entities:
"All permitted Unicode characters may be represented with a numeric character reference." For example:
Most of the control characters and other Unicode ranges are specifically excluded, meaning (I think) they can't occur either escaped or direct:
Valid characters in XML
接受的答案不正确。最好的方法是使用转义 xml 的库。
正如其他问题中提到的
“基本上,控制字符和字符Unicode 范围是不允许的,这也意味着禁止调用例如字符实体。”
如果只转义这五个字符。您可能会遇到诸如 发现无效 XML 字符(Unicode:0xc)之类的问题
The accepted answer is not correct. Best is to use a library for escaping xml.
As mentioned in this other question
"Basically, the control characters and characters out of the Unicode ranges are not allowed. This means also that calling for example the character entity is forbidden."
If you only escape the five characters. You can have problems like An invalid XML character (Unicode: 0xc) was found
这取决于上下文。对于内容,它是 < 和 & 以及 ]]>(虽然是三个字符的字符串,而不是一个字符)。
对于属性值,它是 <、&、" 和 '。
对于 CDATA,它是]]>。
It depends on the context. For the content, it is < and &, and ]]> (though a string of three instead of one character).
For attribute values, it is <, &, ", and '.
For CDATA, it is ]]>.
如果要处理字符数据而不是标记,则仅需要转义
<
和&
:2.4 字符数据和标记
Only
<
and&
are required to be escaped if they are to be treated character data and not markup:2.4 Character Data and Markup