XML 文档中需要转义哪些字符?

发布于 2025-01-16 19:57:00 字数 38 浏览 2 评论 0原文

XML 文档中必须转义哪些字符,或者在哪里可以找到这样的列表?

What characters must be escaped in XML documents, or where could I find such a list?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

傲鸠 2025-01-23 19:57:00

如果您使用适当的类或库,他们将为您进行转义。许多 XML 问题是由字符串连接引起的。

XML 转义字符

只有五种:

"   "
'   '
<   <
>   >
&   &

转义字符取决于特殊字符的使用位置。

这些示例可以在 W3C 标记验证服务 进行验证。

文本

安全的方法是转义文本中的所有五个字符。但是,三个字符 "'> 不需要在文本中转义:

<?xml version="1.0"?>
<valid>"'></valid>

属性

安全的方法是转义所有五个字符但是,属性中的 > 字符不需要转义:

<?xml version="1.0"?>
<valid attribute=">"/>

如果引号是 "<,则属性中的 ' 字符不需要转义。 /code>:

<?xml version="1.0"?>
<valid attribute="'"/>

同样,如果引号是 ',则 " 不需要在属性中转义:

<?xml version="1.0"?>
<valid attribute='"'/>

注释

所有五个特殊字符不得在注释中转义:

<?xml version="1.0"?>
<valid>
<!-- "'<>& -->
</valid>

CDATA

所有五个特殊字符不得CDATA部分中转义:

<?xml version="1.0"?>
<valid>
<![CDATA["'<>&]]>
</valid>

处理说明

全部五个特殊字符不得 XML 处理指令中的转义:

<?xml version="1.0"?>
<?process <"'&> ?>
<valid/>

XML 与 HTML

HTML 有它自己的一组转义代码,它涵盖了更多的字符。

If you use an appropriate class or library, they will do the escaping for you. Many XML issues are caused by string concatenation.

XML escape characters

There are only five:

"   "
'   '
<   <
>   >
&   &

Escaping characters depends on where the special character is used.

The examples can be validated at the W3C Markup Validation Service.

Text

The safe way is to escape all five characters in text. However, the three characters ", ' and > needn't be escaped in text:

<?xml version="1.0"?>
<valid>"'></valid>

Attributes

The safe way is to escape all five characters in attributes. However, the > character needn't be escaped in attributes:

<?xml version="1.0"?>
<valid attribute=">"/>

The ' character needn't be escaped in attributes if the quotes are ":

<?xml version="1.0"?>
<valid attribute="'"/>

Likewise, the " needn't be escaped in attributes if the quotes are ':

<?xml version="1.0"?>
<valid attribute='"'/>

Comments

All five special characters must not be escaped in comments:

<?xml version="1.0"?>
<valid>
<!-- "'<>& -->
</valid>

CDATA

All five special characters must not be escaped in CDATA sections:

<?xml version="1.0"?>
<valid>
<![CDATA["'<>&]]>
</valid>

Processing instructions

All five special characters must not be escaped in XML processing instructions:

<?xml version="1.0"?>
<?process <"'&> ?>
<valid/>

XML vs. HTML

HTML has its own set of escape codes which cover a lot more characters.

走走停停 2025-01-23 19:57:00

对旧的常见问题的新的、简化的答案...

简化的 XML 转义(优先,100% 完成) >

  1. 始终 (90% 重要的是要记住)


  2. 属性值 < super>(9%重要的是要记住)

    • attr=" '单引号'可以放在双引号内。"
    • attr=' "双引号"可以放在单引号内。'
    • " 转义为 ",将 ' 转义为 '
  3. 评论CDATA处理说明 < em>(0.9% 重要记住)

    • 不需要转义任何内容,但不允许使用 -- 字符串。
    • CDATA ]]> 无需转义任何内容,但不允许使用 ]]> 字符串。
    • PI?> 无需转义任何内容,但不允许使用 ?> 字符串。


  4. 深奥 (0.1% 重要的是要记住)

New, simplified answer to an old, commonly asked question...

Simplified XML Escaping (prioritized, 100% complete)

  1. Always (90% important to remember)

    • Escape < as < unless < is starting a <tag/> or other markup.
    • Escape & as & unless & is starting an &entity;.
  2. Attribute Values (9% important to remember)

    • attr=" 'Single quotes' are ok within double quotes."
    • attr=' "Double quotes" are ok within single quotes.'
    • Escape " as " and ' as ' otherwise.
  3. Comments, CDATA, and Processing Instructions (0.9% important to remember)

    • <!-- Within comments --> nothing has to be escaped but no -- strings are allowed.
    • <![CDATA[ Within CDATA ]]> nothing has to be escaped, but no ]]> strings are allowed.
    • <?PITarget Within PIs ?> nothing has to be escaped, but no ?> strings are allowed.
  4. Esoterica (0.1% important to remember)

幸福%小乖 2025-01-23 19:57:00

也许这会有所帮助:

XML 和 HTML 字符实体引用列表

在 SGML、HTML 和 XML 文档中,
称为字符的逻辑结构
数据和属性值包括
字符序列,其中每个
性格可以直接体现出来
(代表它自己),或者可以是
由一系列字符表示
称为字符引用,其中
有两种类型: 数字
字符参考和字符
实体参考。本文列出了
字符实体引用
在 HTML 和 XML 文档中有效。

该文章列出了以下五个预定义的 XML 实体:

quot  "
amp   &
apos  '
lt    <
gt    >

Perhaps this will help:

List of XML and HTML character entity references:

In SGML, HTML and XML documents, the
logical constructs known as character
data and attribute values consist of
sequences of characters, in which each
character can manifest directly
(representing itself), or can be
represented by a series of characters
called a character reference, of which
there are two types: a numeric
character reference and a character
entity reference. This article lists
the character entity references that
are valid in HTML and XML documents.

That article lists the following five predefined XML entities:

quot  "
amp   &
apos  '
lt    <
gt    >
忆沫 2025-01-23 19:57:00

根据万维网联盟(w3C)的规范,有 5 个字符不得以其文字形式出现在 XML 文档中,除非用作标记分隔符或在注释、处理指令或 CDATA 部分中使用。在所有其他情况下,必须根据下表使用相应的实体或数字引用来替换这些字符:

原始字符XML 实体替换XML 数字替换
             ;          <                              0;               p;             
             ;          >                 p;            >&                 ;              
"              p;         "               p;           "  &                ;             
&             ;          &               p;           &  &                ;             
'              p;          '               p;           '  &                ;             

请注意,上述实体也可以在 HTML 中使用,但 ' 除外,它是随 XHTML 1.0 引入的,并且未在 HTML 4 中声明。出于这个原因,并确保复古-兼容性,XHTML 规范建议使用' 相反。

According to the specifications of the World Wide Web Consortium (w3C), there are 5 characters that must not appear in their literal form in an XML document, except when used as markup delimiters or within a comment, a processing instruction, or a CDATA section. In all the other cases, these characters must be replaced either using the corresponding entity or the numeric reference according to the following table:

Original CharacterXML entity replacementXML numeric replacement
<                              <                                    <                                    
>                              >                                   >                                    
"                               "                               "                                    
&                              &                               &                                    
'                               '                               '                                    

Notice that the aforementioned entities can be used also in HTML, with the exception of ', that was introduced with XHTML 1.0 and is not declared in HTML 4. For this reason, and to ensure retro-compatibility, the XHTML specification recommends the use of ' instead.

十级心震 2025-01-23 19:57:00

标签和属性的转义字符是不同的。

对于标签:

 < <
 > > (only for compatibility, read below)
 & &

对于属性:

" "
' '

来自字符数据和标记

不得使用与号 (&) 和左尖括号 (<)
以文字形式出现,除非用作标记分隔符,
或者在注释、处理指令或 CDATA 部分中。如果
它们在其他地方需要,必须使用数字进行转义
字符引用或字符串“ & ”和“ < ”
分别。直角括号 (>) 可以使用
字符串“ > ”,并且为了兼容性,必须使用以下任一方法进行转义
“ > ” 或出现在字符串“ ]]> 中的字符引用
" 在内容中,当该字符串没有标记 CDATA 的结尾时
部分。

要允许属性值同时包含单引号和双引号,
撇号或单引号字符 (') 可以表示为 "
" ",双引号字符 (") 为 " " "。

Escaping characters is different for tags and attributes.

For tags:

 < <
 > > (only for compatibility, read below)
 & &

For attributes:

" "
' '

From Character Data and Markup:

The ampersand character (&) and the left angle bracket (<) must not
appear in their literal form, except when used as markup delimiters,
or within a comment, a processing instruction, or a CDATA section. If
they are needed elsewhere, they must be escaped using either numeric
character references or the strings " & " and " < "
respectively. The right angle bracket (>) may be represented using the
string " > ", and must, for compatibility, be escaped using either
" > " or a character reference when it appears in the string " ]]>
" in content, when that string is not marking the end of a CDATA
section.

To allow attribute values to contain both single and double quotes,
the apostrophe or single-quote character (') may be represented as "
' ", and the double-quote character (") as " " ".

数理化全能战士 2025-01-23 19:57:00

除了众所周知的五个字符 [<、>、&、" 和 '] 之外,我还会转义垂直制表符 (0x0B)。它是有效的 UTF-8,但不是有效的 XML 1.0,并且甚至许多库(包括高度可移植的(ANSI C)库libxml2)错过了它并默默地输出 invalid XML。

In addition to the commonly known five characters [<, >, &, ", and '], I would also escape the vertical tab character (0x0B). It is valid UTF-8, but not valid XML 1.0, and even many libraries (including the highly portable (ANSI C) library libxml2) miss it and silently output invalid XML.

み青杉依旧 2025-01-23 19:57:00

摘自:XML,转义

有五个预定义实体:

< represents "<"
> represents ">"
& represents "&"
' represents '
" represents "

“所有允许的 Unicode 字符都可以用数字字符引用来表示。” 例如:

大多数控制字符和其他 Unicode 范围都被明确排除,这意味着(我认为)它们不能发生转义或直接:

XML 中的有效字符

Abridged from: XML, Escaping

There are five predefined entities:

< represents "<"
> represents ">"
& represents "&"
' represents '
" represents "

"All permitted Unicode characters may be represented with a numeric character reference." For example:

Most of the control characters and other Unicode ranges are specifically excluded, meaning (I think) they can't occur either escaped or direct:

Valid characters in XML

忆沫 2025-01-23 19:57:00

接受的答案不正确。最好的方法是使用转义 xml 的库。

正如其他问题中提到的

“基本上,控制字符和字符Unicode 范围是不允许的,这也意味着禁止调用例如字符实体。”

如果只转义这五个字符。您可能会遇到诸如 ​​发现无效 XML 字符(Unicode:0xc)之类的问题

The accepted answer is not correct. Best is to use a library for escaping xml.

As mentioned in this other question

"Basically, the control characters and characters out of the Unicode ranges are not allowed. This means also that calling for example the character entity is forbidden."

If you only escape the five characters. You can have problems like An invalid XML character (Unicode: 0xc) was found

魂归处 2025-01-23 19:57:00

这取决于上下文。对于内容,它是 <& 以及 ]]>(虽然是三个字符的字符串,而不是一个字符)。

对于属性值,它是 <&"'

对于 CDATA,它是]]>

It depends on the context. For the content, it is < and &, and ]]> (though a string of three instead of one character).

For attribute values, it is <, &, ", and '.

For CDATA, it is ]]>.

心在旅行 2025-01-23 19:57:00

如果要处理字符数据而不是标记,则仅需要转义 <&

2.4 字符数据和标记

Only < and & are required to be escaped if they are to be treated character data and not markup:

2.4 Character Data and Markup

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文