在 XML 中存储引用数据的可接受方式是什么?

发布于 2024-07-06 12:56:56 字数 302 浏览 3 评论 0原文

在 XML 中存储引用数据的可接受方式是什么?

例如,对于一个节点,哪个是正确的?

  • (a)<名字>杰西“身体”文图拉
  • (b) <名字>杰西“身体”文图拉
  • (c)<名字>杰西“身体” 文图拉
  • (d) 以上都不是(请具体说明)

如果 (a),您对属性做什么? 如果 (c),混合 HTML 和 HTML 真的合适吗? XML? 同样,如何处理单引号和弯引号?

What's the accepted way of storing quoted data in XML?

For example, for a node, which is correct?

  • (a) <name>Jesse "The Body" Ventura</name>
  • (b) <name>Jesse \"The Body\" Ventura</name>
  • (c) <name>Jesse "The Body" Ventura</name>
  • (d) none of the above (please specify)

If (a), what do you do for attributes? If (c), is it really appropriate to mix HTML & XML? Similarly, how do you handle single and curly quotes?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

π浅易 2024-07-13 12:56:56

你的正确答案是 A & C 作为 " 不是元素数据中必须编码的字符。

您应该始终对 >< 等 XML 编码字符> 和 & 以确保它们不在 CDATA 部分内时不会出现问题。这些是

在讨论元素数据时必须 关注的关键项。还要注意属性值内的 '",具体取决于用于包围该值的符号类型。

我发现经常编码 "' 在各个方面都是一个更好的主意,因为它有时在转换为其他格式时有所帮助,其中 " 或 ' 也可能会导致问题。

Your correct answer is A & C as the " is not a character that must be encoded in element data.

You should always be XML encoding characters such as >, <, and & to ensure that you don't have issues if they are NOT inside a CDATA section. These are key items to be concerned about for element data.

When talking about attributes you have to then also be careful of ' and " inside attribute values depending on the type of symbol you use to surround the value.

I've found that often encoding " and ' is a better idea in all aspects as it helps at times when converting to other formats, where the " or ' might cause problems there as well.

看海 2024-07-13 12:56:56

XML 元素内的字符数据可以包含引号字符而不转义它们。 XML 元素中唯一不允许使用的字符是“<”、“&” 和“>” (并且仅当“>”字符是“]]>”字符序列的一部分时才不允许使用。

这并不是说转义引号不是一个好主意 - 我只是说不转义引号是完全有效的 XML。请参阅 XML 规范中的第 2.4 节 - “字符数据和标记”。

因此,就属性而言,属性值可以用单引号或双引号括起来 ,因此,如果它包含其中一个,您可以使用相反的一个来括住该值,如果它同时包含两个,那么您必须使用一个字符实体来表示其中一个或两个

。担心,如果您正在谈论 Word 有时将引号转换为的特殊非 ASCII 引号 - 它们在 XML 中没有特殊含义,因此您可以执行任何操作(但它们不能用于包含属性值)。您还需要确保文档的字符编码正确,以便正确解释它们。

Character data inside XML elements can contain quote characters without escaping them. The only characters that are not permitted inside an XML element are '<', '&' and '>' (and the '>' character is only disallowed if it's part of a "]]>" sequence of characters.

That's not to say that escaping the quotes is not a good idea - I'm just saying that not escaping the quotes is perfectly valid XML. See section 2.4 - "Character Data and Markup" in the XML spec.

So both (a) and (c) are OK.

As far as attributes are concerned, attribute values can be enclosed in either single or double quotes, so if it contains one or the other you can use the opposite one to enclose the value. If it'll contain both, then you'll have to use a character entity for one or both.

As far as 'curly-quotes' are concerned, if you're talking about the special, non-ASCII quotes that Word sometimes converts quotes to - they have no special meaning in XML, so you can do whichever (but they can't be used to enclose attribute values". You'll also need to make sure the character encoding for the document is correct, so they are interpreted correctly.

绝情姑娘 2024-07-13 12:56:56

文本节点中的双引号可以表示为双引号字符或 " 实体。 如果属性值中的双引号由单引号分隔,则可以将其表示为双引号字符,反之亦然; 否则,将它们转义为 "

仅当您 a) 在不支持 XML 的文本编辑器中编辑 XML 或 b) 通过字符串操作以编程方式创建 XML 时,这才有意义。 一般来说,您应该避免 (a),除非您确实知道自己在做什么,或者至少有办法在编辑完成后检查 XML 的格式良好性。

在任何情况下都应该避免 (b)。 切勿通过字符串操作创建 XML; 始终使用 DOM 或其他一些工具。

Double quotes in text nodes can be represented either as the double-quote character or as the " entity. Double quotes in attribute values can be represented as the double-quote character if the value is delimited by single quotes, and vice versa; otherwise, escape them as "

This is only relevant if you're a) editing XML in a non-XML-aware text editor or b) creating XML programmatically through string manipulation. Generally speaking, you should avoid (a) unless you really know what you're doing, or at least have a way of checking the well-formedness of your XML after editing is complete.

And you should avoid (b) under all circumstances. Never create XML through string manipulation; always use a DOM or some other tool.

心作怪 2024-07-13 12:56:56

您不必担心 XML 中的内容是如何编码的。 您应该始终使用适当的库来生成 XML 文档。 XML 存在太多问题,您无法自行解决。 我见过大量无效的 XML 文档,因为有人认为他们可以自己生成正确的 XML,而不需要使用库。 当今使用的所有主要编程语言都有 XML 库。

You shouldn't worry about how things are encoded in your XML. You should always use a proper library for generating XML documents. There's too many gotcha's to XML to get it right by yourself. I've seen tons of invalid XML documents come my way because somebody thought they could generate proper XML themselves, without using a library. All major programming languages in use today have XML libraries.

余厌 2024-07-13 12:56:56

例如,对于一个节点,哪个是正确的?

XML 规范本身不讨论节点(除了将 DTD 语法与有限自动机正则表达式进行比较时)。 DOM 节点可以是属性、元素、文本或任何其他节点类型。

在文本节点内,您只需转义解析器将解释为启动不同节点的字符 - 因此您可以将 &< 转义为 & amp;<

为了可移植性,转义大引号通常是个好主意,但没有理由在 XML 文本中转义纯引号。

在属性节点内,您必须像以前一样转义小于号和与号,以及用于分隔属性的引号。

<foo attribute="'ok'" attribute2='"also-ok"' attribute3=""needed""/>

通常更容易养成只使用一种类型并始终逃避它的习惯。 我写了相当多的 XSLT,并且喜欢使用“outside 和 ' inside:

<xsl:value-of select="person[@name = 'bob']"/>

如果您对转义感到偏执,那么 XPath 的可读性就会降低:

<xsl:value-of select="person[@name = 'bob'"/>

如果 (c),混合 HTML 和 HTML 真的合适吗? XML?

XML 定义了命名实体 ampgtltapos 和 & 。 quot

HTML 定义了更多的实体。

您可以而且应该在 XML 中使用 XML 命名实体,而不是使用数字实体。

lt 实体转义 <,并且应该在文本和属性值中使用。
amp 实体转义 &,应该在文本和属性值中使用。
aposquot 实体转义 '",应该在属性值中使用。
gt 实体有点无用 - XML 中几乎没有转义 > 的语法要求。 也许> 只同意与< 如果它得到平等的计费。

我在生成源代码的 XSLT 中经常使用的另一个是 ,它会插入一个新行。 &nl;> 更有用

同样,如何处理单引号和弯引号?

XML 旨在标记 Unicode 文本,大引号在其中没有特殊含义。 然而,XML 文档所使用的编码在野外被误解的情况并不少见。 因此,如果它处于封闭环境中并且可以保证生产者和消费者的正确 Unicode 编码,那么我只需将其放入 XML 中即可。 否则使用数字字符实体。 对于代码点高于 127 的任何字符都是如此 - 弯引号没有什么特别的。

For example, for a node, which is correct?

The XML specification itself doesn't talk about nodes (other than when comparing DTD syntax to finite automaton regex). A DOM node can be attribute, element, text or any of the other node types.

Inside a text node, you only need to escape characters which the parser would interpret as starting a different node - so you escape & and < as & and < .

For portability, it's often a good idea to escape curly quotes, but there is no reason to escape plain quotes in XML text.

Inside an attribute node, you have to escape less-than and ampersand as before, and also whichever quote you used to delimit the attribute.

<foo attribute="'ok'" attribute2='"also-ok"' attribute3=""needed""/>

It's usually easier to get in the habit of only using one type and always escaping it. I write quite a bit of XSLT and favour using " outside and ' inside:

<xsl:value-of select="person[@name = 'bob']"/>

If you get paranoid with the escaping, the XPath becomes less readable:

<xsl:value-of select="person[@name = 'bob'"/>

If (c), is it really appropriate to mix HTML & XML?

XML defines the named entities amp, gt, lt, apos, & quot

HTML defines many more entities.

You can and should use the XML named entities in XML in preference of using a numeric entity.

The lt entity escapes < and should be used in text and attribute values.
The amp entity escapes & and should be used in text and attribute values.
The apos and quot entities escape ' and " and should be used in attribute values.
The gt entity is a bit useless - there is almost never a syntactic requirement to escape > in XML. Maybe > only agreed to work with < if it got equal billing.

The other one I use a lot in XSLT that generates source code is which inserts a new line. &nl; would have been more use than >

Similarly, how do you handle single and curly quotes?

XML is designed to mark up Unicode text, and the curly quotes have no special meaning in it. However, it's not uncommon for the encoding used for and XML document to be misinterpreted in the wild. So if it's in a closed environment and can guarantee correct Unicode encoding at producer and consumer then I'd just put it in the XML. Otherwise use a numeric character entity. That's true of any character with a code-point above 127 - there's nothing special about curly quotes.

那支青花 2024-07-13 12:56:56

正确答案是“C”。

单引号实际上不会造成问题,但您需要小心&符号和左尖括号。

The correct answer is 'C'.

Single quotes don't really cause a problem, but you need to be careful of ampersands and left angle brackets.

稀香 2024-07-13 12:56:56

这真的取决于。 如果您只想在 XML 字符串中包含引号,则选择“A”。

但如果有意义或者您需要抽象引用(例如 i18n),XML 可以提供更丰富的选择。 例如:

<name>
  <given>Jesse</given>
  <family>Ventura</family>
  <nickName>the Body</nickName>
</name>

在许多情况下杀伤力过大。 但是,如果您需要正确处理世界上许多不同且经常不一致的命名方案,我会考虑按照这些方式对您的名称进行编码。 XML 对此非常有用。

It depends really. If all you want to do is have quotes in your XML string, then 'A'.

But if there is meaning or you need to abstract the quote (i18n for example), XML affords richer options. For example:

<name>
  <given>Jesse</given>
  <family>Ventura</family>
  <nickName>the Body</nickName>
</name>

Overkill in many situations. But if you need to correctly handle many of the world's varied - and frequently inconsistent - naming schemes, I'd think about encoding your names along these lines. XML is great for this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文