当前位置：文江博客话题详情

PCDATA 和 CDATA 到底是什么？

发布于 2024-07-19 12:31:56 字数 273 浏览 12 评论 0原文

PCDATA 和 CDATA 的松散定义似乎是

PCDATA 是字符数据，但要解析。
CDATA 是字符数据，不被解析。

但后来有人告诉我，CDATA 实际上已解析，或者 PCDATA 实际上未解析......所以这有点混乱。有谁知道真正的交易是什么？

更新：我实际上在维基百科上添加了 PCDATA 定义...所以不要太认真地对待这个答案，因为这只是我对它的粗略理解。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

用心笑 2024-07-26 12:31:56

来自 WIKI：

PCDATA

简单来说，PCDATA 代表解析字符数据。这意味着字符将由 XML、XHTML 或 HTML 解析器进行解析。（< 将更改为 <，
将被视为段落标记等）。与 CDATA 相比，其中字符不由 XML、XHTML 或 HTML 解析器解析。

CDATA

术语 CDATA 表示字符数据，在标记语言 SGML 和 XML 中用于不同但相关的目的。该术语表示文档的某个部分是通用字符数据，而不是非字符数据或具有更具体、有限结构的字符数据。

回复收藏 0 原文

凉月流沐 2024-07-26 12:31:56

PCDATA 和 CDATA 都被解析。它们都是字符数据。

它们都必须仅包含有效字符。例如，如果您的文档编码是 UTF-8，则 CDATA 部分的内容仍然必须是有效的 UTF-8 字符。因此，随机的二进制数据可能会妨碍文档的格式良好。此外，如果只是为了找到结束部分标记，CDATA 部分仍然会被解析。但其他类似标记的字符，如 <、> 和& 被解析器忽略并按原样传递。

PCDATA 文字 < 和 & 中的 OTOH（以及属性值中的 ' 或 "）必须进行转义，或者它们将被解释为标记。

所以，是的，CDATA 部分确实被解析了，但我不知道为什么你被告知 PCDATA 未被解析。

回复收藏 0 原文

一身仙ぐ女味 2024-07-26 12:31:56

PCDATA - 已解析的字符数据

CDATA - （未解析的）字符数据

http://www.w3schools.com/XML/xml_cdata.asp

回复收藏 0 原文

摇划花蜜的午后 2024-07-26 12:31:56

PCDATA 是将由解析器解析的文本。文本内的标签
将被视为标记并且实体将被扩展。
CDATA 是解析器不会解析的文本。文本内的标签将
不被视为标记并且实体不会被扩展。

默认情况下，一切都是 PCDATA。在下面的示例中，忽略根，将被解析，并且它将没有内容，只有一个子项。

<?xml version="1.0"?>
<foo>
<bar><test>content!</test></bar>
</foo>

当我们想要指定一个元素仅包含文本而不包含子元素时，我们使用关键字 PCDATA，因为该关键字指定该元素必须包含可解析的字符数据 - 即，除小于号字符 (< ;) 、大于 (>) 、与号 (&)、引号 (') 和双引号 (")。

在下一个示例中， bar 是 CDATA，未进行解析，内容为 “内容！”。

<?xml version="1.0"?>
<foo>
<bar><![CDATA[<test>content!</test>]]></bar>
</foo>

#PCDATA 内容模型表示元素可以包含纯文本，它的“已解析”部分表示标记。（包括 PI、注释和 SGML 指令）被解析而不是显示为原始文本。这也意味着

允许纯文本内容的另一种内容模型是 CDATA，而元素内容模型可能不会被替换。隐式设置为 CDATA，但在 SGML 中，这意味着元素内容中的标记和实体引用将被忽略，但在 CDATA 类型的属性中，实体引用将被替换。

在 XML 中，#PCDATA 是唯一的纯文本内容模型。如果您确实想允许元素中包含文本内容，则可以使用它。 CDATA 内容模型可以通过 #PCDATA 中的 CDATA 块标记显式使用，但元素内容可能不会默认定义为 CDATA。

在 DTD 中，包含文本的属性类型必须是 CDATA。属性声明中的 CDATA 关键字与 XML 文档中的 CDATA 部分具有不同的含义。在 CDATA 部分中，除了“]]>”之外，所有字符都是合法的（包括 <、>、&、' 和“字符）结束标签。

#PCDATA 不适合属性的类型。它用于“叶子”文本类型。

#PCDATA 前面加上一个哈希（也称为“标签”或 octothorp）只是出于历史原因。

PCDATA is text that will be parsed by a parser. Tags inside the text
will be treated as markup and entities will be expanded.
CDATA is text that will not be parsed by a parser. Tags inside the text will
not be treated as markup and entities will not be expanded.

By default, everything is PCDATA. In the following example, ignoring the root, <bar> will be parsed, and it'll have no content, but one child.

<?xml version="1.0"?>
<foo>
<bar><test>content!</test></bar>
</foo>

When we want to specify that an element will only contain text, and no child elements, we use the keyword PCDATA, because this keyword specifies that the element must contain parsable character data – that is , any text except the characters less-than (<) , greater-than (>) , ampersand (&), quote(') and double quote (").

In the next example, bar is CDATA, and isn't parsed, and has the content "<test>content!</test>".

<?xml version="1.0"?>
<foo>
<bar><![CDATA[<test>content!</test>]]></bar>
</foo>

There are several content models in SGML. The #PCDATA content model says that an element may contain plain text. The "parsed" part of it means that markup (including PIs, comments and SGML directives) in it is parsed instead of displayed as raw text. It also means that entity references are replaced.

Another type of content model allowing plain text contents is CDATA. In XML, the element content model may not implicitly be set to CDATA, but in SGML, it means that markup and entity references are ignored in the contents of the element. In attributes of CDATA type however, entity references are replaced.

In XML #PCDATA is the only plain text content model. You use it if you at all want to allow text contents in the element. The CDATA content model may be used explicitly through the CDATA block markup in #PCDATA, but element contents may not be defined as CDATA per default.

In a DTD, the type of an attribute that contains text must be CDATA. The CDATA keyword in an attribute declaration has a different meaning than the CDATA section in an XML document. In CDATA section all characters are legal (including <,>,&,’ and “ characters) except the “]]>” end tag.

#PCDATA is not appropriate for the type of an attribute. It is used for the type of "leaf" text.

#PCDATA is prepended by a hash (also known as a "hashtag" or octothorp) simply for historical reasons.

回复收藏 0 原文

悲念泪 2024-07-26 12:31:56

你的第一个定义是正确的。

PCDATA 被解析，这意味着实体被扩展并且文本被视为标记。 CDATA 不由 XML 解析器解析。

回复收藏 0 原文

寄居者 2024-07-26 12:31:56

如果在 XHTML DTD 中默认情况下仅将元素设置为 CDATA，则可以节省大量丑陋的手动覆盖...为什么脚本块会包含其他元素？如果存在这样的元素，它们将由 JS 解释器在 DOM 操作操作中处理——在这种情况下，在文档插入和渲染之前，XML 解析器仍应完全忽略它们。我想它可能是为了强制使用外部脚本资源文件而设计的，这最终是一件好事。

回复收藏 0 原文

~没有更多了~