PCDATA 和 CDATA 到底是什么?

发布于 2024-07-19 12:31:56 字数 273 浏览 12 评论 0原文

PCDATA 和 CDATA 的松散定义似乎是

  1. PCDATA 是字符数据,但要解析。
  2. CDATA 是字符数据,被解析。

但后来有人告诉我,CDATA 实际上已解析,或者 PCDATA 实际上未解析......所以这有点混乱。 有谁知道真正的交易是什么?

更新:我实际上在维基百科上添加了 PCDATA 定义...所以不要太认真地对待这个答案,因为这只是我对它的粗略理解。

it seems that a loose definition of PCDATA and CDATA is that

  1. PCDATA is character data, but is to be parsed.
  2. CDATA is character data, and is not to be parsed.

but then someone told me that CDATA is actually parsed or PCDATA is actually not parsed... so it is a bit of a confusion. Does anyone know the real deal is?

Update: I actually added the PCDATA definition on Wikipedia... so don't take that answer too seriously as that's only my rough understanding of it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

用心笑 2024-07-26 12:31:56

来自 WIKI:

PCDATA

简单来说,PCDATA 代表解析字符数据。 这意味着字符将由 XML、XHTML 或 HTML 解析器进行解析。 (< 将更改为 <,

将被视为段落标记等)。 与 CDATA 相比,其中字符不由 XML、XHTML 或 HTML 解析器解析。

CDATA

术语 CDATA 表示字符数据,在标记语言 SGML 和 XML 中用于不同但相关的目的。 该术语表示文档的某个部分是通用字符数据,而不是非字符数据或具有更具体、有限结构的字符数据。

From WIKI:

PCDATA

Simply speaking, PCDATA stands for Parsed Character Data. That means the characters are to be parsed by the XML, XHTML, or HTML parser. (< will be changed to <, <p> will be taken to mean a paragraph tag, etc). Compare that with CDATA, where the characters are not to be parsed by the XML, XHTML, or HTML parser.

CDATA

The term CDATA, meaning character data, is used for distinct, but related purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited structure.

凉月流沐 2024-07-26 12:31:56

PCDATA 和 CDATA 都被解析。 它们都是字符数据。

它们都必须仅包含有效字符。 例如,如果您的文档编码是 UTF-8,则 CDATA 部分的内容仍然必须是有效的 UTF-8 字符。 因此,随机的二进制数据可能会妨碍文档的格式良好。 此外,如果只是为了找到结束部分标记,CDATA 部分仍然会被解析。 但其他类似标记的字符,如 <、> 和& 被解析器忽略并按原样传递。

PCDATA 文字 <& 中的 OTOH(以及属性值中的 '")必须进行转义,或者它们将被解释为标记。

所以,是的,CDATA 部分确实被解析了,但我不知道为什么你被告知 PCDATA 未被解析。

Both PCDATA and CDATA are parsed. They are both character data.

They both must include only valid characters. For example if your document encoding is UTF-8, the content of CDATA sections must still be valid UTF-8 characters. So random binary data will probably prevent the document from being well-formed. Also CDATA sections are still parsed, if only to find the end section tag. But other markup-like characters, like <, > and & are ignored and passed as-is by the parser.

OTOH in PCDATA literal < and & (and ' or " in attribute values) must be escaped, or they will be interpreted as markup. Entities will also be expanded.

So yes, CDATA sections are indeed parsed. I am not sure why you were told that PCDATA is not parsed though.

一身仙ぐ女味 2024-07-26 12:31:56

PCDATA - 已解析的字符数据

CDATA - (未解析的)字符数据

http://www.w3schools.com/XML/xml_cdata.asp

PCDATA - Parsed Character Data

CDATA - (Unparsed) Character Data

http://www.w3schools.com/XML/xml_cdata.asp

摇划花蜜的午后 2024-07-26 12:31:56
  • PCDATA 是将由解析器解析的文本。 文本内的标签
    将被视为标记并且实体将被扩展。
  • CDATA 是解析器不会解析的文本。 文本内的标签将
    被视为标记并且实体不会被扩展。

默认情况下,一切都是 PCDATA。 在下面的示例中,忽略根, 将被解析,并且它将没有内容,只有一个子项。

<?xml version="1.0"?>
<foo>
<bar><test>content!</test></bar>
</foo>

当我们想要指定一个元素仅包含文本而不包含子元素时,我们使用关键字 PCDATA,因为该关键字指定该元素必须包含可解析的字符数据 - 即,除小于号字符 (< ;) 、大于 (>) 、与号 (&)、引号 (') 和双引号 (")。

在下一个示例中, bar 是 CDATA,未进行解析,内容为 内容!

<?xml version="1.0"?>
<foo>
<bar><![CDATA[<test>content!</test>]]></bar>
</foo>

#PCDATA 内容模型表示元素可以包含纯文本,它的“已解析”部分表示标记。 (包括 PI、注释和 SGML 指令)被解析而不是显示为原始文本。这也意味着

允许纯文本内容的另一种内容模型是 CDATA,而元素内容模型可能不会 被替换。隐式设置为 CDATA,但在 SGML 中,这意味着元素内容中的标记和实体引用将被忽略,但在 CDATA 类型的属性中,实体引用将被替换。

在 XML 中,#PCDATA 是唯一的纯文本内容模型。 如果您确实想允许元素中包含文本内容,则可以使用它。 CDATA 内容模型可以通过 #PCDATA 中的 CDATA 块标记显式使用,但元素内容可能不会默认定义为 CDATA。

在 DTD 中,包含文本的属性类型必须是 CDATA。 属性声明中的 CDATA 关键字与 XML 文档中的 CDATA 部分具有不同的含义。 在 CDATA 部分中,除了“]]>”之外,所有字符都是合法的(包括 <、>、&、' 和“字符) 结束标签。

#PCDATA 不适合属性的类型。 它用于“叶子”文本类型。

#PCDATA 前面加上一个哈希(也称为“标签”或 octothorp)只是出于历史原因。

  • PCDATA is text that will be parsed by a parser. Tags inside the text
    will be treated as markup and entities will be expanded.
  • CDATA is text that will not be parsed by a parser. Tags inside the text will
    not be treated as markup and entities will not be expanded.

By default, everything is PCDATA. In the following example, ignoring the root, <bar> will be parsed, and it'll have no content, but one child.

<?xml version="1.0"?>
<foo>
<bar><test>content!</test></bar>
</foo>

When we want to specify that an element will only contain text, and no child elements, we use the keyword PCDATA, because this keyword specifies that the element must contain parsable character data – that is , any text except the characters less-than (<) , greater-than (>) , ampersand (&), quote(') and double quote (").

In the next example, bar is CDATA, and isn't parsed, and has the content "<test>content!</test>".

<?xml version="1.0"?>
<foo>
<bar><![CDATA[<test>content!</test>]]></bar>
</foo>

There are several content models in SGML. The #PCDATA content model says that an element may contain plain text. The "parsed" part of it means that markup (including PIs, comments and SGML directives) in it is parsed instead of displayed as raw text. It also means that entity references are replaced.

Another type of content model allowing plain text contents is CDATA. In XML, the element content model may not implicitly be set to CDATA, but in SGML, it means that markup and entity references are ignored in the contents of the element. In attributes of CDATA type however, entity references are replaced.

In XML #PCDATA is the only plain text content model. You use it if you at all want to allow text contents in the element. The CDATA content model may be used explicitly through the CDATA block markup in #PCDATA, but element contents may not be defined as CDATA per default.

In a DTD, the type of an attribute that contains text must be CDATA. The CDATA keyword in an attribute declaration has a different meaning than the CDATA section in an XML document. In CDATA section all characters are legal (including <,>,&,’ and “ characters) except the “]]>” end tag.

#PCDATA is not appropriate for the type of an attribute. It is used for the type of "leaf" text.

#PCDATA is prepended by a hash (also known as a "hashtag" or octothorp) simply for historical reasons.

悲念泪 2024-07-26 12:31:56

你的第一个定义是正确的。

PCDATA 被解析,这意味着实体被扩展并且文本被视为标记。 CDATA 不由 XML 解析器解析。

Your first definition is correct.

PCDATA is parsed which means that entities are expanded and that text is treated as markup. CDATA is not parsed by an XML parser.

寄居者 2024-07-26 12:31:56

如果在 XHTML DTD 中默认情况下仅将元素设置为 CDATA,则可以节省大量丑陋的手动覆盖...为什么脚本块会包含其他元素? 如果存在这样的元素,它们将由 JS 解释器在 DOM 操作操作中处理——在这种情况下,在文档插入和渲染之前,XML 解析器仍应完全忽略它们。 我想它可能是为了强制使用外部脚本资源文件而设计的,这最终是一件好事。

If only elements were set to CDATA by default in the XHTML DTDs, it would save a lot of ugly manual overrides... Why would script blocks contain other elements? If there are such elements, they are handled by the JS interpreter in DOM manipulation actions -- in which case they should still be completely ignored by the XML parser before document insertion and rendering. I suppose it may have been designed to force the use of external script resource files, which is a ultimately a good thing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文