PCDATA 和 CDATA 到底是什么?
PCDATA 和 CDATA 的松散定义似乎是
- PCDATA 是字符数据,但要解析。
- CDATA 是字符数据,不被解析。
但后来有人告诉我,CDATA 实际上已解析,或者 PCDATA 实际上未解析......所以这有点混乱。 有谁知道真正的交易是什么?
更新:我实际上在维基百科上添加了 PCDATA 定义...所以不要太认真地对待这个答案,因为这只是我对它的粗略理解。
it seems that a loose definition of PCDATA and CDATA is that
- PCDATA is character data, but is to be parsed.
- CDATA is character data, and is not to be parsed.
but then someone told me that CDATA is actually parsed or PCDATA is actually not parsed... so it is a bit of a confusion. Does anyone know the real deal is?
Update: I actually added the PCDATA definition on Wikipedia... so don't take that answer too seriously as that's only my rough understanding of it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
来自 WIKI:
PCDATA
CDATA
From WIKI:
PCDATA
CDATA
PCDATA 和 CDATA 都被解析。 它们都是字符数据。
它们都必须仅包含有效字符。 例如,如果您的文档编码是 UTF-8,则 CDATA 部分的内容仍然必须是有效的 UTF-8 字符。 因此,随机的二进制数据可能会妨碍文档的格式良好。 此外,如果只是为了找到结束部分标记,CDATA 部分仍然会被解析。 但其他类似标记的字符,如 <、> 和& 被解析器忽略并按原样传递。
PCDATA 文字
<
和&
中的 OTOH(以及属性值中的'
或"
)必须进行转义,或者它们将被解释为标记。所以,是的,CDATA 部分确实被解析了,但我不知道为什么你被告知 PCDATA 未被解析。
Both PCDATA and CDATA are parsed. They are both character data.
They both must include only valid characters. For example if your document encoding is UTF-8, the content of CDATA sections must still be valid UTF-8 characters. So random binary data will probably prevent the document from being well-formed. Also CDATA sections are still parsed, if only to find the end section tag. But other markup-like characters, like <, > and & are ignored and passed as-is by the parser.
OTOH in PCDATA literal
<
and&
(and'
or"
in attribute values) must be escaped, or they will be interpreted as markup. Entities will also be expanded.So yes, CDATA sections are indeed parsed. I am not sure why you were told that PCDATA is not parsed though.
PCDATA - 已解析的字符数据
CDATA - (未解析的)字符数据
http://www.w3schools.com/XML/xml_cdata.asp
PCDATA - Parsed Character Data
CDATA - (Unparsed) Character Data
http://www.w3schools.com/XML/xml_cdata.asp
将被视为标记并且实体将被扩展。
不被视为标记并且实体不会被扩展。
默认情况下,一切都是 PCDATA。 在下面的示例中,忽略根,
将被解析,并且它将没有内容,只有一个子项。当我们想要指定一个元素仅包含文本而不包含子元素时,我们使用关键字 PCDATA,因为该关键字指定该元素必须包含可解析的字符数据 - 即,除小于号字符 (< ;) 、大于 (>) 、与号 (&)、引号 (') 和双引号 (")。
在下一个示例中, bar 是 CDATA,未进行解析,内容为
“内容! ”
。#PCDATA 内容模型表示元素可以包含纯文本,它的“已解析”部分表示标记。 (包括 PI、注释和 SGML 指令)被解析而不是显示为原始文本。这也意味着
允许纯文本内容的另一种内容模型是 CDATA,而元素内容模型可能不会 被替换。隐式设置为 CDATA,但在 SGML 中,这意味着元素内容中的标记和实体引用将被忽略,但在 CDATA 类型的属性中,实体引用将被替换。
在 XML 中,#PCDATA 是唯一的纯文本内容模型。 如果您确实想允许元素中包含文本内容,则可以使用它。 CDATA 内容模型可以通过 #PCDATA 中的 CDATA 块标记显式使用,但元素内容可能不会默认定义为 CDATA。
在 DTD 中,包含文本的属性类型必须是 CDATA。 属性声明中的 CDATA 关键字与 XML 文档中的 CDATA 部分具有不同的含义。 在 CDATA 部分中,除了“]]>”之外,所有字符都是合法的(包括 <、>、&、' 和“字符) 结束标签。
#PCDATA 不适合属性的类型。 它用于“叶子”文本类型。
#PCDATA 前面加上一个哈希(也称为“标签”或 octothorp)只是出于历史原因。
will be treated as markup and entities will be expanded.
not be treated as markup and entities will not be expanded.
By default, everything is PCDATA. In the following example, ignoring the root,
<bar>
will be parsed, and it'll have no content, but one child.When we want to specify that an element will only contain text, and no child elements, we use the keyword PCDATA, because this keyword specifies that the element must contain parsable character data – that is , any text except the characters less-than (<) , greater-than (>) , ampersand (&), quote(') and double quote (").
In the next example, bar is CDATA, and isn't parsed, and has the content
"<test>content!</test>"
.There are several content models in SGML. The #PCDATA content model says that an element may contain plain text. The "parsed" part of it means that markup (including PIs, comments and SGML directives) in it is parsed instead of displayed as raw text. It also means that entity references are replaced.
Another type of content model allowing plain text contents is CDATA. In XML, the element content model may not implicitly be set to CDATA, but in SGML, it means that markup and entity references are ignored in the contents of the element. In attributes of CDATA type however, entity references are replaced.
In XML #PCDATA is the only plain text content model. You use it if you at all want to allow text contents in the element. The CDATA content model may be used explicitly through the CDATA block markup in #PCDATA, but element contents may not be defined as CDATA per default.
In a DTD, the type of an attribute that contains text must be CDATA. The CDATA keyword in an attribute declaration has a different meaning than the CDATA section in an XML document. In CDATA section all characters are legal (including <,>,&,’ and “ characters) except the “]]>” end tag.
#PCDATA is not appropriate for the type of an attribute. It is used for the type of "leaf" text.
#PCDATA is prepended by a hash (also known as a "hashtag" or octothorp) simply for historical reasons.
你的第一个定义是正确的。
PCDATA 被解析,这意味着实体被扩展并且文本被视为标记。 CDATA 不由 XML 解析器解析。
Your first definition is correct.
PCDATA is parsed which means that entities are expanded and that text is treated as markup. CDATA is not parsed by an XML parser.
如果在 XHTML DTD 中默认情况下仅将元素设置为 CDATA,则可以节省大量丑陋的手动覆盖...为什么脚本块会包含其他元素? 如果存在这样的元素,它们将由 JS 解释器在 DOM 操作操作中处理——在这种情况下,在文档插入和渲染之前,XML 解析器仍应完全忽略它们。 我想它可能是为了强制使用外部脚本资源文件而设计的,这最终是一件好事。
If only elements were set to CDATA by default in the XHTML DTDs, it would save a lot of ugly manual overrides... Why would script blocks contain other elements? If there are such elements, they are handled by the JS interpreter in DOM manipulation actions -- in which case they should still be completely ignored by the XML parser before document insertion and rendering. I suppose it may have been designed to force the use of external script resource files, which is a ultimately a good thing.