有效的 XML 文件是否需要 XML 声明?
我正在使用 Xerces 的 Sax Parser 解析 XML 文件。
是否需要 XML 声明 ?
I am parsing an XML file using Sax Parser of Xerces.
Is the XML declaration <?xml version="1.0" encoding="UTF-8"?>
required?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在 XML 1.0 中,XML 声明是可选。请参阅XML 1.0 建议的第 2.8 节,其中表示“应该”被使用——这意味着它是推荐的,但不是强制性的。然而,在 XML 1.1 中,该声明是强制的。请参阅XML 1.1 建议的第 2.8 节,其中表示“必须” 被使用。它甚至继续指出,如果缺少该声明,则自动暗示该文档是 XML 1.0 文档。
请注意,在XML 声明中,
编码
和独立
都是可选的。只有版本
是强制性的。此外,这些不是属性,因此如果它们存在,则它们必须按以下顺序排列:version
,后跟任何encoding
,后跟任何standalone
>。如果您不以这种方式指定编码,XML 解析器会尝试猜测正在使用的编码。 XML 1.0 建议书描述了一种自动检测字符编码的可能方法。实际上,如果输入编码为 UTF-8、UTF-16 或 US-ASCII,这并不是什么大问题。当遇到使用 US-ASCII 范围之外的字符(例如 ISO 8859-1)的 8 位编码时,自动检测不起作用 - 如果可以,请避免创建这些编码。
standalone
表示在没有DTD的情况下是否可以正确处理XML文档。人们很少使用它。如今,设计一种缺少 DTD 信息的 XML 格式是很糟糕的。更新:
“prolog 错误/无效的 utf-8 编码”错误表示解析器在文件中找到的实际数据与 XML 声明所说的编码不匹配。或者在某些情况下,文件内的数据与自动检测的编码不匹配。
由于您的文件包含字节顺序标记 (BOM),因此它应该采用 UTF-16 编码。我怀疑你的声明是
当文件被 NotePad 更改为 UTF-16 时,这显然是不正确的。简单的解决方案是删除
encoding
并简单地说。您也可以将其编辑为
encoding="UTF-16"
但这对于原始文件(不是 UTF-16)或者文件以某种方式更改回 UTF 来说是错误的-8 或其他一些编码。不要费心尝试删除 BOM——这不是问题的原因。使用记事本或写字板编辑 XML 才是真正的问题!
In XML 1.0, the XML Declaration is optional. See section 2.8 of the XML 1.0 Recommendation, where it says it "should" be used -- which means it is recommended, but not mandatory. In XML 1.1, however, the declaration is mandatory. See section 2.8 of the XML 1.1 Recommendation, where it says "MUST" be used. It even goes on to state that if the declaration is absent, that automatically implies the document is an XML 1.0 document.
Note that in an XML Declaration the
encoding
andstandalone
are both optional. Only theversion
is mandatory. Also, these are not attributes, so if they are present they must be in that order:version
, followed by anyencoding
, followed by anystandalone
.If you don't specify the encoding in this way, XML parsers try to guess what encoding is being used. The XML 1.0 Recommendation describes one possible way character encoding can be autodetected. In practice, this is not much of a problem if the input is encoded as UTF-8, UTF-16 or US-ASCII. Autodetection doesn't work when it encounters 8-bit encodings that use characters outside the US-ASCII range (e.g. ISO 8859-1) -- avoid creating these if you can.
The
standalone
indicates whether the XML document can be correctly processed without the DTD or not. People rarely use it. These days, it is a bad to design an XML format that is missing information without its DTD.Update:
A "prolog error/invalid utf-8 encoding" error indicates that the actual data the parser found inside the file did not match the encoding that the XML declaration says it is. Or in some cases the data inside the file did not match the autodetected encoding.
Since your file contains a byte-order-mark (BOM) it should be in UTF-16 encoding. I suspect that your declaration says
<?xml version="1.0" encoding="UTF-8"?>
which is obviously incorrect when the file has been changed into UTF-16 by NotePad. The simple solution is to remove theencoding
and simply say<?xml version="1.0"?>
. You could also edit it to sayencoding="UTF-16"
but that would be wrong for the original file (which wasn't in UTF-16) or if the file somehow gets changed back to UTF-8 or some other encoding.Don't bother trying to remove the BOM -- that's not the cause of the problem. Using NotePad or WordPad to edit XML is the real problem!
Xml 声明是可选的,因此您的 xml 没有它也是格式良好的。但建议使用它,以便解析器不会做出错误的假设,特别是关于所使用的编码的假设。
Xml declaration is optional so your xml is well-formed without it. But it is recommended to use it so that wrong assumptions are not made by the parsers, specifically about the encoding used.
仅当您不使用
version
和encoding
的默认值(您在该示例中)时才需要它。It is only required if you aren't using the default values for
version
andencoding
(which you are in that example).