摆脱 lxml 中的编码

发布于 2024-09-02 13:30:15 字数 684 浏览 5 评论 0原文

我正在尝试使用 lxml 和 Python 打印 XML 文件。

代码如下:

>>> from lxml import etree
>>> root = etree.Element('root')
>>> child = etree.SubElement(root, 'child')
>>> print etree.tostring(root, pretty_print = True, xml_declaration = True, encoding = None)

输出:

<?xml version='1.0' encoding='ASCII'?>
<root>
  <child/>
</root>

如您所见,我已声明 encoding = None,但最终输出中仍然显示 encoding = 'ASCII'。我想这是预料之中的。如果我不输入 encoding 标签,它仍然显示 ASCII。

有什么方法可以只获取 XML 版本标记而不是编码部分吗?我希望输出是这样的:

<?xml version='1.0'>

I am trying to print a XML file using lxml and Python.

Here is the code:

>>> from lxml import etree
>>> root = etree.Element('root')
>>> child = etree.SubElement(root, 'child')
>>> print etree.tostring(root, pretty_print = True, xml_declaration = True, encoding = None)

Output:

<?xml version='1.0' encoding='ASCII'?>
<root>
  <child/>
</root>

As you can see, I have declared encoding = None, however it still shows encoding = 'ASCII' in the final output. Which I guess is expected. If I don't put in the encoding tag, it still shows ASCII.

Is there any way I can just get the XML version tag and not the encoding part? I want the output to be like this:

<?xml version='1.0'>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

糖果控 2024-09-09 13:30:15

lxml.etree 输出什么并不重要,只要它是有效的 XML 即可。如果您确实愿意,可以将字符串粘合在一起:

'<?xml version="1.0"?>\n' + etree.tostring(root, pretty_print = True, encoding = 'ASCII')

不清楚为什么要删除它,因为最终 XML 需要知道它所在的字符集才能理解任何内容。 XML 1.0 规范 包含一种猜测字符集的方法,并且似乎鼓励使用编码声明:

在缺乏[外部信息]的情况下,对于既不以字节顺序标记也不以编码声明开头的实体使用UTF-8以外的编码,这是一个致命错误。

...

除非编码是由更高级别的协议确定的,否则如果 XML 实体不包含编码声明并且其内容不是合法的 UTF-8 或 UTF-16,这也是一个致命错误。

It shouldn't matter what lxml.etree outputs as long as it's valid XML. If you really want to, you can glue strings together:

'<?xml version="1.0"?>\n' + etree.tostring(root, pretty_print = True, encoding = 'ASCII')

It's unclear why you want to remove it, since ultimately XML needs to know what charset it's in to make sense of anything. The XML 1.0 spec includes a method of guessing charsets, and seems to encourage the use of encoding declarations:

In the absence of [external information], it is a fatal error ... for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8.

...

Unless an encoding is determined by a higher-level protocol, it is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文