摆脱 lxml 中的编码

发布于 2024-09-02 13:30:15 字数 684 浏览 5 评论 0原文

我正在尝试使用 lxml 和 Python 打印 XML 文件。

代码如下：

>>> from lxml import etree
>>> root = etree.Element('root')
>>> child = etree.SubElement(root, 'child')
>>> print etree.tostring(root, pretty_print = True, xml_declaration = True, encoding = None)

输出：

<?xml version='1.0' encoding='ASCII'?>
<root>
  <child/>
</root>

如您所见，我已声明 encoding = None，但最终输出中仍然显示 encoding = 'ASCII'。我想这是预料之中的。如果我不输入 encoding 标签，它仍然显示 ASCII。

有什么方法可以只获取 XML 版本标记而不是编码部分吗？我希望输出是这样的：

<?xml version='1.0'>

原文

I am trying to print a XML file using lxml and Python.

Here is the code:

>>> from lxml import etree
>>> root = etree.Element('root')
>>> child = etree.SubElement(root, 'child')
>>> print etree.tostring(root, pretty_print = True, xml_declaration = True, encoding = None)

Output:

<?xml version='1.0' encoding='ASCII'?>
<root>
  <child/>
</root>

As you can see, I have declared encoding = None, however it still shows encoding = 'ASCII' in the final output. Which I guess is expected. If I don't put in the encoding tag, it still shows ASCII.

Is there any way I can just get the XML version tag and not the encoding part? I want the output to be like this:

<?xml version='1.0'>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

糖果控 2024-09-09 13:30:15

lxml.etree 输出什么并不重要，只要它是有效的 XML 即可。如果您确实愿意，可以将字符串粘合在一起：

'<?xml version="1.0"?>\n' + etree.tostring(root, pretty_print = True, encoding = 'ASCII')

不清楚为什么要删除它，因为最终 XML 需要知道它所在的字符集才能理解任何内容。 XML 1.0 规范包含一种猜测字符集的方法，并且似乎鼓励使用编码声明：

在缺乏[外部信息]的情况下，对于既不以字节顺序标记也不以编码声明开头的实体使用UTF-8以外的编码，这是一个致命错误。
...
除非编码是由更高级别的协议确定的，否则如果 XML 实体不包含编码声明并且其内容不是合法的 UTF-8 或 UTF-16，这也是一个致命错误。

It shouldn't matter what lxml.etree outputs as long as it's valid XML. If you really want to, you can glue strings together:

'<?xml version="1.0"?>\n' + etree.tostring(root, pretty_print = True, encoding = 'ASCII')

It's unclear why you want to remove it, since ultimately XML needs to know what charset it's in to make sense of anything. The XML 1.0 spec includes a method of guessing charsets, and seems to encourage the use of encoding declarations:

In the absence of [external information], it is a fatal error ... for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8.
...
Unless an encoding is determined by a higher-level protocol, it is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16.

回复收藏 0 原文

~没有更多了~