摆脱 lxml 中的编码
我正在尝试使用 lxml 和 Python 打印 XML 文件。
代码如下:
>>> from lxml import etree
>>> root = etree.Element('root')
>>> child = etree.SubElement(root, 'child')
>>> print etree.tostring(root, pretty_print = True, xml_declaration = True, encoding = None)
输出:
<?xml version='1.0' encoding='ASCII'?>
<root>
<child/>
</root>
如您所见,我已声明 encoding = None
,但最终输出中仍然显示 encoding = 'ASCII'
。我想这是预料之中的。如果我不输入 encoding
标签,它仍然显示 ASCII。
有什么方法可以只获取 XML 版本标记而不是编码部分吗?我希望输出是这样的:
<?xml version='1.0'>
I am trying to print a XML file using lxml and Python.
Here is the code:
>>> from lxml import etree
>>> root = etree.Element('root')
>>> child = etree.SubElement(root, 'child')
>>> print etree.tostring(root, pretty_print = True, xml_declaration = True, encoding = None)
Output:
<?xml version='1.0' encoding='ASCII'?>
<root>
<child/>
</root>
As you can see, I have declared encoding = None
, however it still shows encoding = 'ASCII'
in the final output. Which I guess is expected. If I don't put in the encoding
tag, it still shows ASCII.
Is there any way I can just get the XML version tag and not the encoding part? I want the output to be like this:
<?xml version='1.0'>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
lxml.etree 输出什么并不重要,只要它是有效的 XML 即可。如果您确实愿意,可以将字符串粘合在一起:
不清楚为什么要删除它,因为最终 XML 需要知道它所在的字符集才能理解任何内容。 XML 1.0 规范 包含一种猜测字符集的方法,并且似乎鼓励使用编码声明:
It shouldn't matter what lxml.etree outputs as long as it's valid XML. If you really want to, you can glue strings together:
It's unclear why you want to remove it, since ultimately XML needs to know what charset it's in to make sense of anything. The XML 1.0 spec includes a method of guessing charsets, and seems to encourage the use of encoding declarations: