在 Python 中漂亮地打印 XML
在 Python 中漂亮打印 XML 的最佳方法是什么(或者有多种方法)?
What is the best way (or are the various ways) to pretty print XML in Python?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
在 Python 中漂亮打印 XML 的最佳方法是什么(或者有多种方法)?
What is the best way (or are the various ways) to pretty print XML in Python?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(28)
它不会在文本节点内添加空格或换行符,除非您要求它:
您可以指定缩进单位应该是什么以及换行符应该是什么样子。
该文档位于 http://www.yattag.org 主页上。
It won't add spaces or newlines inside text nodes, unless you ask for it with:
You can specify what the indentation unit should be and what the newline should look like.
The doc is on http://www.yattag.org homepage.
我编写了一个解决方案来遍历现有的 ElementTree 并使用 text/tail 按照人们通常的预期缩进它。
I wrote a solution to walk through an existing ElementTree and use text/tail to indent it as one typically expects.
这是一个 Python3 解决方案,它消除了丑陋的换行问题(大量空格),并且与大多数其他实现不同,它只使用标准库。
我在此处找到了如何解决常见的换行问题。
Here's a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.
I found how to fix the common newline issue here.
您可以使用流行的外部库 xmltodict,以及
unparse
和pretty= True
您将获得最佳结果:full_document=False
与在顶部。
You can use popular external library xmltodict, with
unparse
andpretty=True
you will get best result:full_document=False
against<?xml version="1.0" encoding="UTF-8"?>
at the top.Python 的 XML 漂亮打印 看起来非常适合这项任务。 (命名也适当。)
另一种方法是使用 pyXML,它有一个 PrettyPrint 函数。
XML pretty print for python looks pretty good for this task. (Appropriately named, too.)
An alternative is to use pyXML, which has a PrettyPrint function.
我在寻找“如何漂亮地打印 html” 时发现了这个问题。
使用此线程中的一些想法,我调整了 XML 解决方案以适用于 XML 或 HTML:
使用时,它是这样的:
返回:
I found this question while looking for "how to pretty print html"
Using some of the ideas in this thread I adapted the XML solutions to work for XML or HTML:
When used this is what it looks like:
Which returns:
您可以尝试这种变体...
安装
BeautifulSoup
和后端lxml
(解析器)库:处理您的 XML 文档:
You can try this variation...
Install
BeautifulSoup
and the backendlxml
(parser) libraries:Process your XML document:
看一下 vkbeautify 模块。
它是我非常流行的同名 javascript/nodejs 插件的 python 版本。 它可以漂亮地打印/缩小 XML、JSON 和 CSS 文本。 输入和输出可以是任意组合的字符串/文件。 它非常紧凑并且没有任何依赖性。
示例:
Take a look at the vkbeautify module.
It is a python version of my very popular javascript/nodejs plugin with the same name. It can pretty-print/minify XML, JSON and CSS text. Input and output can be string/file in any combinations. It is very compact and doesn't have any dependency.
Examples:
如果您不想重新解析,则可以使用 xmlpp.py 库 以及 <代码>get_pprint()函数。 对于我的用例来说,它运行良好且顺利,无需重新解析为 lxml ElementTree 对象。
An alternative if you don't want to have to reparse, there is the xmlpp.py library with the
get_pprint()
function. It worked nice and smoothly for my use cases, without having to reparse to an lxml ElementTree object.我找到了一种快速、简单的方法来很好地格式化和打印 xml 文件:
输出:
I found a fast and easy way to nicely format and print an xml file:
Outuput:
我遇到了这个问题,并像这样解决了它:
在我的代码中,这个方法是这样调用的:
这只有效,因为 etree 默认情况下使用
两个空格
来缩进,我发现它并没有非常强调缩进,因此不漂亮。 我找不到 etree 的任何设置或任何函数的参数来更改标准 etree 缩进。 我喜欢使用 etree 的简单性,但这确实让我很烦。I had this problem and solved it like this:
In my code this method is called like this:
This works only because etree by default uses
two spaces
to indent, which I don't find very much emphasizing the indentation and therefore not pretty. I couldn't ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.对于带有中文的 xml 来说效果很好!
It's working well for the xml with Chinese!
如果由于某种原因您无法使用其他用户提到的任何 Python 模块,我建议使用以下针对 Python 2.7 的解决方案:
据我所知,此解决方案适用于具有 < code>xmllint 软件包已安装。
If for some reason you can't get your hands on any of the Python modules that other users mentioned, I suggest the following solution for Python 2.7:
As far as I know, this solution will work on Unix-based systems that have the
xmllint
package installed.用于将整个 xml 文档转换为漂亮的 xml 文档
(例如:假设您已提取[解压缩]一个 LibreOffice Writer .odt 或 .ods 文件,并且您希望将丑陋的“content.xml”文件转换为漂亮的文件,以用于自动 git 版本控制以及 .odt/.ods 文件的
git difftool
ing,例如我正在实现 这里)参考资料:
For converting an entire xml document to a pretty xml document
(ex: assuming you've extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly "content.xml" file to a pretty one for automated git version control and
git difftool
ing of .odt/.ods files, such as I'm implementing here)References:
使用
etree.indent
和etree.tostring
输出
删除命名空间和前缀
输出
Use
etree.indent
andetree.tostring
output
Removing namespaces and prefixes
output
使用内置的 xml 库将给定的 XML 文件漂亮地打印到标准输出:
与 Python 2.7.18 配合得很好。
Use the built-in xml library to pretty-print a given XML file to stdout:
Works sufficiently well with Python 2.7.18.
我用几行代码解决了这个问题,打开文件,遍历它并添加缩进,然后再次保存。 我正在处理小型 xml 文件,并且不想添加依赖项或为用户安装更多库。 不管怎样,这就是我最终得到的结果:
它对我有用,也许有人会用它:)
I solved this with some lines of code, opening the file, going trough it and adding indentation, then saving it again. I was working with small xml files, and did not want to add dependencies, or more libraries to install for the user. Anyway, here is what I ended up with:
It works for me, perhaps someone will have some use of it :)
lxml 是最近更新的,并且包含一个漂亮的打印功能
查看 lxml 教程:
https://lxml.de/tutorial.html
lxml is recent, updated, and includes a pretty print function
Check out the lxml tutorial:
https://lxml.de/tutorial.html
另一个解决方案是借用这个
缩进
函数,与 Python 2.5 以来内置的 ElementTree 库一起使用。看起来是这样的:
Another solution is to borrow this
indent
function, for use with the ElementTree library that's built in to Python since 2.5.Here's what that would look like:
你有几个选择。
如果您使用的是 Python 3.9+,最简单的选择是:
xml。 etree.ElementTree.indent()
包含电池并且输出漂亮。
示例代码:
BeautifulSoup.prettify()
BeautifulSoup 可能是 Python 最简单的解决方案 3.9.
输出:
这是我的首选答案。 默认参数按原样工作。 但文本内容分散在单独的行上,就好像它们是嵌套元素一样。
lxml.etree.parse()
输出更漂亮,但带有参数。< /em>
产生:
这对我来说没有问题。
xml.dom.minidom.parse()
没有外部依赖项,但有后处理。
输出与上面相同,但代码更多。
You have a few options.
If you are using Python 3.9+, your simplest option is:
xml.etree.ElementTree.indent()
Batteries included and pretty output.
Sample code:
BeautifulSoup.prettify()
BeautifulSoup may be the simplest solution for Python < 3.9.
Output:
This is my goto answer. The default arguments work as is. But text contents are spread out on separate lines as if they were nested elements.
lxml.etree.parse()
Prettier output but with arguments.
Produces:
This works for me with no issues.
xml.dom.minidom.parse()
No external dependencies but post-processing.
The output is the same as above, but it's more code.
这是我的(hacky?)解决方案来解决丑陋的文本节点问题。
上面的代码将生成:
而不是这样:
免责声明: 可能存在一些限制。
Here's my (hacky?) solution to get around the ugly text node problem.
The above code will produce:
Instead of this:
Disclaimer: There are probably some limitations.
从 Python 3.9 开始,ElementTree 具有用于漂亮打印 XML 树的
indent()
函数。请参阅 https://docs.python。 org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.indent。
示例用法:
优点是它不需要任何额外的库。 有关更多信息,请查看 https://bugs.python.org/issue14465 和 https://github.com/python/cpython/pull/15200
As of Python 3.9, ElementTree has an
indent()
function for pretty-printing XML trees.See https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.indent.
Sample usage:
The upside is that it does not require any additional libraries. For more information check https://bugs.python.org/issue14465 and https://github.com/python/cpython/pull/15200
正如其他人指出的那样,lxml 内置了一个漂亮的打印机。
但请注意,默认情况下它将 CDATA 部分更改为普通文本,这可能会产生令人讨厌的结果。
这是一个保留输入文件并仅更改缩进的 Python 函数(请注意
strip_cdata=False
)。 此外,它确保输出使用 UTF-8 作为编码而不是默认的 ASCII(注意encoding='utf-8'
):示例用法:
As others pointed out, lxml has a pretty printer built in.
Be aware though that by default it changes CDATA sections to normal text, which can have nasty results.
Here's a Python function that preserves the input file and only changes the indentation (notice the
strip_cdata=False
). Furthermore it makes sure the output uses UTF-8 as encoding instead of the default ASCII (notice theencoding='utf-8'
):Example usage:
如果您有 xmllint,您可以生成一个子进程并使用它。
xmllint --format
将其输入 XML 漂亮地打印到标准输出。请注意,此方法使用 python 外部的程序,这使得它有点像 hack。
If you have
xmllint
you can spawn a subprocess and use it.xmllint --format <file>
pretty-prints its input XML to standard output.Note that this method uses an program external to python, which makes it sort of a hack.
我尝试编辑上面“ade”的答案,但在我最初匿名提供反馈后,Stack Overflow 不允许我编辑。 这是用于漂亮打印 ElementTree 的函数的错误较少的版本。
I tried to edit "ade"s answer above, but Stack Overflow wouldn't let me edit after I had initially provided feedback anonymously. This is a less buggy version of the function to pretty-print an ElementTree.
如果您使用 DOM 实现,则每个实现都有自己的内置漂亮打印形式:
如果您使用其他东西而没有自己的漂亮打印机 - 或者那些漂亮打印机并不完全按照您的方式进行操作想要 - 你可能必须编写或子类化你自己的序列化程序。
If you're using a DOM implementation, each has their own form of pretty-printing built-in:
If you're using something else without its own pretty-printer — or those pretty-printers don't quite do it the way you want — you'd probably have to write or subclass your own serialiser.
我对 minidom 的漂亮打印有一些问题。 每当我尝试用给定编码之外的字符漂亮地打印文档时,我都会收到 UnicodeError ,例如,如果文档中有 β 并且我尝试了
doc.toprettyxml(encoding='latin-1'). 这是我的解决方法:
I had some problems with minidom's pretty print. I'd get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried
doc.toprettyxml(encoding='latin-1')
. Here's my workaround for it: