有没有办法从 ElementTree 元素获取行号
因此,我使用 Python 3.2.1 的 cElementTree 解析一些 XML 文件,在解析过程中我注意到一些标签缺少属性信息。我想知道是否有任何简单的方法可以获取 xml 文件中这些元素的行号。
So I'm parsing some XML files using Python 3.2.1's cElementTree, and during the parsing I noticed that some of the tags were missing attribute information. I was wondering if there is any easy way of getting the line numbers of those Elements in the xml file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我花了一段时间才弄清楚如何使用 Python 3.x(这里使用 3.3.2)来做到这一点,所以我想总结一下:
Took a while for me to work out how to do this using Python 3.x (using 3.3.2 here) so thought I would summarize:
查看文档,我发现没有办法使用 cElementTree 来做到这一点。
不过,我很幸运地使用了 XML 实现的 lxml 版本。
使用 libxml2,它应该几乎是替代品。并且元素具有
sourceline
属性。 (以及获得许多其他 XML 功能)。唯一需要注意的是,我只在 python 2.x 中使用过它 - 不确定它如何/是否在 3.x 下工作 - 但可能值得一看。
附录:
他们在首页上说:
所以看起来 python 3.x 是可以的。
Looking at the docs, I see no way to do this with cElementTree.
However I've had luck with lxmls version of the XML implementation.
Its supposed to be almost a drop in replacement, using libxml2. And elements have a
sourceline
attribute. (As well as getting a lot of other XML features).Only caveat is that I've only used it in python 2.x - not sure how/if it works under 3.x - but might be worth a look.
Addendum:
from their front page they say :
So it looks like python 3.x is OK.
我通过子类化 ElementTree.XMLTreeBuilder 在 elementtree 中完成了此操作。然后,我可以访问 self._parser (Expat),它具有属性 _parser.CurrentLineNumber 和 _parser.CurrentColumnNumber。
http://docs.python.org/py3k /library/pyexpat.html?highlight=xml.parser#xmlparser-objects 包含有关这些属性的详细信息
在解析过程中,您可以打印出信息,或将这些值放入输出 XML 元素中 属性。
如果您的 XML 文件包含其他 XML 文件,则您必须执行一些我不记得且没有详细记录的操作来跟踪当前的 XML 文件。
I've done this in elementtree by subclassing ElementTree.XMLTreeBuilder. Then where I have access to the self._parser (Expat) it has properties _parser.CurrentLineNumber and _parser.CurrentColumnNumber.
http://docs.python.org/py3k/library/pyexpat.html?highlight=xml.parser#xmlparser-objects has details about these attributes
During parsing you could print out info, or put these values into the output XML element attributes.
If your XML file includes additional XML files, you have to do some stuff that I don't remember and was not well documented to keep track of the current XML file.
一种(黑客)方法是在解析之前将保存行号的虚拟属性插入到每个元素中。以下是我使用 minidom 执行此操作的方法:
python 报告XML 节点的行/列
这可以简单地调整为 cElementTree(或者实际上任何其他 python XML 解析器)。
One (hackish) way of doing this is by inserting a dummy-attribute holding the line number into each element, before parsing. Here's how I did this with minidom:
python reporting line/column of origin of XML node
This can be trivially adjusted to cElementTree (or in fact any other python XML parser).
另一种方法是在解析行时跟踪它们,并使用 ElementTree.iterparse 方法。下面的代码一次只向 XML 解析器返回一行,并且侦听器可以获得当前行号。它对专栏没有帮助,但考虑到 OG 问题是关于行号的,这是可行的。您还可以通过侦听“结束”事件并设置不同的属性等来设置结束行号。
Another way to do this is to keep track of the lines as they are parsed, and use the ElementTree.iterparse method. The below code only returns one line at a time to the XML parser, and the listener can get the current line number. It doesn't help with the column, but given the OG question is about the line number, this works. You could also set the ending line number by listening for the "end" event and setting a different attribute, etc.