ElementTree 命名空间不便
我无法控制所获得的 XML 的质量。在某些情况下是:
<COLLADA xmlns="http://www.collada.org/2005/11/COLLADASchema" version="1.4.1">
...
</COLLADA>
在其他情况下我得到:
<COLLADA>...</COLLADA>
我想我也应该处理
<collada:COLLADA xmlns:collada="http://www.collada.org/2005/11/COLLADASchema">
...
</collada:COLLADA>
它都是相同的模式,我只需要一个解析器来处理它。我该如何处理所有这些情况?我需要 XPath 和其他 lxml 好东西来解决这个问题。如何使其在 etree.parse 期间保持一致?我不想每次需要使用 XPath 时都检查名称空间。
I can't control quality of XML that I get. In some cases it is:
<COLLADA xmlns="http://www.collada.org/2005/11/COLLADASchema" version="1.4.1">
...
</COLLADA>
in others I get:
<COLLADA>...</COLLADA>
and I guess I should also handle
<collada:COLLADA xmlns:collada="http://www.collada.org/2005/11/COLLADASchema">
...
</collada:COLLADA>
It's the same schema all over, and I only need one parser to process it. How can I handle all these cases? I need XPath and other lxml goodies to get through this. How do I make it consistent during etree.parse time? I don't want to check on namespaces every time I need to use XPath.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我通常的建议是首先对其进行预处理,以标准化名称空间。这样做有两个好处:规范化代码具有高度可重用性,因为它不依赖于随后如何处理数据;并且处理数据的逻辑也大大简化。
如果文档只使用这一个命名空间,或者不使用,并且在文本或属性节点的内容中不使用限定名称,那么实现这种规范化的转换非常简单:
My usual recommendation is to preprocess it first, to normalize the namespaces. This has two benefits: the normalization code is highly reusable, because it doesn't depend on how the data is being processed subsequently; and the logic to process the data is considerably simplified.
If the documents only use this one namespace, or none, and do not use qualified names in the content of text or attribute nodes, then the transformation to achieve this normalization is very simple: