XML 解析:元素树 (etree) 与 minidom

发布于 2024-12-14 01:13:57 字数 202 浏览 4 评论 0原文

我多年来一直使用 minidom 来解析 XML。现在我突然了解了元素树。我的问题哪个解析更好?即:

  • 哪个更快?
  • 哪个使用更少的内存?
  • 是否有我应该担心的 O(n^2) 依赖性?
  • 一件物品是否因另一件物品而贬值?

为什么我们有两个接口?

谢谢。

I've been using minidom to parse XML for years. Now I've suddenly learned about Element Tree. My question which is better for parsing? That is:

  • Which is faster?
  • Which uses less memory?
  • Do either have any O(n^2) dependencies I should worry about?
  • Is one being depreciated in favor of another?

Why do we have two interfaces?

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

诗化ㄋ丶相逢 2024-12-21 01:13:57

用于 XML 解析的 DOM 和 Sax 接口是使用 XML 的经典方法。 Python 必须提供这些接口,因为它们是众所周知的且标准的。

ElementTree 包旨在提供更具 Python 风格的界面。这一切都是为了让程序员的事情变得更容易。

根据您的构建,它们中的每一个都有一个底层 C 实现,使它们运行得更快。

上述工具均未被弃用。它们各有优点(例如,Sax 不需要将整个输入读取到内存中)。

还有一个名为 lxml 的第三方模块,它也是一个流行的选择(功能齐全且速度快)。

DOM and Sax interfaces for XML parsing are the classic ways to work with XML. Python had to provide those interfaces because they are well-known and standard.

The ElementTree package was intended to provide a more Pythonic interface. It is all about making things easier for the programmer.

Depending on your build, each of those has an underlying C implementation that makes them run fast.

None of the above tools is being deprecated. They each have their merits (Sax doesn't need to read the whole input into memory, for example).

There is also third-party module called lxml which is also a popular choice (full featured and fast).

安人多梦 2024-12-21 01:13:57

Python 有两个接口,可能是因为 Element Tree 在 minidom 出现后很久才被集成到标准库中。其原因可能是与 W3C 控制的 DOM 相比,它的 API 更加“Pythonic”。

如果您关心速度,还有 lxml,它使用 libxml2 构建与 ElementTree 兼容的 DOM,并且应该相当快– 他们有一个基准套件,将自己与 ElementTree 的 Python 和 C 实现进行比较。

如果您担心内存使用,无论如何您都不应该使用树 API; PullDOM 可能是一个更好的选择,但我是根据使用 Java 优秀的 pull 解析器的经验进行推断的 - 目前似乎没有太多关于 PullDOM 的信息。

Python has two interfaces probably because Element Tree was integrated into the standard library a good deal later after minidom came to be. The reason for this was likely its far more "Pythonic" API compared to the W3C-controlled DOM.

If you're concerned about speed, there's also lxml, which builds an ElementTree-compatible DOM using libxml2 and should be quite fast – they have a benchmark suite comparing themselves to ElementTree's Python and C implementations available.

If you're concerned about memory use, you shouldn't be using a tree API anyway; PullDOM might be a better choice, but I'm extrapolating from experience using Java's excellent pull parser – there doesn't seem to be much current information on PullDOM.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文