决定何时使用 XmlDocument 与 XmlReader

发布于 2024-08-06 13:38:03 字数 1370 浏览 10 评论 0原文

我正在优化自定义对象 -> XML 序列化实用程序,一切都已完成并正在运行,这不是问题。

它的工作原理是将文件加载到 XmlDocument 对象中,然后递归地遍历所有子节点。

我认为也许使用 XmlReader 而不是使用 XmlDocument 加载/解析整个内容会更快,所以我也实现了该版本。

算法完全相同,我使用包装类来抽象处理 XmlNodeXmlReader 的功能。例如,GetChildren 方法yield 返回子XmlNode 或子树XmlReader

因此,我编写了一个测试驱动程序来测试这两个版本,并使用了一个重要的数据集(一个包含大约 1,350 个元素的 900kb XML 文件)。

但是,使用 JetBrains dotTRACE,我发现 XmlReader 版本实际上比 XmlDocument 版本慢!当我迭代子节点时,XmlReader 读取调用似乎涉及一些重要的处理。

所以我说这么多是为了问这个:

XmlDocumentXmlReader 的优点/缺点是什么,以及在什么情况下应该使用它们?

我的猜测是,存在一个文件大小阈值,达到该阈值,XmlReader 的性能变得更加经济,并且占用的内存也更少。然而,该阈值似乎高于 1MB。

我每次都会调用 ReadSubTree 来处理子节点:

public override IEnumerable<IXmlSourceProvider> GetChildren ()
{
    XmlReader xr = myXmlSource.ReadSubtree ();
    // skip past the current element
    xr.Read ();

    while (xr.Read ())
    {
        if (xr.NodeType != XmlNodeType.Element) continue;
        yield return new XmlReaderXmlSourceProvider (xr);
    }
}

该测试适用于单个级别的许多对象(即宽和浅) - 但我想知道 XmlReader 效果如何XML 深度时的代码> 票价宽的?即我正在处理的 XML 很像一个数据对象模型,1 个父对象到许多子对象,等等: 1..M..M..M

我事先也不知道我正在解析的 XML 的结构,所以我无法对其进行优化。

I'm optimizing a custom object -> XML serialization utility, and it's all done and working and that's not the issue.

It worked by loading a file into an XmlDocument object, then recursively going through all the child nodes.

I figured that perhaps using XmlReader instead of having XmlDocument loading/parsing the entire thing would be faster, so I implemented that version as well.

The algorithms are exactly the same, I use a wrapper class to abstract the functionality of dealing with an XmlNode vs. an XmlReader. For instance, the GetChildren methods yield returns either a child XmlNode or a SubTree XmlReader.

So I wrote a test driver to test both versions, and using a non-trivial data set (a 900kb XML file with around 1,350 elements).

However, using JetBrains dotTRACE, I see that the XmlReader version is actually slower than the XmlDocument version! It seems that there is some significant processing involved in XmlReader read calls when I'm iterating over child nodes.

So I say all that to ask this:

What are the advantages/disadvantages of XmlDocument and XmlReader, and in what circumstances should you use either?

My guess is that there is a file size threshold at which XmlReader becomes more economical in performance, as well as less memory-intensive. However, that threshold seems to be above 1MB.

I'm calling ReadSubTree every time to process child nodes:

public override IEnumerable<IXmlSourceProvider> GetChildren ()
{
    XmlReader xr = myXmlSource.ReadSubtree ();
    // skip past the current element
    xr.Read ();

    while (xr.Read ())
    {
        if (xr.NodeType != XmlNodeType.Element) continue;
        yield return new XmlReaderXmlSourceProvider (xr);
    }
}

That test applies to a lot of objects at a single level (i.e. wide & shallow) - but I wonder how well XmlReader fares when the XML is deep & wide? I.e. the XML I'm dealing with is much like a data object model, 1 parent object to many child objects, etc: 1..M..M..M

I also don't know beforehand the structure of the XML I'm parsing, so I can't optimize for it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

给不了的爱 2024-08-13 13:38:03

我通常不是从最快的角度来看待它,而是从内存利用率的角度来看待它。对于我使用它们的使用场景(典型的企业集成)来说,所有实现都足够快。

然而,我失败的地方(有时甚至是惊人的)是没有考虑到我正在使用的 XML 的一般大小。如果你事先考虑一下,你可以避免一些悲伤。

XML 在加载到内存中时往往会膨胀,至少对于像 XmlDocumentXPathDocument 这样的 DOM 读取器来说是这样。大概是10:1?确切的数量很难量化,但如果磁盘上有 1MB,那么内存中就会有 10MB,甚至更多。

使用任何将整个文档完整加载到内存中的读取器 (XmlDocument/XPathDocument) 的进程可能会遭受大对象堆碎片的影响,这最终可能导致 OutOfMemoryException(即使有可用内存)会导致服务/进程不可用。

由于大小超过 85K 的对象最终会出现在大型对象堆上,并且您使用 DOM 读取器获得了 10:1 的大小爆炸,因此您可以看到,在您的 XML 文档被加载之前,不需要花费太多时间。从大对象堆中分配。

XmlDocument 非常简单使用。它唯一真正的缺点是它将整个 XML 文档加载到内存中进行处理。它使用起来非常简单。

XmlReader 是一个流基于读取器的因此将使您的进程内存利用率总体上保持平坦,但更难以使用。

XPathDocument 倾向于成为 XmlDocument 的更快、只读版本,但仍然遭受内存“膨胀”的困扰。

I've generally looked at it not from a fastest perspective, but rather from a memory utilization perspective. All of the implementations have been fast enough for the usage scenarios I've used them in (typical enterprise integration).

However, where I've fallen down, and sometimes spectacularly, is not taking into account the general size of the XML I'm working with. If you think about it up front you can save yourself some grief.

XML tends to bloat when loaded into memory, at least with a DOM reader like XmlDocument or XPathDocument. Something like 10:1? The exact amount is hard to quantify, but if it's 1MB on disk it will be 10MB in memory, or more, for example.

A process using any reader that loads the whole document into memory in its entirety (XmlDocument/XPathDocument) can suffer from large object heap fragmentation, which can ultimately lead to OutOfMemoryExceptions (even with available memory) resulting in an unavailable service/process.

Since objects that are greater than 85K in size end up on the large object heap, and you've got a 10:1 size explosion with a DOM reader, you can see it doesn't take much before your XML documents are being allocated from the large object heap.

XmlDocument is very easy to use. Its only real drawback is that it loads the whole XML document into memory to process. Its seductively simple to use.

XmlReader is a stream based reader so will keep your process memory utilization generally flatter but is more difficult to use.

XPathDocument tends to be a faster, read-only version of XmlDocument, but still suffers from memory 'bloat'.

平生欢 2024-08-13 13:38:03

XmlDocument 是整个 XML 文档的内存中表示。因此,如果您的文档很大,那么它会比使用 XmlReader 读取它消耗更多的内存。

这是假设当您使用 XmlReader 时,您会逐一读取和处理元素,然后将其丢弃。如果您使用 XmlReader 并在内存中构造另一个中间结构,那么您会遇到同样的问题,并且您违背了它的目的。

Google 搜索“SAX 与 DOM”,详细了解 SAX 与 DOM 之间的区别处理 XML 的两种模型。

XmlDocument is an in-memory representation of the entire XML document. Therefore if your document is large, then it will consume much more memory than if you had read it using XmlReader.

This is assuming that when you use XmlReader you read and process the elements one-by-one then discard it. If you use XmlReader and construct another intermediary structure in memory then you have the same problem, and you're defeating the purpose of it.

Google for "SAX versus DOM" to read more about the difference between the two models of processing XML.

英雄似剑 2024-08-13 13:38:03

另一个考虑因素是 XMLReader 对于处理格式不完美的 XML 可能更加健壮。我最近创建了一个使用 XML 流的客户端,但该流在某些元素中包含的 URI 中没有正确转义特殊字符。 XMLDocument 和 XPathDocument 根本拒绝加载 XML,而使用 XMLReader 我能够从流中提取所需的信息。

Another consideration is that XMLReader might be more robust for handling less-than-perfectly-formed XML. I recently created a client which consumed an XML stream, but the stream didn't have the special characters escaped correctly in URIs contained in some of the elements. XMLDocument and XPathDocument refused to load the XML at all, whereas using XMLReader I was able to extract the information I needed from the stream.

绝不服输 2024-08-13 13:38:03

存在一个大小阈值,达到该阈值 XmlDocument 就会变慢,并最终无法使用。但阈值的实际值将取决于您的应用程序和 XML 内容,因此没有硬性规定。

如果您的 XML 文件可以包含大型列表(例如数万个元素),那么您绝对应该使用 XmlReader。

There is a size threshold at which XmlDocument becomes slower, and eventually unusable. But the actual value of the threshold will depend on your application and XML content, so there are no hard and fast rules.

If your XML file can contain large lists (say tens of thousands of elements), you should definitely be using XmlReader.

萌能量女王 2024-08-13 13:38:03

编码差异是因为两种不同的测量被混合。 UTF-32 每个字符需要 4 个字节,本质上比单字节数据慢。

如果您查看大型 (100K) 元素测试,您会发现每种情况的时间都会增加约 70mS,无论使用何种加载方法。

这是一个(几乎)恒定的差异,具体是由每个字符的开销引起的,

The encoding difference is because two different measurements are being mixed. UTF-32 requires 4 bytes per character, and is inherently slower than single byte data.

If you look at the large (100K) element test, you see that the time increasesw by about 70mS for each case regardless of the loading method used.

This is a (nearly) constant difference caused specifically by the per character overhead,

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文