当前位置：文江博客话题详情

C# xml-serialization xmldocument xmlreader

决定何时使用 XmlDocument 与 XmlReader

发布于 2024-08-06 13:38:03 字数 1370 浏览 10 评论 0原文

我正在优化自定义对象 -> XML 序列化实用程序，一切都已完成并正在运行，这不是问题。

它的工作原理是将文件加载到 XmlDocument 对象中，然后递归地遍历所有子节点。

我认为也许使用 XmlReader 而不是使用 XmlDocument 加载/解析整个内容会更快，所以我也实现了该版本。

算法完全相同，我使用包装类来抽象处理 XmlNode 与 XmlReader 的功能。例如，GetChildren 方法yield 返回子XmlNode 或子树XmlReader。

因此，我编写了一个测试驱动程序来测试这两个版本，并使用了一个重要的数据集（一个包含大约 1,350 个元素的 900kb XML 文件）。

但是，使用 JetBrains dotTRACE，我发现 XmlReader 版本实际上比 XmlDocument 版本慢！当我迭代子节点时，XmlReader 读取调用似乎涉及一些重要的处理。

所以我说这么多是为了问这个：

XmlDocument 和 XmlReader 的优点/缺点是什么，以及在什么情况下应该使用它们？

我的猜测是，存在一个文件大小阈值，达到该阈值，XmlReader 的性能变得更加经济，并且占用的内存也更少。然而，该阈值似乎高于 1MB。

我每次都会调用 ReadSubTree 来处理子节点：

public override IEnumerable<IXmlSourceProvider> GetChildren ()
{
    XmlReader xr = myXmlSource.ReadSubtree ();
    // skip past the current element
    xr.Read ();

    while (xr.Read ())
    {
        if (xr.NodeType != XmlNodeType.Element) continue;
        yield return new XmlReaderXmlSourceProvider (xr);
    }
}

该测试适用于单个级别的许多对象（即宽和浅） - 但我想知道 XmlReader 效果如何XML 深度时的代码> 票价宽的？即我正在处理的 XML 很像一个数据对象模型，1 个父对象到许多子对象，等等： 1..M..M..M

我事先也不知道我正在解析的 XML 的结构，所以我无法对其进行优化。

原文

I'm optimizing a custom object -> XML serialization utility, and it's all done and working and that's not the issue.

It worked by loading a file into an XmlDocument object, then recursively going through all the child nodes.

I figured that perhaps using XmlReader instead of having XmlDocument loading/parsing the entire thing would be faster, so I implemented that version as well.

The algorithms are exactly the same, I use a wrapper class to abstract the functionality of dealing with an XmlNode vs. an XmlReader. For instance, the GetChildren methods yield returns either a child XmlNode or a SubTree XmlReader.

So I wrote a test driver to test both versions, and using a non-trivial data set (a 900kb XML file with around 1,350 elements).

However, using JetBrains dotTRACE, I see that the XmlReader version is actually slower than the XmlDocument version! It seems that there is some significant processing involved in XmlReader read calls when I'm iterating over child nodes.

So I say all that to ask this:

What are the advantages/disadvantages of XmlDocument and XmlReader, and in what circumstances should you use either?

My guess is that there is a file size threshold at which XmlReader becomes more economical in performance, as well as less memory-intensive. However, that threshold seems to be above 1MB.

I'm calling ReadSubTree every time to process child nodes:

public override IEnumerable<IXmlSourceProvider> GetChildren ()
{
    XmlReader xr = myXmlSource.ReadSubtree ();
    // skip past the current element
    xr.Read ();

    while (xr.Read ())
    {
        if (xr.NodeType != XmlNodeType.Element) continue;
        yield return new XmlReaderXmlSourceProvider (xr);
    }
}

That test applies to a lot of objects at a single level (i.e. wide & shallow) - but I wonder how well XmlReader fares when the XML is deep & wide? I.e. the XML I'm dealing with is much like a data object model, 1 parent object to many child objects, etc: 1..M..M..M

I also don't know beforehand the structure of the XML I'm parsing, so I can't optimize for it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

给不了的爱 2024-08-13 13:38:03

我通常不是从最快的角度来看待它，而是从内存利用率的角度来看待它。对于我使用它们的使用场景（典型的企业集成）来说，所有实现都足够快。

然而，我失败的地方（有时甚至是惊人的）是没有考虑到我正在使用的 XML 的一般大小。如果你事先考虑一下，你可以避免一些悲伤。

XML 在加载到内存中时往往会膨胀，至少对于像 XmlDocument 或 XPathDocument 这样的 DOM 读取器来说是这样。大概是10：1？确切的数量很难量化，但如果磁盘上有 1MB，那么内存中就会有 10MB，甚至更多。

使用任何将整个文档完整加载到内存中的读取器 (XmlDocument/XPathDocument) 的进程可能会遭受大对象堆碎片的影响，这最终可能导致 OutOfMemoryException（即使有可用内存）会导致服务/进程不可用。

由于大小超过 85K 的对象最终会出现在大型对象堆上，并且您使用 DOM 读取器获得了 10:1 的大小爆炸，因此您可以看到，在您的 XML 文档被加载之前，不需要花费太多时间。从大对象堆中分配。

XmlDocument 非常简单使用。它唯一真正的缺点是它将整个 XML 文档加载到内存中进行处理。它使用起来非常简单。

XmlReader 是一个流基于读取器的因此将使您的进程内存利用率总体上保持平坦，但更难以使用。

XPathDocument 倾向于成为 XmlDocument 的更快、只读版本，但仍然遭受内存“膨胀”的困扰。

回复收藏 0 原文

平生欢 2024-08-13 13:38:03

XmlDocument 是整个 XML 文档的内存中表示。因此，如果您的文档很大，那么它会比使用 XmlReader 读取它消耗更多的内存。

这是假设当您使用 XmlReader 时，您会逐一读取和处理元素，然后将其丢弃。如果您使用 XmlReader 并在内存中构造另一个中间结构，那么您会遇到同样的问题，并且您违背了它的目的。

Google 搜索“SAX 与 DOM”，详细了解 SAX 与 DOM 之间的区别处理 XML 的两种模型。

回复收藏 0 原文

英雄似剑 2024-08-13 13:38:03

另一个考虑因素是 XMLReader 对于处理格式不完美的 XML 可能更加健壮。我最近创建了一个使用 XML 流的客户端，但该流在某些元素中包含的 URI 中没有正确转义特殊字符。 XMLDocument 和 XPathDocument 根本拒绝加载 XML，而使用 XMLReader 我能够从流中提取所需的信息。

回复收藏 0 原文