决定何时使用 XmlDocument 与 XmlReader
我正在优化自定义对象 -> XML 序列化实用程序,一切都已完成并正在运行,这不是问题。
它的工作原理是将文件加载到 XmlDocument 对象中,然后递归地遍历所有子节点。
我认为也许使用 XmlReader
而不是使用 XmlDocument
加载/解析整个内容会更快,所以我也实现了该版本。
算法完全相同,我使用包装类来抽象处理 XmlNode
与 XmlReader
的功能。例如,GetChildren
方法yield 返回子XmlNode
或子树XmlReader
。
因此,我编写了一个测试驱动程序来测试这两个版本,并使用了一个重要的数据集(一个包含大约 1,350 个元素的 900kb XML 文件)。
但是,使用 JetBrains dotTRACE,我发现 XmlReader
版本实际上比 XmlDocument
版本慢!当我迭代子节点时,XmlReader
读取调用似乎涉及一些重要的处理。
所以我说这么多是为了问这个:
XmlDocument
和 XmlReader
的优点/缺点是什么,以及在什么情况下应该使用它们?
我的猜测是,存在一个文件大小阈值,达到该阈值,XmlReader
的性能变得更加经济,并且占用的内存也更少。然而,该阈值似乎高于 1MB。
我每次都会调用 ReadSubTree 来处理子节点:
public override IEnumerable<IXmlSourceProvider> GetChildren ()
{
XmlReader xr = myXmlSource.ReadSubtree ();
// skip past the current element
xr.Read ();
while (xr.Read ())
{
if (xr.NodeType != XmlNodeType.Element) continue;
yield return new XmlReaderXmlSourceProvider (xr);
}
}
该测试适用于单个级别的许多对象(即宽和浅) - 但我想知道 XmlReader 效果如何XML 深度时的代码> 票价宽的?即我正在处理的 XML 很像一个数据对象模型,1 个父对象到许多子对象,等等: 1..M..M..M
我事先也不知道我正在解析的 XML 的结构,所以我无法对其进行优化。
I'm optimizing a custom object -> XML serialization utility, and it's all done and working and that's not the issue.
It worked by loading a file into an XmlDocument
object, then recursively going through all the child nodes.
I figured that perhaps using XmlReader
instead of having XmlDocument
loading/parsing the entire thing would be faster, so I implemented that version as well.
The algorithms are exactly the same, I use a wrapper class to abstract the functionality of dealing with an XmlNode
vs. an XmlReader
. For instance, the GetChildren
methods yield returns either a child XmlNode
or a SubTree XmlReader
.
So I wrote a test driver to test both versions, and using a non-trivial data set (a 900kb XML file with around 1,350 elements).
However, using JetBrains dotTRACE, I see that the XmlReader
version is actually slower than the XmlDocument
version! It seems that there is some significant processing involved in XmlReader
read calls when I'm iterating over child nodes.
So I say all that to ask this:
What are the advantages/disadvantages of XmlDocument
and XmlReader
, and in what circumstances should you use either?
My guess is that there is a file size threshold at which XmlReader
becomes more economical in performance, as well as less memory-intensive. However, that threshold seems to be above 1MB.
I'm calling ReadSubTree
every time to process child nodes:
public override IEnumerable<IXmlSourceProvider> GetChildren ()
{
XmlReader xr = myXmlSource.ReadSubtree ();
// skip past the current element
xr.Read ();
while (xr.Read ())
{
if (xr.NodeType != XmlNodeType.Element) continue;
yield return new XmlReaderXmlSourceProvider (xr);
}
}
That test applies to a lot of objects at a single level (i.e. wide & shallow) - but I wonder how well XmlReader
fares when the XML is deep & wide? I.e. the XML I'm dealing with is much like a data object model, 1 parent object to many child objects, etc: 1..M..M..M
I also don't know beforehand the structure of the XML I'm parsing, so I can't optimize for it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我通常不是从最快的角度来看待它,而是从内存利用率的角度来看待它。对于我使用它们的使用场景(典型的企业集成)来说,所有实现都足够快。
然而,我失败的地方(有时甚至是惊人的)是没有考虑到我正在使用的 XML 的一般大小。如果你事先考虑一下,你可以避免一些悲伤。
XML 在加载到内存中时往往会膨胀,至少对于像
XmlDocument
或XPathDocument
这样的 DOM 读取器来说是这样。大概是10:1?确切的数量很难量化,但如果磁盘上有 1MB,那么内存中就会有 10MB,甚至更多。使用任何将整个文档完整加载到内存中的读取器 (
XmlDocument
/XPathDocument
) 的进程可能会遭受大对象堆碎片的影响,这最终可能导致OutOfMemoryException(即使有可用内存)会导致服务/进程不可用。
XmlDocument
非常简单使用。它唯一真正的缺点是它将整个 XML 文档加载到内存中进行处理。它使用起来非常简单。XmlReader
是一个流基于读取器的因此将使您的进程内存利用率总体上保持平坦,但更难以使用。XPathDocument
倾向于成为 XmlDocument 的更快、只读版本,但仍然遭受内存“膨胀”的困扰。I've generally looked at it not from a fastest perspective, but rather from a memory utilization perspective. All of the implementations have been fast enough for the usage scenarios I've used them in (typical enterprise integration).
However, where I've fallen down, and sometimes spectacularly, is not taking into account the general size of the XML I'm working with. If you think about it up front you can save yourself some grief.
XML tends to bloat when loaded into memory, at least with a DOM reader like
XmlDocument
orXPathDocument
. Something like 10:1? The exact amount is hard to quantify, but if it's 1MB on disk it will be 10MB in memory, or more, for example.A process using any reader that loads the whole document into memory in its entirety (
XmlDocument
/XPathDocument
) can suffer from large object heap fragmentation, which can ultimately lead toOutOfMemoryException
s (even with available memory) resulting in an unavailable service/process.XmlDocument
is very easy to use. Its only real drawback is that it loads the whole XML document into memory to process. Its seductively simple to use.XmlReader
is a stream based reader so will keep your process memory utilization generally flatter but is more difficult to use.XPathDocument
tends to be a faster, read-only version of XmlDocument, but still suffers from memory 'bloat'.XmlDocument 是整个 XML 文档的内存中表示。因此,如果您的文档很大,那么它会比使用 XmlReader 读取它消耗更多的内存。
这是假设当您使用 XmlReader 时,您会逐一读取和处理元素,然后将其丢弃。如果您使用 XmlReader 并在内存中构造另一个中间结构,那么您会遇到同样的问题,并且您违背了它的目的。
Google 搜索“SAX 与 DOM”,详细了解 SAX 与 DOM 之间的区别处理 XML 的两种模型。
XmlDocument is an in-memory representation of the entire XML document. Therefore if your document is large, then it will consume much more memory than if you had read it using XmlReader.
This is assuming that when you use XmlReader you read and process the elements one-by-one then discard it. If you use XmlReader and construct another intermediary structure in memory then you have the same problem, and you're defeating the purpose of it.
Google for "SAX versus DOM" to read more about the difference between the two models of processing XML.
另一个考虑因素是 XMLReader 对于处理格式不完美的 XML 可能更加健壮。我最近创建了一个使用 XML 流的客户端,但该流在某些元素中包含的 URI 中没有正确转义特殊字符。 XMLDocument 和 XPathDocument 根本拒绝加载 XML,而使用 XMLReader 我能够从流中提取所需的信息。
Another consideration is that XMLReader might be more robust for handling less-than-perfectly-formed XML. I recently created a client which consumed an XML stream, but the stream didn't have the special characters escaped correctly in URIs contained in some of the elements. XMLDocument and XPathDocument refused to load the XML at all, whereas using XMLReader I was able to extract the information I needed from the stream.
存在一个大小阈值,达到该阈值 XmlDocument 就会变慢,并最终无法使用。但阈值的实际值将取决于您的应用程序和 XML 内容,因此没有硬性规定。
如果您的 XML 文件可以包含大型列表(例如数万个元素),那么您绝对应该使用 XmlReader。
There is a size threshold at which XmlDocument becomes slower, and eventually unusable. But the actual value of the threshold will depend on your application and XML content, so there are no hard and fast rules.
If your XML file can contain large lists (say tens of thousands of elements), you should definitely be using XmlReader.
编码差异是因为两种不同的测量被混合。 UTF-32 每个字符需要 4 个字节,本质上比单字节数据慢。
如果您查看大型 (100K) 元素测试,您会发现每种情况的时间都会增加约 70mS,无论使用何种加载方法。
这是一个(几乎)恒定的差异,具体是由每个字符的开销引起的,
The encoding difference is because two different measurements are being mixed. UTF-32 requires 4 bytes per character, and is inherently slower than single byte data.
If you look at the large (100K) element test, you see that the time increasesw by about 70mS for each case regardless of the loading method used.
This is a (nearly) constant difference caused specifically by the per character overhead,