提高 XML 解析性能
我有一些带有大量 XPath 查询的 XML 文档。我只想要我设计的应用程序来更改 XML 文档:)。我有机制来检查文档的完整性并保护有人乱搞。
现在,使用 C# .Net 进行 XML 解析并没有提供我满意的性能。
我有的选择是。
- 使用新的解析器(甚至移植到 C++)。
- 写一个Schema。(可能需要很长时间)。它会提高解析性能吗?
- 尝试更多 .Net 类或库。 XMLReader 是我现在正在使用的,
有人可以告诉我,这会让我在花完时间后微笑更多。
更新了一些信息: 我只想解析和读取一些/所有属性。写作改变文档不是我想要的。 稍后可能需要 XSD 支持。 (不知道我的未来会怎样)
当前性能:14 秒内处理 1000 个文件中的 50 MB XML。
我猜打开、关闭文件也需要一些时间!!(包括在内)
我正在寻找 1/2 的时间。
I've few XML Documents with bunch of XPath queries. I want only the app I design to change the XML doc :). I 've mechanisms to check the integrity of the document and protect someone screwing around.
Now, XML Parsing with C# .Net doesn't give the performance I'm happy about.
The options I have are.
- Use a new parser (Even port to C++).
- Write a Schema.( It might take a long long time ). Does it improve parsing performance?
- Playing around with more .Net classes or libraries. XMLReader is what I'm using now
Can someone tell me which will make me smile more after the time I spend on.
Updated some Info:
I want to only parse and read some/all attributes. Writing are changing the doc is not what I want.
Might want XSD support later on. (Donno what future holds for me)
Current Performance: 50 MB of XML in 14 seconds in 1000 files.
Opening, closing of files also takes some time I guess !!(It is included)
I'm looking for 1/2 the time of this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您是否阅读过提高 XML 性能?
如果没有看到一些代码准确地显示了您正在做什么和计时,就很难评论什么是慢的,但我怀疑不是解析本身慢。
Have you read Improving XML Performance?
Without seeing some code which shows exactly what you are doing and timing it is hard to comment on what is slow, but I suspect that it is not the parsing itself which is slow.
当 Microsoft 已经定义了一个解析器时,不建议您创建一个新的解析器。使用架构(使用 XmlSerializer)极大地简化了编码过程。此外,使用 sgen(XML Serializer Generator)可以为指定程序集中的类型创建 XML 序列化程序集,以提高 XmlSerializer 在序列化或反序列化指定类型的对象时的启动性能.
此外,您可以使用 xsd.exe 工具从 xml 文件生成架构。可以修改生成的架构以满足您的口味。
更多信息
It is not recommended that you create a new parser, when Microsoft has already defined one. The use of a schema (with XmlSerializer) greatly simplifies the coding process. In addition, the use of sgen (XML Serializer Generator) allows to
creates an XML serialization assembly for types in a specified assembly in order to improve the startup performance of a XmlSerializer when it serializes or deserializes objects of the specified types
.In addition, you can use
xsd.exe
tool to generate the schema from an xml file. The generated schema can be modified to suit your taste.More information
首先,我想知道您是否正确使用了术语“XML 解析器”。我之所以这么问,是因为许多人错误地提到了他们在解析之后(例如使用 XSLT)对 XML 进行的处理,就好像它是解析的一部分一样。
其次,您从解析器获得什么性能,以及您需要什么性能?任何绩效改进的练习都应该从这两个数字开始,在我们知道它们有多大差异之前,提出任何建议都是没有意义的。
根据模式验证源文档通常会增加而不是减少解析时间。
First, I wonder if you're using the term "XML parser" correctly. I ask because many people refer incorrectly to the processing they do on the XML, after parsing (for example using XSLT) as if it were part of the parsing.
Secondly, what performance are you getting from the parser, and what performance do you require? Any exercise in performance improvement should start with these two numbers, and there is no point making any suggestions until we know how much they differ.
Validating your source document against a schema will generally increase parsing time rather than decreasing it.