检查流中的 XML
如果我有一个大型 XML 文档,我不想将其完全加载到内存中,并且有一些可配置值(例如 XPath 语句或标识 xml 中元素路径的其他格式),是否可以从一个接一个节点的流,直到找到我正在寻找的位置?
我们需要构建一些工具,以便在不了解架构的情况下从 xml 中提取值。我们拥有的只是 xml 文档和 xpath 语句。我们可能会修改为使用 xpath 以外的其他内容,但我们确实希望避免加载整个文档,因为我们需要实时处理,并且 xml 可能相当大,并且容量可能会很高。
If I have a large XML document, which I don't want to load entirely into memory, and some configurable value like an XPath statement or othe format that identifies a path to an element in the xml, is it possible to read the xml from a stream node by node until I find the location I am looking for?
We need to build facilities to pull out a value from xml without knowing the schema. All we have is the xml document and an xpath statement. We could probably revise to use something other than xpath, but we really want to avoid loading up the whole document because we need to process in realtime, and the xml could be fairly large, and the volume could get high.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
LibXML2 提供了一个流 API(您可以一次解析一个文档块)和 XPath。混合两者并不像使用标准 DOM 解析器那么简单,但可以在每个元素的基础上进行。请参阅此处了解更多信息:http://xmlsoft.org/xmlreader.html#Mishing
LibXML2 provides a streaming API (where you can parse a document a chunk at a time) and also XPath. Mixing the two isn't as straightforward as with the standard DOM parser, but it's possible to do on a per-element basis. See here for more info: http://xmlsoft.org/xmlreader.html#Mixing
您可以使用 Saxon-EE 来做到这一点。最简单的方法可能是使用 XQuery 文档投影:请参见此处
http://www.saxonica.com /documentation/sourcedocs/projection.xml
You can do this with Saxon-EE. The simplest approach is probably using XQuery document projection: see here
http://www.saxonica.com/documentation/sourcedocs/projection.xml
尝试http://code.google.com/p/jlibs/wiki/XMLDog
XMLDog 可以使用 SAX 评估 xpath(即无需将整个文档加载到内存中)
try http://code.google.com/p/jlibs/wiki/XMLDog
XMLDog can evaluate xpaths using SAX (i,e without loading whole document into memory)