是否可以使用 Groovy XMLSlurper 解析子树
有谁知道是否可以以某种方式使用 XMLSlurper,这意味着可以从非常大的 XML 文档中提取各个子树并单独进行处理?
想象一下,您有一个巨大的 XML 提要,其中包含一个根元素,该根元素具有数千个可以单独处理的直接子元素。显然,将整个文档读入内存是不允许的,但是,由于根的每个子元素本身大小适中,因此最好能够流式传输文档,但依次将 XMLSlurper 的优点应用于每个子元素。当处理每个子元素时,垃圾收集可以清理用于处理它的内存。通过这种方式,我们可以获得 XMLSlurper(如此简洁的语法)的极大便利性和流式传输(例如 SAX)的低内存占用。
我很想知道是否有人对此有想法和/或您自己是否遇到过这个要求。
Does anyone know whether it is possible to utilise XMLSlurper in a fashion that means individual sub-trees can be pulled from a very large XML document and processed individually?
Imagine you've got a huge XML feed containing a root element that has thousands of direct child elements that you can process individually. Obviously, reading the whole document into memory is a no-no but, as each child of the root is itself modestly sized, it would be nice to stream through the document but apply XMLSlurper niceness to each of the child elements in turn. As each child element is processed, garbage collection can clean up memory used to process it. In this way we get the great ease of XMLSlurper (such concise syntax) with the low memory footprint of streaming (e.g. SAX).
I'd be interested to know if anyone has ideas on this and/or whether you've come across this requirement yourselves.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
初始化
XmlSlurper
实例意味着调用其重载的parse(..)
方法之一(或parseText(String)
方法)。在此调用后,XmlSlurper 将(至少使用 SAX 事件)构造一个内存中的 GPathResult,它保存有关 XML 元素和属性及其结构的完整信息。所以,不,
XmlSlurper
不仅提供用于解析 XML 文档部分的 API。可以做的是,
扩展
XmlSlurper
,覆盖parse*(..)
方法,通过预处理XML ="http://groovy.codehaus.org/Reading+XML+with+Groovy+and+SAX" rel="nofollow">使用自定义 SAX 处理程序,收集所需的 XML 部分,然后转发这些内容到XmlSlurper.parse*(..)
方法之一。Initializing an
XmlSlurper
instance means, calling one of its overloadedparse(..)
methods (or theparseText(String)
method). Upon this call, XmlSlurper will (use SAX events, at least, to) construct an in-memoryGPathResult
that holds the complete information on the XML elements and attributes, and their structure.So, no, the
XmlSlurper
does not provide an API to parse XML document portions, only.What can be done is,
extend
ingXmlSlurper
, overwriting theparse*(..)
methods, pre-processing the XML by using a custom SAX handler, gathering the desired portions of XML, and forwarding these to one of theXmlSlurper.parse*(..)
methods.您可以将 StAX API 与
XmlSlurper
一起使用来解析子树。You can use StAX API together with
XmlSlurper
to parse subtrees.