是否可以使用 Groovy XMLSlurper 解析子树

发布于 2024-09-30 17:15:50 字数 331 浏览 2 评论 0原文

有谁知道是否可以以某种方式使用 XMLSlurper,这意味着可以从非常大的 XML 文档中提取各个子树并单独进行处理?

想象一下,您有一个巨大的 XML 提要,其中包含一个根元素,该根元素具有数千个可以单独处理的直接子元素。显然,将整个文档读入内存是不允许的,但是,由于根的每个子元素本身大小适中,因此最好能够流式传输文档,但依次将 XMLSlurper 的优点应用于每个子元素。当处理每个子元素时,垃圾收集可以清理用于处理它的内存。通过这种方式,我们可以获得 XMLSlurper(如此简洁的语法)的极大便利性和流式传输(例如 SAX)的低内存占用。

我很想知道是否有人对此有想法和/或您自己是否遇到过这个要求。

Does anyone know whether it is possible to utilise XMLSlurper in a fashion that means individual sub-trees can be pulled from a very large XML document and processed individually?

Imagine you've got a huge XML feed containing a root element that has thousands of direct child elements that you can process individually. Obviously, reading the whole document into memory is a no-no but, as each child of the root is itself modestly sized, it would be nice to stream through the document but apply XMLSlurper niceness to each of the child elements in turn. As each child element is processed, garbage collection can clean up memory used to process it. In this way we get the great ease of XMLSlurper (such concise syntax) with the low memory footprint of streaming (e.g. SAX).

I'd be interested to know if anyone has ideas on this and/or whether you've come across this requirement yourselves.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

软的没边 2024-10-07 17:15:50

初始化 XmlSlurper 实例意味着调用其重载的 parse(..) 方法之一(或 parseText(String) 方法)。在此调用后,XmlSlurper 将(至少使用 SAX 事件)构造一个内存中的 GPathResult,它保存有关 XML 元素和属性及其结构的完整信息。

所以,不,XmlSlurper仅提供用于解析 XML 文档部分的 API。

可以做的是,扩展XmlSlurper,覆盖parse*(..)方法,通过预处理XML ="http://groovy.codehaus.org/Reading+XML+with+Groovy+and+SAX" rel="nofollow">使用自定义 SAX 处理程序,收集所需的 XML 部分,然后转发这些内容到 XmlSlurper.parse*(..) 方法之一。

Initializing an XmlSlurper instance means, calling one of its overloaded parse(..) methods (or the parseText(String) method). Upon this call, XmlSlurper will (use SAX events, at least, to) construct an in-memory GPathResult that holds the complete information on the XML elements and attributes, and their structure.

So, no, the XmlSlurper does not provide an API to parse XML document portions, only.

What can be done is, extending XmlSlurper, overwriting the parse*(..) methods, pre-processing the XML by using a custom SAX handler, gathering the desired portions of XML, and forwarding these to one of the XmlSlurper.parse*(..) methods.

柠檬色的秋千 2024-10-07 17:15:50

您可以将 StAX API 与 XmlSlurper 一起使用来解析子树。

// Example of using StAX to split a large XML document and parse a single element using XmlSlurper

import javax.xml.stream.XMLInputFactory
import javax.xml.stream.XMLStreamReader
import javax.xml.transform.Transformer
import javax.xml.transform.TransformerFactory
import javax.xml.transform.sax.SAXResult
import javax.xml.transform.stax.StAXSource

def url = new URL("http://repo2.maven.org/maven2/archetype-catalog.xml")
url.withInputStream { inputStream ->
    def xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader(inputStream)
    def transformer = TransformerFactory.newInstance().newTransformer()
    while (xmlStreamReader.hasNext()) {
        xmlStreamReader.next()
        if (xmlStreamReader.isStartElement() && xmlStreamReader.getLocalName() == 'archetype') {
            // Example of splitting a large XML document and parsing a single element with XmlSlurper at a time
            def xmlSlurper = new XmlSlurper()
            transformer.transform(new StAXSource(xmlStreamReader), new SAXResult(xmlSlurper))
            def archetype = xmlSlurper.document
            println "${archetype.groupId} ${archetype.artifactId} ${archetype.version}"
        }
    }
}

You can use StAX API together with XmlSlurper to parse subtrees.

// Example of using StAX to split a large XML document and parse a single element using XmlSlurper

import javax.xml.stream.XMLInputFactory
import javax.xml.stream.XMLStreamReader
import javax.xml.transform.Transformer
import javax.xml.transform.TransformerFactory
import javax.xml.transform.sax.SAXResult
import javax.xml.transform.stax.StAXSource

def url = new URL("http://repo2.maven.org/maven2/archetype-catalog.xml")
url.withInputStream { inputStream ->
    def xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader(inputStream)
    def transformer = TransformerFactory.newInstance().newTransformer()
    while (xmlStreamReader.hasNext()) {
        xmlStreamReader.next()
        if (xmlStreamReader.isStartElement() && xmlStreamReader.getLocalName() == 'archetype') {
            // Example of splitting a large XML document and parsing a single element with XmlSlurper at a time
            def xmlSlurper = new XmlSlurper()
            transformer.transform(new StAXSource(xmlStreamReader), new SAXResult(xmlSlurper))
            def archetype = xmlSlurper.document
            println "${archetype.groupId} ${archetype.artifactId} ${archetype.version}"
        }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文