从 XML 文件中检索两条数据的最佳方法是什么?

发布于 2024-08-28 03:03:15 字数 513 浏览 7 评论 0原文

我有一个处于 FO 转换前或后状态的 XML 文档,我需要从中提取一些信息。在前一种情况下,我需要提取两个代表 pageWidth 和 pageHeight 的标签,在后一种情况下,我需要从特定标签中提取页面高度和页面宽度参数(我忘记了是哪个标签)我的头顶)。

我正在寻找一种有效/易于维护的方式来获取这两个元素。我只想阅读该文档一次以获取我需要的两件事。

我最初开始编写一些使用 BufferedReader + FileReader 的东西,但后来我进行字符串搜索,当标签跨越多行时,它会变得混乱。然后我查看了 DOMParser,这似乎是理想的,但如果我可以帮助的话,我不想将整个文件读入内存,因为文件可能很大并且我正在寻找的标签几乎总是靠近文件的顶部。然后我研究了 SAXParser,但这对于我想要完成的任务来说似乎是一大堆复杂的矫枉过正。

有人有什么建议吗?或者可以实现我的目标的简单实现?谢谢。

编辑:我忘了提及,由于我有各种限制,我使用的任何内容都必须“内置”到核心 Java,其中我无法使用和/或下载任何第 3 方 XML 工具。

I've got an XML document that is in either a pre or post FO transformed state that I need to extract some information from. In the pre-case, I need to pull out two tags that represent the pageWidth and pageHeight and in the post case I need to extract the page-height and page-width parameters from a specific tag (I forget which one it is off the top of my head).

What I'm looking for is an efficient/easily maintainable way to grab these two elements. I'd like to only read the document a single time fetching the two things I need.

I initially started writing something that would use BufferedReader + FileReader, but then I'm doing string searching and it gets messy when the tags span multiple lines. I then looked at the DOMParser, which seems like it would be ideal, but I don't want to have to read the entire file into memory if I could help it as the files could potentially be large and the tags I'm looking for will nearly always be close to the top of the file. I then looked into SAXParser, but that seems like a big pile of complicated overkill for what I'm trying to accomplish.

Anybody have any advice? Or simple implementations that would accomplish my goal? Thanks.

Edit: I forgot to mention that due to various limitations I have, whatever I use has to be "builtin" to core Java, in which I can't use and/or download any 3rd party XML tools.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

断念 2024-09-04 03:03:15

虽然 XPath 非常适合查询 XML 数据,但我不知道 Java 的良好且快速的 XPath 实现(它们至少都使用 DOM 模型)。

我建议您坚持使用 StAX。即使对于大文件,它也非常快,而且它的游标 API 相当简单:

XMLInputFactory f = XMLInputFactory.newInstance();
XMLStreamReader r = f.createXMLStreamReader("my.xml");
try {
  while (r.hasNext()) {
    r.next();
    . . .
  }
} finally {
  r.close()
}

请参阅 StAX 教程XMLStreamReader javadocs 了解更多信息。

While XPath is very good for querying XML data, I am not aware of good and fast XPath implementation for Java (they all use DOM model at least).

I would recommend you to stick with StAX. It is extremely fast even for huge files, and it's cursor API is rather trivial:

XMLInputFactory f = XMLInputFactory.newInstance();
XMLStreamReader r = f.createXMLStreamReader("my.xml");
try {
  while (r.hasNext()) {
    r.next();
    . . .
  }
} finally {
  r.close()
}

Consult StAX tutorial and XMLStreamReader javadocs for more information.

凉风有信 2024-09-04 03:03:15

您可以使用 XPath搜索您的标签。 这里是关于形成 XPath 表达式的教程。 这里是一篇关于在 Java 中使用 XPath 的文章。


一个易于使用的解析器(dom、sax)是 dom4j。它比内置的 SAXParser 使用起来要容易得多。

You can use XPath to search for your tags. Here is a tutorial on forming XPath expressions. And here is an article on using XPath with Java.


An easy to use parser (dom, sax) is dom4j. It would be quite easier to use than the built-in SAXParser.

草莓味的萝莉 2024-09-04 03:03:15

尝试 "XMLDog"

这使用 sax 来评估 xpath

try "XMLDog"

This uses sax to evaluate xpaths

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文