在 Java 中无需 root 即可解析 XML 文件

发布于 2024-09-08 14:12:14 字数 69 浏览 6 评论 0原文

我有这个没有根节点的 XML 文件。除了手动添加“假”根元素之外,还有什么方法可以用 Java 解析 XML 文件吗?谢谢。

I have this XML file which doesn't have a root node. Other than manually adding a "fake" root element, is there any way I would be able to parse an XML file in Java? Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

川水往事 2024-09-15 14:12:14

我想您可以创建一个新的 InputStream 实现来包装您将要解析的输入流。此实现将在来自包装流的字节之前返回开始根标记的字节,然后返回结束根标记的字节。这做起来相当简单。

我可能也面临这个问题。遗留代码,是吗?

伊恩.

编辑:您还可以查看 java.io.SequenceInputStream,它允许您将流相互追加。您需要将前缀和后缀放入字节数组中并将它们包装在 ByteArrayInputStreams 中,但这一切都相当简单。

I suppose you could create a new implementation of InputStream that wraps the one you'll be parsing from. This implementation would return the bytes of the opening root tag before the bytes from the wrapped stream and the bytes of the closing root tag afterwards. That would be fairly simple to do.

I may be faced with this problem too. Legacy code, eh?

Ian.

Edit: You could also look at java.io.SequenceInputStream which allows you to append streams to one another. You would need to put your prefix and suffix in byte arrays and wrap them in ByteArrayInputStreams but it's all fairly straightforward.

指尖微凉心微凉 2024-09-15 14:12:14

您的 XML 文档需要一个根 xml 元素才能被视为格式良好。如果没有这个,您将无法使用 xml 解析器来解析它。

Your XML document needs a root xml element to be considered well formed. Without this you will not be able to parse it with an xml parser.

无边思念无边月 2024-09-15 14:12:14

一种方法是提供您自己的虚拟包装器,而不触及原始的“xml”(格式不正确的“xml”),需要这个词:

语法

<!DOCTYPE some_root_elem SYSTEM "/home/ego/some.dtd"
[
  <!ENTITY entity-name "Some value to be inserted at the entity">
]

示例:

<!DOCTYPE dummy [
<!ENTITY data SYSTEM "http://wherever-my-data-is">
]>
<dummy>
&data;
</dummy>

One way is to provide your own dummy wrapper without touching the original 'xml' (the not well formed 'xml') Need the word for that:

Syntax

<!DOCTYPE some_root_elem SYSTEM "/home/ego/some.dtd"
[
  <!ENTITY entity-name "Some value to be inserted at the entity">
]

Example:

<!DOCTYPE dummy [
<!ENTITY data SYSTEM "http://wherever-my-data-is">
]>
<dummy>
&data;
</dummy>
东走西顾 2024-09-15 14:12:14

您可以使用另一个解析器,例如 Jsoup。它可以在没有根的情况下解析 XML。

You could use another parser like Jsoup. It can parse XML without a root.

甜心 2024-09-15 14:12:14

我认为即使任何 API 有一个选项,它也只会返回“XML”的第一个节点,它看起来像根,并丢弃其余的节点。

所以答案可能是自己做。 Scanner 或 StringTokenizer 可能可以解决这个问题。

也许一些 html 解析器可能会有所帮助,它们通常不太严格。

I think even if any API would have an option for this, it will only return you the first node of the "XML" which will look like a root and discard the rest.

So the answer is probably to do it yourself. Scanner or StringTokenizer might do the trick.

Maybe some html parsers might help, they are usually less strict.

情域 2024-09-15 14:12:14

这就是我所做的:

有一个旧的 java.io.SequenceInputStream 类,它太旧了,它需要 Enumeration 而不是 List 等。

有了它,您可以在无根 XML 流周围添加根元素标签(在我的例子中为

)。 (由于性能和内存原因,您不应该通过连接字符串来完成此操作。)

public void tryExtractHighestHeader(ParserContext context)
{
    String xhtmlString = context.getBody();
    if (xhtmlString == null || "".equals(xhtmlString))
        return;

    // The XHTML needs to be wrapped, because it has no root element.
    ByteArrayInputStream divStart = new ByteArrayInputStream("<div>".getBytes(StandardCharsets.UTF_8));
    ByteArrayInputStream divEnd = new ByteArrayInputStream("</div>".getBytes(StandardCharsets.UTF_8));
    ByteArrayInputStream is = new ByteArrayInputStream(xhtmlString.getBytes(StandardCharsets.UTF_8));
    Enumeration<InputStream> streams = new IteratorEnumeration(Arrays.asList(new InputStream[]{divStart, is, divEnd}).iterator());

    try (SequenceInputStream wrapped = new SequenceInputStream(streams);) {
        DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = builderFactory.newDocumentBuilder();
        Document xmlDocument = builder.parse(wrapped);

从这里您可以做任何您喜欢的事情,但请记住额外的元素。

        XPath xPath = XPathFactory.newInstance().newXPath();
    }
    catch (Exception e) {
        throw new RuntimeException("Failed parsing XML: " + e.getMessage());
    }
}

Here's what I did:

There's an old java.io.SequenceInputStream class, which is so old that it takes Enumeration rather than List or such.

With it, you can prepend and append the root element tags (<div> and </div> in my case) around your no-root XML stream. (You shouldn't do it by concatenating Strings due to performance and memory reasons.)

public void tryExtractHighestHeader(ParserContext context)
{
    String xhtmlString = context.getBody();
    if (xhtmlString == null || "".equals(xhtmlString))
        return;

    // The XHTML needs to be wrapped, because it has no root element.
    ByteArrayInputStream divStart = new ByteArrayInputStream("<div>".getBytes(StandardCharsets.UTF_8));
    ByteArrayInputStream divEnd = new ByteArrayInputStream("</div>".getBytes(StandardCharsets.UTF_8));
    ByteArrayInputStream is = new ByteArrayInputStream(xhtmlString.getBytes(StandardCharsets.UTF_8));
    Enumeration<InputStream> streams = new IteratorEnumeration(Arrays.asList(new InputStream[]{divStart, is, divEnd}).iterator());

    try (SequenceInputStream wrapped = new SequenceInputStream(streams);) {
        DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = builderFactory.newDocumentBuilder();
        Document xmlDocument = builder.parse(wrapped);

From here you can do whatever you like, but keep in mind the extra element.

        XPath xPath = XPathFactory.newInstance().newXPath();
    }
    catch (Exception e) {
        throw new RuntimeException("Failed parsing XML: " + e.getMessage());
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文