是否有一个 Java XML API 可以解析文档而不解析字符实体?

发布于 2024-08-12 05:43:10 字数 1047 浏览 5 评论 0 原文

我有一个程序需要解析包含字符实体的 XML。程序本身不需要解决它们,并且它们的列表很大并且会发生变化,所以如果可以的话,我想避免对这些实体的显式支持。

这是一个简单的例子:

<?xml version="1.0" encoding="UTF-8"?>
<xml>Hello there &something;</xml>

是否有一个 Java XML API 可以成功解析文档而不解析(非标准)字符实体?理想情况下,它将它们转化为可以专门处理的特殊事件或对象,但我会选择一个默默地抑制它们的选项。

回答&示例:

Skaffman 给了我答案:使用 StAX 解析器,并将 IS_REPLACING_ENTITY_REFERENCES 设置为 false。

下面是我编写的用于尝试的代码:

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLEventReader reader = inputFactory.createXMLEventReader(
    new FileInputStream("your file here"));

while (reader.hasNext()) {
    XMLEvent event = reader.nextEvent();
    if (event.isEntityReference()) {
        EntityReference ref = (EntityReference) event;
        System.out.println("Entity Reference: " + ref.getName());
    }
}

对于上面的 XML,它将打印“Entity Reference: Something”。

I have program that needs to parse XML that contains character entities. The program itself doesn't need to have them resolved, and the list of them is large and will change, so I want to avoid explicit support for these entities if I can.

Here's a simple example:

<?xml version="1.0" encoding="UTF-8"?>
<xml>Hello there &something;</xml>

Is there a Java XML API that can parse a document successfully without resolving (non-standard) character entities? Ideally it would translate them into a special event or object that could be handled specially, but I'd settle for an option that would silently suppress them.

Answer & Example:

Skaffman gave me the answer: use a StAX parser with IS_REPLACING_ENTITY_REFERENCES set to false.

Here's the code I whipped up to try it out:

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLEventReader reader = inputFactory.createXMLEventReader(
    new FileInputStream("your file here"));

while (reader.hasNext()) {
    XMLEvent event = reader.nextEvent();
    if (event.isEntityReference()) {
        EntityReference ref = (EntityReference) event;
        System.out.println("Entity Reference: " + ref.getName());
    }
}

For the above XML, it will print "Entity Reference: something".

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

雨后咖啡店 2024-08-19 05:43:10

STaX API 通过 IS_REPLACING_ENTITY_REFERENCES 属性:

需要替换解析器
内部实体引用及其
替换文本并将其报告为
字符

设置到 XmlInputFactory 中,然后依次使用它来构造 XmlEventReaderXmlStreamReader。然而,API 谨慎地说,此属性只是为了强制实现执行替换,而不是强制它替换它们。尽管如此,它仍然值得一试。

The STaX API has support for the notion of not replacing character entity references, by way of the IS_REPLACING_ENTITY_REFERENCES property:

Requires the parser to replace
internal entity references with their
replacement text and report them as
characters

This can be set into an XmlInputFactory, which is then in turn used to construct an XmlEventReader or XmlStreamReader. However, the API is careful to say that this property is only intended to force the implementation to perform the replacement, rather than forcing it to not replace them. Still, it's got to be worth a try.

瘫痪情歌 2024-08-19 05:43:10

仅当禁用外部实体的支持时才适用于我:

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
inputFactory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);

Works for me only when disabling support of external entities:

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
inputFactory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
夜未央樱花落 2024-08-19 05:43:10

使用 org.xml.sax 进行 SAX 解析.EntityResolver 可能适合您的目的。你肯定可以压制它们,并且你可能会找到一种方法让它们悬而未决。

这个 tutorial 似乎是最相关的:它展示了如何将实体解析为字符串。

A SAX parse with an org.xml.sax.EntityResolver might suit your purpose. You could for sure suppress them, and you could probably find a way to leave them unresolved.

This tutorial seems the most relevant: it shows how to resolve entities into strings.

指尖凝香 2024-08-19 05:43:10

我不是 Java 开发人员,但我“认为”Java xml 类支持与 .net 类似的功能来完成此任务。在 .net 的 xmlreadersettings 类中,您将 ProhibitDtd 属性设置为 false,并将 XmlResolver 属性设置为 null。这将导致解析器忽略外部引用的实体,而不会在读取它们时引发异常。我刚刚在谷歌上搜索了“Javaignore enity”并得到了很多点击,其中一些似乎涉及这个主题。我意识到这并不是您问题的完整答案,但它应该为您指明一个有用的方向。

I am not a Java developer, but I "think" Java xml classes support a similar functionality to .net for accomplishing this. IN .net the xmlreadersettings class you set the ProhibitDtd property false and set the XmlResolver property to null. This will cause the parser to ignore externally referenced entities without throwing an exception when they are read. I just did a google search for "Java ignore enity" and got lots of hits, some of which appear to address this topic. I realize this is not a total answer to your question but it should point you in a useful direction.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文