使用 xerces 和 Java 构建 DOM - 如何防止 & 符号转义

发布于 2025-01-04 20:51:29 字数 1264 浏览 0 评论 0原文

我正在 Java 中使用 xerces 来构建 DOM。对于成为 DOM 中文本节点的字段之一,数据是从已经将任何非 ASCII 和/或 XML 特殊字符转换为其实体名称或数字的源传递的,例如“Banana®”

我知道系统的设计是错误的,因为数据源不应该这样做,但这超出了我的控制范围,但我想知道是否有一种方法可以以某种方式防止这种情况被转义并变成“香蕉®”不先解码? (我知道它会隐式转换它需要的任何字符,以便我可以在解码后输入原始字符)。

示例代码:

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();      
    DocumentBuilder db = dbf.newDocumentBuilder();      
    Document dom = db.newDocument();        
    Element root = dom.createElement("Companies");      
    dom.appendChild(root);      
    Element company = dom.createElement("Company");
    Text t = dom.createTextNode("Banana®");        
    company.appendChild(t);     
    root.appendChild(company);      
    DOMImplementationRegistry dir = DOMImplementationRegistry.newInstance(); 
    DOMImplementationLS impl = 
        (DOMImplementationLS)dir.getDOMImplementation("LS");        
    LSSerializer writer = impl.createLSSerializer();
    LSOutput output = impl.createLSOutput();
    output.setByteStream(System.out);
    writer.write(dom, output);

示例输出:

<?xml version="1.0" encoding="UTF-8"?>
<Companies><Company>Banana&amp;#174;</Company></Companies>

I am using xerces in Java to build a DOM. For one of the fields that becomes a text node in the DOM, the data is being delivered from a source that has already turned any non ASCII and/or XML special characters into their entity names or numbers, e.g. "Banana®"

I know the design of the system is wrong in terms the data source shouldn't be doing this but that is out of my control, but what I am wondering is if there is a way to somehow prevent this from being escaped and turned into "Banana&#174;" without decoding first? (I know it will implicitly convert any chars it needs to so I could enter the raw char after decoding).

Example code:

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();      
    DocumentBuilder db = dbf.newDocumentBuilder();      
    Document dom = db.newDocument();        
    Element root = dom.createElement("Companies");      
    dom.appendChild(root);      
    Element company = dom.createElement("Company");
    Text t = dom.createTextNode("Banana®");        
    company.appendChild(t);     
    root.appendChild(company);      
    DOMImplementationRegistry dir = DOMImplementationRegistry.newInstance(); 
    DOMImplementationLS impl = 
        (DOMImplementationLS)dir.getDOMImplementation("LS");        
    LSSerializer writer = impl.createLSSerializer();
    LSOutput output = impl.createLSOutput();
    output.setByteStream(System.out);
    writer.write(dom, output);

Example Output:

<?xml version="1.0" encoding="UTF-8"?>
<Companies><Company>Banana&#174;</Company></Companies>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

你好,陌生人 2025-01-11 20:51:29

如果您可以以某种方式在 CDATA 部分中声明它,那么它应该按原样传递。

If you could somehow declare it in a CDATA section, it should be passed through as is.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文