使用 xerces 和 Java 构建 DOM - 如何防止 & 符号转义
我正在 Java 中使用 xerces 来构建 DOM。对于成为 DOM 中文本节点的字段之一,数据是从已经将任何非 ASCII 和/或 XML 特殊字符转换为其实体名称或数字的源传递的,例如“Banana®”
我知道系统的设计是错误的,因为数据源不应该这样做,但这超出了我的控制范围,但我想知道是否有一种方法可以以某种方式防止这种情况被转义并变成“香蕉®”不先解码? (我知道它会隐式转换它需要的任何字符,以便我可以在解码后输入原始字符)。
示例代码:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.newDocument();
Element root = dom.createElement("Companies");
dom.appendChild(root);
Element company = dom.createElement("Company");
Text t = dom.createTextNode("Banana®");
company.appendChild(t);
root.appendChild(company);
DOMImplementationRegistry dir = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl =
(DOMImplementationLS)dir.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
LSOutput output = impl.createLSOutput();
output.setByteStream(System.out);
writer.write(dom, output);
示例输出:
<?xml version="1.0" encoding="UTF-8"?>
<Companies><Company>Banana&#174;</Company></Companies>
I am using xerces in Java to build a DOM. For one of the fields that becomes a text node in the DOM, the data is being delivered from a source that has already turned any non ASCII and/or XML special characters into their entity names or numbers, e.g. "Banana®"
I know the design of the system is wrong in terms the data source shouldn't be doing this but that is out of my control, but what I am wondering is if there is a way to somehow prevent this from being escaped and turned into "Banana®" without decoding first? (I know it will implicitly convert any chars it needs to so I could enter the raw char after decoding).
Example code:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.newDocument();
Element root = dom.createElement("Companies");
dom.appendChild(root);
Element company = dom.createElement("Company");
Text t = dom.createTextNode("Banana®");
company.appendChild(t);
root.appendChild(company);
DOMImplementationRegistry dir = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl =
(DOMImplementationLS)dir.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
LSOutput output = impl.createLSOutput();
output.setByteStream(System.out);
writer.write(dom, output);
Example Output:
<?xml version="1.0" encoding="UTF-8"?>
<Companies><Company>Banana®</Company></Companies>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您可以以某种方式在 CDATA 部分中声明它,那么它应该按原样传递。
If you could somehow declare it in a CDATA section, it should be passed through as is.