需要一个应用程序来修复带有未转义字符的 XML
此 XML (rdf 文件扩展名,但为 XML) 是由自动工具,但不幸的是有各种“未转义”字符串
<tag xml:lang="fr">L'insuline (du latin insula, île) </tag>
,例如解析器(和推理软件)因此崩溃...
Java 或 PHP 解决方案对我来说也有效!
谢谢, 塞尔索
This XML (rdf file extension, but is XML) was generated by a automatic tool, but unfortunately have various "unescaped" strings like
<tag xml:lang="fr">L'insuline (du latin insula, île) </tag>
And the parser (and reasoner software) crash with this...
Java or PHP solutions are valid to me too!
Thanks,
Celso
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
下面是我经常使用的一种通用方法,以确保对 XML 正确转义字符串。
Here's a general method that I use a lot to make sure a String is escaped properly for XML.
OP 给出的 xml 是格式正确的 xml,因为单引号字符有效,扬抑符“i”也是有效的,两者都不需要转义。我会确保您使用的是文本编码,例如 UTF-8。这是执行身份转换的快速 java 示例:
The xml given by the OP is well-formed xml as the single quote character is valid and so is the circumflex "i", neither needs escaping. I would make sure you're using a text encoding such as UTF-8. Here's quick java example that does an identity transformation:
OP 给出的 XML 片段看起来格式良好。撇号和抑扬符都不需要转义。最可能的问题是 XML 使用 iso-8859-1 进行编码,但缺少 XML 声明,因此解析器认为它是 UTF-8 编码。那么解决方案是添加 XML 声明
,它告诉解析器如何解码字符。 (对于仅包含 ASCII 字符的文档,iso-8859-1 和 utf-8 是无法区分的,因此只有当您使用 ASCII 范围之外的字符时才会出现此问题)。
一句建议:如果您给出了解析器生成的错误消息,您就不会得到这么多错误的答案。
The XML fragment given by the OP looks well-formed. Neither the apostrophe nor the i-circumflex needs escaping. The most likely problem is that the XML is encoded using iso-8859-1, but lacks an XML declaration, so the parser think it is in UTF-8 encoding. The solution then is to add the XML declaration
<?xml version="1.0" encoding="iso-8859-1"?>
, which tells the parser how to decode the characters. (For a document containing only ASCII characters, iso-8859-1 and utf-8 are indistinguishable, so this problem only surfaces when you use characters outside the ASCII range).A word of advice: if you had given the error message generated by the parser, you wouldn't have got so many incorrect answers.