XMLReader 遇到奇怪的字符时中断
每当 XMLReader 尝试解析这个 XML 文件时,它都会在“½”和看起来像这样的“.”的句点上中断。
这两个字符每当我尝试从 xml feed 中删除它们时,编辑器都会首先删除它们前面的字符。因此,它们的行为就像外国/不同的编码字符。
我有哪些解决方案?我无法每次都编辑xml文件。多谢
Whenever XMLReader tried to parse this XML file Im feeding it, it breaks on "½" and on a period that looks like this "."
Both are characters that whenever I try to delete them from the xml feed, the editor deletes the characters in front of them first. So, they act like foreign/different encoding characters.
What are my options to fix it? I can't edit the xml file every time. Thanks a lot
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您必须修复创建“XML”文件的程序或进程。 (我将“XML”放在引号中,因为实际上,您希望它是一个 XML 文件,但它不是一个。)您也许能够修补、修复或恢复数据,但这不是长期的解决方案。
轶事证据表明“½”字符被编码为两个字节,表明它被编码为 UTF-8,而“é”字符被编码为一个字节,表明它被编码为 ISO 8859-1。这意味着两个不同的进程已写入该文件,并使用不同的编码写入该文件。 (也许它最初是用一种编码创建的,然后使用不知道原始编码是什么的编辑器进行修改。)这是行不通的。
You have to fix the program or process that creates the "XML" file. (I put "XML" in quotes, because actually, you would like it to be an XML file, but it isn't one.) You might be able to patch or repair or recover the data, but that's not a long-term solution.
The anecdotal evidence suggests that the "½" character is encoded as two bytes, suggesting it is encoded as UTF-8, while the "é" character is encoded as one byte, suggesting it is encoded as ISO 8859-1. That means that two different processes have written to the file, writing to it using different encodings. (Perhaps it was originally created in one encoding, and then modified using an editor that didn't know what the original encoding was.) That isn't going to work.