解析使用 UTF-8 编码的 XML 文件时出现无效令牌错误
解析使用 UTF-8 编码的 XML 文件时出现无效标记错误。
当遇到扩展 ASCII 字符 'â' { "â", "â" } 时会出现此错误。
当我将编码从 UTF-8 更改为 ISO-8859-1 时,解析成功。但我的应用程序应该支持 UTF-8、ASCII 和扩展 ASCII 字符。为此我该怎么办?
欢迎任何想法。
预先感谢您的时间和解决方案。
invalid token error while parsing an XML file with UTF-8 encoding.
This error is coming when it encountered extended ASCII character 'â' { "â", "â" }.
When I have changed the encoding from UTF-8 to ISO-8859-1 the parsing is successful. But my application should support UTF-8, ASCII and extended ASCII characters. What should I do for this?
Any ideas are welcome.
Thanks in Advance for your time and solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通过设置 XML 声明的编码属性告诉解析器 latin-1 文件是 UTF-8 将导致与您报告的错误类似的错误。
如果 'â' 字符 (U+00E2) 出现在 UTF-8 编码中文件,那么该字符将在该文件中编码为两个字节序列。因此,如果您在更改编码时没有更改文件中的字节,那么您就没有更改文件的编码,只是告诉解析器非 UTF-8 文件是 UTF-8。
Telling a parser that a latin-1 file is UTF-8 by setting the encoding attribute of the XML declaration will result in an error similar to that which you report.
If the 'â' character (U+00E2) appears in a UTF-8 encoded file, then that character will be encoded in that file as a two byte sequence. So if you are not changing the bytes in the file when you say you are changing the encoding, you are not changing the encoding of the file, only telling the parser that a non-UTF-8 file is UTF-8.