XmlDocument.Load() 方法无法解码 € (欧元)
我有一个 XML 文档 file.xml
,它是用 Iso-latin-15(又名 Iso-Latin-9)编码的,
<?xml version="1.0" encoding="iso-8859-15"?>
<root xmlns="http://stackoverflow.com/demo">
<f>€.txt</f>
</root>
从我最喜欢的文本编辑器中,我可以知道这个文件是用 Iso-Latin 正确编码的-15(不是 UTF-8)。
我的软件是用 C# 编写的,想要提取元素 f
。
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("file.xml");
在现实生活中,我有一个 XMLResolver 来设置凭据。但基本上,我的代码就是这么简单。加载过程很顺利,没有出现任何异常。
现在,当我提取该值时,我的问题是:
//xnsm is the XmlNameSpace manager
XmlNode n = xmlDoc.SelectSingleNode("//root/f", xnsm);
if (n != null)
String filename = n.InnerText;
Visual Studio 调试器显示 filename = □.txt
这只能是 Visual Studio 错误。不幸的是,File.Exists(filename)
返回 false,而文件实际上存在。
怎么了?
I have an XML document file.xml
which is encoded in Iso-latin-15 (aka Iso-Latin-9)
<?xml version="1.0" encoding="iso-8859-15"?>
<root xmlns="http://stackoverflow.com/demo">
<f>€.txt</f>
</root>
From my favorite text editor, I can tell this file is correctly encoded in Iso-Latin-15 (it is not UTF-8).
My software is written in C# and wants to extract the element f
.
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("file.xml");
In real life, I have a XMLResolver to set credentials. But basically, my code is as simple as that. The loading goes smoothly, I don't have any exception raised.
Now, my problem when I extract the value:
//xnsm is the XmlNameSpace manager
XmlNode n = xmlDoc.SelectSingleNode("//root/f", xnsm);
if (n != null)
String filename = n.InnerText;
The Visual Studio debugger displays filename = □.txt
It could only be a Visual Studio bug. Unfortunately File.Exists(filename)
returns false, whereas the file actually exist.
What's wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果我没记错的话,
XmlDocument.Load(string)
方法始终假定 UTF-8,无论 XML 编码如何。您必须使用正确的编码创建一个 StreamReader 并将其用作参数。
编辑:
我刚刚偶然发现了 Microsoft 的 KB308061。有一段很有趣:
If I remember correctly the
XmlDocument.Load(string)
method always assumes UTF-8, regardless of the XML encoding.You would have to create a
StreamReader
with the correct encoding and use that as the parameter.EDIT:
I just stumbled across KB308061 from Microsoft. There's an interesting passage:
不要只使用调试器或控制台将字符串显示为字符串。
相反,转储字符串的内容,一次一个字符。例如:
这将按照 Unicode 代码点显示字符串的真实内容,而不是受到当前字体可以显示的内容的限制。
使用Unicode 代码图表查找指定的字符。
Don't just use the debugger or the console to display the string as a string.
Instead, dump the contents of the string, one character at a time. For example:
That will show you the real contents of the string, in terms of Unicode code points, instead of being constrained by what the current font can display.
Use the Unicode code charts to look up the characters specified.
您的 xml 是否正确定义了其编码? coding="iso-8859-15" .. 是 Iso-latin-15
理想情况下,您应该将内容放在 CDATA 元素中 .. 这样 xml 看起来像
理想情况下,您还应该使用等效的 url 编码(或 http 编码)值转义所有特殊字符,因为 xml 通常是 。
我不知道 € 的确切转义代码 .. 但它会是这样的
上面应该使 € 通过 xml 正确传达。
Does your xml define its encoding correctly ? encoding="iso-8859-15" .. is that Iso-latin-15
Ideally, you should put your content inside a CDATA element .. so the xml would look like
<f><![CDATA[€.txt]]></f>
Ideally, you should also escape all special characters with equivalent url-encoded (or http-encoded) values, because xml typically is for communicating through http.
I dont know the exact escape code for € .. but it would be something of this sort
The above should make € be communicated correctly through the xml.