XmlDocument.Load() 方法无法解码 € （欧元）

发布于 2024-10-07 10:41:19 字数 870 浏览 7 评论 0原文

我有一个 XML 文档 file.xml，它是用 Iso-latin-15（又名 Iso-Latin-9）编码的，

<?xml version="1.0" encoding="iso-8859-15"?>
<root xmlns="http://stackoverflow.com/demo">
  <f>€.txt</f>
</root>

从我最喜欢的文本编辑器中，我可以知道这个文件是用 Iso-Latin 正确编码的-15（不是 UTF-8）。

我的软件是用 C# 编写的，想要提取元素 f。

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("file.xml");

在现实生活中，我有一个 XMLResolver 来设置凭据。但基本上，我的代码就是这么简单。加载过程很顺利，没有出现任何异常。

现在，当我提取该值时，我的问题是：

//xnsm is the XmlNameSpace manager
XmlNode n = xmlDoc.SelectSingleNode("//root/f", xnsm); 
if (n != null)
  String filename = n.InnerText;

Visual Studio 调试器显示 filename = □.txt

这只能是 Visual Studio 错误。不幸的是，File.Exists(filename) 返回 false，而文件实际上存在。

怎么了？

原文

I have an XML document file.xml which is encoded in Iso-latin-15 (aka Iso-Latin-9)

<?xml version="1.0" encoding="iso-8859-15"?>
<root xmlns="http://stackoverflow.com/demo">
  <f>€.txt</f>
</root>

From my favorite text editor, I can tell this file is correctly encoded in Iso-Latin-15 (it is not UTF-8).

My software is written in C# and wants to extract the element f.

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("file.xml");

In real life, I have a XMLResolver to set credentials. But basically, my code is as simple as that. The loading goes smoothly, I don't have any exception raised.

Now, my problem when I extract the value:

//xnsm is the XmlNameSpace manager
XmlNode n = xmlDoc.SelectSingleNode("//root/f", xnsm); 
if (n != null)
  String filename = n.InnerText;

The Visual Studio debugger displays filename = □.txt

It could only be a Visual Studio bug. Unfortunately File.Exists(filename) returns false, whereas the file actually exist.

What's wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

断舍离 2024-10-14 10:41:19

如果我没记错的话，XmlDocument.Load(string) 方法始终假定 UTF-8，无论 XML 编码如何。

您必须使用正确的编码创建一个 StreamReader 并将其用作参数。

xmlDoc.Load(new StreamReader(
                     File.Open("file.xml"), 
                     Encoding.GetEncoding("iso-8859-15")));

编辑：

我刚刚偶然发现了 Microsoft 的 KB308061。有一段很有趣：

指定编码声明
XML 的 XML 声明部分
文档。例如，以下
声明表明
文档采用 UTF-16 Unicode 编码
格式：
请注意，此声明仅
指定一个的编码格式
XML 文档并且不修改或
控制实际的编码格式
数据。

If I remember correctly the XmlDocument.Load(string) method always assumes UTF-8, regardless of the XML encoding.

You would have to create a StreamReader with the correct encoding and use that as the parameter.

xmlDoc.Load(new StreamReader(
                     File.Open("file.xml"), 
                     Encoding.GetEncoding("iso-8859-15")));

EDIT:

I just stumbled across KB308061 from Microsoft. There's an interesting passage:

Specify the encoding declaration in
the XML declaration section of the XML
document. For example, the following
declaration indicates that the
document is in UTF-16 Unicode encoding
format:
<?xml version="1.0" encoding="UTF-16"?>
Note that this declaration only
specifies the encoding format of an
XML document and does not modify or
control the actual encoding format of
the data.

回复收藏 0 原文

靑春怀旧 2024-10-14 10:41:19

不要只使用调试器或控制台将字符串显示为字符串。

相反，转储字符串的内容，一次一个字符。例如：

foreach (char c in filename)
{
    Console.WriteLine("{0}: {1:x4}", c, (int) c);
}

这将按照 Unicode 代码点显示字符串的真实内容，而不是受到当前字体可以显示的内容的限制。

使用Unicode 代码图表查找指定的字符。

Don't just use the debugger or the console to display the string as a string.

Instead, dump the contents of the string, one character at a time. For example:

foreach (char c in filename)
{
    Console.WriteLine("{0}: {1:x4}", c, (int) c);
}

That will show you the real contents of the string, in terms of Unicode code points, instead of being constrained by what the current font can display.

Use the Unicode code charts to look up the characters specified.

回复收藏 0 原文

夏夜暖风 2024-10-14 10:41:19

您的 xml 是否正确定义了其编码？ coding="iso-8859-15" .. 是 Iso-latin-15
理想情况下，您应该将内容放在 CDATA 元素中 .. 这样 xml 看起来像
理想情况下，您还应该使用等效的 url 编码（或 http 编码）值转义所有特殊字符，因为 xml 通常是。

我不知道 € 的确切转义代码 .. 但它会是这样的

<f><![CDATA[%3E.txt]]></f>

上面应该使 € 通过 xml 正确传达。

Does your xml define its encoding correctly ? encoding="iso-8859-15" .. is that Iso-latin-15
Ideally, you should put your content inside a CDATA element .. so the xml would look like <f><![CDATA[€.txt]]></f>
Ideally, you should also escape all special characters with equivalent url-encoded (or http-encoded) values, because xml typically is for communicating through http.

I dont know the exact escape code for € .. but it would be something of this sort