德国文化中的 XML 解析问题 - ASP.NET

发布于 2024-10-29 06:34:15 字数 2765 浏览 1 评论 0 原文

编码平台:使用 C# 的 ASP.NET WebForms 4.0

背景:我正在从 XML 中读取一些值,并且一切都在我的区域设置(en-US)中运行。 XML 看起来像这个

<?xml version="1.0" encoding="utf-32" ?>
<settings>
  <UserRegistration>AutoAuthorize</UserRegistration>
  <OpenIDProfile>PromptUser</OpenIDProfile>
  <EnableSpamProtection>Yes</EnableSpamProtection>
  <MaxAllowedOpenID>2</MaxAllowedOpenID>
  <WebsiteURL>http://localhost:70707/blah/</WebsiteURL>
  <FacebookOAuthURL>https://graph.facebook.com/oauth/authorize?</FacebookOAuthURL>
  <FacebookAccessTokenURL>https://graph.facebook.com/oauth/access_token?</FacebookAccessTokenURL>
  <FacebookRedirectPage>ausgefüllt.aspx</FacebookRedirectPage>
  <FacebookAppID>192328104139846</FacebookAppID>
  <FacebookAppKey>29daeb58d8ae84cc22181f4073e4ed9d</FacebookAppKey>
  <FacebookAppSecret>b94e9ddd20efc47b3227e7333925fdd8</FacebookAppSecret>
  <FacebookScope>email</FacebookScope>
  <EmailSettingsDisplayName>admin</EmailSettingsDisplayName>
  <EmailSettingsEmail>[email protected]</EmailSettingsEmail>
  <EmailSettingsPassword>192185135098207157230060249027191124199097098215</EmailSettingsPassword>
</settings>

问题

我将整个事情打包给我的客户进行测试。测试环境为

服务器:Windows Server 2008 R2 64位
区域设置:德语 (de-DE)

现在,当我尝试读取 XML 时,Elmah 会抛出两个错误。第一个错误是

System.Xml.XmlException: '另', 十六进制值 0xA000D,是 无效字符。 1号线,位置 40.在System.Xml.XmlTextReaderImpl.Throw(字符串 res, String[] args) 在 System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace() 在 System.Xml.XmlTextReaderImpl.ParseDocumentContent() 在 System.Xml.Linq.XDocument.Load(XmlReader 读卡器、LoadOptions 选项)位于 System.Xml.Linq.XDocument.Load(字符串 uri、LoadOptions 选项)位于 Administrator_SiteSettings.SaveSettingsButton_Click(对象 发送者、EventArgs e) 中 c:\Webs\ThirdPartyLogins\Administrator\SiteSettings.aspx.cs:line 48

我将这些 XML 节点值放入字典中,此错误后出现字典的键未找到错误。
编码是罪魁祸首吗?
我的代码可能有什么问题?


Update: Just read UTF-8, UTF-16, and UTF-32. Will changing to utf-8 help?
Update2: Two things that might clarify the issue more.

1)将编码更改为utf-16时,出现新错误

在 utf-16 处其 System.Xml.XmlException: '.',十六进制值 0x00,是一个 无效字符。 1号线,位置 39.

2) 之前粘贴的 XML 不完整。它还有更多节点,其中一些 URL 作为节点数据。这会是一个问题吗?还更新了 XML。


Coding Platform: ASP.NET WebForms 4.0 with C#

Background: I am reading some values from XML and everything was working in my locale (en-US). The XML looks like this

<?xml version="1.0" encoding="utf-32" ?>
<settings>
  <UserRegistration>AutoAuthorize</UserRegistration>
  <OpenIDProfile>PromptUser</OpenIDProfile>
  <EnableSpamProtection>Yes</EnableSpamProtection>
  <MaxAllowedOpenID>2</MaxAllowedOpenID>
  <WebsiteURL>http://localhost:70707/blah/</WebsiteURL>
  <FacebookOAuthURL>https://graph.facebook.com/oauth/authorize?</FacebookOAuthURL>
  <FacebookAccessTokenURL>https://graph.facebook.com/oauth/access_token?</FacebookAccessTokenURL>
  <FacebookRedirectPage>ausgefüllt.aspx</FacebookRedirectPage>
  <FacebookAppID>192328104139846</FacebookAppID>
  <FacebookAppKey>29daeb58d8ae84cc22181f4073e4ed9d</FacebookAppKey>
  <FacebookAppSecret>b94e9ddd20efc47b3227e7333925fdd8</FacebookAppSecret>
  <FacebookScope>email</FacebookScope>
  <EmailSettingsDisplayName>admin</EmailSettingsDisplayName>
  <EmailSettingsEmail>[email protected]</EmailSettingsEmail>
  <EmailSettingsPassword>192185135098207157230060249027191124199097098215</EmailSettingsPassword>
</settings>

Problem

I wrapped the whole thing to my client for testing. The testing environment is

Server: Windows Server 2008 R2 64 bit
Locale: German (de-DE)

And now, when I try to read the XML, Elmah throws two errors error. The first error is

System.Xml.XmlException: '????',
hexadecimal value 0xA000D, is an
invalid character. Line 1, position
40. at System.Xml.XmlTextReaderImpl.Throw(String
res, String[] args) at
System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace()
at
System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at
System.Xml.Linq.XDocument.Load(XmlReader
reader, LoadOptions options) at
System.Xml.Linq.XDocument.Load(String
uri, LoadOptions options) at
Administrator_SiteSettings.SaveSettingsButton_Click(Object
sender, EventArgs e) in
c:\Webs\ThirdPartyLogins\Administrator\SiteSettings.aspx.cs:line
48

I am taking these XML node values to a Dictionary and this error follows with a key not found error for the dictionary.
Is encoding the culprit?
What could be wrong in my code?


Update: Just read UTF-8, UTF-16, and UTF-32.
Will changing to utf-8 help?


Update2: Two things that might clarify the issue more.

1) On changing the encoding to utf-16, got a new error

at utf-16 its System.Xml.XmlException:
'.', hexadecimal value 0x00, is an
invalid character. Line 1, position
39.

2) The XML pasted earlier was not complete. It had some more nodes with some URL as node data. Will that be an issue? Have updated XML also.


如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

昔日梦未散 2024-11-05 06:34:15

简短的回答:是的,编码是罪魁祸首;正确的编码是utf-16。

长答案:线索在于异常文本,其中显示“十六进制值 0xA000D”和“第 1 行,位置 40”。

当 XmlReader 读取文件时,它首先读取 XML 声明(?> 之间的所有内容)以确定文件其余部分使用哪种编码。在本例中,声明显示 UTF-32。因此,在读取声明末尾的 > 字符后,它会立即切换到使用 UTF-32 编码。正如您的链接文章所解释的,UTF-32 使用 4 个字节来表示每个字符,因此 XmlReader 从文件中读取接下来的 4 个字节并尝试将它们解释为字符。 (这与您的错误消息一致,因为第 1 行位置 40 紧接在 > 字符之后。)

如果文件确实是 UTF-32,那么接下来的 4 个字节是什么?好吧,文件中 > 字符之后的下一个内容是换行符,它由两个字符组成:回车符和换行符(在 Unicode 中分别为 0D 和 0A)。因此,我们预计接下来的 4 个字节是 0D 00 00 00,接下来的 4 个字节是 0A 00 00 00(请记住,Windows 是 小端字节序)。

但正如错误消息所述,实际读取的“字符”是 A000D,这意味着接下来的 4 个字节是 0D 00 0A 00(再次记住小端)。这非常接近,但显然每个字符只使用 2 个字节,而不是 4 个。那么我们有一个名称,不是吗?它被称为UTF-16!

Short answer: Yes, the encoding is the culprit; the correct encoding is utf-16.

Long answer: The clue lies in the exception text, where it says "hexidecimal value 0xA000D" and "line 1, position 40".

When XmlReader reads your file, it first reads the XML declaraction (everything between <?xml and ?>) to determine which encoding to use for the rest of the file. In this case the declaration says UTF-32. So immediately after reading the > character at the end of the declaration, it switches to using UTF-32 encoding. As your linked article explains, UTF-32 uses 4 bytes to represent each character, so the XmlReader reads the next 4 bytes from the file and tries to interpret them as a character. (This lines up with your error message, since line 1 position 40 is immediately after the > character.)

If the file really were UTF-32, what would the next 4 bytes be? Well, the next thing in the file after the > character is a newline, which is made up of two characters, carriage return and linefeed (in Unicode, 0D and 0A respectively). So we would expect the next 4 bytes to be 0D 00 00 00, and the next 4 after that would be 0A 00 00 00 (remember, Windows is little-endian).

But as the error message states, the actual "character" read was A000D, which means the next 4 bytes were 0D 00 0A 00 (again, remember little-endian). That's pretty close, but apparently only 2 bytes are being used for each character instead of 4. Well we have a name for that, don't we? It's called UTF-16!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文